BMI 534 - Introduction to Machine Learning
CS 534 - Machine Learning
The course is cross-listed in both BMI534 and CS534, so please register for the one with available seats.
This is a summarized version of the syllabus. The detailed version of the syllabus will be available on the Course Canvas.
Instructor:
Hyeokhyen Kwon, Ph.D.
Assistant Professor
Department of Biomedical Informatics
Office: Rm 4105, 4th Floor, Emory Woodruff Memorial Research Building (101 Woodruff Cir, Atlanta, GA 30322)
Teaching Assistant: TBD (mail)
Course Overview
Machine learning is innovating many applications all across our society from autonomous vehicles to biomedicine and science. In this course, students will learn the fundamental theories (optimization, probability, linear algebra, etc) and algorithms of machine learning (supervised and unsupervised learning, etc) and also obtain practical experiences in applying machine learning techniques and analysis in real-world problems in biomedical informatics.
Learning Objectives
This course will introduce students to fundamental theory and algorithms in machine learning through lectures, homework, midterm, and a semester-long project. Taking this course, students should be able to:
- Prepare datasets for machine-learning experiments
- Understand the basic building blocks and general principles that underlie machine learning algorithms
- Be familiar with specific, widely used machine learning algorithms for classification and regression
- Formulate rigorous validation protocols, and evaluate the rigor of published experiments
- Understand the bias and variance tradeoffs and strategies to mitigate overfitting
- Understand fundamental machine-learning algorithms presented in the second half of the course
- Learn methodology and tools to apply machine learning algorithms to real data and evaluate their effectiveness and performance
Prerequisites:
- Undergraduate-level linear algebra
- Undergraduate-level multivariate calculus
- Undergraduate-level statistics and probability theory
- Coding experience:
- Python (required)
- Matlab (optional)
- C/C++ (optional)
- R (optional)
- Permission by the instructor (Send an email to the instructor)
Course Logistics
Communication & Course Materials:
-
Canvas:
The class syllabus, schedule, lecture slides, homework handouts/files, discussion, and grades are posted on this platform. Please ask all questions under the discussions section of Canvas. You are encouraged to answer other students’ questions when you know the answer. Do not ask or answer questions that are exactly homework questions. -
Email:
If there are private matters specific to you (e.g., special accommodations, requesting alternative arrangements, etc.) or confidential matters you want to report, please email with the subject heading starting with “[CS534]”. I probably will not respond immediately but will try to respond within 24 hours(during specific busy periods I may need 48 hours). It is best to avoid last-minute questions that require immediate attention (e.g., before a deadline!). Plan accordingly. -
Office Hours
- Hyeokhyen Kwon: TBD
- TA-TBD: TBD
Textbook(s):
- Required
- “The Elements of Statistical Learning: Data mining, Inference, and Prediction, Second Edition”, by Trevor Hastie, Robert Tibshirani & Jerome Friedman (link)
- Supplemental:
- “Machine Learning: a Probabilistic Perspective”, by Kevin Murphy
- “Pattern Recognition and Machine Learning”, by Christopher Bishop
- “A First Encounter with Machine Learning”, by Max Welling (link)
- “A Course in Machine Learning”, by Hal Daumé III (link)
- “Understanding Machine Learning: From Theory to Algorithms”, by Shai Shalev-Shwartz & Shai Ben-David (link)
Expectations & Grading:
Component | Weight |
---|---|
Participation | 10% |
Homeworks | 35% |
Midterm | 15% |
Project | 40% |
The detailed version of the syllabus with instructions on each component and grading policy will be uploaded on Canvas.
University Policies and Academic Integrity
All class work is governed by the College Honor Code and Departmental Policy. Your submitted homework and all code and writeup must be written by yourself. Any code and writeup that is found to be similar are grounds for an honor code investigation by the Director of Graduate Studies, Laney Graduate School, and the honor council. Additional extensions on assignments will be granted with appropriate documentation from the Office of Undergraduate Education (OUE)
A syllabus with details of policies will be uploaded on Canvas.
(Tentative) Course Schedule
Topics may change but the homework, midterm, and project deliverables are fixed. The reading material listed below is optional and the lecture plan may deviate over the course of the semester.
# | Date | Theme | Topic | Reference (Chapter) | Assignment |
---|---|---|---|---|---|
1 | 1/17 | Intro + Course Logistics | Review syllabus, Overview of course topics | Ch. 1 (Hastie et al.) Ch. 1 (Murphy) Ch. 3 (Welling) |
Homework #0 out (Due 1/30) |
2 | 1/22 | Intro to Optimization | Convex optimization notes Part I and II from Stanford’s machine learning class Rosenberg’s abridged notes |
||
3 | 1/24 | Intro to Statistics, Probability, and Random Variables | Random variables, probability density functions, conditional and joint distributions, Bayes rule | Handouts | |
4 | 1/29 | Statistical Decision Theory + Linear Regression | Mapping machine learning problems to statistical concepts, Regression, ridge regression | Ch 1 -2; Ch 3.1 - 3.4 (Hastie et al.) Ch. 17.1 - 17.2 (Barber) Prof. Carlos Carvalho’s MLR Slides |
|
5 | 1/31 | Linear Regression + Naive Bayes | LASSO regression, elastic net regression | Homework #1 out (Due 2/13) | |
6 | 2/5 | Linear Classification | logistic regression, LDA, QDA | Ch 2.1 - 2.4; Ch 4.1 - 4.4 (Hastie et al.) | |
7 | 2/7 | Linear Classification + Bias-Variance Tradeoff | Training & test error, conditional and expected test error, bias-variance decomposition and tradeoff, training error optimism | Ch 7.2 - 7.3 (Hastie et al.) Ch. 5.9 (Daumé III) |
|
8 | 2/12 | Model Assessment + Error Measures | Validation as an estimation problem, cross validation, bias and variance of cross validation schemes, Error measures, class imbalance, ROC analysis, precision-recall | Ch. 7.10 (Hastie et al.) Ch. 2.5 - 2.6 (Daumé III) |
|
9 | 2/14 | Model Selection | Effective number of parameters, Akaike and Bayes information criterion | Ch. 7 (Hastie et al.) Ch. 5.5 - 5.6 (Daumé III) |
Homework #2 out (Due 2/27) |
10 | 2/19 | Practical Issues | Preparing data, labeling issues, interpretation | Ch. 9 -10 (Hastie et al.) | |
11 | 2/21 | Decision Trees | Decision trees, boosting | Ch. 9.2 (Hastie et al.) Ch. 1.3 (Daumé III) |
|
12 | 2/26 | Perceptron + Support Vector Machines | Perceptron, SVM, kernel SVM | Ch. 12 (Hastie et al.) Ch. 4; Ch. 11 (Daumé III) Ch. 7 - 9 (Welling) Ch. 15 (Shalev-Shwartz & Ben-David) Standford SVM notes NYU SVM notes |
|
13 | 2/28 | Neural Networks | Architectures, gradient optimization, back propagation | Ch. 11 (Hastie et al.) Ch. 1-3 (Nielsen) Ch. 20.1 - 20.3 (Shalev-Shwartz & Ben-David) |
Homework #3 out (Due 3/14) |
14 | 3/4 | Neural Networks | Project Proposal due 3/5 | ||
3/6 | Spring Break | ||||
3/11 | Spring Break | ||||
15 | 3/13 | Additive Models + Bootstrap | ADABoost, gradient boosting | Ch. 7.11; Ch. 9.1 (Hastie et al.) | |
16 | 3/18 | Boosting | Ch. 10 (Hastie et al.) | Homework #4 out (Due 4/2) | |
17 | 3/20 | Random Forest | Ensemble methods, random forests | Ch. 15 - 16 (Hastie et al.) Breiman’s paper |
Project Spotlight Slides Due 3/24 |
18 | 3/25 | Project Spotlight + Ensembles | |||
19 | 3/27 | Prototype methods + Challenges with High-dimensional Data + Demensionality Reduction | KNN, Curse of dimensionality, sparse representation | Ch. 13 - 14; Ch. 18 (Hastie et al.) Ch. 3.2 - 3.3 (Daumé III) Ch. 5 (Welling) Ch. 19.1 - 19.2; Ch. 23 (Shalev-Shwartz & Ben-David) Stanford PCA notes |
|
20 | 4/1 | Dimensionality Reduction | Principal component analysis, locally-linear embedding, manifold learning | Ch. 14 (Hastie et al.) | |
21 | 4/3 | Clustering + Mixture modeling | K-means, spectral clustering, expectation maximization | Ch. 14 (Hastie et al.) | Homework #5 out (Due 4/16) |
22 | 4/8 | Reinforcement Learning | Markov Decision Process | ||
23 | 4/10 | Reinforcement Learning | Q-Learning | ||
24 | 4/15 | Bayesian Network | Probabilistic Graphical Model | ||
25 | 4/17 | Filtering + Time-series Analysis | Kalman Filter, Hidden Markov Model | ||
21 | 4/22 | Midterm Exam | |||
27 | 4/24 | Ethics in AI | |||
28 | 4/29 | Project Presentations | Final Report Due 5/10 |