STATS 315B - Modern Applied Statistics

Name: {}
Author: Emily Fox

This course provides a broad and deep treatment of modern statistical machine learning topics. Students entering the course are assumed to have foundational working knowledge in statistics, probability, and basic machine learning concepts, though the course has been designed to provide a broadly accessible treatment of the topics covered.

The course starts with a quick review of linear regression and classification, error metrics, and the bias-variance tradeoff. We then delve into decision trees and deep learning techniques for non-linear regression and classification tasks. The subsequent modules move beyond regression and classification tasks and turn to discovering patterns and low-dimensional structure via unsupervised learning. Topics include clustering, dimensionality reduction and autoencoding methods, and matrix factorization. The last module considers time series and sequential data sources via state space models and deep learning methods.

Teaching team

Emily Fox

Instructor

Office Hours: Thu 4:35-5:35pm @ Sequoia 134

Kevin Han

Teaching Assistant

Office Hours: Tue 9:30am-11:30pm @ Sequoia Library (1F)

Amber Hu

Teaching Assistant

Office Hours: Wed 9:30-11:30am @ Zoom (see Canvas for link)

Annette Jing

Teaching Assistant

Office Hours: Mon 10am-12pm @ Zoom (see Canvas for link), @ Sequoia 220 (Fishbowl) on 4/17, 5/1, 5/15

Course Logistics

When: Class is Mondays and Wednesdays 1:30-2:50pm PST.

Where: Class will be in person at Gates B3.

Links:

Ed: This is the main way that you and the teaching team should communicate: we will post all important announcements here, and you should ask all course-related questions here. For personal matters that you don’t wish to put in a private Ed post, you can email the teaching staff at stats315b-spr2223-staff@lists.stanford.edu.
Canvas: The course Canvas page contains links and resources only accessible to students.
Gradescope: We use Gradescope for managing coursework (turning in, returning grades). Please use your @stanford.edu email address to sign up for a Gradescope account.

Prerequisites: A well-prepared student will have knowledge of:

Math:
- Linear algebra (matrix/vector operations, orthogonality, etc.)
- Multivariate calculus (gradients, partial derivatives)
Probability:
- Random variables, expectations, Gaussian distribution, conditional and marginal distributions
Statistics / machine learning basics:
- Linear regression and classification and, ideally, overfitting and bias-variance tradeoff
- Parameter estimation, including via maximum likelihood estimation
Programming proficiency in:
- Python (preferred for this course), or
- R, Julia, etc. with an ability to (i) pivot to Python with starter code or (ii) code independently in selected language

From experience, eager students with a strong quantitative background are able to catch up and fully participate.

Course Grade: The course grade will be based on the following components.

5 Homework Assignments (45%): HW0 (5%), HW1 - HW4 (40%)
Concept Quizzes (15%)
Final Project (40%): Project midway (8%), Project presentation (8%), Project report (24%)

Textbooks:

Probabilistic Machine Learning: An Introduction; Kevin Murphy
Probabilistic Machine Learning: Advanced Topics; Kevin Murphy
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition (12th printing); Trevor Hastie, Robert Tibshirani, Jerome Friedman

Other references:

A Course in Machine Learning; Hal Duame III
Pattern Recognition and Machine Learning; Chris Bishop
Machine Learning; Tom Mitchell
Information Theory, Inference, and Learning Algorithms; David MacKay