Housekeeping
- Website organization update. Added materials page.
- Start to bring headphones to class. You’ll be watching short videos & taking notes during class.
- ISLR Logistics
- We will use a single ISLR repo for all work. I will add files to it regularly and review your work using the feedback branch. Don’t merge this branch. Push and pull regularly.
- Before class: Watch videos and prepare answers to selected questions
- During class: Discuss and compare answers to questions with classmates, sometimes will watch additional videos and share out.
- After class
- Finish answering any remaining questions
- Complete any practice problems and/or activities assigned
- Project Logistics
- Weekly wednesday updates to the project page on the ADS website.
- First (10 min) project report out on Thursday the 10th. Then every 2 weeks. Present overhead to class and possibly client via Zoom.
- Reading logistics
- Reading discussion every other week
- Learning journal updates on off weeks.
Learning Path
Where we’ve been
- Getting orientated with the semester long project
- Practicing data wrangling and report writing for a professional audience.
Where we’re at
- Learning how to balance textbook learning and project based learning while keeping the broader ethical implications in mind.
- If you didn’t have an organization schedule for your classes yet, you should do so asap. The workload in this class is going to ramp up a bit.
Where we’re going
- Digging into mathematical models of statistical learning.
- Learning new R code methods, practicing building models.
Learning Objectives
- Describe the difference between training and testing data sets
- Describe the difference between a parametric and non-parametric model
- Identify and describe situations where classification, regression, and clustering models are appropriate.
- Explain the concept of overfitting, and bias-variance tradeoff.
Tuesday
Prepare
👥 Discuss these questions in your group and write the answers in the ch2-statistical-learning.Rmd
file in your ISRL repo. You may not finish all questions in the time allotted during class, you will have to finish outside of class.
- What is f?
- Why do we care about estimating f?
- Describe the two types of errors in a model.
- What concept in 456 does irreducible error portion of the model correspond to?
- Summarize the curse of dimensionality.
- What is the difference between a parametric & non-parametric model? Give an example of each.
- What are the advantages & disadvantages of a parametric approach to regression or classification (as opposed to a nonparametric approach)?
- Why would we ever choose to use a more restrictive method instead of a very flexible approach?
- What is the difference between supervised & unsupervised learning? Give an example of each.
Thursday
Prepare
- Finish ISLR Chapter 2
- Open project work time
- What is the primary measure of model accuracy for regression models?
- Compare and contrast a smoothing spline to a linear regression line. (What is the same, what is different)
- What’s the difference between training and testing data? Why do we need both?
- What is overfitting?
- If we don’t have a testing data set, what method can we use to estimate the MSE of the testing data?
- What is the bias-variance trade-off?
- What is the primary measure of model accuracy for classification models?
- Describe the Bayes Classifier
- What is the Bayes error rate?
- What is a limitation of the Bayes classifier?
- Describe how the K-Nearest Neighbors classifier works.
- Name a benefit of using a KNN model.
- What happens to the accuracy/bias of the model as the K increases? Why?
Assignments
Full details in your ISLR repo.
- ISLR Ch 2 Exercises: 1, 2, 4 (One example each)
- Learning Tidymodels