This is the landing page for Applied Statistics II course taught by Dr. Robin Donatello for Spring 2020. This page is used for posting of regular announcements and information for students of the class.

Chico State COVID-19 News & Information page: https://www.csuchico.edu/coronavirus/index.shtml


05-04-2020: We made it to the end.

And i’m super sad that it ended virtually. This class was the highlight of my days. I throughougly enjoyed having each and every one of you with me on this ride this semester. I wish you all the best of luck in your future adventures, and know that i’ll keep our Slack channel open for a while (probably a year or so) in case you ever want to get a hold of me or someone else from the class. I’m also lurking around on Twitter if you use that media.I hope that you had a chance to work with existing friends, and make new ones as part of our time together. I hope you feel proud of your accomplishments this semester. This class is an elective for most of you, even the minor is technically an elective for your program. You’ve chosen to invest in building your analytical skills, and I know it will benefit you in your career.

When this is all over and we’re allowed to meet up again, I’m always up for a chat over coffee. I truly love to hear how former students are doing.

Enough with the sappy end of the semester closing monologue, here’s the things we’re going to work on today.

  1. Presentation expectations
  2. End of the semester Learning Journal entries: Reflections on the past 16 weeks.
    • Thinking Routine I used to Think… Now I think…
    • Lastly, and this is for my future course development. Answer the following two questions:
      • What is one thing you are glad you learned in this class?
      • What is one thing you wished you had learned in this class?
  3. Stat modeling in the wild. Can we mathematically model when this pandemic will end?

“.. Referred to in the original Post report as the”cubic model," it was attributed to former Council of Economic Advisers Chair Kevin Hassett, and the Post reported that “people with knowledge of that model say it shows deaths dropping precipitously in May — and essentially going to zero by May 15.”

Discuss as a group what you think about this model. AFTER you talk with each other, google around to see what the scientific community think of this model. Here are a few places to start: [1] [2]


04-20-2020: End of new content.


04-13-2020: Cluster Analysis


04-06-2020:

Asynchronous discussions: asking, answering & discussing questions in Slack.

  • Still required, but timeline has been removed. Just get your questions & discussion in before the HW is due.
  • This is essential discussion for your learning. We don’t have opportunities to talk to each other in class.
  • You don’t have to have “the” answer to contribute meaningfully & work towards a discussion.

Office hours requirement

Come check in with me! Drop into open class time and ask a question. Check in with me to let me know you’re on track and making progress. This is not like coming to the principals office - you’re not in trouble! I’m not gonna get on your case if you’re behind. I want to help and support you, and it’s important for me to know how you all are doing.

Don’t forget about the exam error assessment.


03-30-2020:


03-23-2020: Getting back to business, Long but important notice about class changes.

Video version of this announcment available in our google drive (“week 9 announcement”)

Changes due to Covid-19

See this page for comprehensive details on changes to the structure of our class. This is also linked under the ‘Materials’ tab in the navbar.

Gradebook

  • I switched priorities and will finish the exam before I go back to the homework and peer reviews.
  • Now that every educational tool is free, I’m testing out Gradescope to grade your exams.
  • You still have an opportunity to do an Exam error assessment.
    • You have until the end of the semester to get this done with me.
    • You can earn up to half the points you missed back. This will be added directly to your midterm grade.
  • I have added columns in the Blackboard gradebook to reflect assignments and points available for the rest of the semester.

Project Timeline Updates

  • Project presentation that we were going to do the Friday before finals, will now be due Friday 3/27.
  • See the project page for more details as there has been changes to the delivery requirements.

03-09-2020: Midterm, Project updates.

⚠️ UPDATE

Thursday

  • I still have OH at 1-2pm on Thursday.
  • This is a good time to test out making sure you know how to work Zoom.

Friday

  • We will continue with the plan to present the week 8 update.
  • See the [project] page for specifics and logistical updates.

Have a GOOD break! Use lots of 🧼 and stay away from 😷. I hope to see you all again in person real soon.

03-02-2020: Missing Data: Multiple Imputation with Chained Equations

  • Video on Multiple Imputation in Physics Education Research https://media.csuchico.edu/media/0_tgnydpgf
  • There is so much to cover regarding how to treat missing data, we could have an entire class on it. For more details please refer to the textbook by a current leading researcher in this area and the creator of mice: Flexible Imputation of Missing Data, 2nd Ed, by Stef van Buuren: https://stefvanbuuren.name/fimd/
  • Quiz 3 has been posted. Due Thursday, group quiz on Friday.
  • Short turn around time for HW3. Solutions will be posted on Monday, no late assignments will be collected.
  • We have some open work days built into the schedule this week. That does not imply that you can skip class time.

HW 1 grading done.

  • Please read the HW 1 feedback in Google Drive. It contains response to common problems throughout the homework.
  • If you got a 0, but submitted something, my guess is that you didn’t put the file into the final folder.
    • If this is the case, go ahead and resubmit and i’ll work on grading it when i have time, but it will be considered late.
  • HW 2 solutions will be posted shortly.

Status of analysis data sets

By now you should have:

  1. a data folder in your math456 folder containing all the data sets we work on in class and for homework.
  2. a data management (dm) file that reads in these data sets, performs some cleaning actions on it (e.g. converting to factors, collapsing levels). See examples here.
  3. a clean data set, dated with the date that you last peformed cleaning on (i.e. lung_03012020.csv). This is the one you use in analysis.

For the full experience that is more similar to what you’re doing on your project, and what you will encounter in real world analysis, you should download the raw data files from the link above and make your own DM file. You can use the ones I have linked as starter code.

02-24-2020: Missing Data: Identification and Imputation

  • HW 01 feedback started. Please review for general info that applies to all HW.
  • Quiz 2 posted. Individual due Tuesday 2/25, Group on Wednesday.
  • HW 03 posted. Short turn around time on it, check the calendar.
  • QFT on Monday - don’t forget the learning journal reflection afterward.

02-17-2020: Classification and Prediction

  • The Learning Journal section on the Learning Tools page has been updated to contain information on what LJ entries were assigned. I will update this on a more regular basis going forward.

PROJECT UPDATE

  • Project selection is over! Thank you to all who contributed project proposals.
    • All student proposals were funded!
    • You all have been assigned to either your 1st or 2nd choices.
    • Those that didn’t respond to the project selection form were placed on teams of my choosing.
  • You have been placed in a private Slack channel with your team members.
  • Check the Project page for details on your next deliverables (this week)

Report appearance

In a professional world, it is not just the correctness/quality of your work, but how you present the results. There are several packages in R that help you with this by writing the HTML or LaTeX code for you.

When looking for some updated references so I could write a list for you, I came across [these course notes] by Dr. Jeffrey Arnold taht does a good job (even if it’s from 2018) of giving a list of packages, and examples of a few. I noticed that these notes also do a good job of explaining regression modeling concepts. There is a lot more to talk about than what I can go through this semester.

See new code at end of logistic regression section of ASCN

I highly recommend looking into the [kableExtra] package for most things related to printing nice table (not raw regression output).

02-10-2020: Modeling Binary Outcomes

  • Little bit of recap on regression modeling for a continuous outcome, before we get into what to do when your outcome isn’t continuous.
  • We are going to only discuss the case when your outcome is binary, by fitting a Logistic regression model.

  • Quiz 1 is due by Monday 1pm
  • Draft of HW1 is due by Sunday EOD
  • Peer review opens Monday morning, due Wednesday
    • You have 2 assignments to peer review. The rotation sheet has been updated.

02-03-2020: Variable Selection

  • I will be reviewing and commenting on project proposals this week. Approved proposals will have until Saturday to create and post a video “pitch” to recruit other students to join their project.
  • Due to the lack of proposals submitted so far, I am extending the proposal deadline until Monday EOD.
    • Everyone has to work on a project.
    • I will have several projects proposals for you to choose from. These won’t nearly be as much fun or interesting compared to ones that you would choose.
  • You should have started HW1 already. In fact, you should be done with a portion of HW 1 already. Quiz 1 is based on the content covered in HW1, will open Saturday, and close on Sunday EOD.

01-27-2020: Refresher on Linear regression modeling.

  • The main goal this week is to refresh/learn how to interpret all types of predictors in a linear regression model.
  • You’ll get your first experience conducting a peer review also this wek.
  • Review the course notes and corresponding textbook chapters before class.

HW01 is not a “linear” homework in that early questions are on topics early in the module, and later questions on later topics. Questions have various parts, some of which aren’t discussed directly in class until later. You may not be able to complete all parts of all questions until the end of this module. Do what you can.

General Guidance on Homework

  • Start early and do what you can. If you have any question about the length, or difficulty of my homework assignments and why you should procrastinate at your own peril, just ask any of my prior students.
  • Don’t be afraid to ask questions. It is essential to your learning. This is a judgement free class as clearly stated in the [code of conduct] section of the syllabus.
  • Spell check your work, and for petes sake look at your work before you turn it in and imagine if you had to print it out.
    • No 100 page documents because you accidentally printed out the whole data set.
    • Make sure all code is showing and readable
    • Make sure all text is readable and not sucked into a table or wrapping off the page.
  • All data sets referenced for the homework can be obtained from Dr. D’s [data website].
  • Don’t assume the data are clean.
    • Check the data type in R, versus the codebook.
  • We will be using the following data sets regularly: Depression, Lung function, Parental HIV.
    • For each data set we use you should maintain:
      1. the raw data
      2. the codebook
      3. a data management file
      4. a clean data set that is used for homework

The following file organization is recommended. Another option could be to have a subfolder for each data set.

math456/
  | - data/
        | - lung_081217.txt
        | - dm_lung.R (or .Rmd)
        | - lung_clean.Rdata
        | - depress_081217.txt
        | - dm_depress.R
        | - depress_clean.Rdata
  | - hw/
      | - hw01-modeling-userid.Rmd
      | - hw01-modeling-userid.pdf

01-20-2020: Welcome!

Office Hours Scheduled

Click this link to see the R code I use to choose office hours.

  • Welcome to spring semester in statistics with your host, Dr. Robin Donatello.
  • Make sure you acquaint yourself with this website, it will be your guide for the next 16 weeks.
  • We will use Google Drive for homework submission.
    • You will be added using your @mail.csuchico.edu email address.
    • I recommend you ⭐ the folder in your drive so you can find it easier.
  • We will use Slack for all outside of class communication.
    • Do not email me with questions relating to the class.
    • You must download the desktop app, or phone app (or both). Do not rely on logging into the web browser version. You will miss timely information.
    • You will be sent an invite link to your @mail.csuchico.edu email address. Accept and join the workspace by Wednesday.
  • Software Updates:
    • Update your version of R studio for sure. Update R if you’re running < 3.5.
  • If you’ve had me in a class before, you know that there will be typos and broken links on this webpage. Here is how you can contribute to the fix.
    • good: Notify me in the #general channel in Slack. That way I can confirm in public when the issue is resolved.
    • better: open up an Issue in Github
    • best: fork the repo and submit a PR with the fix!