Students will gain experience with the data science process including:
Students will learn about these topics by working on case studies based on problems that data scientists face in industry and academic research. Many of the cases will involve data collected by an organization (e.g., organization or scientist), using published data, or scraping web pages. All projects will involve some type of collaboration or communication. Students are expected to be familiar with the application of basic statistical methods used for inference (e.g., general linear models), prediction (e.g., linear and logistic regression), and are comfortable with basic data analysis using a programming language such as R or Python. Students will be expected to adopt a reproducible research workflow using tools such as Github, and RMarkdown, or Jupyter.
Class time will be a mixture of informal lectures, class discussions, and student presentations.
All work will be graded on a scale from 1 to 4 (sometimes with pluses and minuses) where:
Grade value | Description |
---|---|
1 | Work does not meet expectations. |
2 | Work meets expectations minimally, possibly missing some. |
3 | Good work; meets all or most expectations. |
4 | Excellent work; exceeds expectations. |
Grades will almost always be 2 or 3 (1’s and 4’s are rare). Generally speaking, a 2 is a B, a 3 is an A, and a 4 is an A+.
Item | Description | Value |
---|---|---|
In-class Labs | ||
Lab #1 | 5.00% | |
Lab #2 | 5.00% | |
Lab #3 | 5.00% | |
Projects | ||
Project #1 | 25.00% | |
Project #2 | 25.00% | |
Project #3 | 25.00% | |
Reflection on projects | Written reflection | 5.00% |
Participation | Attendance, active in discussions, and prepared for class | 5.00% |
This is a half-credit that meets in both the fall and winter terms. Class meeting will occur approximately bi-weekly.
Class | Date | Description | Reading | Due |
---|---|---|---|---|
1 | 10-Sep | Introduction to course, hypothesis testing in data analysis | ISLR: Chapt. 2, 3.1. CASI: Chapt 1, 2 | |
17-Sep | No class | |||
2 | 24-Sep | Multiple linear regression | ISLR: 3.2-3.6. | In-class lab #1 |
01-Oct | No class | In-class lab #2 | ||
3 | 08-Oct | Logistic regression and other linear models (GLM) | ISLR: Chapt 4; CASI: Chapt 8 | |
15-Oct | No class | In-class lab #3 | ||
4 | 22-Oct | No class | ||
29-Oct | Introduction of Project #1 | TBD | ||
05-Nov | Fall Reading Week - No class | |||
5 | 12-Nov | Data visualization | ||
19-Nov | No class | |||
6 | 26-Nov | Office hours | ||
03-Dec | Student presentations for project #1 | Project #1 presentations | ||
10-Dec | Winter Break | |||
17-Dec | Winter Break | |||
24-Dec | Winter Break | |||
31-Dec | Winter Break | |||
07-Jan | No class | |||
7 | 14-Jan | Introduction to Diabetes Complications Project #2 | Description will be posted after class | |
21-Jan | Introduction to RCT Fraud Project #2 | Read descriptions posted on Quercus | ||
8 | 28-Jan | No class | ||
04-Feb | Office hours | |||
9 | 11-Feb | Student presentations for project #2 | Project #2 presentations | |
18-Feb | Reading Week | |||
10 | 25-Feb | Introduction of Project #3 | TBD | |
03-Mar | No class | |||
11 | 10-Mar | Guest lecture: Dr. Inmar Givoni, Uber ATG. Deep Learning for self driving | ||
17-Mar | No class | |||
12 | 24-Mar | Student presentations for project #3 | Project #3 presentations | |
31-Mar | No class | |||
07-Apr | No class |
Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference. Vol. 5. Cambridge University Press.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.