The theme of project #2 will be problems from two different healthcare domains.
Presentation: Feb. 11
Written Report: Feb. 24
Mathieu Ravaut (MScAC Alumni) - Machine Learning Research Scientist at Layer 6 AI will introduce this problem on Jan. 14
Use the Diabetes 130 US hospitals for years 1999-2008 data to predict readmission.
Shuang Di (MSc - Statistics Alumni) - Data Scientist at Digital Health Innovation, Hamilton Health Sciences on Jan. 21
Data for this project will be available from Hamilition Health Sciences. To access the data for this project please follow the instructions on this page.
Coming soon …
The assignment is to answer either the diabetes or fraud question using the data provided, but you will almost surely need to develop more focused questions. You will have to wrangle the data into a format that can be analysed using statistical methods, and draw appropriate conclusions.
This is an individual assignment, where each student will pick a project to work on, although you may collaborate with your classmates. The presentation and written report is an individual assignment.
The written report should be at most six pages. This means that you will have to be selective in what you choose to report, and which plots you choose to display.
The written report is due on Feb. 24.
echo=FALSE
, warning = FALSE
, message = FALSE
and in Jupyter use the command line tool nbconvert
1) unless there is some part of the code that will contribute to describing what you have done in the data analysis. Don’t submit a report with warning messages from a library you loaded in your report. For example,Don’t do this:
The distribution of XX is shown below …
library(tidyverse)
set.seed(1028)
data.frame(x = rnorm(100)) %>% ggplot(aes(x)) + geom_histogram(colour = "white", fill = "darkblue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Do this:
The distribution of XX is shown below …
Also, you will be submitting your R Markdown/Jupyter Notebook file so I can see all the gory details. This leads to …
What should be in the report? A high level description of what you have done. This leads to …
Who is the intended audience for the report and what do you mean by a “high level description”? The intended audience is an educated person that has taken at least one basic statistics course, but might be a bit rusty on the details. For example, your supervisor at work completed an MBA ten years ago and took a few statistics courses, but the details are a bit hazy.
Your writing will be evaluated for clarity and conciseness.
Title [1-5] There should be an appropriate title, adequate summary, and complete information including names and dates.
Introduction [1-5] The purpose of the research should be clearly stated and the scope of what is considered in the report should be clear.
Methods [1-5] The role of each method should be clearly stated. The description of the analyses should be clear and unambiguous so that another statistician or data scientist could easily re-construct it. The methods should be described accurately.
Results [1-5] There should be appropriate tables and graphs. The results should be clearly stated in the context of the problem. The size and direction of significant results should be given. The results must be accurately stated. The research question should be adequately answered.
Conclusion / Discussion [1-5] The results should be clearly and completely summarized. This section should also include discussion of limitations and/or concerns and/or suggestions for future consideration as appropriate.
General Considerations [1-5] The ideas should be presented in logical order, with well-organized sections, no grammatical, spelling, or punctuation errors, an appropriate level of technical detail, and be clear and easy to follow.
Presentations will take place on Feb. 11. The time allotted for each group is 10 minutes plus 5 minutes for discussion. This time limit will be enforced. If your group exceeds the time limit then you will be asked to stop the presentation. This means that you should rehearse your presentation timing before you present to the class.
The goal of the presentation is to effectively communicate your findings to a non-technical, but educated, audience (e.g., physicians or health care executives). This doesn’t mean that you shouldn’t include technical details, but you should aim to communicate the findings to an audience without a background in statistics, math, or computer science.
You will need to remind us about the project, but only tell us what we really need to know. We are curious about the results, and how your group presents the results, but they are not the only purpose of this presentation. So, what should you include? Examples, of questions to consider as you and your partner are preparing your presentation are: