A mini-homework about a data science study that used Twitter data to predict election outcomes.
For this module exercise, you will answer a series of questions that check your understanding of the material covered in the Module 1 lecture videos.
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 1: Introduction to data
A mini-homework on editing R Markdown files and saving to GitHub.
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 2: GitHub
Section 2.1 – Getting started: http://book.cds101.com/getting-started.html
Section 2.2 – Navigating the GitHub site: http://book.cds101.com/navigating-the-github-site.html
Section 2.3 – Repositories: http://book.cds101.com/repositories.html
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 27: R Markdown
R Markdown: The Definitive Guide
Book URL: https://bookdown.org/yihui/rmarkdown
Chapter 2: Basics
Introduction: https://bookdown.org/yihui/rmarkdown/basics.html
Section 2.2 – Compile an R Markdown document: https://bookdown.org/yihui/rmarkdown/compile.html
Section 2.5 – Markdown syntax: https://bookdown.org/yihui/rmarkdown/markdown-syntax.html
Introduction to Data Science: Data Analysis and Prediction Algorithms with R
Book URL: https://rafalab.github.io/dsbook/
Selections from chapters 2, 39, and 40
Section 2.4.1 – RStudio: The panes: https://rafalab.github.io/dsbook/getting-started.html#the-panes
Section 2.4.2 – RStudio: Key bindings: https://rafalab.github.io/dsbook/getting-started.html#key-bindings
Section 2.4.4 – RStudio: Global options: https://rafalab.github.io/dsbook/getting-started.html#changing-global-options
Section 39.6 – Using Git and GitHub in RStudio: https://rafalab.github.io/dsbook/git.html#rstudio-git
Sections 40.1 – RStudio projects: https://rafalab.github.io/dsbook/reproducible-projects-with-rstudio-and-r-markdown.html#rstudio-projects
A mini-homework to practice using RStudio to run code blocks in RMarkdown files and to create visualizations using ggplot2.
Your first major assignment is a set of exercises based around a single dataset called rail_trail, which will provide you with practice in creating visualizations using R and ggplot2
.
A mini-homework for practicing how to make plots using the ggplot2 library.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 3: Data visualisation
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 1: Introduction to data
Section 1.6 – Examining numerical data, skip subsection 1.6.8
Section 1.7 – Considering categorical data
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 3: Describing numerical data
For your second major assignment, you will explore a dataset about the passengers on the Titanic, the British passenger liner that crashed into an iceberg during its maiden voyage and sank early in the morning on April 16, 1912.
A mini-homework for practicing how to manipulate datasets using the dplyr library.
For this module exercise, you will follow along with the examples from the Module 4 lecture videos in an R Markdown file.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 4: Workflow: basics
Chapter 5: Data transformation
A mini-homework for practicing how to reshape datasets using the tidyr library.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 12: Tidy data
For this module exercise, you will answer a series of questions that check your understanding of the material covered in the Module 5 lectures.
A mini-homework for practicing how to analyze data distributions using basic statistical functions in R, ggplot2, and dplyr.
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 4: Representing distributions
Introduction: http://book.cds101.com/representing-distributions.html
Section 4.1 – Probability mass functions: http://book.cds101.com/probability-mass-functions.html
Section 4.2 – Cumulative distribution functions: http://book.cds101.com/cumulative-distribution-functions.html
For this module exercise, you will answer a series of questions that check your understanding of the material covered in the Module 6 lectures.
For your third major homework assignment, you will use statistical inference to answer a question about the National Survey of Family Growth, Cycle 6 dataset published by the National Center for Health Statistics.
A mini-homework for practicing how to conduct hypothesis tests and calculate confidence intervals using the infer package.
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 2: Foundation for inference
Chapter 4: Inference for numerical data
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 5: Statistical inference with infer
For your fourth major homework assignment, you will build a regression model that predicts the market value of condominiums in New York City using a dataset published by the New York City Department of Finance.
A mini-homework for practicing how to build and analyze linear regression models.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 23: Model basics
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 5: Introduction to linear regression
Introduction
Section 5.1
Section 5.4, read subsection 5.4.1 only
For the final project, you will be assigned into a team to conduct an exploratory data analysis of the U.S. Department of Education’s