About the Trainings

Many class sessions have both interactive Modules courtesy of Data Camp1 and Walkthroughs created by me that you will need to work through after doing the readings and reviewing the corresponding content (if applicable). The lessons are a central part of the class and are focused on using the tidyverse family of packages, though the approaches are certainly not the only ways to wrangle, clean, analyze, and visualize data in R.

Advice

Carve out some time everyday to go through these. If you try to complete everything in one sitting, it will probably be overwhelming! However if you have familiarity with some modules, please feel free to work ahead.

Grading

The ultimate point of Data Camp is to get you familiarized with an environment that you likely have never seen or been exposed to. While you should absolutely go through each module, there is certainly no expectation that you will get everything right. In fact, the points that you incur don’t mean anything as far as how you are assessed so please use hints as needed! As with any things data science, you’ll learn by doing. If you have a polar personality type as it pertains to work (i.e. primarily a perfectionist or mostly careless), then the modules will likely prove to be a challenge. The chance that you will be able to comprehend everything by going beyond your limit or conversely assuming it will just come to you is low so please work hard but also take breaks, swear2, and look on the Internet, ask peers or reach out for help. Your score is predicated on putting in a solid effort, rather than getting it perfect because thats not realistic when it comes to data.

Data Camp Schedule

A tentative schedule is given below. The Course and Chapter names represent Data Camp titles3:

Required

The following modules are required and will count toward your final grade

LinkDueRequiredTaskModuleChapters
Week 18/23/22Introduction to RIntro to basics
Vectors
Matrices
Factors
Data Frames
Lists
Week 28/30/22Introduction to the TidyverseData wrangling
Data visualization
Grouping and summarizing
Types of visualizations
Week 39/6/22Introduction to Data Visualization with ggplot2Explore your data
Tame your data
Tidy your data
Transform your data
Week 49/13/22Data Manipulation with dplyrTransforming Data with dplyr
Aggregating Data
Selecting and Transforming Data
Case Study: The babynames Dataset
Week 59/20/22Reshaping Data with tidyrTidy Data
From Wide to Long and Back
Expanding Data
Rectangling Data
Week 69/27/22Joining Data with dplyrJoining Tables
Left and Right Joins
Full, Semi, and Anti Joins
Case Study: Joins on Stack Overflow Data
Week 710/4/22Categorical Data in the TidyverseIntroduction to Factor Variables
Manipulating Factor Variables
Creating Factor Variables
Case Study on Flight Etiquette
Week 810/11/22Modeling with Data in the TidyverseIntroduction to Modeling
Modeling with Basic Regression
Modeling with Multiple Regression
Model Assessment and Selection
Week 910/18/22Fundamentals of Bayesian Data Analysis in RWhat is Bayesian Data Analysis?
How does Bayesian inference work?
Why use Bayesian Data Analysis?
Bayesian inference with Bayes’ theorem
Week 1010/25/22Survey and Measurement Development in RPreparing to analyze survey data
Exploratory factor analysis & survey development
Confirmatory factor analysis & construct validation
Criterion validity & replication
Week 1111/1/22Analyzing Survey Data in RIntroduction to survey data
Exploring categorical data
Exploring quantitative data
Modeling quantitative data
Week 1211/8/22Introduction to Text Analysis in RWrangling Text
Visualizing Text
Sentiment Analysis
Topic Modeling
Week 1311/8/22Text Mining with Bag-of-Words in RJumping into text mining with bag of words
Word clouds and more interesting visuals
Adding to your tm skills
Battle of the tech giants for talent
Week 1411/15/22Sentiment Analysis in RFast & dirty: Polarity scoring
Sentiment analysis the tidytext way
Visualizing sentiment
Case study: Airbnb reviews

The following modules are optional but highly recommended

RequiredTaskModuleChapters
Communicating with Data in the TidyverseCustom ggplot2 themes
Creating a custom and unique visualization
Introduction to Rmarkdown
Customizing your RMarkdown report
Dealing with Missing Data in RWhy care about missing data?
Wrangling and tidying up missing values
Testing missing relationships
Connecting the dots (Imputation)
Intermediate RConditionals and Control Flow
Loops
Functions
The apply family
Utilities
String Manipulation with stringr in RString basics
Introduction to stringr
Pattern matching with regular expressions
More advanced matching and manipulation
Case studies

Extra Credit

The following modules are optional and may count as extra credit contingent on the successful completion of the data camp course and corresponding assessment to be submitted via eCampus. Please note that each subsequent module is dependent on the previous one.

DueRequiredTaskModuleChapters
12/9/22Analyzing US Census Data in RCensus data in R with tidycensus
Wrangling US Census Data
US Census geographic data in R
Mapping US Census Data
12/9/22Intermediate Data Visualization with ggplot2Statistics
Coordinates
Facets
Best Practices
12/9/22Introduction to Natural Language Processing in RTrue Fundamentals
Representations of Text
Applications: Classification and Topic Modeling
Advanced Techniques
12/9/22Machine Learning in the TidyverseFoundations of “tidy” Machine learning
Multiple Models with broom
Build, Tune & Evaluate Regression Models
Build, Tune & Evaluate Classification Models

R Tasks

In some weeks you will be expected to complete an additional R task which are indicated by a  in the table above. Collectively these serve as the R Survey EDA noted on the syllabus.

Working Ahead

By no means do you have to wait for a particular module to be assigned. If you wish to enroll in a training - one that is assigned or otherwise - simply search for the name of that course on the Data Camp site. For those modules assigned for this course, you will receive credit after the due date has passed.

Need Help?

While I am happy to meet face-to-face, it is just as easy to schedule a Zoom session using the calendar or by notifying me on Slack by adding @Dr. Abhik Roy to your message.


  1. Please note that if you have (1) used Data Camp before and (2) are logged in with the same username, then any module that was successfully completed will not have to be done again. ↩︎

  2. and curse my name if you have to ↩︎

  3. Subject to change with notice. ↩︎