About the Trainings
Many class sessions have both interactive Modules courtesy of Data Camp1 and Walkthroughs created by me that you will need to work through after doing the readings and reviewing the corresponding content (if applicable). The lessons are a central part of the class and are focused on using the tidyverse family of packages, though the approaches are certainly not the only ways to wrangle, clean, analyze, and visualize data in R.
Advice
Carve out some time everyday to go through these. If you try to complete everything in one sitting, it will probably be overwhelming! However if you have familiarity with some modules, please feel free to work ahead.
Grading
The ultimate point of Data Camp is to get you familiarized with an environment that you likely have never seen or been exposed to. While you should absolutely go through each module, there is certainly no expectation that you will get everything right. In fact, the points that you incur don’t mean anything as far as how you are assessed so please use hints as needed! As with any things data science, you’ll learn by doing. If you have a polar personality type as it pertains to work (i.e. primarily a perfectionist or mostly careless), then the modules will likely prove to be a challenge. The chance that you will be able to comprehend everything by going beyond your limit or conversely assuming it will just come to you is low so please work hard but also take breaks, swear2, and look on the Internet, ask peers or reach out for help. Your score is predicated on putting in a solid effort, rather than getting it perfect because thats not realistic when it comes to data.
Data Camp Schedule
A tentative schedule is given below. The Course and Chapter names represent Data Camp titles3:
Required
The following modules are required and will count toward your final grade
Link | Due | Required | Task | Module | Chapters |
---|---|---|---|---|---|
Week 1 | 8/23/22 | Introduction to R | Intro to basics | ||
Vectors | |||||
Matrices | |||||
Factors | |||||
Data Frames | |||||
Lists | |||||
Week 2 | 8/30/22 | Introduction to the Tidyverse | Data wrangling | ||
Data visualization | |||||
Grouping and summarizing | |||||
Types of visualizations | |||||
Week 3 | 9/6/22 | Introduction to Data Visualization with ggplot2 | Explore your data | ||
Tame your data | |||||
Tidy your data | |||||
Transform your data | |||||
Week 4 | 9/13/22 | Data Manipulation with dplyr | Transforming Data with dplyr | ||
Aggregating Data | |||||
Selecting and Transforming Data | |||||
Case Study: The babynames Dataset | |||||
Week 5 | 9/20/22 | Reshaping Data with tidyr | Tidy Data | ||
From Wide to Long and Back | |||||
Expanding Data | |||||
Rectangling Data | |||||
Week 6 | 9/27/22 | Joining Data with dplyr | Joining Tables | ||
Left and Right Joins | |||||
Full, Semi, and Anti Joins | |||||
Case Study: Joins on Stack Overflow Data | |||||
Week 7 | 10/4/22 | Categorical Data in the Tidyverse | Introduction to Factor Variables | ||
Manipulating Factor Variables | |||||
Creating Factor Variables | |||||
Case Study on Flight Etiquette | |||||
Week 8 | 10/11/22 | Modeling with Data in the Tidyverse | Introduction to Modeling | ||
Modeling with Basic Regression | |||||
Modeling with Multiple Regression | |||||
Model Assessment and Selection | |||||
Week 9 | 10/18/22 | Fundamentals of Bayesian Data Analysis in R | What is Bayesian Data Analysis? | ||
How does Bayesian inference work? | |||||
Why use Bayesian Data Analysis? | |||||
Bayesian inference with Bayes’ theorem | |||||
Week 10 | 10/25/22 | Survey and Measurement Development in R | Preparing to analyze survey data | ||
Exploratory factor analysis & survey development | |||||
Confirmatory factor analysis & construct validation | |||||
Criterion validity & replication | |||||
Week 11 | 11/1/22 | Analyzing Survey Data in R | Introduction to survey data | ||
Exploring categorical data | |||||
Exploring quantitative data | |||||
Modeling quantitative data | |||||
Week 12 | 11/8/22 | Introduction to Text Analysis in R | Wrangling Text | ||
Visualizing Text | |||||
Sentiment Analysis | |||||
Topic Modeling | |||||
Week 13 | 11/8/22 | Text Mining with Bag-of-Words in R | Jumping into text mining with bag of words | ||
Word clouds and more interesting visuals | |||||
Adding to your tm skills | |||||
Battle of the tech giants for talent | |||||
Week 14 | 11/15/22 | Sentiment Analysis in R | Fast & dirty: Polarity scoring | ||
Sentiment analysis the tidytext way | |||||
Visualizing sentiment | |||||
Case study: Airbnb reviews | |||||
Recommended
The following modules are optional but highly recommended
Required | Task | Module | Chapters |
---|---|---|---|
Communicating with Data in the Tidyverse | Custom ggplot2 themes | ||
Creating a custom and unique visualization | |||
Introduction to Rmarkdown | |||
Customizing your RMarkdown report | |||
Dealing with Missing Data in R | Why care about missing data? | ||
Wrangling and tidying up missing values | |||
Testing missing relationships | |||
Connecting the dots (Imputation) | |||
Intermediate R | Conditionals and Control Flow | ||
Loops | |||
Functions | |||
The apply family | |||
Utilities | |||
String Manipulation with stringr in R | String basics | ||
Introduction to stringr | |||
Pattern matching with regular expressions | |||
More advanced matching and manipulation | |||
Case studies |
Extra Credit
The following modules are optional and may count as extra credit contingent on the successful completion of the data camp course and corresponding assessment to be submitted via eCampus. Please note that each subsequent module is dependent on the previous one.
Due | Required | Task | Module | Chapters |
---|---|---|---|---|
12/9/22 | Analyzing US Census Data in R | Census data in R with tidycensus | ||
Wrangling US Census Data | ||||
US Census geographic data in R | ||||
Mapping US Census Data | ||||
12/9/22 | Intermediate Data Visualization with ggplot2 | Statistics | ||
Coordinates | ||||
Facets | ||||
Best Practices | ||||
12/9/22 | Introduction to Natural Language Processing in R | True Fundamentals | ||
Representations of Text | ||||
Applications: Classification and Topic Modeling | ||||
Advanced Techniques | ||||
12/9/22 | Machine Learning in the Tidyverse | Foundations of “tidy” Machine learning | ||
Multiple Models with broom | ||||
Build, Tune & Evaluate Regression Models | ||||
Build, Tune & Evaluate Classification Models | ||||
R Tasks
In some weeks you will be expected to complete an additional R task which are indicated by a in the table above. Collectively these serve as the R Survey EDA noted on the syllabus.
Working Ahead
By no means do you have to wait for a particular module to be assigned. If you wish to enroll in a training - one that is assigned or otherwise - simply search for the name of that course on the Data Camp site. For those modules assigned for this course, you will receive credit after the due date has passed.
Need Help?
While I am happy to meet face-to-face, it is just as easy to schedule a Zoom session using the calendar or by notifying me on Slack by adding @Dr. Abhik Roy to your message.