Six New Data Analytics Case Studies Available!

We are excited to announce the addition of six new case studies to our collection! These cases offer valuable real-world scenarios and datasets for students to hone their data analytics skills across various domains.

Here's a brief overview of each new case:

Case 1: Customer Revenue Analysis

Author: Vic Anand

Overview: Dive into the financials of "Holder's Boulders," an indoor rock-climbing gym. The owner, Sally, is trying to understand why her 4-year-old business isn't thriving, suspecting issues with revenue generation despite having a typical cost structure. Using transaction-level data, this case challenges learners to analyze customer spending patterns to uncover insights.

What Learners Gain: Students will practice extracting customer insights from raw transaction data, learn to use histograms and cumulative distribution functions (CDFs) to visualize and understand customer spending and visit frequency, develop skills in interpreting these visualizations for actionable business recommendations, and gain hands-on experience with data cleaning, filtering, and aggregation using the Python Pandas library.

Language/Tools: Python and Pandas

Link to the Customer Revenue Analysis Case

Case 2: Merging Data in Pandas Tutorial

Author: Vic Anand

Overview: This tutorial tackles the essential, yet sometimes tricky, task of merging datasets using Pandas. It uses data from the "Northwind Traders" sample database, representing a gourmet food retailer, to explain core concepts. It covers why relational databases split information (e.g., customer data, order data) and how merging brings it together for analysis.

What Learners Gain: This case provides practical experience in merging Pandas data frames, understanding different types of joins (inner, left, right) and when to use them, reinforcing the importance of unique identifiers, learning typical data structures in retail/eCommerce, and practicing loading data from Excel and creating pivot tables in Pandas.

Language/Tools: Python, Pandas, and NumPy

Link to the Merging Data in Pandas Tutorial

Case 3: Assessing Health Belief and Behaviors (Illinois Workplace Wellness Study)

Authors: David Molitor and Julian Reif

Overview: Explore the complex relationship between what people believe about their health and their actual health behaviors and outcomes. This case uses data from the Illinois Workplace Wellness Study, a large-scale randomized controlled trial (RCT) evaluating the "iThrive" wellness program. It delves into the challenges of accurately measuring health beliefs and assessing the true impact of wellness interventions, accounting for factors like selection bias. The study collected both self-reported health beliefs and clinical biometric data.

What Learners Gain: Students will understand the critical role of health beliefs, learn about methodologies like RCTs to establish causal effects in health interventions, analyze real-world data comparing subjective beliefs with objective health measures, apply statistical techniques like linear regression and correlation analysis, and evaluate the multifaceted effects of a workplace wellness program.

Language/Tools: R

Link to the Health Beliefs Case

Case 4: Measuring Productivity using Principal Component Analysis (Illinois Workplace Wellness Study)

Authors: David Molitor and Julian Reif

Overview: This case introduces Principal Component Analysis (PCA), a powerful statistical technique for reducing the complexity of datasets with many related variables. Using data from the Illinois Workplace Wellness Study, students will apply PCA to construct a single "productivity index" from various employee self-reported metrics like sick days, job satisfaction, and focus.

What Learners Gain: Learners will gain an understanding of PCA as a dimensionality reduction tool, learn how to apply PCA to create a meaningful summary index from multiple variables, interpret PCA results (loadings), practice using correlation matrices to understand relationships between variables, and analyze the stability of the created index over time.

Language/Tools: Stata

Link to the Measuring Productivity Case

Case 5: Teenage Driving and Mortality (Regression Discontinuity)

Authors: David Molitor and Julian Reif

Overview: Motor vehicle accidents are a leading cause of death for young adults, with inexperienced teenage drivers at particularly high risk. This case examines the causal impact of obtaining a driver's license on mortality rates. It highlights the challenges of traditional methods due to selection bias and introduces the Regression Discontinuity (RD) research design. RD leverages the sharp cutoff of the minimum legal driving age (MLDA) to create a natural experiment, comparing outcomes for teens just below and just above the eligibility age. The analysis uses comprehensive US death certificate data from 1983-2014.

What Learners Gain: Students will explore challenges in causal inference, learn the theory and application of the Regression Discontinuity design, understand key RD concepts like bandwidth and parametric/non-parametric estimation, gain experience working with large-scale vital statistics data, and apply RD to estimate the real-world impact of driving eligibility on mortality.

Language/Tools: R

Link to the Teenage Driving and Mortality Case

Case 6: Chicago Ridesharing During COVID-19 (Visualizations)

Author: Ye Joo Park

Overview: Investigate how the COVID-19 pandemic dramatically altered urban mobility using rideshare (Uber/Lyft) data from the City of Chicago. This case focuses on visualizing and analyzing trends in trip counts, duration, distance, fares, tipping behavior, and geographic patterns before and after the onset of the pandemic and related public health measures in March 2020.

What Learners Gain: This case provides hands-on experience analyzing real-world transportation data to identify pandemic-related shifts. Students will practice using Python libraries (pandas, plotly) for data cleaning, transformation, and creating insightful visualizations exploring temporal and spatial trends. Learners will also develop skills in crafting data-driven narratives about the impact of major events on urban life.

Language/Tools: Python, Pandas, and Plotly

Link to the Chicago Ridesharing during COVID-19 Visualization Case