DATS 2102 — Data Visualization for Data Science
Week 1 — Getting Started
Focus: Course introduction, importance of data visualization in data science, and environment setup.
Learning Objectives:
- Understand visualization’s role in data analysis and communication.
- Install Python, Jupyter, and core libraries.
- Execute basic code and create markdown cells in Jupyter.
- Produce first bar and scatter plots.
Datasets: Seaborn penguins
, small CSVs (population, GDP).
Core Libraries: pandas, matplotlib, seaborn.
Lecture Topics:
- What is data visualization and why it matters.
- Overview of course structure and expectations.
- Introduction to JupyterLab workflow.
In-Class Activities: Load dataset, inspect data, create bar and scatter plots.
Homework: Set up environment, explore CSV, produce two labeled plots with captions.
Week 2 — Language of Graphs
Focus: Visual encodings, tidy data principles, grammar of graphics.
Learning Objectives:
- Identify and apply core visual encodings (position, color, shape, size).
- Reshape data into tidy format.
- Use seaborn and altair for multi-encoding charts.
Datasets: Seaborn tips
, Gapminder data.
Core Libraries: pandas, seaborn, altair.
Lecture Topics:
- Mapping data to visual attributes.
- Tidy data and why it matters.
- Grammar of graphics overview.
In-Class Activities: Reshape and plot categorical vs. numerical data.
Homework: Create three visualizations using different encoding strategies, with explanations.
Week 3 — Distributions & Variation
Focus: Visualizing univariate distributions and variation.
Learning Objectives:
- Choose appropriate distribution plots.
- Understand and apply binning, kernel density estimation, ECDF.
Datasets: Flight delay data, iris dataset.
Core Libraries: seaborn, matplotlib.
Lecture Topics:
- When to use histograms vs. KDEs vs. box/violin plots.
- Understanding variability and spread.
In-Class Activities: Compare multiple distribution plot types.
Homework: Explore and visualize distributions in two datasets with narrative.
Week 4 — Wrangling with pandas
Focus: Data cleaning, transformation, and preparation for visualization.
Learning Objectives:
- Select, filter, group, summarize, and reshape data.
- Work with datetime and categorical data.
Datasets: NYC taxi trips sample, COVID-19 data.
Core Libraries: pandas, matplotlib.
Lecture Topics:
- Data import and export.
- Common data wrangling operations.
In-Class Activities: Group data by category and visualize aggregates.
Homework: Clean a messy dataset and create three informative charts.
Week 5 — Perception & Principles
Focus: Visual perception theory and chart design principles.
Learning Objectives:
- Apply Cleveland–McGill perceptual rankings.
- Recognize and fix misleading visualizations.
Datasets: Simulated comparison datasets.
Core Libraries: seaborn, matplotlib.
Lecture Topics:
- How humans perceive visual encodings.
- Common design pitfalls.
In-Class Activities: Redesign poor visualizations.
Homework: Select a misleading chart, redesign it, and explain improvements.
Week 6 — Comparisons
Focus: Comparing categories, groups, and time series.
Learning Objectives:
- Create grouped bar charts, dot plots, slope charts.
- Use small multiples effectively.
Datasets: World Bank indicators.
Core Libraries: seaborn, matplotlib, plotly.
Lecture Topics:
- Designing fair comparisons.
- Aligning scales and baselines.
In-Class Activities: Build comparison visuals using small multiples.
Homework: Compare groups in chosen dataset using 2+ visualization types.
Week 7 — Text, Labels, & Tables
Focus: Enhancing visuals with annotations and well-formatted tables.
Learning Objectives:
- Apply direct labeling and meaningful captions.
- Create clear and concise tables.
Datasets: Sports statistics.
Core Libraries: matplotlib, seaborn, pandas.
Lecture Topics:
- Annotating charts for storytelling.
- Formatting tables for clarity.
In-Class Activities: Annotate key data points in charts.
Homework: Create a labeled and captioned visual from dataset of choice.
Week 8 — Mapping I
Focus: Fundamentals of geographic data visualization.
Learning Objectives:
- Create choropleth maps and understand coordinate reference systems.
- Join spatial and tabular datasets.
Datasets: US states shapefile, population data.
Core Libraries: geopandas, mapclassify, folium.
Lecture Topics:
- Spatial joins.
- Map classification schemes.
In-Class Activities: Produce a choropleth map from joined datasets.
Homework: Create thematic map for a real-world topic.
Week 9 — Color & Accessibility
Focus: Effective and inclusive color usage in visualization.
Learning Objectives:
- Choose appropriate color palettes.
- Apply accessibility best practices.
Datasets: From previous assignments.
Core Libraries: seaborn, matplotlib, colorcet.
Lecture Topics:
- Sequential, diverging, qualitative palettes.
- Colorblind-safe schemes.
In-Class Activities: Recolor existing charts for better accessibility.
Homework: Revise a prior visualization with improved color design.
Week 10 — Relationships & Modeling
Focus: Visualizing relationships and model fit.
Learning Objectives:
- Plot scatterplots with regression lines.
- Visualize residuals and model diagnostics.
Datasets: Housing prices dataset.
Core Libraries: seaborn, statsmodels, matplotlib.
Lecture Topics:
- Visualizing correlation and causation.
- Checking model assumptions visually.
In-Class Activities: Fit and visualize a simple regression.
Homework: Analyze and visualize a bivariate relationship with commentary.
Week 11 — Uncertainty
Focus: Representing uncertainty in data visualizations.
Learning Objectives:
- Add error bars and confidence intervals.
- Visualize sampling variability.
Datasets: Polling data.
Core Libraries: seaborn, matplotlib.
Lecture Topics:
- Why uncertainty matters.
- Techniques for communicating uncertainty.
In-Class Activities: Compare plots with and without uncertainty intervals.
Homework: Visualize uncertainty in selected dataset.
Week 12 — Visualization for ML/NLP
Focus: Visualizing machine learning and NLP outputs.
Learning Objectives:
- Plot feature importance, confusion matrices, and ROC curves.
- Visualize topic clusters and word clouds.
Datasets: IMDB reviews, classification dataset.
Core Libraries: scikit-learn, matplotlib, seaborn, wordcloud, bertopic.
Lecture Topics:
- Visualization in the ML workflow.
- Visualizing high-dimensional data.
In-Class Activities: Train a small model, visualize predictions.
Homework: Create three ML-related visualizations from a chosen dataset.
Weeks 13–14 — Final Project Workshops
Focus: Final project preparation, peer review, and refinement.
Learning Objectives:
- Integrate multiple visualization techniques into one narrative.
- Polish charts for professional presentation.
Datasets: Student-chosen.
In-Class Activities: Peer feedback, troubleshooting, improving visuals.
Homework: Finalize and submit project with report and reproducible code.