DATS 2102 — Data Visualization for Data Science


Week 1 — Getting Started

Focus: Course introduction, importance of data visualization in data science, and environment setup.
Learning Objectives: - Understand visualization’s role in data analysis and communication. - Install Python, Jupyter, and core libraries. - Execute basic code and create markdown cells in Jupyter. - Produce first bar and scatter plots.
Datasets: Seaborn penguins, small CSVs (population, GDP).
Core Libraries: pandas, matplotlib, seaborn.
Lecture Topics: - What is data visualization and why it matters. - Overview of course structure and expectations. - Introduction to JupyterLab workflow.
In-Class Activities: Load dataset, inspect data, create bar and scatter plots.
Homework: Set up environment, explore CSV, produce two labeled plots with captions.


Week 2 — Language of Graphs

Focus: Visual encodings, tidy data principles, grammar of graphics.
Learning Objectives: - Identify and apply core visual encodings (position, color, shape, size). - Reshape data into tidy format. - Use seaborn and altair for multi-encoding charts.
Datasets: Seaborn tips, Gapminder data.
Core Libraries: pandas, seaborn, altair.
Lecture Topics: - Mapping data to visual attributes. - Tidy data and why it matters. - Grammar of graphics overview.
In-Class Activities: Reshape and plot categorical vs. numerical data.
Homework: Create three visualizations using different encoding strategies, with explanations.


Week 3 — Distributions & Variation

Focus: Visualizing univariate distributions and variation.
Learning Objectives: - Choose appropriate distribution plots. - Understand and apply binning, kernel density estimation, ECDF. Datasets: Flight delay data, iris dataset.
Core Libraries: seaborn, matplotlib.
Lecture Topics: - When to use histograms vs. KDEs vs. box/violin plots. - Understanding variability and spread.
In-Class Activities: Compare multiple distribution plot types.
Homework: Explore and visualize distributions in two datasets with narrative.


Week 4 — Wrangling with pandas

Focus: Data cleaning, transformation, and preparation for visualization.
Learning Objectives: - Select, filter, group, summarize, and reshape data. - Work with datetime and categorical data.
Datasets: NYC taxi trips sample, COVID-19 data.
Core Libraries: pandas, matplotlib.
Lecture Topics: - Data import and export. - Common data wrangling operations.
In-Class Activities: Group data by category and visualize aggregates.
Homework: Clean a messy dataset and create three informative charts.


Week 5 — Perception & Principles

Focus: Visual perception theory and chart design principles.
Learning Objectives: - Apply Cleveland–McGill perceptual rankings. - Recognize and fix misleading visualizations.
Datasets: Simulated comparison datasets.
Core Libraries: seaborn, matplotlib.
Lecture Topics: - How humans perceive visual encodings. - Common design pitfalls.
In-Class Activities: Redesign poor visualizations.
Homework: Select a misleading chart, redesign it, and explain improvements.


Week 6 — Comparisons

Focus: Comparing categories, groups, and time series.
Learning Objectives: - Create grouped bar charts, dot plots, slope charts. - Use small multiples effectively.
Datasets: World Bank indicators.
Core Libraries: seaborn, matplotlib, plotly.
Lecture Topics: - Designing fair comparisons. - Aligning scales and baselines.
In-Class Activities: Build comparison visuals using small multiples.
Homework: Compare groups in chosen dataset using 2+ visualization types.


Week 7 — Text, Labels, & Tables

Focus: Enhancing visuals with annotations and well-formatted tables.
Learning Objectives: - Apply direct labeling and meaningful captions. - Create clear and concise tables.
Datasets: Sports statistics.
Core Libraries: matplotlib, seaborn, pandas.
Lecture Topics: - Annotating charts for storytelling. - Formatting tables for clarity.
In-Class Activities: Annotate key data points in charts.
Homework: Create a labeled and captioned visual from dataset of choice.


Week 8 — Mapping I

Focus: Fundamentals of geographic data visualization.
Learning Objectives: - Create choropleth maps and understand coordinate reference systems. - Join spatial and tabular datasets.
Datasets: US states shapefile, population data.
Core Libraries: geopandas, mapclassify, folium.
Lecture Topics: - Spatial joins. - Map classification schemes.
In-Class Activities: Produce a choropleth map from joined datasets.
Homework: Create thematic map for a real-world topic.


Week 9 — Color & Accessibility

Focus: Effective and inclusive color usage in visualization.
Learning Objectives: - Choose appropriate color palettes. - Apply accessibility best practices.
Datasets: From previous assignments.
Core Libraries: seaborn, matplotlib, colorcet.
Lecture Topics: - Sequential, diverging, qualitative palettes. - Colorblind-safe schemes.
In-Class Activities: Recolor existing charts for better accessibility.
Homework: Revise a prior visualization with improved color design.


Week 10 — Relationships & Modeling

Focus: Visualizing relationships and model fit.
Learning Objectives: - Plot scatterplots with regression lines. - Visualize residuals and model diagnostics.
Datasets: Housing prices dataset.
Core Libraries: seaborn, statsmodels, matplotlib.
Lecture Topics: - Visualizing correlation and causation. - Checking model assumptions visually.
In-Class Activities: Fit and visualize a simple regression.
Homework: Analyze and visualize a bivariate relationship with commentary.


Week 11 — Uncertainty

Focus: Representing uncertainty in data visualizations.
Learning Objectives: - Add error bars and confidence intervals. - Visualize sampling variability.
Datasets: Polling data.
Core Libraries: seaborn, matplotlib.
Lecture Topics: - Why uncertainty matters. - Techniques for communicating uncertainty.
In-Class Activities: Compare plots with and without uncertainty intervals.
Homework: Visualize uncertainty in selected dataset.


Week 12 — Visualization for ML/NLP

Focus: Visualizing machine learning and NLP outputs.
Learning Objectives: - Plot feature importance, confusion matrices, and ROC curves. - Visualize topic clusters and word clouds.
Datasets: IMDB reviews, classification dataset.
Core Libraries: scikit-learn, matplotlib, seaborn, wordcloud, bertopic.
Lecture Topics: - Visualization in the ML workflow. - Visualizing high-dimensional data.
In-Class Activities: Train a small model, visualize predictions.
Homework: Create three ML-related visualizations from a chosen dataset.


Weeks 13–14 — Final Project Workshops

Focus: Final project preparation, peer review, and refinement.
Learning Objectives: - Integrate multiple visualization techniques into one narrative. - Polish charts for professional presentation.
Datasets: Student-chosen.
In-Class Activities: Peer feedback, troubleshooting, improving visuals.
Homework: Finalize and submit project with report and reproducible code.