Teaches the skill that consumes 60-80% of every data professional's working time: turning messy, inconsistent, incomplete data into analysis-ready datasets. Covers data profiling, source reconciliation, standardization, deduplication, missing value strategies, and building systematic data quality monitoring. Students confront the reality that "data cleaning" is not a preliminary step — it is the core of the job.
Levels: Remember · Understand · Apply · Analyze · Evaluate · Create — highest demands most original thinking.
Systematic first-pass assessment of any dataset's shape, quality, and known issues before analysis begins.
Merging conflicting data from multiple systems with documented priority logic and traceability.
Normalizing text, parsing dates, harmonizing categories, and eliminating duplicate records with auditable logic.
Diagnosing missingness patterns, handling nulls appropriately, resolving data anomalies that break standard queries.
Building automated checks, freshness alerts, and quality reporting that catch problems before stakeholders do.
Data Quality Remediation Report — Student receives a realistic messy dataset (inconsistent categories, duplicate records, missing values, conflicting sources, timezone issues) and produces: a data profiling report, a documented cleaning pipeline (Python + SQL) with every transformation explained, a reconciliation log for conflicting sources, a final clean dataset, and a data quality monitoring plan with proposed dbt tests. Includes a summary of time allocation (profiling vs. cleaning vs. analysis) demonstrating the 80/20 reality.
Data manipulation libraries for profiling, cleaning, standardizing, and transforming datasets programmatically.
Warehouse-based data cleaning and quality checks using SQL queries and transformations.
Data transformation and testing framework for automated quality checks and freshness monitoring.
Interactive computing environment for documenting and executing data cleaning pipelines step by step.
Version control for tracking cleaning pipeline changes and maintaining reproducible workflows.
AI assistant for pandas help, cleaning strategy review, and debugging data quality issues.
Take the free AI-guided assessment. We'll build your personalized path through the Foundations and your chosen major.
Start Your Assessment