M05-02 · AI + Data & Decision Science

Data Cleaning and Quality Management

AI + Data & Decision Science →

Teaches the skill that consumes 60-80% of every data professional's working time: turning messy, inconsistent, incomplete data into analysis-ready datasets. Covers data profiling, source reconciliation, standardization, deduplication, missing value strategies, and building systematic data quality monitoring. Students confront the reality that "data cleaning" is not a preliminary step — it is the core of the job.

35 Hours
8 Learning objectives
Create Bloom's ceiling (?)
5 Competencies

Learning Objectives

Objectives

Depth
  • Execute a systematic data profiling methodology on any new dataset: null counts, duplicates, distributions, date ranges, unique value inventories, and data type validation Apply
  • Reconcile conflicting data sources by establishing priority hierarchies, building mapping tables, and documenting reconciliation logic for auditability Apply
  • Standardize text fields through normalization (case, whitespace, encoding), date parsing across mixed formats, and categorical value harmonization using mapping dictionaries Apply
  • Implement deduplication strategies using composite keys, ROW_NUMBER windowing, and fuzzy matching, documenting which record is kept and why Apply
  • Evaluate missing value patterns to distinguish between MCAR, MAR, and MNAR missingness and select appropriate handling strategies (imputation, proxy, exclusion with documentation) Evaluate
  • Identify and resolve edge cases that break standard queries — re-entries, retroactive changes, timezone inconsistencies, schema migrations — by investigating data lineage Analyze
  • Create data quality reports that inventory known issues by severity, document assumptions made during cleaning, and propose upstream fixes Create
  • Implement dbt tests and automated data quality checks (freshness, row count thresholds, referential integrity) to catch data issues before they reach dashboards Apply

Levels: Remember · Understand · Apply · Analyze · Evaluate · Create — highest demands most original thinking.

What You'll Master

Data Profiling

Systematic first-pass assessment of any dataset's shape, quality, and known issues before analysis begins.

Source Reconciliation

Merging conflicting data from multiple systems with documented priority logic and traceability.

Standardization & Deduplication

Normalizing text, parsing dates, harmonizing categories, and eliminating duplicate records with auditable logic.

Missing Value & Edge Case Resolution

Diagnosing missingness patterns, handling nulls appropriately, resolving data anomalies that break standard queries.

Data Quality Monitoring

Building automated checks, freshness alerts, and quality reporting that catch problems before stakeholders do.

What You'll Build

Data Quality Remediation Report — Student receives a realistic messy dataset (inconsistent categories, duplicate records, missing values, conflicting sources, timezone issues) and produces: a data profiling report, a documented cleaning pipeline (Python + SQL) with every transformation explained, a reconciliation log for conflicting sources, a final clean dataset, and a data quality monitoring plan with proposed dbt tests. Includes a summary of time allocation (profiling vs. cleaning vs. analysis) demonstrating the 80/20 reality.

Industry Tools, Not Toy Projects

Python (pandas, numpy)

Data manipulation libraries for profiling, cleaning, standardizing, and transforming datasets programmatically.

SQL (Snowflake/BigQuery)

Warehouse-based data cleaning and quality checks using SQL queries and transformations.

dbt

Data transformation and testing framework for automated quality checks and freshness monitoring.

Jupyter Notebooks

Interactive computing environment for documenting and executing data cleaning pipelines step by step.

Git / GitHub

Version control for tracking cleaning pipeline changes and maintaining reproducible workflows.

Claude

AI assistant for pandas help, cleaning strategy review, and debugging data quality issues.

Prerequisites

Ready to start learning?

Take the free AI-guided assessment. We'll build your personalized path through the Foundations and your chosen major.

Start Your Assessment
Free · 15 minutes · No credit card