Catalog
concept#Data#Governance#Analytics#Integration

Data Quality

Concept for ensuring and managing data quality using metrics, governance and improvement processes.

Data quality describes the fitness of data for specific purposes, characterized by accuracy, completeness, consistency, and timeliness.
Established
Medium

Classification

  • Medium
  • Business
  • Organizational
  • Intermediate

Technical context

Data catalogs (e.g. Amundsen, DataHub)Data pipelines (e.g. Airflow, dbt)Monitoring and observability tools

Principles & goals

Define and assign ownership for measurable quality metricsEstablish data governance and data contractsIntegrate feedback and remediation loops into processes
Iterate
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Focusing on measurable metrics instead of actual value
  • Excessive gates that hinder innovation and speed
  • Lack of domain acceptance leads to workarounds
  • Start with a few business-relevant metrics
  • Integrate automated tests into CI/CD
  • Define ownership and SLAs per data product

I/O & resources

  • Data sources and their schemas
  • Business rules and acceptance criteria
  • Metadata and data lineage
  • Quality metrics and dashboards
  • Alerts and error reports
  • Improved data products and contracts

Description

Data quality describes the fitness of data for specific purposes, characterized by accuracy, completeness, consistency, and timeliness. The concept covers measurement methods, governance, data lineage and processes for improvement. It is vital for reliable analytics, operational processes and automated decision-making.

  • Increased reliability of analytics and reporting
  • Reduced error costs in operational processes
  • Better decision basis for management

  • Requires organizational alignment and ownership
  • Complete error-free data is often unattainable
  • Measurement and automation have initial implementation costs

  • Completeness rate

    Share of records with required fields populated.

  • Accuracy rate

    Share of values validated against authoritative sources.

  • Freshness/latency

    Time since last update of relevant data fields.

Customer master data consolidation

Harmonizing IDs and addresses, enriching missing fields, introducing duplicate detection.

BI dashboard with quality gate

Dashboards are published only when core metrics like completeness and timeliness meet defined thresholds.

Data trust for ML models

Continuous monitoring pipelines check data drift, missing labels and inconsistencies before training and inference.

1

Initial assessment and definition of core metrics

2

Introduce monitoring and validation pipelines

3

Operationalize data contracts and governance processes

⚠️ Technical debt & bottlenecks

  • Ad-hoc remediation scripts without tests
  • Missing data lineage for historical remediation
  • Outdated validation rules after system changes
Missing metadataIncompatible data formatsLegacy source systems
  • Optimizing 'completeness' metric in isolation while critical fields are missing
  • Automatically deleting suspicious records without review
  • Governance rules preventing necessary fast remediations
  • Relying on single metrics instead of holistic assessment
  • Ignoring context and domain logic in validations
  • Over-specification of rules that are hard to maintain
Data modeling and metadata managementData engineering and pipeline implementationDomain knowledge to define business rules
Traceability of data lineageMeasurability and monitoring of quality metricsGovernance and responsibilities
  • Privacy and compliance requirements
  • Limited resources for data maintenance
  • Heterogeneous system landscape