Catalog
concept#Data#Integration#Architecture

Data Integration

Data integration unifies heterogeneous data sources into consistent, usable views to support analytics and operations.

Data integration describes processes, tools and concepts to combine heterogeneous data sources into consistent, usable views.
Established
High

Classification

  • High
  • Technical
  • Architectural
  • Intermediate

Technical context

Relational databases (e.g., PostgreSQL)Message brokers and streaming platforms (e.g., Kafka)Data warehouses and lakes (e.g., Snowflake, S3)

Principles & goals

Establish a single source of truthAutomate lineage and quality assuranceExplicit semantics and standardization
Build
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Inconsistent or incorrect reports due to faulty mappings
  • Violation of privacy and compliance requirements
  • Operational outages due to faulty pipelines
  • Versioning of mappings and transformation logic
  • Integrate automated tests and validations into CI/CD
  • Capture comprehensive monitoring, alerts and lineage

I/O & resources

  • Access to source systems and their schemas
  • Mapping definitions and transformation rules
  • Governance and security policies
  • Consolidated datasets and views
  • ETL/ELT pipelines and artifacts
  • Data lineage and change logs

Description

Data integration describes processes, tools and concepts to combine heterogeneous data sources into consistent, usable views. It covers extraction, transformation, harmonization and consolidation for analytics, operations and decision support. Goals include semantic coherence, improved data quality and reliable access points across architectures and governance models.

  • Improved decision-making through consolidated data
  • Reusable data products and reduced integration effort
  • Better traceability and compliance support

  • High implementation effort with heterogeneous sources
  • Latency vs. consistency trade-offs in real-time scenarios
  • Dependence on metadata and governance disciplines

  • Data freshness

    Time lag between source and consolidated view; measures freshness.

  • Integration failure rate

    Share of failed pipeline runs per time unit.

  • MTTR for integration outages

    Mean time to recover after disruptions of integration processes.

Airbyte + dbt for ELT pipelines

Open ELT pipeline using Airbyte for extraction and dbt for modeling in the data warehouse.

Real-time inventory via Kafka

Event-driven synchronization of inventory levels via a Kafka-based broker system.

Master data management for customers

Consolidation of distributed customer data with deduplication rules and governance processes.

1

Clarify goals, domains and ownership; inventory sources.

2

Define data models and mappings; establish quality rules.

3

Select technology stack and run a POC (e.g., Airbyte, Kafka, dbt).

4

Implement, test, monitor pipelines and improve iteratively.

⚠️ Technical debt & bottlenecks

  • Poorly documented transformation logic
  • Hard-coded mappings instead of configurable rules
  • No lineage or audit information stored
Source heterogeneityNetwork and I/O bandwidthSchema and version management
  • Dumping raw data into a data lake and declaring it integrated.
  • Relying solely on batch when real-time synchronization is required.
  • Merging without deduplication and quality rules.
  • Underestimating effort for data cleansing
  • Not planning for schema evolution from the start
  • Assuming stability of source systems
Data engineering and pipeline developmentData modeling and metadata managementMonitoring, observability and troubleshooting
Scalability for data volumeData quality and schema governanceSecurity, privacy and compliance
  • Budget and operational resources
  • Legacy systems with limited interfaces
  • Regulatory requirements and data protection