Catalog
concept#Data#Integration#Architecture#Security

Data Source

Origins of data that define format, semantics and timeliness; foundation for integration and data quality.

A data source is the origin of data consumed by systems: databases, files, sensors, applications or APIs that produce or expose data.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

Relational databases (e.g. PostgreSQL)Message brokers (e.g. Kafka)Data platforms / data lakes

Principles & goals

Document explicit provenance and ownershipDefine clear contracts (schemas/SLAs) between source and consumersManage metadata and semantics centrally
Discovery
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Faulty sources lead to incorrect analyses
  • Data breaches through insecure integration
  • High effort for adaptations on schema changes
  • Establish source contracts (schema + SLA) early
  • Store provenance and metadata consistently
  • Perform validation at the ingestion boundary

I/O & resources

  • Endpoint URL or file path
  • Schema or data model description
  • Access credentials and permissions
  • Ingested, normalized data
  • Metadata and provenance information
  • Quality and validation reports

Description

A data source is the origin of data consumed by systems: databases, files, sensors, applications or APIs that produce or expose data. It specifies format, semantics, update frequency and trust level, and is essential for integration, data quality and governance. Understanding data sources informs architecture, privacy and operational decisions.

  • Improved traceability and auditability of data
  • Better data quality through early validation
  • Enables targeted integration and efficient transformations

  • Sources can be unreliable or inconsistent
  • Constraints from SLAs, rate limits or formats
  • Privacy and compliance restrictions may limit use

  • Freshness

    Time since the source was last updated; important for time-sensitive data.

  • Completeness

    Proportion of expected fields/records that were successfully delivered.

  • Ingest error rate

    Share of erroneous or rejected records during ingestion.

IoT platform as a data source

Sensors feed telemetry into an IoT platform used as the primary data source for analytics.

Legacy ERP as master data source

A legacy ERP system remains the authoritative source for product and customer master data.

Third-party API for market prices

An external market data provider supplies price data via API; timeliness and SLA are critical.

1

Create and prioritize a source inventory

2

Define schemas and contracts

3

Build ingest pipelines with validation and monitoring

4

Establish metadata and governance processes

⚠️ Technical debt & bottlenecks

  • Legacy connectors without automation
  • Hardcoded credentials in ingest scripts
  • Missing central metadata catalog
Network latencySchema incompatibilitiesRate limits / throttling
  • Using an unreliable public API for reporting
  • Merging different source formats without mapping
  • Leaving sensitive fields exposed instead of masking
  • Assuming sources are immutable
  • Ignoring rate limits and backoff mechanisms
  • Overlooking time and timezone issues in timestamps
Data modeling and semanticsAPI integration and authenticationETL/ELT development and validation
Scalability of data ingestionData quality and trustworthinessRegulatory requirements and compliance
  • Legal requirements and data protection
  • Existing data formats and protocols
  • Access rights and authentication