Data Source
Origins of data that define format, semantics and timeliness; foundation for integration and data quality.
Classification
- ComplexityMedium
- Impact areaTechnical
- Decision typeArchitectural
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Faulty sources lead to incorrect analyses
- Data breaches through insecure integration
- High effort for adaptations on schema changes
- Establish source contracts (schema + SLA) early
- Store provenance and metadata consistently
- Perform validation at the ingestion boundary
I/O & resources
- Endpoint URL or file path
- Schema or data model description
- Access credentials and permissions
- Ingested, normalized data
- Metadata and provenance information
- Quality and validation reports
Description
A data source is the origin of data consumed by systems: databases, files, sensors, applications or APIs that produce or expose data. It specifies format, semantics, update frequency and trust level, and is essential for integration, data quality and governance. Understanding data sources informs architecture, privacy and operational decisions.
✔Benefits
- Improved traceability and auditability of data
- Better data quality through early validation
- Enables targeted integration and efficient transformations
✖Limitations
- Sources can be unreliable or inconsistent
- Constraints from SLAs, rate limits or formats
- Privacy and compliance restrictions may limit use
Trade-offs
Metrics
- Freshness
Time since the source was last updated; important for time-sensitive data.
- Completeness
Proportion of expected fields/records that were successfully delivered.
- Ingest error rate
Share of erroneous or rejected records during ingestion.
Examples & implementations
IoT platform as a data source
Sensors feed telemetry into an IoT platform used as the primary data source for analytics.
Legacy ERP as master data source
A legacy ERP system remains the authoritative source for product and customer master data.
Third-party API for market prices
An external market data provider supplies price data via API; timeliness and SLA are critical.
Implementation steps
Create and prioritize a source inventory
Define schemas and contracts
Build ingest pipelines with validation and monitoring
Establish metadata and governance processes
⚠️ Technical debt & bottlenecks
Technical debt
- Legacy connectors without automation
- Hardcoded credentials in ingest scripts
- Missing central metadata catalog
Known bottlenecks
Misuse examples
- Using an unreliable public API for reporting
- Merging different source formats without mapping
- Leaving sensitive fields exposed instead of masking
Typical traps
- Assuming sources are immutable
- Ignoring rate limits and backoff mechanisms
- Overlooking time and timezone issues in timestamps
Required skills
Architectural drivers
Constraints
- • Legal requirements and data protection
- • Existing data formats and protocols
- • Access rights and authentication