Lambda Architecture
An architectural pattern combining batch and real-time processing for scalable data platforms.
Classification
- ComplexityHigh
- Impact areaTechnical
- Decision typeArchitectural
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Divergent results between speed and batch views can erode trust.
- High operational costs due to parallel infrastructure for batch and real-time.
- Lack of automated reconciliations leads to manual error corrections.
- Use idempotent events and unique timestamps.
- Implement automated reconciliation processes.
- Establish monitoring for latency, throughput and divergence.
I/O & resources
- Real-time event streams
- Raw data for batch processing
- Metadata, timestamps and schema definitions
- Low-latency metrics and alerts
- Corrected, final aggregations
- Serve index for read APIs
Description
Lambda Architecture is a structural architectural principle for large-scale data processing that separates batch and speed pipelines. It defines batch, speed and serving layers to combine accuracy with low latency. Typical decisions involve trade-offs around consistency, complexity and operational cost in data integration.
✔Benefits
- Combines low latency and high accuracy via specialized pipelines.
- Clear separation of responsibilities eases scaling of individual layers.
- Batch layer allows full corrections and re-processing on errors.
✖Limitations
- Considerable implementation and operational overhead due to duplicated logic.
- Complex consistency and error handling models between layers.
- Increasing maintenance effort when data models change.
Trade-offs
Metrics
- End-to-end latency
Time from event arrival to availability in the serving layer.
- Batch processing duration
Duration of complete batch jobs to produce corrected results.
- Data divergence rate
Share of inconsistent values between speed and batch views.
Examples & implementations
Real-time analytics platform with Spark and Kafka
Combination of Apache Spark (batch), Spark Streaming (speed) and Kafka as ingest for low latency.
Log analysis with separate serving index
Batch computation yields corrected aggregations, speed layer supplies dashboards, serving layer indexes results.
Hybrid reporting in an e-commerce system
Real-time conversion metrics combined with daily computed revenue figures in the serving layer.
Implementation steps
Analyze requirements for latency, accuracy and volume.
Define data flows, storage locations and interfaces.
Develop a minimal speed layer for critical dashboards.
Implement the batch layer with re-processing capability.
Build a serving layer and establish validation processes.
⚠️ Technical debt & bottlenecks
Technical debt
- Ad-hoc reconciliations instead of proper re-processing pipelines.
- Outdated batch jobs that cannot handle new schemas.
- Insufficient test coverage for divergence cases.
Known bottlenecks
Misuse examples
- Implementing only the speed layer without batch corrections.
- Full duplication of complex logic in both pipelines.
- Neglecting reconciliation tests before production.
Typical traps
- Underestimating operational effort for parallel pipelines.
- Missing unique timestamps complicate corrections.
- Serving layer becomes bottleneck due to insufficient indexing.
Required skills
Architectural drivers
Constraints
- • Existing infrastructure for batch and stream processing
- • Limited resources for parallel operation of multiple layers
- • Regulatory requirements for data retention and correction