Star Schema
Dimensional data model for analytics that links a central fact table to multiple dimension tables and optimizes queries for reporting and OLAP.
Classification
- ComplexityMedium
- Impact areaTechnical
- Decision typeArchitectural
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Data inconsistencies with insufficient ETL logic
- Performance issues with very large dimensions
- Hidden complexity from historical dimensions
- Define and document grain early
- Use surrogate keys instead of natural keys
- Define SCD strategies per dimension
I/O & resources
- Operational source systems (transactions, logs)
- Master data (products, customers, time)
- ETL tools and scheduling
- Fact and dimension data for analysis
- Dashboards and standardized reports
- Aggregated KPIs for BI users
Description
The star schema is a dimensional model for data warehouses that connects a central fact table with multiple dimension tables and optimizes queries for analytical reports. Through denormalization and simple joins it improves query performance but impacts storage requirements and change flexibility. Typical applications include BI, reporting, and OLAP workloads.
✔Benefits
- Improved query performance via simple joins
- Easier understandability for business analysts
- Well suited for aggregation and OLAP
✖Limitations
- Increased storage due to denormalization
- Less flexible for frequent structural changes
- Not ideal for high-frequency transactional systems
Trade-offs
Metrics
- Query latency (p50/p95)
Median and upper latency for typical reporting queries.
- Storage footprint
Total storage for fact and dimension tables including indexes.
- ETL duration
Duration of data load processes and historizations.
Examples & implementations
Retail: sales data mart
Customer data, product catalog and orders modeled in a star schema for fast reporting.
Finance: monthly closes
Fact transactions combined with account and time dimensions for month-end analysis.
E-commerce: marketing attribution
Click and conversion data as fact tables, dimensions for campaign, channel and customer.
Implementation steps
Analyze business processes and define grain.
Design fact and dimension tables and determine keys.
Implement ETL/ELT pipelines, handle SCDs, and run tests.
⚠️ Technical debt & bottlenecks
Technical debt
- Insufficient documentation of grain and mapping rules
- Ad-hoc transformations in the data mart instead of central pipeline
- Outdated dimension tables without maintained history
Known bottlenecks
Misuse examples
- Using star schema as a replacement for OLTP in real-time systems
- Ignoring historization and overwriting historical dimensions
- Not using surrogate keys and duplicating natural keys
Typical traps
- Unaccounted large-dimension cardinality heavily affects joins
- Missing indexes on fact and dimension keys
- Complex SCD logic without tests leads to inconsistent reports
Required skills
Architectural drivers
Constraints
- • Limited flexibility for schema changes
- • Dependence on stable source systems
- • Requirement for ETL capacity and scheduling