Catalog
concept#Data#Analytics#Architecture#Platform

Star Schema

Dimensional data model for analytics that links a central fact table to multiple dimension tables and optimizes queries for reporting and OLAP.

The star schema is a dimensional model for data warehouses that connects a central fact table with multiple dimension tables and optimizes queries for analytical reports.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

ETL/ELT tools (e.g., Apache Airflow, Informatica)Databases and data warehouse platforms (e.g., Snowflake, Redshift)BI tools (e.g., Power BI, Tableau)

Principles & goals

Clear separation of facts and dimensionsExplicit grain definition for fact tablesUse surrogate keys for consistency
Build
Domain, Enterprise

Use cases & scenarios

Compromises

  • Data inconsistencies with insufficient ETL logic
  • Performance issues with very large dimensions
  • Hidden complexity from historical dimensions
  • Define and document grain early
  • Use surrogate keys instead of natural keys
  • Define SCD strategies per dimension

I/O & resources

  • Operational source systems (transactions, logs)
  • Master data (products, customers, time)
  • ETL tools and scheduling
  • Fact and dimension data for analysis
  • Dashboards and standardized reports
  • Aggregated KPIs for BI users

Description

The star schema is a dimensional model for data warehouses that connects a central fact table with multiple dimension tables and optimizes queries for analytical reports. Through denormalization and simple joins it improves query performance but impacts storage requirements and change flexibility. Typical applications include BI, reporting, and OLAP workloads.

  • Improved query performance via simple joins
  • Easier understandability for business analysts
  • Well suited for aggregation and OLAP

  • Increased storage due to denormalization
  • Less flexible for frequent structural changes
  • Not ideal for high-frequency transactional systems

  • Query latency (p50/p95)

    Median and upper latency for typical reporting queries.

  • Storage footprint

    Total storage for fact and dimension tables including indexes.

  • ETL duration

    Duration of data load processes and historizations.

Retail: sales data mart

Customer data, product catalog and orders modeled in a star schema for fast reporting.

Finance: monthly closes

Fact transactions combined with account and time dimensions for month-end analysis.

E-commerce: marketing attribution

Click and conversion data as fact tables, dimensions for campaign, channel and customer.

1

Analyze business processes and define grain.

2

Design fact and dimension tables and determine keys.

3

Implement ETL/ELT pipelines, handle SCDs, and run tests.

⚠️ Technical debt & bottlenecks

  • Insufficient documentation of grain and mapping rules
  • Ad-hoc transformations in the data mart instead of central pipeline
  • Outdated dimension tables without maintained history
join-cardinalitydimension-sizeetl-latency
  • Using star schema as a replacement for OLTP in real-time systems
  • Ignoring historization and overwriting historical dimensions
  • Not using surrogate keys and duplicating natural keys
  • Unaccounted large-dimension cardinality heavily affects joins
  • Missing indexes on fact and dimension keys
  • Complex SCD logic without tests leads to inconsistent reports
Data modeling (dimensional modeling)SQL and ETL developmentUnderstanding of business metrics
Query performance for BIData consistency and reproducibilityScalability for large fact tables
  • Limited flexibility for schema changes
  • Dependence on stable source systems
  • Requirement for ETL capacity and scheduling