Descriptive Statistics
Core techniques to summarize and describe datasets using summary statistics and visualizations.
Classification
- ComplexityLow
- Impact areaTechnical
- Decision typeDesign
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Misinterpretation due to missing context.
- Overreliance on metrics without uncertainty estimates.
- Wrong decisions due to incorrect aggregation or sampling bias.
- Use robust measures (median, IQR) for skewed distributions.
- Document metric definitions and calculation rules.
- Include confidence intervals or uncertainty measures.
I/O & resources
- Raw datasets (CSV, Parquet, databases)
- Metadata and field descriptions
- Business objectives and analysis questions
- Tabular metrics and summary statistics
- Visualizations (histograms, boxplots, time series)
- Reports with interpretations and recommendations
Description
Descriptive statistics summarize numerical and categorical data using summary measures and visualizations to describe distribution, central tendency, dispersion, and shape. Typical measures include mean, median, variance, standard deviation and frequencies. It supports exploratory data analysis, reporting and quality control and forms the foundation for inferential methods.
✔Benefits
- Rapid orientation about data quality and structure.
- Consistent basis for reports and dashboards.
- Foundation for subsequent inferential analyses.
✖Limitations
- No causal conclusions; only descriptive summaries.
- Aggregated metrics can obscure important details.
- Sensitivity to outliers for certain measures (e.g., mean).
Trade-offs
Metrics
- Mean
Arithmetic average as a measure of central tendency.
- Median
Middle value, robust to outliers.
- Standard deviation
Measure of dispersion around the mean.
Examples & implementations
Sales dashboard with summary metrics
Monthly sales figures summarized with mean, median and quartiles to identify trends and spikes.
Sensor-based quality monitoring
Production sensors provide metrics for dispersion and central tendency to detect process deviations early.
A/B test reporting
Descriptive metrics summarize performance and conversion rates of variants and reveal basic patterns before inferential tests.
Implementation steps
Ensure data access and cleaning.
Define and compute relevant summary measures.
Visualize results and summarize in reports.
⚠️ Technical debt & bottlenecks
Technical debt
- Unclear or inconsistent metric definitions across reports.
- Outdated ETL scripts without tests for statistical correctness.
- Lack of automation for reproducibility of analyses.
Known bottlenecks
Misuse examples
- Using mean instead of median on highly skewed data leads to wrong conclusions.
- High aggregation across heterogeneous groups masks subgroup issues.
- Ignoring outliers when computing critical metrics.
Typical traps
- Confusing correlation with causation.
- Misinterpreting percentages without a baseline.
- Insufficient handling of missing values before aggregation.
Required skills
Architectural drivers
Constraints
- • Available compute resources limit real-time analyses.
- • Privacy and compliance restrict access and aggregation.
- • Heterogeneous data sources require normalization.