Catalog
concept#Data#Analytics#Observability#Software Engineering

Descriptive Statistics

Core techniques to summarize and describe datasets using summary statistics and visualizations.

Descriptive statistics summarize numerical and categorical data using summary measures and visualizations to describe distribution, central tendency, dispersion, and shape.
Established
Low

Classification

  • Low
  • Technical
  • Design
  • Intermediate

Technical context

ETL pipelines and data lakesBusiness intelligence and dashboarding toolsStatistical libraries (pandas, scipy, R)

Principles & goals

Simplicity over complexity: use compact measures for quick orientation.Transparency: methods and calculations must be documented and reproducible.Context awareness: interpret measures in conjunction with domain knowledge.
Discovery
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Misinterpretation due to missing context.
  • Overreliance on metrics without uncertainty estimates.
  • Wrong decisions due to incorrect aggregation or sampling bias.
  • Use robust measures (median, IQR) for skewed distributions.
  • Document metric definitions and calculation rules.
  • Include confidence intervals or uncertainty measures.

I/O & resources

  • Raw datasets (CSV, Parquet, databases)
  • Metadata and field descriptions
  • Business objectives and analysis questions
  • Tabular metrics and summary statistics
  • Visualizations (histograms, boxplots, time series)
  • Reports with interpretations and recommendations

Description

Descriptive statistics summarize numerical and categorical data using summary measures and visualizations to describe distribution, central tendency, dispersion, and shape. Typical measures include mean, median, variance, standard deviation and frequencies. It supports exploratory data analysis, reporting and quality control and forms the foundation for inferential methods.

  • Rapid orientation about data quality and structure.
  • Consistent basis for reports and dashboards.
  • Foundation for subsequent inferential analyses.

  • No causal conclusions; only descriptive summaries.
  • Aggregated metrics can obscure important details.
  • Sensitivity to outliers for certain measures (e.g., mean).

  • Mean

    Arithmetic average as a measure of central tendency.

  • Median

    Middle value, robust to outliers.

  • Standard deviation

    Measure of dispersion around the mean.

Sales dashboard with summary metrics

Monthly sales figures summarized with mean, median and quartiles to identify trends and spikes.

Sensor-based quality monitoring

Production sensors provide metrics for dispersion and central tendency to detect process deviations early.

A/B test reporting

Descriptive metrics summarize performance and conversion rates of variants and reveal basic patterns before inferential tests.

1

Ensure data access and cleaning.

2

Define and compute relevant summary measures.

3

Visualize results and summarize in reports.

⚠️ Technical debt & bottlenecks

  • Unclear or inconsistent metric definitions across reports.
  • Outdated ETL scripts without tests for statistical correctness.
  • Lack of automation for reproducibility of analyses.
missing metadatainconsistent formatsincomplete samples
  • Using mean instead of median on highly skewed data leads to wrong conclusions.
  • High aggregation across heterogeneous groups masks subgroup issues.
  • Ignoring outliers when computing critical metrics.
  • Confusing correlation with causation.
  • Misinterpreting percentages without a baseline.
  • Insufficient handling of missing values before aggregation.
Basic statistics knowledge (mean, median, dispersion)Data cleaning and preprocessingFundamentals of data visualization
Data availability and qualityReporting frequency requirementsReproducibility and traceability
  • Available compute resources limit real-time analyses.
  • Privacy and compliance restrict access and aggregation.
  • Heterogeneous data sources require normalization.