Catalog
concept#Data#Analytics#Platform

Central Tendency

Fundamental statistical concept summarizing a distribution by a single representative value (mean, median, mode).

Central tendency summarizes a dataset by identifying a single representative value (mean, median, mode).
Established
Medium

Classification

  • Medium
  • Technical
  • Design
  • Intermediate

Technical context

Statistical libraries (NumPy, SciPy, pandas)Reporting and BI toolsModeling and ML pipelines

Principles & goals

Choose measure based on distribution and outliers.Use robust measures for skewed or contaminated data.Document the choice and its implications.
Discovery
Domain, Team

Use cases & scenarios

Compromises

  • Wrong choice can bias decisions.
  • Ignoring outliers skews metrics.
  • Inappropriate aggregation across heterogeneous groups.
  • Compare mean, median and mode systematically.
  • Use robust measures when outliers are present.
  • Preserve distribution and dispersion alongside the central value.

I/O & resources

  • Raw data or aggregated time series
  • Distribution diagnostics (histograms, QQ-plots)
  • Target context (reporting, model training)
  • Selected central measure per feature
  • Documentation of the selection decision
  • Expected impacts on models and metrics

Description

Central tendency summarizes a dataset by identifying a single representative value (mean, median, mode). It guides reporting, comparison and modeling choices by describing the 'center' of distributions. Selection depends on data scale, distribution and outliers; understanding trade-offs is essential for valid interpretation.

  • Condenses information into an interpretable value.
  • Enables quick comparisons across groups.
  • Useful for reporting, imputation and baseline tests.

  • Loses information about dispersion and distribution shape.
  • Mean is sensitive to outliers.
  • Mode can be misleading for multimodal distributions.

  • Mean–median difference

    Measures skewness and outlier influence.

  • Robustness index

    Assesses stability of central measure under sample changes.

  • Explainability (stakeholder comprehension)

    Qualitative measure of how well a metric can be communicated.

Median for salary data

Salary distributions are right-skewed; median yields a more realistic central value.

Mean for symmetric distributions

For normally distributed measurements the arithmetic mean is informative and efficient.

Mode for categorical features

For the most frequent category value the mode is the appropriate representative measure.

1

Perform data exploration and distribution analysis.

2

Determine suitable central measures per variable.

3

Document results and integrate into reporting/models.

⚠️ Technical debt & bottlenecks

  • Legacy reports use outdated aggregation rules.
  • Automated pipelines lacking distribution checks.
  • Lack of documentation for imputation choices.
outliersheterogeneous groupsmultimodality
  • Mean salary reported for heavily right-skewed data without median.
  • Using mode to summarize continuous measurements.
  • Uniform mean imputation for heavily missing and biased data.
  • Confusing representativeness with statistical efficiency.
  • Ignoring measurement scale (nominal vs. metric).
  • Insufficient communication of limitations to stakeholders.
Basic statistics knowledgeData preparation and visualizationUnderstanding of data quality and biases
Data distribution and outlier prevalenceMeasurement scale of featuresUse case: reporting vs. modeling
  • Assumptions about distribution must be documented.
  • Aggregations may imply information loss.
  • Measurement scale limits appropriate measures (nominal vs. metric).