Central Tendency
Fundamental statistical concept summarizing a distribution by a single representative value (mean, median, mode).
Classification
- ComplexityMedium
- Impact areaTechnical
- Decision typeDesign
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Wrong choice can bias decisions.
- Ignoring outliers skews metrics.
- Inappropriate aggregation across heterogeneous groups.
- Compare mean, median and mode systematically.
- Use robust measures when outliers are present.
- Preserve distribution and dispersion alongside the central value.
I/O & resources
- Raw data or aggregated time series
- Distribution diagnostics (histograms, QQ-plots)
- Target context (reporting, model training)
- Selected central measure per feature
- Documentation of the selection decision
- Expected impacts on models and metrics
Description
Central tendency summarizes a dataset by identifying a single representative value (mean, median, mode). It guides reporting, comparison and modeling choices by describing the 'center' of distributions. Selection depends on data scale, distribution and outliers; understanding trade-offs is essential for valid interpretation.
✔Benefits
- Condenses information into an interpretable value.
- Enables quick comparisons across groups.
- Useful for reporting, imputation and baseline tests.
✖Limitations
- Loses information about dispersion and distribution shape.
- Mean is sensitive to outliers.
- Mode can be misleading for multimodal distributions.
Trade-offs
Metrics
- Mean–median difference
Measures skewness and outlier influence.
- Robustness index
Assesses stability of central measure under sample changes.
- Explainability (stakeholder comprehension)
Qualitative measure of how well a metric can be communicated.
Examples & implementations
Median for salary data
Salary distributions are right-skewed; median yields a more realistic central value.
Mean for symmetric distributions
For normally distributed measurements the arithmetic mean is informative and efficient.
Mode for categorical features
For the most frequent category value the mode is the appropriate representative measure.
Implementation steps
Perform data exploration and distribution analysis.
Determine suitable central measures per variable.
Document results and integrate into reporting/models.
⚠️ Technical debt & bottlenecks
Technical debt
- Legacy reports use outdated aggregation rules.
- Automated pipelines lacking distribution checks.
- Lack of documentation for imputation choices.
Known bottlenecks
Misuse examples
- Mean salary reported for heavily right-skewed data without median.
- Using mode to summarize continuous measurements.
- Uniform mean imputation for heavily missing and biased data.
Typical traps
- Confusing representativeness with statistical efficiency.
- Ignoring measurement scale (nominal vs. metric).
- Insufficient communication of limitations to stakeholders.
Required skills
Architectural drivers
Constraints
- • Assumptions about distribution must be documented.
- • Aggregations may imply information loss.
- • Measurement scale limits appropriate measures (nominal vs. metric).