Catalog
concept#Analytics#Data#Quality Assurance#Software Engineering

Hypothesis Testing

A formal statistical framework to evaluate assumptions about populations using sample data. It defines null and alternative hypotheses, test statistics and error probabilities.

Hypothesis testing is the statistical framework for evaluating assumptions about populations using sample data.
Established
Medium

Classification

  • Medium
  • Business
  • Design
  • Intermediate

Technical context

Analytics stacks (e.g., Python, R, SQL)Experiment frameworks and feature flagsReporting and BI systems

Principles & goals

Predefine hypotheses and analysis rules.Perform power analysis to determine sample size.Report metrics and uncertainties transparently.
Discovery
Domain, Team

Use cases & scenarios

Compromises

  • Misinterpretation can lead to wrong business decisions.
  • Underpowered studies yield unreliable results.
  • Inadequate multiple-testing corrections inflate error rates.
  • Document analyses in advance and avoid data peeking.
  • Report p-values alongside effect sizes and confidence intervals.
  • Adjust for multiple testing and perform sensitivity analyses.

I/O & resources

  • Raw data or aggregated sample data
  • Hypotheses (null and alternative)
  • Significance level and testing plan
  • Test decision and p-value
  • Confidence intervals and effect size estimates
  • Recommendations for next steps

Description

Hypothesis testing is the statistical framework for evaluating assumptions about populations using sample data. It formalizes decision-making by specifying null and alternative hypotheses, test statistics, and error rates. Widely used in science, product experiments and quality control, it requires careful design, power analysis and interpretation to avoid common pitfalls.

  • Provides a structured decision basis vs. introspective judgment.
  • Quantifiable error probabilities and effect sizes.
  • Transferable standard for scientific and product-related tests.

  • Dependence on model assumptions (e.g., distributional assumptions).
  • p-values are not the direct probability of a hypothesis.
  • Sensitive to selective reporting and p-hacking.

  • p-value

    Probability of observing data at least as extreme as observed under H0.

  • statistical power

    Probability of detecting a true effect (1 − Beta).

  • effect size

    Measure of practical importance of an effect independent of sample size.

Clinical demonstration of drug efficacy

Randomized controlled trial using hypothesis testing to evaluate primary endpoints.

A/B test of a landing page

Comparison of two page variants to test for significant conversion differences.

Control of defect rates in production

Sample-based tests to determine whether a batch meets quality criteria.

1

Clearly define goals and hypotheses, select metrics.

2

Plan sample size via power analysis, create test protocol.

3

Collect data, run tests, interpret results robustly.

⚠️ Technical debt & bottlenecks

  • Insufficient instrumentation hampers clean tests.
  • Lack of standard protocols for experiment execution.
  • Outdated analysis scripts without tests and documentation.
sample-sizemeasurement-errorselection-bias
  • Running multiple A/B tests and reporting only significant outcomes.
  • Interpreting a small sample as proof of no effect.
  • Presenting the p-value as the probability of H1 being true.
  • Confusing statistical with practical significance.
  • Ignoring biases from selection or attrition.
  • Missing adjustment for multiple comparisons.
Basic statistics knowledge (hypotheses, distributions)Experience with experiment design and power analysisAbility to interpret and communicate data-driven results
Data quality and measurement accuracyStatistical competence and interpretationExperiment design and infrastructure
  • Legal and privacy constraints for data collection
  • Time and budget limits for sampling
  • Requirement for pre-specified analysis protocols