Hypothesis Testing
A formal statistical framework to evaluate assumptions about populations using sample data. It defines null and alternative hypotheses, test statistics and error probabilities.
Classification
- ComplexityMedium
- Impact areaBusiness
- Decision typeDesign
- Organizational maturityIntermediate
Technical context
Principles & goals
Use cases & scenarios
Compromises
- Misinterpretation can lead to wrong business decisions.
- Underpowered studies yield unreliable results.
- Inadequate multiple-testing corrections inflate error rates.
- Document analyses in advance and avoid data peeking.
- Report p-values alongside effect sizes and confidence intervals.
- Adjust for multiple testing and perform sensitivity analyses.
I/O & resources
- Raw data or aggregated sample data
- Hypotheses (null and alternative)
- Significance level and testing plan
- Test decision and p-value
- Confidence intervals and effect size estimates
- Recommendations for next steps
Description
Hypothesis testing is the statistical framework for evaluating assumptions about populations using sample data. It formalizes decision-making by specifying null and alternative hypotheses, test statistics, and error rates. Widely used in science, product experiments and quality control, it requires careful design, power analysis and interpretation to avoid common pitfalls.
✔Benefits
- Provides a structured decision basis vs. introspective judgment.
- Quantifiable error probabilities and effect sizes.
- Transferable standard for scientific and product-related tests.
✖Limitations
- Dependence on model assumptions (e.g., distributional assumptions).
- p-values are not the direct probability of a hypothesis.
- Sensitive to selective reporting and p-hacking.
Trade-offs
Metrics
- p-value
Probability of observing data at least as extreme as observed under H0.
- statistical power
Probability of detecting a true effect (1 − Beta).
- effect size
Measure of practical importance of an effect independent of sample size.
Examples & implementations
Clinical demonstration of drug efficacy
Randomized controlled trial using hypothesis testing to evaluate primary endpoints.
A/B test of a landing page
Comparison of two page variants to test for significant conversion differences.
Control of defect rates in production
Sample-based tests to determine whether a batch meets quality criteria.
Implementation steps
Clearly define goals and hypotheses, select metrics.
Plan sample size via power analysis, create test protocol.
Collect data, run tests, interpret results robustly.
⚠️ Technical debt & bottlenecks
Technical debt
- Insufficient instrumentation hampers clean tests.
- Lack of standard protocols for experiment execution.
- Outdated analysis scripts without tests and documentation.
Known bottlenecks
Misuse examples
- Running multiple A/B tests and reporting only significant outcomes.
- Interpreting a small sample as proof of no effect.
- Presenting the p-value as the probability of H1 being true.
Typical traps
- Confusing statistical with practical significance.
- Ignoring biases from selection or attrition.
- Missing adjustment for multiple comparisons.
Required skills
Architectural drivers
Constraints
- • Legal and privacy constraints for data collection
- • Time and budget limits for sampling
- • Requirement for pre-specified analysis protocols