Glossary

Simpson's paradox

Simpson's paradox is a statistical phenomenon where a trend appears in several groups of data but disappears or reverses when the groups are combined. This paradox occurs in probability and statistics, showing how aggregated data can reveal different relationships compared to disaggregated data. The phenomenon demonstrates that the direction of an association between variables can change depending on whether data is examined within subgroups or as a whole.

Context and Usage

Simpson's paradox is commonly encountered in statistics, data science, epidemiology, medical research, social sciences, and decision theory. Researchers, statisticians, data analysts, and scientists encounter this phenomenon when analyzing observational studies, clinical trials, gender bias studies, treatment effectiveness comparisons, and A/B testing scenarios. The paradox frequently emerges in contexts where confounding variables affect different subgroups unequally, making it relevant for anyone working with aggregated data across multiple categories or populations.

Common Challenges

The paradox can lead to incorrect causal interpretations when analysts fail to account for lurking variables that affect subgroups differently. Misunderstandings often arise from overlooking confounding factors or improperly aggregating data without considering subgroup structures. Analysts may draw misleading conclusions about relationships between variables when they rely solely on aggregated data. The phenomenon challenges intuitive understanding of statistical relationships and can mask true causal effects, particularly in observational studies where randomization is not possible.

Related Topics: confounding variables, causal inference, statistical bias, data aggregation, omitted variable bias, observational studies

Jan 22, 2026

Reviewed by Dan Yan