The Paradox of Simpson (rude)
What Even Is It, Homie?
Here are some things that can happen:
Trend Reversal After Aggregation
Correlation Reversal for Subgroups
Non-Random Assignment as Confounder - A Case Study
An example that shows this in action is the classic kidney stone study (Charig et al., 1986). Working for the NHS, the authors were interested in comparing efficacy and cost of two different treatments for kidney stones - Treatment A and Treatment B, the details aren’t important.
flowchart TD Data["700 Patients<br/>NON-RANDOMLY Assigned to Treatment A or B"] Data --> Split Split{"Analyze by<br/>stone size?"} Split --> Small Split --> Large Split --> All subgraph DISAGGREGATED Small["Small Stones<br/>357 patients"] Large["Large Stones<br/>343 patients"] Small --> SA["Treatment A<br/>81 / 87 = 93%"] Small --> SB["Treatment B<br/>234 / 270 = 87%"] SA & SB --> SW["A wins"] Large --> LA["Treatment A<br/>192 / 263 = 73%"] Large --> LB["Treatment B<br/>55 / 80 = 69%"] LA & LB --> LW["A wins"] end subgraph AGGREGATED All["All Patients<br/>700 patients"] All --> AA["Treatment A<br/>273 / 350 = 78%"] All --> AB["Treatment B<br/>289 / 350 = 83%"] AA & AB --> AW["B wins"] end style SW fill:#1D9E75,color:#000 style LW fill:#1D9E75,color:#000 style AW fill:#7F77DD,color:#000
As we can see, there is something strange going on - when we split by stone size on the “disaggregated” side, Treatment A seems superior for both splits. However, in the overall population it appears that Treatment B is superior.
The problem is that the size of the stone is a “confounder”. Treatment A was given much more frequently to those with large kidney stones than to those with small kidney stones - but the baseline recovery rate for those with large kidney stones is just generally lower! This allows for the statistics for Treatment A, when aggregated, to look worse.
The diagram of influence looks something like this:
flowchart LR C["Stone Size"] T["Treatment<br/>Assignment"] O["Recovery Rate"] C --> L1["biases assignment"] L1 --> T C --> L2["affects baseline"] L2 --> O T --> L3["what we care about"] L3 --> O style C fill:#BA7517,color:#000 style T fill:#1D9E75,color:#000 style O fill:#7F77DD,color:#000 style L1 fill:none,stroke:none,color:#888780 style L2 fill:none,stroke:none,color:#888780 style L3 fill:none,stroke:none,color:#888780