The Paradox of Simpson (rude)

What Even Is It, Homie?

Here are some things that can happen:

Trend Reversal After Aggregation

Correlation Reversal for Subgroups

Non-Random Assignment as Confounder - A Case Study

An example that shows this in action is the classic kidney stone study (Charig et al., 1986). Working for the NHS, the authors were interested in comparing efficacy and cost of two different treatments for kidney stones - Treatment A and Treatment B, the details aren’t important.


flowchart TD
    Data["700 Patients<br/>NON-RANDOMLY Assigned to Treatment A or B"]
    Data --> Split

    Split{"Analyze by<br/>stone size?"}
    Split --> Small
    Split --> Large
    Split --> All

    subgraph DISAGGREGATED
    Small["Small Stones<br/>357 patients"]
    Large["Large Stones<br/>343 patients"]

    Small --> SA["Treatment A<br/>81 / 87 = 93%"]
    Small --> SB["Treatment B<br/>234 / 270 = 87%"]
    SA & SB --> SW["A wins"]

    Large --> LA["Treatment A<br/>192 / 263 = 73%"]
    Large --> LB["Treatment B<br/>55 / 80 = 69%"]
    LA & LB --> LW["A wins"]
    end

    subgraph AGGREGATED
    All["All Patients<br/>700 patients"]
    All --> AA["Treatment A<br/>273 / 350 = 78%"]
    All --> AB["Treatment B<br/>289 / 350 = 83%"]
    AA & AB --> AW["B wins"]
    end

    style SW      fill:#1D9E75,color:#000
    style LW      fill:#1D9E75,color:#000
    style AW      fill:#7F77DD,color:#000

As we can see, there is something strange going on - when we split by stone size on the “disaggregated” side, Treatment A seems superior for both splits. However, in the overall population it appears that Treatment B is superior.

The problem is that the size of the stone is a “confounder”. Treatment A was given much more frequently to those with large kidney stones than to those with small kidney stones - but the baseline recovery rate for those with large kidney stones is just generally lower! This allows for the statistics for Treatment A, when aggregated, to look worse.

The diagram of influence looks something like this:

flowchart LR
    C["Stone Size"]
    T["Treatment<br/>Assignment"]
    O["Recovery Rate"]

    C --> L1["biases assignment"]
    L1 --> T
    C --> L2["affects baseline"]
    L2 --> O
    T --> L3["what we care about"]
    L3 --> O

    style C  fill:#BA7517,color:#000
    style T  fill:#1D9E75,color:#000
    style O  fill:#7F77DD,color:#000
    style L1 fill:none,stroke:none,color:#888780
    style L2 fill:none,stroke:none,color:#888780
    style L3 fill:none,stroke:none,color:#888780

Simulations