I was starting to procrastinate writing this, so I went to my YouTube subscription feed. And lo and behold, what should I see, but a video on Simpson’s Paradox! It was a sign! I’d heard of this paradox before, but I’d kinda forgotten it.
So, Simpson’s paradox isn’t about doughnuts or cartoon characters, sadly, it is about statistics. Now, statistics seem pretty straightforward. Count some stuff, do some math, get a figure or percentage. But Simpson’s paradox shows just how complicated it can be to get that process right. It turns out, the scale at which you ‘zoom in’ can drastically change your results.
The classic example of this seems to be UC Berkeley’s gender bias, so I’m just going to roll with it.
So, in 1973, UC Berkeley wanted to know if they were discriminating against women in their admissions process. Pretty simple, right? Count the dudes, count the chicks, do some simple math, and there you go!
Let’s say that, to start with, the folks at UC Berkeley do exactly that. They look at the University as a whole, and they count the men and women admitted. They ended up with this chart (I straight-up snatched these from Wikipedia btw):
Aha! They’ve been caught! Men are admitted more often that women by 9%! They are discriminating against women!
But wait, before you get up in arms, remember what I said about how zooming in can change the result? Let’s zoom in a bit, and look at the individual departments. We should get a similar result, right? If we look at the departments instead of the whole university, shouldn’t we still see the discrimination?
Well, as it turns out, no, we won’t. As this chart shows, out of the five biggest departments, two are biased towards men, yes, but three are biased towards women! This trend holds down to the smaller departments; in total, six departments were biased towards women, four towards men.
So, yeah. On one scale, it seems that they are being sexist, but on another scale, if anything, they are biased towards women!
This pesky zooming-in thing shows up in lots of other problems. In that video I linked to up above, they demonstrated it with playing cards. Apparently it is the mechanism behind the Low Birth Weight Paradox, which relates to babies born of tobacco-smoking women. It also pops up in sports stats, like batting averages.
Statistics are hard, man. Forget political perspectives, ideologies, and anything high-level that can effect your results- just the scale at which you view the data can screw it up! It makes me glad that people are working hard to figure this out, because I have no clue.