In 2007 Red Sox rookie phenom Jacoby Ellsbury batted .353 while teammate Mike Lowell batted .324. In 2008, in what would be his first full season in the majors, Ellsbury again outperformed Lowell, batting .280 to Lowell's .274.
So Ellsbury clearly outperformed Lowell at the plate over the two-year stretch, right?
Wrong. Over the course of the two years, Lowell was superior at the plate, out batting Ellsbury .304 to .293.
What at first glance may seem confusing comes down to a simple problem of aggregation -- and a classic example of the statistical phenomenon known as Simpson's Paradox.
Simpson's Paradox occurs when a relationship between two variables -- in this case,the batting averages -- is reversed when an additional variable is taken into account. The additional variable to consider here is the number of at-bats each player had in each season. Not doing so results in "omitted variable bias," or a change in how the relationship between the two batting averages is understood.
And it matters -- not only for the important business of sizing up your favorite baseball players, but also for such pursuits as assessing the nation's schools or understanding state-by-state obesity rates.
How it Works
You weigh the importance of additional factors, or variables, implicitly in almost every evaluation you make. Your friend might tell you that his team is better because it has a better record, to which you might counter,"OK, but my team plays a harder schedule." You understand intuitively that an accurate assessment of the relationship between the teams cannot be made by assessing the record alone. Therefore, when including, or conditioning your argument on a third variable (beyond just wins and losses), you can get a picture that may change what initially had been thought.
Simpson's Paradox goes one step further. It says not only does omitting an important variable change a relationship, but in fact, it can completely reverse how the relationship is perceived. Ellsbury may seem to have had the hotter bat across the two years, but the facts show Lowell actually had a better two-year average. Here's how it works:
|Year||2007||2008||2007 and 2008|
|Jacoby Ellsbury||41/116 (.353)||155/554 (.280)||196/670 (.293)|
|Mike Lowell||191/589 (.324)||115/419 (.274)||306/1008 (.304)|
Yes, Ellsbury had the better average both seasons. But he had far fewer at bats than Lowell in 2007, having joined the team late in the season.
Ellsbury had only 116 at bats in 2007, while Lowell had 589. Therefore, for Ellsbury's combined average, the second year is weighted much more heavily, while Lowell's average in his better season - 2007 -- carries greater weight than his average in his worse season. The result is Lowell's superior average on aggregate, a Simpson's Paradox.
Simpson's Paradox actually occurs with some frequency, and not just in baseball. Every year students across the country take the National Assessment of Educational Progress exams, with results mapped to a variety of factors, including whether students are eligible for the national school lunch program.
A comparison between school-lunch eligible eighth-graders in New York City and California for 2007 finds that a lower percentage of the New Yorkers scored below basic in math. The same was true for New Yorkers not eligible for school lunch compared to Californians not eligible.
However, in aggregate, fewer California eighth-graders were below basic in math than were NYC eighth graders. Why? A significantly higher percentage of NYC students are eligible for the school lunch program:
|Jurisdiction||School Lunch Eligible||School Lunch Ineligible||Combined|
|New York City
(% Below Basic)
(% Below Basic)
In Health, at least four examples of Simpson's Paradox are found in state-level obesity data among black, white and Hispanic adults reporting a Body Mass Index over 30:
|State||Black Adults||White Adults||Hispanic Adults||All Adults|
|District of Columbia||35||10||17||24|
Deciding What's Relevant
One of the most important aspects of reading these or any such statistics is deciding which ones matter.
Simpson's Paradox might be used to demonstrate that, in fact, New York City's schools were outperforming California's insofar as New York City has a smaller percentage of children performing below basic in both categories of lunch eligibility. However, one might also use it to try to demonstrate that, contrary to common interpretation of baseball statistics, Lowell actually had a better two years than Ellsbury.
Recognizing Simpson's Paradox, and omitted variable bias in general, has very important policy implications. In the case of New York's schools, one could argue that rather than investing more in education, you need to address the underlying social conditions that are leaving such a high percentage of students eligible for the national school lunch program. Suddenly a policy question about education can become a policy question about poverty when looking at the system as a whole.
It is an important aspect of research to determine what other variables matter for a question and which ones don't, and there is yet another layer for policy in determining which ones can be most effectively addressed to produce desired outcomes. Not only is it important to be aware of the pitfalls of Simpson's Paradox, but it is also necessary to determine whether the paradox is relevant to the question being asked. Simpson's Paradox, then, serves as a pertinent reminder that there is often more to a statistic than meets the eye.
Arthur Smith is a recent graduate of Georgetown University, with a degree in international economics. He spent his junior year studying at the London School of Economics. At the State of the USA, he has worked on education and economy projects, includng, the data selection, collection and presentation processes. He has also worked as a research assistant on projects analyzing returns on education, and perceptions of AIDS in Kenya.