In the last post I mentioned that when we have two different estimates, each with its own uncertainty range (note: I use the 95% confidence interval almost all the time, or to be precise the ±2σ range), the fact that their ranges overlap isn’t the proper statistical test for whether the estimates are significantly different. Somebody asked about that.
Just for a gut feeling: I know that when error ranges overlap, there are values that fall in the “plausible range” for both estimates, which suggests that the estimates may well be in agreement. But sometimes, those “plausible in both ranges” values are unlikely in both ranges. Unlikely isn’t so implausible, but unlikely for both is unlikely squared, and that’s too implausible to be plausible.
What follows is some of the math, and it’s really simple, really, but I know that turns off some readers. Others want it; feel free to skip it and enjoy the remainder of the day.
Let the two estimates be x and y, and suppose their “standard errors” (the 1σ errors) are σx and σy. If we were testing whether or not x (or y) is zero, we’d form a test statistic like t = x/σx (or t = y/σy). But what we’re not interested in is whether or not the difference is zero, with the difference being d = x – y. But what’s the “standard error” for d?
That turns out to be σd = √ (σx2 + σy2). Our test statistic will be t = d / σd, or in more detail (x-y) / √ (σx2 + σy2).
Statistically we’ll usually treat that as a t-test, and we’ll wonder about the number of “degrees of freedom” to use, which can be a sticky issue. But in many cases (including global temperature since 1979) the degrees of freedom is large enough (even allowing for autocorrelation) that we can safely treat it as large, in which case the t distribution approaches the normal distribution.
But the important point is that two different error ranges can overlap (at least in part), even when they are significantly different. Suppose they have the same standard error, σx = σy = σ. Then σd = σ √2. If d = 3σ, then their difference has a test statistic t = 3 / √2 = 2.12, which indicates that yes their difference is statistically significant — in spite of the fact that their error ranges overlap.
I know, it’s pretty simple really. But it’s the kind of detail that the non-statistician (even those mathematically savvy) may not be aware of, and it’s a mistake I’ve seen made at many levels (names will not be named).
Thanks to for kind readers for donations to the blog. If you’d like to help, please visit the donation link below.
This blog is made possible by readers like you; join others by donating at My Wee Dragon.