One of the most important lessons to learn about statistics, and not forget, is that just because your model is statistically significant, that doesn’t mean it’s right. In fact, just about all models are wrong. And some models, even some which strongly pass statistical significance, aren’t even useful.
After investigating some claims about temperate records showing a “step change” rather than continuous trend, and finding those claims faulty, a reader mentioned yet another such claim. Specifically, it discusses a paper by Stockwell and Cox, submitted to International Journal of Forecasting. The essence of all these claims is that the temperature trend is discontinuous, that there aren’t just sudden jumps in temperature caused by noise, but the trend itself is a discontinuous function of time.
The paper applies the Chow test to look for structural breaks in climate time series, specifically temperature and precipitation in Australia, and global temperature from HadCRU. The Chow test (as implemented) determines whether or not a pair of straight lines — which don’t need to meet at their endpoints so the model trend can be discontinuous — fits some data better than a single straight line. The first data set on which they report is annual average temperature over Australia from 1910 to 2008, which results in this graph:
The upper panel shows the F-statistic returned from the test, the lower panel shows the temperature data together with a linear trend line (in green) and a step-change model (in blue). The test indicates a structural break at 1978.
It looks to me as though the plotted F-statistic isn’t actually the result of the Chow test. That’s because I ran the Chow test and got a different result, although it indicates the same “breakpoint” as reported by Stockwell & Cox. The plotted statistic looks more like the result of a CUSUM test, which looks for a sudden shift in the mean value but doesn’t allow for a slope in either the before-shift or after-shift segments.
This may indeed be the case, but it looks like they determined the breakpoint from a Chow test, then rejected the slopes of the “before” and “after” segments due to their lack of statistical significance:
The slope of the two segments is not significant (p = 0.15 and p = 0.14). Because of the significance of the break, and the non-significence of slopes in the segments, a model with a single step, or structural change model is justified (Figure 1a blue).
The first salient point is that the step-change model really does fit better than the linear-trend model. This isn’t just indicated by the Chow test — if you compute the AIC of the two models (allowing for the fact that the breakpoint time is an additional parameter), the step-change model has much better (i.e., lower) AIC. The difference in AIC is big enough to make the linear-trend model downright implausible.
The second salient point is that the authors make the classic mistake cautioned against in the first sentence of this post: they interpret the strong statistical significance of their model as clear indication that it’s (at least approximately) correct. Permit me to doubt.
We can gain some insight by smoothing the data:
This doesn’t rule out, or even argue against, a step change, because if there were such a shift the smoothing method would, well, smooth it. But it does argue strongly against the linear-trend model. It reveals that the real reason that Stockwell & Cox found such a strong result from the Chow test, and place their faith in the step-change model, is that they are testing the step-change model against what is effectively a straw man. The “single continuous linear trend” model is clearly wrong, so it’s no surprise that the step-change model — whether right (even approximately) or wrong — fits better.
The linear model has its uses, but only if it’s not taken too seriously. It is strongly statistically significant, which shows that Australian temperature has increased overall rather than decreased or remained stable. It also gives us a numerical estimate of the average warming rate during the time span studied. But it doesn’t show that the warming rate has been constant — after all, the linear model is not just wrong it’s provably wrong.
What about other models? We could try a quadratic function of time:
It definitely fits better than the linear model, and the improvement is statistically significant. That doesn’t make it right — but it may be useful. And when we compute its AIC, we find that not only is it way better than the linear model, it’s even slightly better than the step-change model. The difference in AIC between the quadratic and the step-change model is small, and although it prefers the quadratic model it’s not big enough to decide definitively. While a step-change model can’t be ruled out statistically, it certainly can’t be claimed to be established statistically either.
But wait! There’s more! Suppose we try a model which is continuous but is not differentiable — a model consisting of two straight lines which do meet at their endpoints so the trend remains continuous:
This model has even better AIC than the quadratic model! Of all the models tested this one fits best, although the differences are not big enough to rule out any but the “single continuous linear trend” model. Nor can one claim to have established any of these models statistically. To claim that the step-change model is correct (even approximately), and to wax philosophic about how it’s due to some great Pacific climate shift and the PDO, is, well, mathturbation.
Why do so many want to claim step change in temperature records? Note that those who do so are consistently in denial of man-made global warming. In my opinion, they regard step changes as implausible in a greenhouse-gas warming world, and more important, they expect that people in general will perceive them as implausible. So, they hope that by establishing step changes they can cast doubt on greenhouse gases as the root cause of global warming. Of course there’s no proof (or even evidence, that I’m aware of) that greenhouse-gas warming can’t cause step changes. But then, there’s really no evidence that will stand up to scrutiny that there are step changes at all, rather than continuous trend plus natural-variation noise.
I’m sure if you look hard enough and long enough and search enough temperature records, you’ll eventually find some for which the step-change model is decidedly better than a continuous-trend model. I’m confident that fake skeptics will keep trying until eventually they do. But the difficulty in finding it argues against its being anything but a statistical fluke. And the readiness, even eagerness, with which step-changes are declared which just don’t stand the light of day, argues against the reality of their claimants’ skepticism.