Step 3

One of the most important lessons to learn about statistics, and not forget, is that just because your model is statistically significant, that doesn’t mean it’s right. In fact, just about all models are wrong. And some models, even some which strongly pass statistical significance, aren’t even useful.

After investigating some claims about temperate records showing a “step change” rather than continuous trend, and finding those claims faulty, a reader mentioned yet another such claim. Specifically, it discusses a paper by Stockwell and Cox, submitted to International Journal of Forecasting. The essence of all these claims is that the temperature trend is discontinuous, that there aren’t just sudden jumps in temperature caused by noise, but the trend itself is a discontinuous function of time.


The paper applies the Chow test to look for structural breaks in climate time series, specifically temperature and precipitation in Australia, and global temperature from HadCRU. The Chow test (as implemented) determines whether or not a pair of straight lines — which don’t need to meet at their endpoints so the model trend can be discontinuous — fits some data better than a single straight line. The first data set on which they report is annual average temperature over Australia from 1910 to 2008, which results in this graph:

The upper panel shows the F-statistic returned from the test, the lower panel shows the temperature data together with a linear trend line (in green) and a step-change model (in blue). The test indicates a structural break at 1978.

It looks to me as though the plotted F-statistic isn’t actually the result of the Chow test. That’s because I ran the Chow test and got a different result, although it indicates the same “breakpoint” as reported by Stockwell & Cox. The plotted statistic looks more like the result of a CUSUM test, which looks for a sudden shift in the mean value but doesn’t allow for a slope in either the before-shift or after-shift segments.

This may indeed be the case, but it looks like they determined the breakpoint from a Chow test, then rejected the slopes of the “before” and “after” segments due to their lack of statistical significance:


The slope of the two segments is not significant (p = 0.15 and p = 0.14). Because of the significance of the break, and the non-significence of slopes in the segments, a model with a single step, or structural change model is justified (Figure 1a blue).

The first salient point is that the step-change model really does fit better than the linear-trend model. This isn’t just indicated by the Chow test — if you compute the AIC of the two models (allowing for the fact that the breakpoint time is an additional parameter), the step-change model has much better (i.e., lower) AIC. The difference in AIC is big enough to make the linear-trend model downright implausible.

The second salient point is that the authors make the classic mistake cautioned against in the first sentence of this post: they interpret the strong statistical significance of their model as clear indication that it’s (at least approximately) correct. Permit me to doubt.

We can gain some insight by smoothing the data:

This doesn’t rule out, or even argue against, a step change, because if there were such a shift the smoothing method would, well, smooth it. But it does argue strongly against the linear-trend model. It reveals that the real reason that Stockwell & Cox found such a strong result from the Chow test, and place their faith in the step-change model, is that they are testing the step-change model against what is effectively a straw man. The “single continuous linear trend” model is clearly wrong, so it’s no surprise that the step-change model — whether right (even approximately) or wrong — fits better.

The linear model has its uses, but only if it’s not taken too seriously. It is strongly statistically significant, which shows that Australian temperature has increased overall rather than decreased or remained stable. It also gives us a numerical estimate of the average warming rate during the time span studied. But it doesn’t show that the warming rate has been constant — after all, the linear model is not just wrong it’s provably wrong.

What about other models? We could try a quadratic function of time:

It definitely fits better than the linear model, and the improvement is statistically significant. That doesn’t make it right — but it may be useful. And when we compute its AIC, we find that not only is it way better than the linear model, it’s even slightly better than the step-change model. The difference in AIC between the quadratic and the step-change model is small, and although it prefers the quadratic model it’s not big enough to decide definitively. While a step-change model can’t be ruled out statistically, it certainly can’t be claimed to be established statistically either.

But wait! There’s more! Suppose we try a model which is continuous but is not differentiable — a model consisting of two straight lines which do meet at their endpoints so the trend remains continuous:

This model has even better AIC than the quadratic model! Of all the models tested this one fits best, although the differences are not big enough to rule out any but the “single continuous linear trend” model. Nor can one claim to have established any of these models statistically. To claim that the step-change model is correct (even approximately), and to wax philosophic about how it’s due to some great Pacific climate shift and the PDO, is, well, mathturbation.

Why do so many want to claim step change in temperature records? Note that those who do so are consistently in denial of man-made global warming. In my opinion, they regard step changes as implausible in a greenhouse-gas warming world, and more important, they expect that people in general will perceive them as implausible. So, they hope that by establishing step changes they can cast doubt on greenhouse gases as the root cause of global warming. Of course there’s no proof (or even evidence, that I’m aware of) that greenhouse-gas warming can’t cause step changes. But then, there’s really no evidence that will stand up to scrutiny that there are step changes at all, rather than continuous trend plus natural-variation noise.

I’m sure if you look hard enough and long enough and search enough temperature records, you’ll eventually find some for which the step-change model is decidedly better than a continuous-trend model. I’m confident that fake skeptics will keep trying until eventually they do. But the difficulty in finding it argues against its being anything but a statistical fluke. And the readiness, even eagerness, with which step-changes are declared which just don’t stand the light of day, argues against the reality of their claimants’ skepticism.

36 responses to “Step 3

  1. I don’t want to cast Stockwell and Cox as denialists, since I haven’t so much as glanced at their paper, but if they were an alternate title for this post might be “Out of the frying pan, into the fire,” inasmuch as this looks like evidence strongly suggesting acceleration of (Australian) warming somewhere between 1955 (or so) and 1978.

  2. Strikes Eli that if you use a comparable 30 year period before 1980 (e.g. back to 1950, you are going to get a very different result for the Chow test as to what the before and after slopes are.

  3. Every statistical excursion by climate change denialists is simply another example of the warning that “there are three kinds of lies: lies, damned lies, and statistics”!

  4. Stephen Baines

    I often hear the claim, even from practising scientists, that you can use statistics to show anything. I have always thought that this mistaken impression comes because people think the actual doing of the test (i.e., learning software, coding) is the hard part of using stats, when in fact it is the proper interpretation of the tests which is the real trick. This is a good case in point.

    In the hands of the cynical, improper use of statistics can be used to distract the technically naive or fearful from straw man arguments underpinning interpretations. More commonly it is preexisting opinion or excitement that blinds individuals to alternative explanations and the limits of statistics.

    The problem lies not in the statistical tests, but in our naive interpretations of them.

  5. Jeez, you show this stuff so clearly it makes these guys seem silly. If I, not so slick at stats as you, was in their place, I would be thinking: okay, some unknown mechanism seems to have caused a step change at this location — what caused it? Probably the first subsequent thing I’d check is the spatial scale of this phenomenon — if many broadly distributed locations showed the same break point, then I might start to think I was on to something important. Has any of them done this? Finding different break points around the Pacific would argue against a shift in PDO, wouldn’t it?

  6. “Just about all models are wrong” you say. I’d say that absolutely all serious models are wrong as they cannot match the system they model. It’s whether they are useful that matters. The poor fit of the linear trend above is a useful finding.

    I am reminded of schooldays and the ‘pithy’ saying that preceeded the chapter introducing statistics (complete with a cartoon of a drunk).. “Do not use statistics like drunks use lamp posts. Use statistics for enlightenment, not simply for support.”
    Sadly such sanguine advice is absent from the “read me” files that these days instruct all and sundry how to run sophisticated statistical tests.

  7. Al Rodger–the saying is due to Andrew Lang, who actually said of a colleague:
    “He uses statistics as a drunkard uses a lamp post–for support rather than illumination.”

    Or, as I say: Any fool can lie with statistics. What takes skill is using them to tease out and illuminate the truth.

  8. It should be noted that a climate shift occurring around the time indicated wouldn’t be a controversial idea. AR4 Chapter 3 talks about ‘the 1976–1977 climate shift’ on just about every other page.

  9. For context, see SSWR, pp.110-112
    J. Scott Armstrong was the founder/cofounder of the Int. Institute of Forecasting and its journal – Int. J of Forecasting.
    This has been submitted a while ago, will be interesting to see what happens, like can they actually get relevant peer-review.

  10. Does a break around 1976 in the temperature data correlate with significant physical events which could feasibly explain that break UP and thereafter maintain the elevated temperature:

    http://www.hubmed.org/display.cgi?uids=9657714

    Is there similar correlation for asserting a break DOWN around 1998:

    http://www.agu.org/pubs/crossref/2004/2004GL020727.shtml

    Anyway, thanks for your, as usual, incisive comments; David will do a post at his site soon:

    http://landshape.org/enm/

    • Ah, cohenite, still pushing that empty barrow I see.

      If you want to blame an oscillating, heat-shifting phenomenon wholely and solely for increased planetary warming, and simultaneously exonerate carbon dioxide and its known ‘greenhouse’ properties, you need to explain both how ENSO and other such phenomena cause a net increase in global temperature, and why the physics of infrared absorption by ‘greenhouse’ gases don’t result in any planetary warming.

      You didn’t accomplish that on Deltoid, so I’m not holding my breath that you’re about to overturn physics now.

  11. Kevin,

    “I don’t want to cast Stockwell and Cox as denialists”

    Well, maybe this will shed some light on at least Dr. Stockwell’s position.
    From Dr. Stockwell’s site that Anthony Cox linked us to, [here he is referring to an interview with Ron Paul in the NY Times],

    “I think that is a fairer assessment than I have seen from a climate scientist. The problem is that when you dig into the field of climate science there is data and there are models and then there’s 50 feet of Climategate crap and big-government science funding. Below that there’s the IPCC.”

    Stockwell doesn’t sound like a true skeptic to me. In fact, it looks to me like they are seeking out excuses to try and dismiss AGW or at least obfuscate. However, this is even worse/weaker than the random walk kerfuffle from a couple of years ago.

    • I didn’t want to characterize someone whom I hadn’t taken the trouble to read. That scrupulosity shouldn’t be confused with defending Stockwell; I’d be at least as unwilling to do that without reading him.

      However, I had suspected what you have documented.

  12. Timothy (likes zebras)

    Surely the biggest problem with the step-change models is its physical implausibility?

    It claims that vast numbers of watts of energy are almost instantaneously introduced to the Earth’s climate system. How is that going to happen?

    You’d have thought someone would notice however many petawatts it would take being introduced over such a short period.

    • The energy is not introduced into the climatesystem but into the atmosphere. The climatesystem being atmophere+oceans.

      The argument is that the energy is released from the ocean into the atmosphere suddenly. The ocean being a much larger reservoir of energy has all the energy needed to provide that. But who can think of a physical mechanism that releases such amounts of energy without anyone noticing it (except the rise in temperature of the atmosphere)? There would have to be an enormous anomaly in SST’s around that time.

      • Timothy (likes zebras)

        Yes, fair enough, but to be logically consistent the ocean heat content of the ocean would have to fall, implying a drop in sea levels, and most likely the net radiation flux at TOA would be such that energy was leaving the Earth, due to the temporarily [by ocean circulation change] elevated surface temperatures.

        There’s simply no internal consistency at all.

  13. Why do so many want to claim step change in temperature records? Note that those who do so are consistently in denial of man-made global warming. In my opinion, they regard step changes as implausible in a greenhouse-gas warming world, and more important, they expect that people in general will perceive them as implausible. So, they hope that by establishing step changes they can cast doubt on greenhouse gases as the root cause of global warming.

    Specifically, and explicitly, what the denialists are trying to imply is that if there is no direct proportionality between atmospheric carbon dioxide concentration and temperature, there must ergo be no cause-effect relationship.

    In their efforts to cast doubt or disbelief there is no need for consideration of multiple, cumulative forcings on temperature. Just ‘show’ that there is no ‘direct’ correlation… et voila!, human-caused global warming disproved!

  14. Actually, there is quite a few papers claiming a that a change of regime occurred en circa 1978 and 1998. Pickover called those event “magic doors”. I bet that 2007, year of the collapse of the arctic ice will be added to the list soon since some other climate indicator have been affected by the cascading effect. .

  15. Isn’t the glacial/ interglacial of the ice age cycle a step change mechanism? Once CO2 and insolation drop below a critical threshold, albedo can flip to a new state via permanent snow cover. It takes very little thickness of snow to change albedo and reinforce a cooling trend, and once it has changed the snow/ice can continue to become thicker and thicker, reinforcing and protecting the ice-house state.

    And yet once the ice becomes thick enough to be destabilized by insolation increasing above a tipping point, rapid melt disturbs ocean circulation, disgorging CO2 from the deep ocean, which reinforces the warming caused by insolation and locks in a new interglacial state.

    This is only the second most dramatic and obvious bi-stable feature of the climate system (the first being the annual swing from warming to cooling modes). Just because our tools lack the sensitivity (for the time being) to discern more subtle tipping-point systems doesn’t mean they don’t exist.

  16. Tamino, in a previous post you tested the validity of a detection model by generating artificial data with a known trend and used that to determine if various models were able to provide the correct answer (that there is a trend). I guess this also works for a step-change model? I.e. generate artificial data that has a step-change with random noise and then test the various detection models. Would the results be really different from what you’ve shown above?

    • Presumably not– since the AIC measure, at least, doesn’t show clearly that any of the (considered) alternatives to the step model is correct, the results for this case should fit reasonably well with such artificial, step-based data sets.

      • Cynicus and Bryson,
        Like all GOF measures or hypothesis test methods, AIC is only useful for comparing models, not for selecting “the right” model. You can see this if you look at Akaike’s derivation. AIC is an unbiased estimator for Kullback-Liebler Divergence between “the right” model and the candidate model(s). However there is always an undefined constant involved, so you can never be sure you have “the right” model.

  17. Pete Dunkelberg

    They keep coming:

    Abrupt increase in the land uptake of carbon in 1988

    Identification and characterization of abrupt changes in the land uptake of carbon – Beaulieu et al. (2012)

    Abstract: “A recent study of the net land carbon sink estimated using the Mauna Loa, Hawaii atmospheric CO2 record, fossil fuel estimates, and a suite of ocean models suggests that the mean of the net land carbon uptake remained approximately constant for three decades and increased after 1988/1989. Due to the large variability in the net land uptake, it is not possible to determine the exact timing and nature of the increase robustly by visual inspection. Here, we develop a general methodology to objectively determine the nature and timing of the shift in the net land uptake based on the Schwarz Information Criterion. We confirm that it is likely that an abrupt shift in the mean net land carbon uptake occurred in 1988. After taking into account the variability in the net land uptake due to the influence of volcanic aerosols and the El Niño Southern Oscillation, we find that it is most likely that there is a remaining step increase at the same time (p-values of 0.01 and 0.04 for Mauna Loa and South Pole, respectively) of about 1 Pg C/yr. Thus, we conclude that neither the effect of volcanic eruptions nor the El Niño Southern Oscillation are the causes of the sudden increase of the land carbon sink. By also applying our methodology to the atmospheric growth rate of CO2, we demonstrate that it is likely that the atmospheric growth rate of CO2 exhibits a step decrease between two fitted lines in 1988–1989, which is most likely due to the shift in the net land uptake of carbon.”

    Citation: Beaulieu, C., J. L. Sarmiento, S. E. Mikaloff Fletcher, J. Chen, and D. Medvigy (2012), Identification and characterization of abrupt changes in the land uptake of carbon, Global Biogeochem. Cycles, 26, GB1007, doi:10.1029/2010GB004024.

    H/T The observer.

  18. Great moments in science,
    at Anthony Cox’s Climate Sceptics Party.

    http://theclimatescepticsparty.blogspot.com/2012/01/case-against-co2-ipcc-must-be-wrong.html

  19. Didn’t you do a post on Cox called ‘A bag of hammers’ some time ago?

  20. Dikran Marsupial

    Another excellent post. One of the things that is often forgotten when discussing statistical significance is that it assumes that the data in question are the only source of information on whether the hypotheses in question are correct or not. In this case there are many lines of evidence that can give good physical explanations for the basic shape of the data, which means (as Tamino points out) we know a straight line model is inapropriate a-priori. The other thing that is usually forgotten is the power of the test, it shouldn’t surprise anybody with a solid grasp of statistics that the trend post 1998 isn’t significantly positive. The reason is simple, there just isn’t enough data for the test to have useful power. However that doesn’t stop the likes of Lord Lawson this morning claiming that there hasn’t been any global warming for n years on the basis of a test with insufficient statistical power.

  21. Dikran Marsupial

    Was the paper accepted by the Journal of Forecasting?

  22. Michael Brown

    I’ve had a few discussions with Anthony Cox at http://www.theconversation.edu.au

    Based on those discussions, I question his ability to draw any reliable inferences from data.

    In recent months he has claimed sea level rise has stopped, misinterpreting the 2010 dip caused by the ENSO cycle. He has also claimed a low temperature sensitivity for CO2, based on an erroneous interpretation of Foster & Rahmstorf. He is also happy to conflate any acknowledgment of natural variability into all of climate change being caused by natural variability.

    Finally, he often inflates undergraduate geography subjects he undertook in the 1970s into a “degree in climatology”.

  23. I had it in my mind that there were “multiple lines of evidence” that have identified a sudden regime change in the Pacific in 1976/1977. For example this paper has a go at 100 different climatic and biological time series to identify the climate shift.

    Click to access hare-mantua_pio2000.pdf

    (and if you put the title of that paper in Google Scholar and click on the ‘cited by 899’ tag you can read a whole literature on the subject)

    I get your point with the blog post but pointing out the bad stats in one paper doesn’t necessarily make a step change in the Pacific in 1976/1977 go away.

    • Yes, and the idea of “regime shift” in 1976/77 began with papers led by Kevin Trenberth, someone who hardly could be considered a denier of global warming. I think that this series of posts is great, and I still question the idea of a sudden regime change at any time in data I’ve seen, but I don’t think the motivation always is to find ways of casting doubt on the role of greenhouse gases. I think many scientists just have a tendency of trying to identify step changes in noisy time series. Maybe that just seems like an opportunity for a more interesting story – the so-called 1976/77 regime change sure generated a lot of interest!

      • Yes, looking at some papers following on from HR’s linked study, I came to the conclusion that the ‘agenda’ was to better understand the evolution of fish stocks–and there certainly did seem to be step changes in those parameters.