Hammering the Trend

I tend to hammer away at the concept of trend. Like many, I’m especially interested in whether, and if so when, trends have changed.

Recent research from Oliva et al. is all about trend change in temperature on the Antarctic Peninsula (AP). It follows in the footsteps of Turner et al., and their common conclusion is that since about 1998, the Antarctic peninsula — so often touted as one of the fastest-warming regions on earth — has been cooling, cooling even faster than it was warming before 1998.

Naturally this has deniers all a-twitter, for instance we are treated to an article by Justin Haskins in theBlaze telling us that:

“Warming on the Antarctic Peninsula has long been touted by supporters of the theory man is destroying the planet by using fossil fuels as proof of the dangers of global warming. Al Gore, the face of the world-is-going-to-end climate movement, has visited Antarctica on at least two occasions to highlight the alleged problem.”

Well, no, warming on the AP has not “long been touted .. as proof …” etc. As for Al Gore visiting the place, I don’t think that has much to do with it at all.

What interests me is the statistics behind claims that the temperature trend changed. I’ll begin by putting some data together, combining the same 10 stations used by Oliva et al. I’ll align them in my usual way [using the pseudo-“Berkeley method”] and form a composite monthly average temperature anomaly for the AP as a whole.

Then I computed annual averages from the monthly averages. This really doesn’t lose any precision in trend estimates (somewhat counterintuitively), and greatly reduces the level of autocorrelation. Finally, I’ve got a time series — annual average surface air temperature — suitable for study.

And I can see right away what Oliva et al. and Turner et al. are talking about. The estimated trend (by linear regression) from 1998 through 2015 is rather downward:

Using my combined data set, it’s cooling at 6.4 °C/century, faster even than reported by Oliva et al. and Turner et al. And with p-value equal to 0.03, we have confidence (97%, in fact) that the AP has been cooling over these 18 years.

And it surely doesn’t seem to be warming rapidly, in spite of the fact that it sure was prior to 1998:

Using that time period, the linear regression slope is upward at 2.7 °C/century, and the p-value of 0.007 leaves little doubt about its warming at 99.3% confidence.

Conclusion: the trend changed. With a 97% chance it’s now going downward, almost no chance it’s going upward as fast as it used to be going (back when we were 99.3% sure it was going upward), that’s some solid evidence. Hell, even tamino would admit it’s a trend change, right?

Let’s look closer.

The narrative is: warming fast through 1997, then cooling even faster since 1998. If the last 18 years is the change from warming to a rapid cooling trend, one would expect the 1998-2015 average to be less than it would have been, had that trend change not happened.

We can of course take the pre-existing (1950-through-1997 warming) trend and extrapolate it to the present, to see what would have happened (trend-wise at least) had no trend change occurred:

The big blue dot marks the 1998-through-2015 average of the observations. Notice that it’s above the value at that time for the extrapolated pre-existing trend (shown in red); just when it was supposed to be cooling so rapidly, its 18-year average went up, above expectation.

But that linear regression trend was 97% confidence downward? What’s going on? To get that result, you can’t just change the trend slope from warming to cooling, you also have to change the value of the trend line at 1998. But if you’re allowed to do that, there’s an extra degree of freedom (in the statistical sense) in how you’re modelling the data.

We all tend to ignore the “intercept” in a linear regression, and usually it really doesn’t matter. But when we have a model with two intercepts, we can’t just ignore them both. One of them represents a degree of freedom that won’t go away.

The right way to account for this is the Chow test, and when I apply it to the given data with a breakpoint at 1998, the p-value for a trend change becomes a paltry 0.061. It doesn’t even make 95% confidence, although it does make 93.9% confidence.

Well, at least it’s 93.9% confidence by some pretty strict standards. That’s not a “lock,” it’s hardly “fer sure,” but it’s likely enough that it deserves very serious attention, right?

Let’s look closer.

The breakpoint being tested is 1998. That’s fine if you have a reason to pick 1998. But if you picked it because it looks like there might be a trend change, based on the data you’re using to test for a trend change, then you’re cherry-picking. I often call it “innocent” or “naive” cherry-picking. You don’t need a nefarious purpose to look at 1998-2015 and think “Looks like a trend change … let’s estimate the trend starting then.” It’s natural.

But that means that when there’s nothing really going on, you still get all those previous years to start from and the chance that at least one of them will show signs of trend change is a lot higher than the chance a single random series will show such signs. It’s the fallacy of multiple trials — essentially, if you’re allowed to buy 100 lottery tickets you have a bigger chance of winning, but that doesn’t mean the lottery odds have changed.

When I run Monte Carlo simulations to compensate for the multiple trials, using a 66-year-long total record as I used for Antarctic (from 1950 through 2015), the “maybe impressive” p-value of 0.061 turned into a no-way-not-even-close p-value of 0.67. In other words, there’s about a 2/3 chance of finding a time span at least as suggestive as the one in the Antarctica data.

So no, I do not consider the evidence sufficient to regard cooling in the Antarctic Peninsula as being even likely, let alone established. It’s possible, and only time will tell, but the evidence presented so far isn’t strong enough to stand on its own.

Incidentally, since their studies there’s a little bit more data available, for 2016 and the first three months of 2017. Here’s an update of the previous graph, adding the 2016 and first-quarter-2017 values:

I note that 2016, and 2017-so-far, both came in above the trend line estimated from the pre-1998 data.

This blog is made possible by readers like you; join others by donating at Peaseblossom’s Closet.

52 responses to “Hammering the Trend

  1. It is certainly correct that a fit to temperature data should be a continuous function, since temperature is an indicator of system energy content (and energy is a conserved quantity). How do we make that argument to a non-specialist audience?

  2. Tamino, how will the first figure look when you include 2016? Thanks.

  3. Another case for the Chow test.

  4. Fascinating. Let’s see if I’ve got this right. The analysis shows that the Antarctic data may be plausibly be accounted for either as a trend change, or a ‘pause’ resulting purely from unforced variability? And the data from the last year and a half, though still far from definitive, is suggestive that the latter hypothesis could well be the correct option?

  5. Ralph Snyder

    Thanks, Tamino. By the time you presented the data from Oliva et al, I was already thinking, “Why pick 1998. Is that cherry picking? What about nonphysical discontinuities in the temperature? What would a trend point analysis show? ”

    Before reading your blog, I wouldn’t have known to ask such questions.

    That you went off in directions I didn’t expect (degrees of freedom, the Chow test) nuts tells me how much more I have to learn.

  6. Andrew Haines

    Thanks Tamino, for another fine explanation of the proper use of statistics for us statistical semi-literates.

    General comments:
    – it was well explained that the apparent 97% downward trend since 1998 could have been known to be nowhere near that significant even in 2015
    – with the later 2 data points it is certainly clear now! Someone might care to do the official stats in case my eyes are deceiving me from the final graph, but I doubt it (but unlike the fake sceptics, I am open to the possibility)
    – and also a good explanation of “naive” cherry-picking. Most of us are more familiar with the cherry picking shown in the Blaze article (“deliberate?, Dunning-Kruger”?), so it is a good reminder that it can also be more innocent/subtle.

  7. Reblogged this on Atmosphere and climate over Larsen C and commented:
    This is an extremely interesting consideration of the validity and robustness of trends estimated from limited numbers of data points. The jury is still out on the magnitude and even direction of temperature trends on the Antarctic Peninsula – clearly the need for further research and data collection is ever more pressing.

  8. Hey Tamino, could you add the trend and uncertainty in the trend from 1998 once you include the latest data?

  9. Great post. Just asking: you’ve done trend point analysis in past posts; is it possible to see if there have been any trend changes in the full record? Heck, in some ways, I expect a reduction in warming rate in Antarctic waters, due to high ice melt due to global warming effects – it would be interesting if this could be seen with trend point analysis. I have no idea how to do that.

  10. Isn’t the case for a change in the trend even weaker than you say? After all, there are many different areas across the globe whose temperature is described by trend+noise? I’m not a statistician but it seems to me that if you have enough areas of trend+noise it is likely that some will have an apparent period of negative trend.

  11. Ah…the old pick a local max and start the analysis from there trick!

    What would deniers do if they didn’t use that one?!

    • Looks like they missed an opportunity to start with the highest anomaly in the record around 1990. It might not have given them such a dramatic negative trend, and they wouldn’t have had such a neat link to the famous canard, but they would have been able to say that the AP has cooled for 25 years!

    • “They” = pseudo-skeptics, not the authors of the papers linked in the OP. Don’t mean to impugn Oliva, Turner et al.

      • As for the OP, I have observed over many years that physicists can be, as tamino notes, “naive” about regression and inference.

        Many also have naively searched for meaning in the “pause” data. I’m not saying there isn’t any, but you don’t get there by _assuming_ a subset of data is significant and then looking for reasons why. It just may be noise.

        All this reminds me of the Stats 101 exercise where you ask students to generate series of 100 heads/tails cognitively versus with actual coin flips. The two series are easy to reliably differentiate as actual coin flip data shows apparent “meaning” in runs whereas cognitive series usually try consciously avoid runs as they appear not random to the eye. This however makes them observably and measurably nonrandom in actuality.

        This cognitive illusion is very powerful and hard to overcome. One needs to “fly on instruments” (stats done correctly) to avoid crashing because of vertigo. And this, of course, is why deniers use it constantly–usually on purpose, and even scientists get fooled.

  12. russellseitz

    Looking for and not finding graph of average albedo of the Calcite Belt, 1950-present .

  13. Tamino-san, if you get a chance, please let me know if I made any mistakes here, at least in the statistics part:


    • David B. Benson

      Liang causality is strictly superior to Granger causality. I suggest making the change.

      • Benson-san,

        I have looked up Liang’s papers, but I can’t make head nor tail of how to actually compute the results. He says it’s merely a matter of sample covariance, but I can’t seem to make it through his symbolism. Could you give a succinct algorithm?

      • David B. Benson

        Barton, look in to methods section of the paper linked in the following comment, about the causal connection between CO2 and global temperature. Equation 2 appears to be the means of determining the information flow using expectations and second partial derivatives. If that is not enough, I suggest contacting the corresponding author of that paper for the additional details required.


    • David B. Benson

      “On the causal structure between CO2 and global temperature”
      explains and uses Liang causality, called IF, information flow, in the paper.

      As a separate matter, the results are interesting although the main 2 results are not surprising.

  14. Where can one find a source for the 2016 and 2017 so far anomaly numbers?

  15. Its just the plain good ole “Pause”, “hiatus”, “Global warming stopped” since 1998 Trick, only applied to annother dataset.

  16. michael sweet

    Don’t scientists normally claim that 17 years is needed to see the trend rise over the noise for Global temperatures? For a small area like the Antarctic Peninsula more time would be required for the signal to escape the noise. An 18 year record is too short for a local area.

    As expected, with more data the signal starts to appear again.

    [Response: The required time depends on the strength of the signal and that of the noise. Specific time spans don’t seem like a very good idea.]

  17. From the abstract of Santer et al 2011, “Separating signal and noise in atmospheric temperature changes: The importance of timescale”:

    “Our results show that temperature records of at least 17 years in length are required for identifying human effects on global-mean tropospheric temperature.” http://onlinelibrary.wiley.com/doi/10.1029/2011JD016263/full

    Of course this findings have no meaning for a small part of the world in the Antarctic. There the needed timespan is quite probably waay longer – it depends again on the characteristics of the data from there.

    As I said; the good ole “No warming since ’98, 2002, 2002 2005 etc.” scam again.

  18. What are the odds that 5 years in a row would be below the trend estimate? Is that even meaningful?

    • Assuming white noise, 1 in 32.

      Now “what are the odds that we will see 5 years in a row somewhere in a longer series?” is a completely different matter. Tamino and many others have taken extreme pains to point this out over the years.

  19. On the subject of trend analysis, if you have time, in a future blog post it would be great if you could take a look at this new article on Arctic sea ice trends since 1901 https://twitter.com/1ronanconnolly/status/861295825808486400

    For context, in addition to Willie Soon, the other two authors have this website http://globalwarmingsolved.com/start-here/

  20. The scientists then applied a statistical technique, known as a Mann-Kendall test, to look for any point within the time series where the size or direction of the trend changes.

    Using this method, they found such a point occurred in late 1998 to early 1999. Before this, the stations recorded an average warming of 0.32C per decade. But afterwards, the trend turned negative, showing 0.47C of cooling per decade until the last measurement in 2014.

    By repeating the analysis without 1997 or 1998, the authors satisfied themselves that the switch from warming to cooling trend was not an artefact of the extreme El Niño conditions.

    No cherry picking to find a trend here?

    • A Mann-Kendall test is just a nonparametric test for a monotonic trend. “Nonparametric” man that it doesn’t assume a normal distribution–nonparametric tests are generally less powerful.

      It is, AFAIK, not a “magical” test that can “objectively” identify acceptable points where the trend changes. Correct me if I’m wrong.

      • I wish you could edit posts.

        Anyway, so, I’m neither convinced that (1) this doesn’t involve cherry-picking, nor (2) this overcomes the “jumping trend” problem.

    • Angech,

      Could you please post an excerpt from the paper that describes how they did this?

      Basically, if they used any method to find the most promising part of the overall series, all further analysis of the chosen cherry needs to be corrected for multiplicity. Any statistical analysis performed on the cherry that does not account for how many possible start and end positions were originally available is not simply misguided, but deeply wrong.

      If the cherry-finding technique itself produces a measure of significance that includes a correction for multiplicity, that is different, but you have provided no evidence that this is the case.

      The Mann-Kendall test does not appear to be a method for finding out whether a long trend includes statistically significant subtrends [Response: You’re quite right.], at least not in the links I have followed. It is described as a test for assessing whether a monotonic trend exists.


      “The MK test tests whether to reject the null hypothesis ( Ho ) and accept the alternative hypothesis ( Ha ), where

      Ho : No monotonic trend

      Ha : Monotonic trend is present”

      Even if they had done their stats correctly, they would still have the problem that the start of their trend requires a sudden inexplicable breach in the prior trend. As shown in tamino’s graphs, their trend begins way above the main trend line. The jump to the start of their trend is entirely non-physical and unexplained. The improbability of their trend being due to chance, even if calculated correctly, would need to be balanced against the impossibility of their trend starting at a magical energy-creation point.

      • Could you please post an excerpt from the paper that describes how they did this?
        No, sorry. I must have reached a link through Tamino’s references but now find they have large costs to access whereas I just linked in. I agree their method picks the best start date for a change of trend but there are numerous dates pre and post 1998 which would still show downwards trends of greater or less significance

      • This is a phenomenon well known to particle physicists: They comb through their data looking for “bumps” that might indicate some sort of particle decay or resonance and then apply (hopefully) reasonable and physically justifiable filters. The result is a whole helluva lot of bumps–most of them spurious. They compensate for this by requiring 5 standard deviations of significance before they publish. Interestingly, a certain string theorist whose initials are LM does not understand this.

      • Angech,
        You really don’t understand stats at all, do you?

      • Angech:”…but there are numerous dates pre and post 1998 which would still show downwards trends of greater or less significance”

        Uh, dude. Come on, Angech. You’re almost there. What can you conclude from the fact that you can cherrypick many different short durations that show behavior different from the overall trend?

  21. libertador

    Isn’t your analysis compatible with the Turner et al. paper as it suggests the so-called absence of warming is consistent with natural variability? As natural variability should been found in the rest of the temperature data the conclusion of a trend being natural variability seems to be compatible with the conclusion of no established trend change.
    Is the dispute then a dispute about the language of analysis. How to call such a short period which seem to have no warming, but is not statistically enough for a trend change?

    [Response: I don’t know, but I do know this: you do NOT say “cooling at a statistically significant rate” (as in the abstract of Turner et al.) when it just ain’t so — like now.]

  22. scottdenning

    The real problem here is that the whole line of reasoning is spurious, as Tamino points out earlier in the post before he even starts doing statistics.There’s absolutely no scientific validity in choosing a particular region over a particular time and pretend that trends there (or lack thereof) either support or refute the hypothesis that adding heat to the Earth system will cause it to change its temperature. As Tamino shows, the contention on The Blaze was false, but even if it were 100% true it wouldn’t undermine the First Law of Thermodynamics not a whit!

    • The Blaze apparently exists to lie.

    • “There’s absolutely no scientific validity in choosing a particular region over a particular time and pretend that trends there (or lack thereof) either support or refute the hypothesis that adding heat to the Earth system will cause it to change its temperature.”
      Ummm. scottdenning
      If trends and no trends in a chosen region and time can either support or refute a hypothesis you cannot make a hypothesis?
      Extended your assumption implies that trends in regions and times in general do not have scientific validity.
      Choosing any place and time will show trends or lack of which can scientifically support or refute a hypothesis.
      What needs to be done is to contrast and compare to bigger pictures at other times and places to confirm or refute.
      A cherry pick [pretence] is always scientifically valid at that one time and place, just not really.

      • Your reading skills are less than mad. “heat to the Earth system will cause it to change its temperature.” If you want to refute that hypothesis, you need take the whole earth over an extended period of time.

  23. As stated in Tamino’s excellent book, “Understanding Statistics,” a quintessence of statistics is separating signal from noise; and then making sure you are not fooling yourself. False positives are especially pernicious.

  24. Another way to state this might be an erstwhile gambler who goes to a casino with $1,000 and quickly loses $500 in the first hour, then hits a ‘winning streak’ in the second hour and gets back to $1,000. If he starts his ‘trend’ line right at the point when he starts winning, it sure looks like he had a good night!

  25. A repost of Singer’s arguments from 2006 on WUWT has shown up and this post of yours refuting it seems an apt place to start.


    Wanted to just mail you what I have so far as a starting point but I don’t think I have your e-mail and the fool where I posted it just now has some issues. I did link back to your work there … I might refine what I did so that it is actually polished up rather than simply reasonably complete.


    I also asked whether there was any change in the paper since 2008 and so far don’t see any, Dr Singer may not even have been involved in posting it. The usual gish-gallop but I thought it possible that you might want to guest a rebuttal on SkS or re-do one here?

    I hope this is appropriate.

  26. Ah… and you won’t have my e-mail. bj*DOT*chippindale*AT*gmail*DOT*com

  27. Tamino, I used your constructed annual AP time series 1950-2015 and carried out a Bayesian change point analysis on it. I tested it for 0, 1, 2 ,3 & 4 change points with DIC as the measure of model fit (it penalises for model complexity). I found for your time series that really no model other than the 0CP model has a DIC score sufficiently lower to be worth ‘the extra complexity’. I was concerned at the relatively few data points and decided to fill out the time series by interpolation and made the same graph into 1,070 time points and tested it also. The 1CP model is barely any better than 0CP and has the CP in mid 2012 which really is not a valid CP towards the end of the time series as it lowers to the 2015 value.

    In summary, Bayesian CP analysis would class this time series as a 0CP model with a median slope of 2.68C/century and 95% equal-tailed credible interval of 1.5-3.87C/century. It would be interesting to do a similar study on the monthly AP aggregated values.