There’s yet another paper debunking the so-called “hiatus” in global temperature, making five so far (of which I’m aware), including one of my own. But this one, in my opinion, isn’t helping. In fact I believe it has some very serious problems, some of which make the idea of “hiatus” too easy to reject, while others make it too hard to reject. Although I agree with their overall conclusion — and published that conclusion before they did — I find their evidence completely unconvincing.
We’ll start with the confusion about what data they’re using. They focus on global temperature data from NASA GISS, and repeatedly refer to it as “land-ocean temperature index” (LOTI). NASA produces two such indexes, and the one they refer to (LOTI) is based on a combination of meteorological station data with sea-surface temperature data. The other uses only meteorological stations, and NASA calls it “meteorological station data.”
They repeatedly say they’re using LOTI — but they’re not. At least, not according to the graphs they show. Here’s their figure 1:
Every month NASA updates their data because new data arrives, so it’s not clear which release they were using. I haven’t saved all the past releases, but I have squirreled away several of them, and I couldn’t find any which match this graph. It’s especially problematic during the 1935-1950 period. It finally dawned on me that they’re simply not using LOTI, they’re using meteorological station data. That’s OK, but repeatedly referring to it as “land-ocean temperature index” is not.
Which leaves open the question, which release? Unfortunately I haven’t found a reference in this paper. But the version released in January of 2014 seems to match quite well:
I’m just going to assume they’re using the Jan. 2014 release of meteorological station data, and proceed.
They focus on comparing the trend from 1998 through 2013 (the purported “hiatus” period) to the trend leading up to 1998. Good plan. Unfortunately they characterize the “trend leading up to” by starting with 1950, and that’s a very bad choice.
Notice that in their graph (their figure 1) they include a table labelled “Temporal Dependence” showing results of tests for autocorrelation. Both the Durbin-Watson test (which essentially tests lag-1 autocorrelation) and the Ljung-Box test (which tests an ensemble of many autocorrelations) indicate the presence of autocorrelation strong enough to be detected with the given data. All their tests are based on the residuals of linear regressions starting in 1950. For the “1950-2013 full” test (i.e. residuals from a linear fit to the time span 1950 through 2013) they get a Ljung-Box statistic with p-value 0.0015, definitely significant at 95% confidence (I got 0.00135, but I’m probably not using exactly the same data they are).
The problem is that these test results don’t prove autocorrelation, they just negate the null hypothesis. It’s safe to say that the residuals from a linear fit 1950-2013 are not just independent random variables, and that might be due to autocorrelation or it might not. Maybe it’s because the signal isn’t a straight line, in fact far enough from a straight line to lead to these results. What’s rather disturbing is that the alternative — straight line since 1950 not being right — is suggested by their own plot, which includes 5-year moving averages (the thick red line). Of course “suggested by a plot” isn’t statistical confirmation, but it is a good reason for statistical investigation.
Turns out that “straight line since 1950” is easy to reject. Here’s a linear fit to data from 1950 through 2013:
Here are the residuals from that fit together with two patterns for testing (quadratic and piece-wise linear models), both of which reject the straight-line idea:
Both fits are statistically significant — strongly so — and yes, I allowed for the fact that the piecewise linear model has so many choices of when to place the “change point” (I used change-point analysis). It’s disappointing not to test for when the “trend leading up to” should start, when the whole point is to be statistically rigorous about when trends have actually, confirmably, changed.
One should really start about 1970 (within a year or three, depending on which data set one uses). If we do that, computing a linear fit to data from 1970 through 2013, then test the residuals for autocorrelation, the test fails to reject the null hypothesis, i.e. fails to establish autocorrelation. For example, the Ljung-Box test gives a p-value 0.64, which is nowhere near being significant, by any standard.
Funny thing is, we know there is some autocorrelation because it’s definitely present in monthly data, and annual averages are averages of monthly data. But for annual averages, it’s weak. So weak that tests fail to establish it. It’s certainly a lot weaker than their tests and computations indicate. Which means, they’re overestimating autocorrelation by rather a lot, because they chose the wrong time to start their analysis.
The choice to start at 1950 also makes their estimate of the “trend leading up to 1998” wrong. Using data from 1950 through 1997 gives a “trend leading up to” of 0.0134 deg.C/yr, but the superior choice of data from 1970 through 1997 give 0.0196 deg.C/yr. The actual “trend leading up to” is 47% higher than what you get using their 1950 starting point. Such an artificially low “trend leading up to” will make it too easy to reject the idea of a trend change. And one thing we don’t want, is to reject the idea of a trend change based on “stacking the deck” by lowering the estimated pre-existing trend.
Perhaps the biggest problem of all is that they simply compare pre-1998 trends to post-1998 trends as though they were independent intervals. But 1998 wasn’t picked out of a hat. It’s the right choice because that’s when people (mainly deniers) have claimed a “hiatus.” But when you test that idea, sorry but you don’t get to treat it like an independent time span; don’t pick a starting time because of the result it gives if you want to claim to be the utmost in statistical rigor. It’s the essence of cherry-picking.
To do it right, you need to allow for the fact that there are many possible change points (one of the things change-point analysis is all about). That makes it all too easy to get a result which appears to be significant of trend change. If you get a result which fails to be significant of trend change you can rely on it, but those that appear to be significant must be compensated.
In fact, one of the strongest arguments against a “hiatus” is that tests fail to demonstrated it even before such compensation is included. It’s fine to skip this adjustment when they do, but not even to mention it — giving the distinct impression that they’re simply unaware of the fact — fails to inspire confidence in the analysis. If they wanted to be the “gold standard” of rigor for this issue, they should not merely have mentioned it, but computed it.
One should mention that when you model temperature data as two indpendent linear segments, with independent slopes and intercepts, you’re really creating a model of a “broken trend” — it’s a combination of a sudden change in slope and value. That’s distinctly non-physical. That doesn’t mean you can’t try it — I did myself — but I think it’s incumbent on those who study the trend at least to mention this fact. I did, Cahill and Rahmstorf did, but they pass over this point without mention.
I also take exception to the fact that one of the hypotheses they go to some trouble to test is whether or not there’s a statistically significant trend since 1998. This is not the issue. The issue is whether or not the post-1998 trend is any different from the pre-1998 trend. There will always be time spans short enough to fail statistical significance. Nobody would claim a statistically significant trend from January through July of this year … and nobody cares. Such lack is meaningless, and rather than go to the trouble to dispute a meaningless claim, I’d much rather simply emphasize that it’s meaningless.
There are other technical problems, which I won’t go into. Suffice it to say that this paper doesn’t impress me, and although I agree that its conclusion is correct, I don’t agree that their analysis is correct.
I’ve said all along that when it comes to claims of a “haitus” in global warming, the evidence just isn’t there. I’ve even published research about it. It’s now dawning on the scientific community in general: all that “hiatus” talk from global warming deniers was bloviating by blowhards.
Unfortunately, a lot of people were taken in by it, including many scientists. I’ll take credit for being one of the few who never jumped on the “pause” bandwagon — I have repeatedly and consistently emphasized that when it comes to a pause, hiatus, or even slowdown in global temperature, the evidence just doesn’t pass muster. It seems the tide has turned, now more and more research is appearing which questions seriously whether or not the “hiatus” was real. Consistently, the result is: no. Fortunately the word is spreading, and not just among the scientific community but also in the media and to the general public — as this recent article in the Washington Post illustrates.
Ah–“right, but for the wrong reasons.”
Perhaps the Rajaratnam et al team will have some responses.
The merits of the 1950 starting point are not discussed in the paper. One could argue that (a) it’s the IPCC’s reference point for “warming since 1950 is mostly anthropogenic” and (b) it yields trends from xxxx-1998 that are somewhere in the middle of the trends you’d get from all possible starting points.
But the elephant-in-the-room counterargument is that there WAS a hiatus from 1940 to 1970. All their efforts to demonstrate that 1998-2013 is indistinguishable from 1950-1998 can only prove that the present hiatus, if it exists or ever existed, falls within the statistical envelope of the earlier hiatus.
If you want to determine whether the (alleged) hiatus period is distinguishable from a previous non-hiatus period, you have to pick a previous period without a hiatus.
I remember Trenberth making this argument from the perspective of someone who believes there has been a hiatus. To what extent his position actually differs from that of Tamino I do not know.
I looked at the Rajaratnam et al. paper and had a few nagging concerns that your analysis here helped bring into focus. It’s unfortunate that you weren’t a reviewer, since some of your points would have been easy for the authors to address. It seemed as if they applied powerful analytical tools without having framed their problem properly at the start; this could be due to the division of labor among the authors that they helpfully detail in the Author Contributions section.
One trivial but annoying issue for me was the title: ‘Debunking the climate hiatus’ carries far too much baggage for a research paper, and is more suited to a commentary.
Veering off-topic, sort of
The GISS LOTI plot makes a (disguised) appearance in Lewandowsky, Risby and Oreskes’ new paper as well, where it was supplied to 25 economists under the guise of being annual world agricultural output by value for the period 1880-2010. The economists were then asked if statements made by a “critic of conventional economics” that growth had stopped in 1998 were correct (17 of 25 said they were not).
Some issues I have include the highly leading nature* of the questions asked and the fact that the economists seem to have been asked to judge solely on the basis of visual inspection of the graph. In addition the facts that the LOTI plot is widely recognizable, does not resemble the value of global agricultural production by year, and that the plot was provided with a scale putting the value of ‘agricultural output’ at about 80% of world GDP may have triggered suspicions in the sharper economists.
*For example, the claim made by Mr. X is “misleading” or “ill-informed”, or worst of all “If incompetence is ruled out, the claim made about the data by Mr. X is fraudulent”.
Ironically (considering the authors) there may well be ‘seepage’ from the tactics and framing of AGW denier groups happening here, and that is not a good thing.
Cahill, Rahmstorf and Parnell (2015) Change points in global temperature, Env. Res. Lett., takes an approach consistent with what you recommend here, by the way.
This is also a problem with the trend analysis done by Karl et al. (2015).
Thanks for this, Tamino. One paper to evade when referring to the faux-pause.
Did you contact the authors? Looks to me like at the very least a correction is necessary.
“Evade”? I presume you mean “avoid”, as in avoiding its employment in arguments to the effect that there has been no pause.
It was late, English is not my first language, so yes, I meant avoid.
Thanks for checking this out. I had completely missed the LOTI/dTs confusion, they need to fix that.
The part of the paper I was interested in was the trend uncertainty analysis. The moving block bootstrap was new to me, but at an intuitive level it seems to me to do the right thing. You’re the time series expert though – do you have any thoughts on it?
[Response: It’s a sound way to approach the autocorrelation issue, but I suspect their implementation is problematic. By starting in 1950 to generate a residual series, they pollute their estimates (e.g. their tests for autocorrelation say strongly yes, but if you start at 1970 it’s not). Hence I suspect the series of data they’re using for bootstrap samples isn’t right.]
I’m a long-time reader of your blog and I may be off-topic for my first post, but I would be very curious to see an update of your graph of global temperatures anomalies adjusted with main forcings (ENSO, solar, volcanoes) from your 2011 study.
Did I miss it or can we hope an update ?
Thanks anyway for your work and your stark skepticism.
Hiatus years already happened and will happen again, just because pacific shifts from warm to cold for periods as long as 20 years… Contesting the pause is like contesting the existence of a negative PDO. The question is not if global warming paused, the point is that even with with a negative PDO global temperature did not drop like it should have.
[Response: Just because you can toss around the words “hiatus” and “pause” doesn’t make it real. Speaking of it as though it were an “obvious fact” when the evidence contradicts you and all you can say is “negative PDO,” just makes you look foolish.
As for actual evidence of that claim, I’ve looked hard for it. So far, none.]
Maybe I haven’t been clear. I’m not saying that the hiatus was a fact, just that a negative pdo happened and that a part of the warmth due to co2 has been buried in the ocean. That naturel processus does not contradict global warming just because the earth warms as much during la Nina years but we don’t see it in air temps. So air temps can go from +0.2 to -0.2 with shifts in PDO. While CO2 already warmed the planet as much as +1°C since 1880.
To find whether there has been a departure of unknown form from linearity the first thing I usually do is fit a generalized linear model, usually with either a smoothing spline or penalized regression splines as the non-linear components. I will usually choose the smoothing parameter by generalized cross-validation.
When I do this to the annual average global temperatures from 1970 on generalized cross-validation chooses a linear fit. This says that there is no evidence for a departure from linearity.
This is done assuming independent errors. But if there is any auto-correlation the GAM fitted under the assumption of independence will be under-smoothed. Which makes the degree of smoothing in a GAM conservative. And it is already linear, that is, as smooth as it can be.
My view, and I think that this is also Tamino’s opinion, is that it is completely bogus to cherry pick an extreme outlier year, 1998, for the starting point of a trend analysis. Deniers are expects in this approach. For example, they note that arctic sea ice has increased so many percent since 2012. They don’t look at the longer term trend, nor do they note that 2012 was an extreme record year. The rule is that we need to consider the long term trend. That’s what anyone with experience analyzing data does almost automatically.
Thanks for this analysis, as always!
Regarding scientists and bandwagons, recall that one person’s signal is another person’s noise. Given no change in trend, there is still the interesting and even fascinating science of understanding the physical mechanisms behind short-term fluctuations in global mean surface temperature. In my opinion the continuing interest in this problem stems not from hiatus-hype, but from the legitimate need to understand and predict interannual climate variations. Aside from ENSO, these are MUCH harder to model than either the mean state or the trend, and they have enormous real-world consequences.
“For example, they note that arctic sea ice has increased so many percent since 2012. They don’t look at the longer term trend, nor do they note that 2012 was an extreme record year.”
Oh, they do – Arctic sea ice was absent in summertimes x thousand years ago: cite a paper or two that seems to support the point.
What they fail to do is to justify their time periods, if they even discuss them. It’s enough to say lookee at the pre-industrial record, obviously not CO2.
I don’t know what you call this sort of reasoning, only that it is free of any context save ABC.