There’s yet another paper debunking the so-called “hiatus” in global temperature, making five so far (of which I’m aware), including one of my own. But this one, in my opinion, isn’t helping. In fact I believe it has some very serious problems, some of which make the idea of “hiatus” too easy to reject, while others make it too hard to reject. Although I agree with their overall conclusion — and published that conclusion before they did — I find their evidence completely unconvincing.
We’ll start with the confusion about what data they’re using. They focus on global temperature data from NASA GISS, and repeatedly refer to it as “land-ocean temperature index” (LOTI). NASA produces two such indexes, and the one they refer to (LOTI) is based on a combination of meteorological station data with sea-surface temperature data. The other uses only meteorological stations, and NASA calls it “meteorological station data.”
They repeatedly say they’re using LOTI — but they’re not. At least, not according to the graphs they show. Here’s their figure 1:
Every month NASA updates their data because new data arrives, so it’s not clear which release they were using. I haven’t saved all the past releases, but I have squirreled away several of them, and I couldn’t find any which match this graph. It’s especially problematic during the 1935-1950 period. It finally dawned on me that they’re simply not using LOTI, they’re using meteorological station data. That’s OK, but repeatedly referring to it as “land-ocean temperature index” is not.
Which leaves open the question, which release? Unfortunately I haven’t found a reference in this paper. But the version released in January of 2014 seems to match quite well:
I’m just going to assume they’re using the Jan. 2014 release of meteorological station data, and proceed.
They focus on comparing the trend from 1998 through 2013 (the purported “hiatus” period) to the trend leading up to 1998. Good plan. Unfortunately they characterize the “trend leading up to” by starting with 1950, and that’s a very bad choice.
Notice that in their graph (their figure 1) they include a table labelled “Temporal Dependence” showing results of tests for autocorrelation. Both the Durbin-Watson test (which essentially tests lag-1 autocorrelation) and the Ljung-Box test (which tests an ensemble of many autocorrelations) indicate the presence of autocorrelation strong enough to be detected with the given data. All their tests are based on the residuals of linear regressions starting in 1950. For the “1950-2013 full” test (i.e. residuals from a linear fit to the time span 1950 through 2013) they get a Ljung-Box statistic with p-value 0.0015, definitely significant at 95% confidence (I got 0.00135, but I’m probably not using exactly the same data they are).
The problem is that these test results don’t prove autocorrelation, they just negate the null hypothesis. It’s safe to say that the residuals from a linear fit 1950-2013 are not just independent random variables, and that might be due to autocorrelation or it might not. Maybe it’s because the signal isn’t a straight line, in fact far enough from a straight line to lead to these results. What’s rather disturbing is that the alternative — straight line since 1950 not being right — is suggested by their own plot, which includes 5-year moving averages (the thick red line). Of course “suggested by a plot” isn’t statistical confirmation, but it is a good reason for statistical investigation.
Turns out that “straight line since 1950” is easy to reject. Here’s a linear fit to data from 1950 through 2013:
Here are the residuals from that fit together with two patterns for testing (quadratic and piece-wise linear models), both of which reject the straight-line idea:
Both fits are statistically significant — strongly so — and yes, I allowed for the fact that the piecewise linear model has so many choices of when to place the “change point” (I used change-point analysis). It’s disappointing not to test for when the “trend leading up to” should start, when the whole point is to be statistically rigorous about when trends have actually, confirmably, changed.
One should really start about 1970 (within a year or three, depending on which data set one uses). If we do that, computing a linear fit to data from 1970 through 2013, then test the residuals for autocorrelation, the test fails to reject the null hypothesis, i.e. fails to establish autocorrelation. For example, the Ljung-Box test gives a p-value 0.64, which is nowhere near being significant, by any standard.
Funny thing is, we know there is some autocorrelation because it’s definitely present in monthly data, and annual averages are averages of monthly data. But for annual averages, it’s weak. So weak that tests fail to establish it. It’s certainly a lot weaker than their tests and computations indicate. Which means, they’re overestimating autocorrelation by rather a lot, because they chose the wrong time to start their analysis.
The choice to start at 1950 also makes their estimate of the “trend leading up to 1998” wrong. Using data from 1950 through 1997 gives a “trend leading up to” of 0.0134 deg.C/yr, but the superior choice of data from 1970 through 1997 give 0.0196 deg.C/yr. The actual “trend leading up to” is 47% higher than what you get using their 1950 starting point. Such an artificially low “trend leading up to” will make it too easy to reject the idea of a trend change. And one thing we don’t want, is to reject the idea of a trend change based on “stacking the deck” by lowering the estimated pre-existing trend.
Perhaps the biggest problem of all is that they simply compare pre-1998 trends to post-1998 trends as though they were independent intervals. But 1998 wasn’t picked out of a hat. It’s the right choice because that’s when people (mainly deniers) have claimed a “hiatus.” But when you test that idea, sorry but you don’t get to treat it like an independent time span; don’t pick a starting time because of the result it gives if you want to claim to be the utmost in statistical rigor. It’s the essence of cherry-picking.
To do it right, you need to allow for the fact that there are many possible change points (one of the things change-point analysis is all about). That makes it all too easy to get a result which appears to be significant of trend change. If you get a result which fails to be significant of trend change you can rely on it, but those that appear to be significant must be compensated.
In fact, one of the strongest arguments against a “hiatus” is that tests fail to demonstrated it even before such compensation is included. It’s fine to skip this adjustment when they do, but not even to mention it — giving the distinct impression that they’re simply unaware of the fact — fails to inspire confidence in the analysis. If they wanted to be the “gold standard” of rigor for this issue, they should not merely have mentioned it, but computed it.
One should mention that when you model temperature data as two indpendent linear segments, with independent slopes and intercepts, you’re really creating a model of a “broken trend” — it’s a combination of a sudden change in slope and value. That’s distinctly non-physical. That doesn’t mean you can’t try it — I did myself — but I think it’s incumbent on those who study the trend at least to mention this fact. I did, Cahill and Rahmstorf did, but they pass over this point without mention.
I also take exception to the fact that one of the hypotheses they go to some trouble to test is whether or not there’s a statistically significant trend since 1998. This is not the issue. The issue is whether or not the post-1998 trend is any different from the pre-1998 trend. There will always be time spans short enough to fail statistical significance. Nobody would claim a statistically significant trend from January through July of this year … and nobody cares. Such lack is meaningless, and rather than go to the trouble to dispute a meaningless claim, I’d much rather simply emphasize that it’s meaningless.
There are other technical problems, which I won’t go into. Suffice it to say that this paper doesn’t impress me, and although I agree that its conclusion is correct, I don’t agree that their analysis is correct.
I’ve said all along that when it comes to claims of a “haitus” in global warming, the evidence just isn’t there. I’ve even published research about it. It’s now dawning on the scientific community in general: all that “hiatus” talk from global warming deniers was bloviating by blowhards.
Unfortunately, a lot of people were taken in by it, including many scientists. I’ll take credit for being one of the few who never jumped on the “pause” bandwagon — I have repeatedly and consistently emphasized that when it comes to a pause, hiatus, or even slowdown in global temperature, the evidence just doesn’t pass muster. It seems the tide has turned, now more and more research is appearing which questions seriously whether or not the “hiatus” was real. Consistently, the result is: no. Fortunately the word is spreading, and not just among the scientific community but also in the media and to the general public — as this recent article in the Washington Post illustrates.