Robert Grumbine has a post in which he takes an unusual look at global temperature data. I’m afraid I must take exception to his methodology.
He starts with the HadCRUT3v data set (and kindly provides his data as an ExCel file here). He then defines a couple of different versions of “climate normal” as the average over some particular time frame. In fact his first choice is to use the entire data set, for which the climate normal is just the data average:
Then he transforms the data values into “cooling degree months.” These are the differences between a given month’s value and the average, which we can call :
He then computes running sums of the cooling degree months to generate a new time series of accumulated cooling degree months; let’s call that :
First of all we’ll note that doesn’t depend on time, it’s a constant, so its sum is simply that constant times the number of terms. Therefore
Let’s give a name to the accumulated sums of the raw data values . We’ll call them cumulative sums, and mathematically denote them as . Therefore
For instance, when defining the average using the entire time span Grumbine gets this for the accumulated cooling degree months:
and I get this:
They’re the same.
Later he uses a different time span to define “normal.” But that just leads to a different value for , so it only changes the result by adding a linear trend to the previous values.
My complaint is that it’s way to easy to see patterns in cumulative sums that really don’t mean anything. Suppose for instance that “normal” temperature was zero, and the deviations from normal were purely random numbers — plain old white noise like this:
If we then define “cooling degree months” as departure from the average value, and accumulated cooling degree months as the cumulative sums of those, we get what looks like an extremely strong pattern:
But the pattern really doesn’t mean anything at all. By construction, the time series is just random noise.
The root of the false appearance is that the time series of cumulative sums has extraordinarily high autocorrelation, to an extremely high lag. Here’s the sample autocorrelation function for the random-noise cumulative sums:
Note that the autocorrelation is both extremely high, and persists as long as lag 600 months (50 years!). That’s just the nature of cumulative sums. And that’s for random noise which is white noise, i.e., which isn’t already autocorrelated. Actual temperature values already show autocorrelation, which leads to even stronger autocorrelation in the cumulative sums.
In fact we can generate random numbers with autocorrelation similar to the noise (not the signal!) in global temperature, like this:
Then we can compute accumulated deviations from average just as before, and we get this:
Note the extremely strong appearance of a powerful signal. But again, by construction the data are just random, pure noise.
There are ways to deal with cumulative sums, and in some (rare) circumstances it is natural to analyze them. But this isn’t one of those circumstances. There’s no information in the cumulative sums that isn’t already in the original data, and you don’t need the cumulative sums to glean insight about what the trend is doing — that too is clear from the original data.
In fact computing cumulative sums is a very dangerous approach to analyzing data, the autocorrelation is too strong and the likelihood of deducing patterns where none really exist is just too high. So I recommend against it, very strongly.