Global sea level before the satellite era is estimated from individual tide gauge records, which are combined to reconstruct a global average. One of the reconstructions climate deniers love best is from Jevrejeva et al., and the reason is obvious: because it gives a result they like.
But there are problems with the methodology used by Jevrejeva et al. HUGE problems.
One that has been much criticized is the “virtual station method” for averaging large groups of station records. Rightly so; but that’s not the problem I want to discuss. The one that hasn’t gotten attention is with their method of aligning station records to form averages.
They use a variant of the first difference method. When I first learned of that method I thought it was ingenious, downright brilliant, probably the best way to align data. Now I think that it’s the worst.
We need to align station records when they have a different baseline. Here’s a sample of monthly data records which follow exactly the same trend, but have a different baseline:
The two records are plotted with different symbols and in different colors. The first (black circles) extends from January 1950 through December 1999, but the second (red triangles) doesn’t begin until 1975.
We’re interested in how the data have changed over time, so the difference in baseline means nothing, and with things like sea level it’s arbitrary anyway. The goal is to shift each record by a constant offset to bring them into the best alignment. Then we can average them, knowing that the inconsistencies caused by (possibly vast) differences in the baseline values are minimized.
The method I use is to align them so that the sum of the squares of differences from each momentary average is minimized. I’ll call this the least squares method, and when I apply it to these data I get an aligned and averaged reconstruction like this:
It looks as though the data, when aligned and averaged, don’t seem to show any trend — just random fluctuation. And that’s correct, because these are artificial data right out of a random-number generator, plus a constant offset (different baseline) for the reccord which starts in 1975 rather than 1950.
I can also compute annual averages, again showing no trend (because there isn’t one):
The least squares method works fine, and hasn’t introduced any false trend. That’s good.
What’s the first difference method? We begin by computing, for each record, its first differences. These are just the differences between each value and the preceding value. This transforms each time series into a time series of first differences. The usefulness is that if the data x are some signal v plus some unknown and arbitrary baseline b
then the first difference operation eliminates that arbitrary baseline
Now we can average the first differences themselves, and even if the baseline values b are different for different data series, it doesn’t matter because we’ve eliminated those by first-differencing.
Finally, we transform from “averaged first differences” back to “averaged value” by integrating, which is easily accomplished by computing cumulative sums. We have eliminated baseline values, so we don’t even have to choose an offset for each record!
Let’s apply that method to these two artificial data sets. It gives this:
Whoa! What’s going on? That doesn’t look right. If we compute annual averages of this combined data set, it just emphasizes that “it ain’t right.”
What went wrong?
If you “do the math” you’ll discover that we have not avoided applying an offset. Instead, there’s a “hidden assumption” of what they are. And it is: that the very first value of the “new” series (the one plotted as red triangles) is exactly equal to the value of the first series at that starting moment.
And that’s a very serious problem. If we use the least squares method, of course our estimated offset will be imperfect because of the noise in the data, but at least its average will be minimized because we take into account all the data. But with the first difference method, the offset, although never computed or even necessarily thought about, is actually asummed based on a single moment of time. All the statistical precision we get from averaging large quantities of data goes out the window — the error in the offset is based on a single value only. And since that includes the random noise in both data series, it’s even bigger than the random noise in a single value.
The offset error which arises is not a bias. It’s random, and its expected value is zero so it’s unbiased. The problem is that its variance is so high; we lose the “power of large numbers” that comes with the least squares method. Hence we can get large offsets when a new data record begins, and although they can go either way (as they can with the least squares method), they’ll probably be a lot bigger than the offset error from least squares.
If there are multiple records, the offset errors accumulate. Hence, even when they are all unbiased, the variance of the cumulative offsets keeps growing. It’s not good when adding more data makes probable errors get bigger.
Here, for instance, are ten data records, the first starting in 1950, the next in 1955, next in 1960, etc. up to the tenth starting in 1995 (all end at December 1999):
Here’s what the least squares method gives as aligned averages:
That looks good, since all these data sets are nothing but random noise (plus a different offset for each). Here’s what the first-difference method gives:
Here are annual averages of same:
Not only have we introduced multiple large offset errors, they have conspired to make an apparently very strong, but totally spurious trend. Not good. Every time a new data record enters or exits the set, there’s another offset error added to the mix.
Let’s try these methods on some real data. I took sea level data from PSMSL (the Permanent Service for Mean Sea Level) for stations in Florida, and identified which station records have at least 600 months’ data. That amounts to seven stations:
But we’ll have offset errors more often than you might expect, because when a data record is missing a value you can’t compute a first-difference. It’s like a station drops out, then re-enters later, and those events too contribute to offset error.
Before doing any alignment, I’ll remove the seasonal cycle from each data series.
Let’s compare the reconstruction using the least squares method in red, to that using the first difference method in blue:
For the first 30 years or so they agree perfectly (which is why the blue line is hidden by the red), because there’s only 1 data record covering that time span so there’s no offset error. But when new stations come and go, the differences become palpable; here in fact is the difference between the two reconstructions:
Not only are the difference sizeable, they have once again introduced a spurious difference in trend.
Jevrejeva et al. don’t actually use the basic first-difference method. Instead they compute differences between values 12 months apart rather than just 1 month apart, in order to eliminate the seasonal cycle. That’s fine, but it doesn’t solve the first-difference offset problem. If you “do the math” you find that it’s the same as using the first-difference method (with lag 1 month instead of 12), then computing 12-month moving averages.
And that is yet another problem, although when it comes to estimating trends it’s a minor one. The Jevrejeva et al. data are 12-month moving averages, although that eliminates the seasonal cycle it also introduces “wicked strong” autocorrelation in the data. If you analyze the Jevrejeva et al. data without being aware of this, you could easily reach a faulty conclusion by not taking the strong autocorrelation into account.
The bottom line is that the Jeverejeva et al. data are faulty in multiple ways because of multiple problems with their methodology. That doesn’t make them totally useless — but it does mean that they’re a bad indicator of what global sea level trends have been over time. Which is one of the reasons climate deniers love to use these data so much.
This blog is made possible by readers like you; join others by donating at My Wee Dragon.