There’s another paper about sea level rise in the Journal of Coastal Research by P. J. Watson (2011, Is There Evidence Yet of Acceleration in Mean Sea Level Rise around Mainland Australia?, Journal of Coastal Research, 27, 368–377). According to this powerpoint, Watson is genuinely concerned about sea level rise due to global warming and argues forcefully for addressing the issue. His primary interest seems to be: to help those responsible for protecting Australia’s coastline be as well prepared as possible for the impending sea level rise. That’s a noble motive, and I wish him success. But in spite of the best of intentions, I can’t put much stock in Watson’s published results because it’s clear that he is no data analyst.
Watson looks at tide gauge data from 4 stations near Australia: Fremantle on the West coast of Oz, Fort Denison in Sydney Harbor on the East coast, Newcastle (which is quite close to Sydney), and Auckland New Zealand. The data can be obtained from PSMSL (Permanent Service for Mean Sea Level). He obtained more current data to update these records through the end of 2009. I didn’t have to do that for the Australian stations because the present data sets in the PSMSL archive are already current up to the end of 2009. But I don’t have the up-to-date data for Auckland, NZ that Watson used — the Auckland data I acquired only go to May of 2000.
There are two very big problems with the analysis, and several not-so-big ones. For example: if we plot one of the data sets, say Fremantle on the West Australian coast, it shows an obvious increase over time as well as a great deal of fluctuation:
Close inspection of the graph (click on it for a larger, clearer view) suggests that there may be an annual cycle. This is easily confirmed by Fourier analysis (I first detrended the data, then computed the Fourier periodogram):
We can even get a good idea of the shape and size of the annual cycle by “folding” the data with a period of 1 year, i.e., plotting the (detrended) data as a function of phase (which is equivalent to a plot as a function of month) rather than time. I’ve plotted two whole cycles (just repeating one after the other) so that the shape of the annual cycle is clear:
No doubt about it, there’s an annual cycle. We can improve trend analysis by removing it, which we can do because it’s not noise it’s signal, and we should do because it’s part of the signal which doesn’t contribute to the trend. I chose to do so by computing anomaly, the difference between each month’s value and the average for that same month during a baseline period, which I chose as 1960.0 to 1990.0. Here are the anomalies:
Note that the noise is greatly reduced, in fact the range of the y-axis is a good bit less than it was before, because the variation is reduced by having removed the annual cycle. Good riddance!
Unfortunately Watson doesn’t remove the annual cycle at all. This leaves in place a source of variation which is due to non-trend signal — not due to trend, and not due to noise. If it had been eliminated it would have improved whatever trend analysis follows.
What he does do is remove much of the noise. That’s a great idea if you want to get insight from visual inspection of a graph. Under such circumstances I recommend a good smoothing method (I’m fond of the Lowess smooth, others prefer other kinds). Watson chose a moving average filter. That’s OK, except for two things. First, it’s not a very “smooth” smooth — it’s kinda choppy — but that’s just me being fussy. More important, a moving average filter of width T will cut off a time span equal to half of T from both the beginning and the end of the time series — you lose what just might be the most interesting parts, the beginning and the end. Since he uses a 20-year moving average filter, he loses the first and last 10 years of the time span.
Well and good; he computed 20-yr moving averages of all four data sets, then aligned them by shifting them to set the zero point equal to value in January 1940. I think this is a very poor choice of “baseline.” For one thing, it’s only one month. Mainly, we can’t be sure that the data quality was as good 70 years ago as it was more recently, and I think it would be a better idea to align them so that they’re coincident during a period of better-quality data. Nonetheless, he gets this for the 20-yr moving averages of the four data sets (“Fort Denison” is the location of the Sydney data):
Note that the values from Newcastle are consistently much higher than those from other locations. This argues that there is indeed a data problem with Newcastle, and that the January 1940 choice for baseline is indeed a poor selection. When I computed 20-yr moving averages using the 1960.0 to 1990.0 baseline, it looks like this:
Now it’s clear that the 20-yr moving averages of Newcastle data are divergent from the other data sets prior to 1960. They shouldn’t be: Newcastle is so close to Sydney that it’s implausible to suggest it’s a regional effect. It’s probably a problem with data quality. Therefore the 20-yr moving averages from Newcastle shouldn’t be trusted prior to 1960, and shouldn’t be included in further analysis. Yet Watson does include these earlier, suspect data in his subsequent analysis.
It’s also clear that the 20-yr moving averages for all stations diverge prior to about 1930. Therefore analysis which includes those values prior to that time are also suspect, and the 20yr moving average shouldn’t be trusted prior to about 1930. Yet Watson includes them too, analyzing the 20yr moving averages for Fremantle and Auckland from 1920 through 2000.
Those are all nontrivial problems. But now we come to one of the very big problems: instead of just using the smoothed (20yr moving average) values to gain insight from the graph, he actually treats them as data and subjects them to analysis. This is a very bad idea. In fact, there’s a good book about analyzing astronomical time series, written to be accessible to the non-expert, which warns strongly against exactly this:
One might wonder, since we’ve already recommended analyzing averages rather than the raw data, why not analyze moving averages? Surely they reduce the impact of the noise, and won’t that improve results? The answer is an emphatic NO. When we compute averages with bins that don’t overlap, the noise for each computed average is independent of the noise for all the other averages, so we can apply all our tests and analyses which rely on assumptions like the noise being white noise. But when we compute moving averages, the noise in different averages is not independent because of the extreme overlap between the data used for different averages. In fact consecutive moving averages based on 50 data points each, are based on 49 of the same data values! This strong dependence between nearby moving-average values leads to extremely strong autocorrelation in the noise of the moving averages, which invalidates the statistical treatment of moving averages as signal-plus-white-noise. Moving averages are a robust and simple way to smooth noisy data, but should never be used in analysis as a substitute for the original data.
Analyzing moving averages totally invalidates the statistical evaluation of the analysis. In particular, it tends to inflate (greatly!) the apparent quality of fitting some model to the data. That’s why, when Watson gets around to fitting models to these data (as quadratic functions of time), he is able to report such large values of the squared correlation coefficient.
He also seems to be operating under the misconception that extremely high values of the squared correlation coefficient validate the model statistically. Not so. It does tell you how much of the variance is explained by the model, but doesn’t reveal whether or not the fit is meaningful (i.e., not entirely random). And besides that, his impressively large squared correlation coefficients are tremendously inflated by virtue of his having analyzed 20yr moving averages rather than simply data.
This also falls prey to the problem of chopping off the beginning and end of the time series. Think of each data point as a “voter” which should get one “vote” like every other data point. When you compute a 20yr moving average using monthly data, each average is based on 240 months. Each month within that range gets 1/240th of a “vote” for that moving average. Most of the data points get to contribute to 240 of the moving averages, so they get 240 “partial votes” worth 1/240th each, for a total of 1 vote per data point. But the very last data point — which is very important for determining the recent trend and possible acceleration — only participates in voting for the very last moving average value. Essentially, it only gets to contribute to one “partial vote” worth 1/240th, so it only gets 1/240th of the total voting power that a central data point would get.
In fact the first and last 20 years of data get less than a full “vote” in the time evolution of the signal, and the closer to the beginning or end the less vote they get. This downplaying of the earliest and latest values — especially the latest — undermines our ability to determine the most recent behavior and whether or not the time series has recently shown acceleration.
When fitting a quadratic curve to estimate the acceleration, excluding the late data can actually change the sign of the result. Consider the Fremantle data from 1940 to the present (one of the data sets used by Watson). Watson estimates the acceleration as twice the coefficient of in the model fit. Using the 20yr moving averages from 1940 to the present, he estimates the acceleration as -0.016 mm/yr/yr, and when I repeated that analysis I got the same result (actually -0.015 mm/yr/yr, but I suspect the difference is due to rounding). The negative value indicates deceleration. But when I use the actual data in the same analysis (with the annual cycle removed), the estimated acceleration is positive, 0.013 mm/yr/yr. By suppressing the influence of the most recent data, an estimate of acceleration has been changed to one of deceleration.
Finally, we come to the other very big problem with this analysis: the model itself. Watson models his data as a quadratic function of time:
He then uses (the 2nd time derivative of the model) as the estimated acceleration. But this model assumes that the acceleration is constant throughout the observed time span. That’s clearly not so. You can tell just by looking at a smooth (a lowess smooth, so as not to eliminate the starting and ending data), for instance for Fremantle (I’ve superimposed the lowess smooth and the 20yr moving averages):
This is clearly not well approximated by a quadratic function of time, and just as clearly the signal does not show constant acceleration throughout.
If you want to know how the acceleration might be changing over time you need to use a model which allows for acceleration change. You might try, for instance, a quartic model:
Then you can approximate the acceleration as a function of time, again as the 2nd time derivative of the model:
When I fit this model, I get the following estimate of time-varying acceleration at Fremantle:
Note that the acceleration is strongly positive early, dips negative (deceleration) around 1950, then becomes strongly positive recently. In fact the most recent estimate shows the highest positive acceleration. That answers Watson’s question Is There Evidence Yet of Acceleration in Mean Sea Level Rise around Mainland Australia? Yes.
This is just one model, and a particularly simple one. If I were really interested in investigating the issue, I’d try numerous models, select one based on AIC or BIC or stepwise regression, and use that to estimate the changes in acceleration over time. I’d also put some error bars on the estimates of acceleration. Who knows, I might even be able to publish it in the Journal of Coastal Research.
As a matter of fact, this kind of time-varying acceleration is exactly what would be expected from models of sea level rise based on temperature, such as that proposed by Vermeer & Rahmstorf (2009). In fact, that model will reproduce observed sea level changes well, including some of the changes in acceleration over time. It also points to very large acceleration, leading to very troublesome sea level rise, during this upcoming century.