How Not to Analyze Tide Gauge Data

Posted on July 22, 2011 | 110 Comments

There’s another paper about sea level rise in the Journal of Coastal Research by P. J. Watson (2011, Is There Evidence Yet of Acceleration in Mean Sea Level Rise around Mainland Australia?, Journal of Coastal Research, 27, 368–377). According to this powerpoint, Watson is genuinely concerned about sea level rise due to global warming and argues forcefully for addressing the issue. His primary interest seems to be: to help those responsible for protecting Australia’s coastline be as well prepared as possible for the impending sea level rise. That’s a noble motive, and I wish him success. But in spite of the best of intentions, I can’t put much stock in Watson’s published results because it’s clear that he is no data analyst.

Watson looks at tide gauge data from 4 stations near Australia: Fremantle on the West coast of Oz, Fort Denison in Sydney Harbor on the East coast, Newcastle (which is quite close to Sydney), and Auckland New Zealand. The data can be obtained from PSMSL (Permanent Service for Mean Sea Level). He obtained more current data to update these records through the end of 2009. I didn’t have to do that for the Australian stations because the present data sets in the PSMSL archive are already current up to the end of 2009. But I don’t have the up-to-date data for Auckland, NZ that Watson used — the Auckland data I acquired only go to May of 2000.

There are two very big problems with the analysis, and several not-so-big ones. For example: if we plot one of the data sets, say Fremantle on the West Australian coast, it shows an obvious increase over time as well as a great deal of fluctuation:

Close inspection of the graph (click on it for a larger, clearer view) suggests that there may be an annual cycle. This is easily confirmed by Fourier analysis (I first detrended the data, then computed the Fourier periodogram):

We can even get a good idea of the shape and size of the annual cycle by “folding” the data with a period of 1 year, i.e., plotting the (detrended) data as a function of phase (which is equivalent to a plot as a function of month) rather than time. I’ve plotted two whole cycles (just repeating one after the other) so that the shape of the annual cycle is clear:

No doubt about it, there’s an annual cycle. We can improve trend analysis by removing it, which we can do because it’s not noise it’s signal, and we should do because it’s part of the signal which doesn’t contribute to the trend. I chose to do so by computing anomaly, the difference between each month’s value and the average for that same month during a baseline period, which I chose as 1960.0 to 1990.0. Here are the anomalies:

Note that the noise is greatly reduced, in fact the range of the y-axis is a good bit less than it was before, because the variation is reduced by having removed the annual cycle. Good riddance!

Unfortunately Watson doesn’t remove the annual cycle at all. This leaves in place a source of variation which is due to non-trend signal — not due to trend, and not due to noise. If it had been eliminated it would have improved whatever trend analysis follows.

What he does do is remove much of the noise. That’s a great idea if you want to get insight from visual inspection of a graph. Under such circumstances I recommend a good smoothing method (I’m fond of the Lowess smooth, others prefer other kinds). Watson chose a moving average filter. That’s OK, except for two things. First, it’s not a very “smooth” smooth — it’s kinda choppy — but that’s just me being fussy. More important, a moving average filter of width T will cut off a time span equal to half of T from both the beginning and the end of the time series — you lose what just might be the most interesting parts, the beginning and the end. Since he uses a 20-year moving average filter, he loses the first and last 10 years of the time span.

Well and good; he computed 20-yr moving averages of all four data sets, then aligned them by shifting them to set the zero point equal to value in January 1940. I think this is a very poor choice of “baseline.” For one thing, it’s only one month. Mainly, we can’t be sure that the data quality was as good 70 years ago as it was more recently, and I think it would be a better idea to align them so that they’re coincident during a period of better-quality data. Nonetheless, he gets this for the 20-yr moving averages of the four data sets (“Fort Denison” is the location of the Sydney data):

Note that the values from Newcastle are consistently much higher than those from other locations. This argues that there is indeed a data problem with Newcastle, and that the January 1940 choice for baseline is indeed a poor selection. When I computed 20-yr moving averages using the 1960.0 to 1990.0 baseline, it looks like this:

Now it’s clear that the 20-yr moving averages of Newcastle data are divergent from the other data sets prior to 1960. They shouldn’t be: Newcastle is so close to Sydney that it’s implausible to suggest it’s a regional effect. It’s probably a problem with data quality. Therefore the 20-yr moving averages from Newcastle shouldn’t be trusted prior to 1960, and shouldn’t be included in further analysis. Yet Watson does include these earlier, suspect data in his subsequent analysis.

It’s also clear that the 20-yr moving averages for all stations diverge prior to about 1930. Therefore analysis which includes those values prior to that time are also suspect, and the 20yr moving average shouldn’t be trusted prior to about 1930. Yet Watson includes them too, analyzing the 20yr moving averages for Fremantle and Auckland from 1920 through 2000.

Those are all nontrivial problems. But now we come to one of the very big problems: instead of just using the smoothed (20yr moving average) values to gain insight from the graph, he actually treats them as data and subjects them to analysis. This is a very bad idea. In fact, there’s a good book about analyzing astronomical time series, written to be accessible to the non-expert, which warns strongly against exactly this:

One might wonder, since we’ve already recommended analyzing averages rather than the raw data, why not analyze moving averages? Surely they reduce the impact of the noise, and won’t that improve results? The answer is an emphatic NO. When we compute averages with bins that don’t overlap, the noise for each computed average is independent of the noise for all the other averages, so we can apply all our tests and analyses which rely on assumptions like the noise being white noise. But when we compute moving averages, the noise in different averages is not independent because of the extreme overlap between the data used for different averages. In fact consecutive moving averages based on 50 data points each, are based on 49 of the same data values! This strong dependence between nearby moving-average values leads to extremely strong autocorrelation in the noise of the moving averages, which invalidates the statistical treatment of moving averages as signal-plus-white-noise. Moving averages are a robust and simple way to smooth noisy data, but should never be used in analysis as a substitute for the original data.

Analyzing moving averages totally invalidates the statistical evaluation of the analysis. In particular, it tends to inflate (greatly!) the apparent quality of fitting some model to the data. That’s why, when Watson gets around to fitting models to these data (as quadratic functions of time), he is able to report such large values of the squared correlation coefficient.

He also seems to be operating under the misconception that extremely high values of the squared correlation coefficient validate the model statistically. Not so. It does tell you how much of the variance is explained by the model, but doesn’t reveal whether or not the fit is meaningful (i.e., not entirely random). And besides that, his impressively large squared correlation coefficients are tremendously inflated by virtue of his having analyzed 20yr moving averages rather than simply data.

This also falls prey to the problem of chopping off the beginning and end of the time series. Think of each data point as a “voter” which should get one “vote” like every other data point. When you compute a 20yr moving average using monthly data, each average is based on 240 months. Each month within that range gets 1/240th of a “vote” for that moving average. Most of the data points get to contribute to 240 of the moving averages, so they get 240 “partial votes” worth 1/240th each, for a total of 1 vote per data point. But the very last data point — which is very important for determining the recent trend and possible acceleration — only participates in voting for the very last moving average value. Essentially, it only gets to contribute to one “partial vote” worth 1/240th, so it only gets 1/240th of the total voting power that a central data point would get.

In fact the first and last 20 years of data get less than a full “vote” in the time evolution of the signal, and the closer to the beginning or end the less vote they get. This downplaying of the earliest and latest values — especially the latest — undermines our ability to determine the most recent behavior and whether or not the time series has recently shown acceleration.

When fitting a quadratic curve to estimate the acceleration, excluding the late data can actually change the sign of the result. Consider the Fremantle data from 1940 to the present (one of the data sets used by Watson). Watson estimates the acceleration as twice the coefficient of $t^2$ in the model fit. Using the 20yr moving averages from 1940 to the present, he estimates the acceleration as -0.016 mm/yr/yr, and when I repeated that analysis I got the same result (actually -0.015 mm/yr/yr, but I suspect the difference is due to rounding). The negative value indicates deceleration. But when I use the actual data in the same analysis (with the annual cycle removed), the estimated acceleration is positive, 0.013 mm/yr/yr. By suppressing the influence of the most recent data, an estimate of acceleration has been changed to one of deceleration.

Finally, we come to the other very big problem with this analysis: the model itself. Watson models his data as a quadratic function of time:

$x = \beta_o + \beta_1 t + \beta_2 t^2$ .

He then uses $2\beta_2$ (the 2nd time derivative of the model) as the estimated acceleration. But this model assumes that the acceleration is constant throughout the observed time span. That’s clearly not so. You can tell just by looking at a smooth (a lowess smooth, so as not to eliminate the starting and ending data), for instance for Fremantle (I’ve superimposed the lowess smooth and the 20yr moving averages):

This is clearly not well approximated by a quadratic function of time, and just as clearly the signal does not show constant acceleration throughout.

If you want to know how the acceleration might be changing over time you need to use a model which allows for acceleration change. You might try, for instance, a quartic model:

$x = \beta_o + \beta_1 t + \beta_2 t^2 + \beta_3 t^3 + \beta_4 t^4$ .

Then you can approximate the acceleration as a function of time, again as the 2nd time derivative of the model:

$a = 2 \beta_2 + 6 \beta_3 t + 12 \beta_4 t^2$ .

When I fit this model, I get the following estimate of time-varying acceleration at Fremantle:

Note that the acceleration is strongly positive early, dips negative (deceleration) around 1950, then becomes strongly positive recently. In fact the most recent estimate shows the highest positive acceleration. That answers Watson’s question Is There Evidence Yet of Acceleration in Mean Sea Level Rise around Mainland Australia? Yes.

This is just one model, and a particularly simple one. If I were really interested in investigating the issue, I’d try numerous models, select one based on AIC or BIC or stepwise regression, and use that to estimate the changes in acceleration over time. I’d also put some error bars on the estimates of acceleration. Who knows, I might even be able to publish it in the Journal of Coastal Research.

As a matter of fact, this kind of time-varying acceleration is exactly what would be expected from models of sea level rise based on temperature, such as that proposed by Vermeer & Rahmstorf (2009). In fact, that model will reproduce observed sea level changes well, including some of the changes in acceleration over time. It also points to very large acceleration, leading to very troublesome sea level rise, during this upcoming century.

This entry was posted in climate change, Global Warming and tagged Global Warming. Bookmark the permalink.

110 responses to “How Not to Analyze Tide Gauge Data”

Chris Colose | July 22, 2011 at 8:38 pm |

Great post Tamino. you probably have enough here alone to write a comment in JCR (though I´m not familiar with the journal or how seriosuly we are supposed to take it).
Rob Honeycutt | July 22, 2011 at 8:57 pm |

Tamino, you deserve a Wayne’s World style… “We’re not worthy!”
You are a true Alice Cooper of statistical analysis! :-)

[Response: Thanks for a good laugh. And for likening me to Alice Cooper rather than Madonna.]
- arch stanton | July 23, 2011 at 1:06 am |
  
  Indeed!
  
  Although Wayne and Garth’s taste misses something… Clapton comes to mind.
  
  I’m on my knees.
- Rob Honeycutt | July 23, 2011 at 8:11 pm |
  
  For full context see… this.
- Rob Honeycutt | July 23, 2011 at 8:13 pm |
  
  Sorry to fill the comments with fluff here but this is a better version for context… Here.
Rocco | July 22, 2011 at 10:12 pm |

Another vote for publication. We can’t stop the propaganda, but it would be a shame if the “deceleration” meme took a hold in the literature.
- Damien | July 24, 2011 at 10:24 am |
  
  We can’t stop the propaganda, but it would be a shame if the “deceleration” meme took a hold in the literature.
  
  Too late.
Ed Davies | July 22, 2011 at 10:28 pm |

So, can we conclude that the Journal of Coastal Research doesn’t know any statisticians able to review articles of this type?
- Ray Ladbury | July 23, 2011 at 1:52 am |
  
  Actually, this is a VERY big problem in a lot of fields. Many of my papers are rather heavy on statistical analysis in a field where most papers are of the “squirt and tell” variety (that is, squirt it with the beam and tell what the device being tested did). I actually had a reviewer say that AIC couldn’t be a useful or standard technique because he’d never heard of it. My papers always pose a challenge for the editors when it comes to finding reviewers–and this is a problem, as I’m a complete aut0didact when it comes to stats.
  
  [Response: AIC not useful? Not standard? Now I’ve heard everything.]
Tom Curtis | July 23, 2011 at 12:04 am |

The reason for the aberrant values for Newcastle between 1940 and 1960 may be, ironically enough, coal mining caused subsidence. Lynne Humphries cites an anthropologically induced subsidence of 52 to 168 mm. Coal mining became much less extensive in the Hunter Valley (Newcastle’s location) after 1960 as Queensland’s open cut mines increasingly reduced their competitive advantage.

Click to access c018p147.pdf
- Susan Anderson | July 23, 2011 at 4:08 pm |
  
  Fascinating! I used to spend a lot of time in Cornwall, where the cliffs were changed (and made beautiful and interesting) by mining. I hope other counterintuitive features will be checked against local practices that may exert unexpected influences on data.
- naught101 | July 31, 2011 at 6:20 am |
  
  Actually, that’s not entirely true. Coal mining in NEWCASTLE reduced in the latter half of last century, but mining in the Hunter valley as a whole has significantly increased. Also, mining in the valley shifted from underground to mostly open-cut, especially in more open areas up the valley. Newcastle is currently the world’s biggest coal export port. Irony indeed.
Deech56 | July 23, 2011 at 12:05 am |

Tamino, very understandable analysis. Following the advice in Bart V’s post (and comments), I think it would be very important to submit a comment to the journal.

And good find of the light curves book. ;-)
Ernst K | July 23, 2011 at 2:23 am |

How about performing a LOESS fit with a second degree local polynomial? Wouldn’t this allow you to produce a LOESS acceleration estimate throughout the time series?

Keep in mind that my knowledge of LOESS is limited to what’s on the Wikipedia page.
IA | July 23, 2011 at 6:02 am |

Tamino

Thanks for explaining in such a clear and comprehensive manner.
Hunt Janin | July 23, 2011 at 11:44 am |

I’ve posted this on Realclimate but did not get any replies. Here’s the situation: as a generalist, I’m writing an introductory survey (a book for a US publisher) on sea level rise and have used the best published materials and the best real-live experts to do so. Mydraft manuscript, however, may be a bit too careful and too conservative. If you have any out-of-the-box (but rational and well-founded) ideas about the FUTURE of sea level rise, please share them with me off-list at huntjanin@aol.com. Thanks.
- JCH | July 23, 2011 at 3:40 pm |
  
  There is a series of comments and an article by Mauri Pelto on RC, mostly about the difficulty of melting the Greenland ice sheet, that I think should be part of any serious discussion of future SLR in the one-meter, or greater, range.
Martin | July 23, 2011 at 1:05 pm |

Your second graph (Fourier analysis) shows a second smaller peak for a period of 2 years. While I’m not sure what would cause a 2 year cycle, I wonder whether it would have made sense to remove not only the annual cycle but the biannual one as well.

An excellent post! Although my knowledge of statistics is only elementary, I actually believe I understood your explanation.

[Response: It’s not a period of 2 years, it a *frequency* of 2 cycles/yr. That makes it the next harmonic of the fundamental a 1 cycle/yr, so it’s part of the annual cycle as well. There are bound to be harmonics when the cycle shape is not simply sinusoidal.]
Pete Dunkelberg | July 23, 2011 at 4:26 pm |

This post by Tamino is very helpful to those wishing to better understand sea level and time series. It might be refreshing to see an analysis of a better paper for a change though. In particular ….

Ari Jokimäki provides a worthy short of papers each week, including http://agwobserver.wordpress.com/2011/07/18/new-research-from-last-week-282011/. Note several papers on ocean heat content and its consequences, especially (for this discussion) “Is the Atlantic Multidecadal Oscillation (AMO) a statistical phantom?” – Vincze & Jánosi (2011), which is available.
Rob | July 24, 2011 at 4:18 am |

Not that this is the point of this post, but does this mean that for any in-depth analysis of temperature change it is important to remove the annual cycle also prior to the calculation of running means and so on. It makes sense that this would be the case but i’m just making sure.

[Response: Yes it is (or at least, include it in your model), unless you’re specifically interested in studying the seasonal cycle and its possible changes.

If you do running means with the interval width T an whole number of years and there are no missing data (a rather unusual condition actually), then that’s actually one way to remove said seasonal cycle. But as I say, it’s unusual to have no missing data, and of course you then have running means instead of data, so I recommend against this.]
John Brookes | July 24, 2011 at 4:26 am |

Beautifully explained. Do you do teaching as well as research?

Anyway, wrt the Fremantle data, apparently we in the Perth/Freo area are sinking, because of over enthusiastic groundwater extraction.

I ride around the Swan River several times a week. It is a tidal river which flows out to sea through Fremantle. In our summer of 2009/10 the river level (and hence local sea level) was often unusually low. Since the start of the summer of 2010/11 the Swan has had no very low tides, and many very high tides. Maybe due to the change from El Nino to La Nina.
Stefan Rahmstorf | July 24, 2011 at 12:42 pm |

Interesting point by Tom Curtis re coal-mining in Newcastle. A similar case is Venice, where the much greater sea level rise as compared to close-by Trieste pointed to problems of subsidence due to groundwater pumping by local industry.
- Sekerob | July 24, 2011 at 5:20 pm |
  
  In some countries fossil fuel companies have permanent ”subsidence” damage ”adjuster” teams. A harbour in the Netherlands… Delfzijl, has had to raise the dykes substantially due that effect. Tide gauge analysts need to be extremely aware of such impacts in determining true SLR. Salt extraction areas suffer similar.
Phillip Shaw | July 24, 2011 at 2:45 pm |

Tamino – great article (another in a series of great articles). As an engineer whose math skills lie somewhere between counting on my fingers and actually understanding the math behind the analysis I appreciate the clarity with which you write.
Now for my question – given the moon’s effect on tides, should the annual cycle have been plotted with averages of lunar cycles instead of monthly averages? Essentially this would have given thirteen phases for the annual cycle instead of twelve monthly phases. Would plotting by lunar averagess reduce the spread of data in the phase plot by eliminating months with, say, two full moons? But perhaps the difference would be trivial.
-Phillip

[Response: There are 12.369 lunar (synodic) months per year, not 13, so 13 “lunar months” is about 1.05 years, not 1 year, and such averages wouldn’t properly delineate an annual cycle. Nor would the lunar sidereal month, since there are 13.369 per year so 13 lunar sidereal months would be about 0.97 years. Also, when you examine the time series there’s no sign of periodicity at either lunar cycle frequency. Bear in mind that the lunar cycle strongly affects the size of the tides, i.e. the *amplitude* of the semi-daily tidal cycle, but doesn’t seem to effect its *zero point* and therefore the average height of sea level.]
- Gavin's Pussycat | July 24, 2011 at 6:12 pm |
  
  Bear in mind that the lunar cycle
  strongly affects the size of the tides, i.e. the *amplitude* of
  the semi-daily tidal cycle, but doesn’t seem to effect its *zero
  point* and therefore the average height of sea level.
  
  Actually, not quite true… google ‘permanent tide’ ;-)
  
  …but the good news is, no *trend*
- Neil White | July 25, 2011 at 3:08 am |
  
  Phillip
  
  What Tamino has done is correct. Watson uses monthly average sea level, which is quite common for longer-term sea-level analyses. A monthly average is actually a pretty good filter for removing tides. It’s very common in oceanography to take out what we refer to as the “seasonal” signal – annual + semi-annual at the same time. This second peak shows up quite clearly in the spectrum graph above, and also in the next graph – the modulation of the annual sinusoid.
  
  Physically, taking out signals based on the solar year is correct – there are various sea level signals at frequencies based on the solar year – e.g. the annual warming up and expanding/cooling down and shrinking signal from summer/winter and a hydrological cycle signal which is also based on weather patterns and hence the solar year.
  
  Neil White
Phil Scadden | July 24, 2011 at 10:24 pm |

Tamino – a quick comment to JCR would be order. Without published criticism, the analysis will not progress the science.
Muenchow | July 24, 2011 at 10:34 pm |

Excellent review on how to not do time series analysis (TSA). As someone dealing with lots of tidal data, I would recommend to remove the (deterministic) periodic components (annual, semi-annual, and short period tides) with a least-square fit (harmonic analysis). Filter afterwards, if you must, but why one needs a filter in any function-fitting procedure except for feel-good reasons of plotting, is beyond me. Thanks for pointing out a bad paper to pass on to my students of TSA for a critical review as a homework problem.
Steve Bloom | July 25, 2011 at 3:10 am |

Nice job, Tamino!

Watson has certainly fought the good fight with regard to the rotten media coverage, so maybe you should give him a chance to co-author (and maybe then submit it as a correction, which might be the best way to remove any prospect of further distortion and could and might advance the publication date). It sounds as if he may also be able to be helpful with regard to the noted peculiarities of some of the data.
- Dan R | July 25, 2011 at 5:51 am |
  
  An excellent suggestion.
john byatt | July 25, 2011 at 3:52 am |

In the paper, Watson Quote ” personal communication Houston & Dean, which left me wondering even before seeing if Tamino had looked at the paper, thanks again
FrankD | July 25, 2011 at 4:40 am |

Very interesting analysis, Tamino, but I think there is an even more fundamental problem here – selection of suitable datasets. Watson is specific about the location of the Newcastle, which were taken at the Pilot Station there. That is a poor choice due to data quality (for this specific purpose) and consistency.

Newcastle is on the Hunter River and like all river ports, the docks are upstream from the mouth of the river. The Pilot Station is below the port, but is still about a mile upriver from the sea. This presents two problems: 1.the run of the tide is constrained by the river mouth so tidal extremes will not be accurate, and 2. the flow rate of the river is critical to the water level measured at the station. With regard to point 2, I’d be prepared to be a shiny penny that you could find an El Nino / La Nina signal buried in that data as changes in river flow produce the appearance of higher and lower tides.. This is a pilot station, so the record keeper is only interested in how deep the water is for moving ships in and out of the port. It’s not a climatological record where external factors are controlled for, and I think Watson could have been more cautious in trying to use it as such.

It should also be noted that the port of Newcastle has undergone several periods of major redevelopment and dredging. I’ll leave it to others to work out the effect on the data, but since the beginning of the dataset for this port (1925), there have been major developments affecting hydrology in 1939, 1951, the late 1960’s, mid-1970’s, 1982 and then almost continuously from 1995 to 2002. Any one of those changes could have affected the water level in the river mouth in complex ways. A useful summary of the major changes can be found here: http://lakescan.customer.netspace.net.au/NewcastlesHistory.html
Dan R | July 25, 2011 at 8:57 am |

Those that haven’t already may wish to read the following blog-post from Deltoid. http://scienceblogs.com/deltoid/2011/07/the_australians_war_on_science_67.php
John Brookes | July 25, 2011 at 10:24 am |

Now, sometimes I hang out at a very bad web site, and they ran with the story that sea levels rise was slowing. A very smart chap there by the name of cohenite says the following:

“First the obvious; tammy criticises Watson for amongst other things, using a 20 year moving average which reveals, according to tammy, an obviously defective early part the Newcastle data. Tammy’s alternative is an anomaly graph based on 1960-1990 [no reasons for this base period are given other than a saintly pronouncement] which shows Newcastle’s early data is divergent from the others; Watson’s graph had shown Newcastle’s LATTER data divergent from the others. Tammy says this would not be a regional effect, justifying Watson’s work, because Newcastle and Fort Denison in Sydney are too close together for such a pronounced divergence; actually it’s only 2 cms at its maximum, and Newcastle is the convergent point of major currents which may explain a regional effect.

But, tammy’s alternative with the baseline not only shows Newcastle’s EARLY data divergent but Fort Denison’s as well, in the opposite direction from Newcastle! How is that an improvement?

The issues here are why did tammy use his base period? And why does tammy ignore Watson’s justification for using his 20 year average explained thus:

“This has been achieved though the application of a 20-
year ‘‘rolling’’ or ‘‘moving’’ average (10 y either side of the data
point in question) to the monthly average data set. The fixed
averaging window of 20 years is sufficiently wide to dampen the
dynamic influences to reveal a transformed time series from
which signals of comparatively low-amplitude sea level rise (or
fall) can be more readily isolated.
The width of the averaging window means that the moving
average time series will start and end 10 years inside the
extremity of the data record considered.”

Watson doesn’t use the 20 year averages as data but he does normalise them to January 1940, in effect a baseline, a date he uses for good statistical reasons which tammy again ignores? This error by tammy is compounded when he refers to Rahmstorf who has been shown to do the very thing tammy falsely accuses Watson of doing, namely end point fiddling, ignoring, or, in Rhahmstorf’s case, padding or adding to those data which show end of data acceleration.

It is essential for people like tammy and the AGW theory that sea level rise be accelerating because that is the only way sea level rise can get near the projected levels. By showing that the rate of increase is recently accelerating the predictions can be validated. Watson has shown the opposite, although he obviously hates himself for doing so. The end of the world is again not nigh.”

Has cohenite got a point?

[Response: What point? He doesn’t make one — all he does is snipe at me for no reason.

Using 20-yr moving averages does create such high artificial autocorrelation that it invalidates the statistics. It does shorten the time span by 10 years on both ends. Newcastle’s 20yr moving-average data really cannot be made to agree with other records prior to 1960, but are in good agreement after that. All the records diverge, and cannot be made to agree on time changes, prior to 1930. There is an annual cycle which should have been removed prior to looking for acceleration. And yes indeed, a model which assumes constant acceleration throughout is clearly not valid (and is certainly not what is expected according to mainstream climate science).

And as for Cohenite’s bizarre claim that Watson doesn’t treat the 20yr averages as data and base his entire analysis on them — what planet is he on?

Cohenite is on planet “can’t-admit-the-truth.” This is another litmus test, if he had said “OK the Watson analysis is wrong — but what about this other thing?” then we might believe he has some honesty.]
- Gavin's Pussycat | July 25, 2011 at 1:08 pm |
  
  cohenite is a lawyer
  - dhogaza | July 25, 2011 at 3:49 pm |
    
    With apparently no technical background at all, and a very tenuous grasp on reality.
    
    He’s an author here:
    
    http://jennifermarohasy.com/blog/author/cohenite
    
    And for those of you who haven’t checked it out, let’s just say that Marohasy’s blog at times makes WUWT look downright sane.
  - Gavin's Pussycat | July 25, 2011 at 5:11 pm |
    
    dhogaza dammit, you spoiled the in-joke. cohenite is a lawyer, no buts, ifs or maybes
  - Igor Samoylenko | July 25, 2011 at 11:08 pm |
    
    dhogaza: “Marohasy’s blog at times makes WUWT look downright sane.”
    
    Absolutely. Here is one of the finest examples.
    
    ***Be warned*** though if you decide to have a look – this is weapon grade… I hope this will get archived for the posterity to see what we had to deal with. Otherwise, no one’s going to believe it in a few decades.
  - Ray Ladbury | July 26, 2011 at 2:10 pm |
    
    I am reminded of what John Adams says in the Musical “1776”:
    
    “I have found that one person of low intelligence is called a moron. Two become a law firm, and three or more become a Congress.
- Dan R | July 26, 2011 at 3:50 am |
  
  “A very smart chap” – careful now.
John Brookes | July 25, 2011 at 12:51 pm |

Thanks Tamino, much appreciated!
MapleLeaf | July 25, 2011 at 7:18 pm |

A candidate for how not to analyze temperature record data (Loehle and Scafetta)?

Click to access 74TOASCJ.pdf

Sigh. Any chance you could look at this Tamio or is this drivel not worth your time?
- TimeBandit | July 27, 2011 at 2:47 am |
  
  Why is it “drivel”?
Rattus Norvegicus | July 25, 2011 at 9:05 pm |

So much to debunk so little time…
Paul D | July 25, 2011 at 10:02 pm |

Good stuff, I actually learnt something
Phil Scadden | July 25, 2011 at 10:05 pm |

Newcastle had a 5.6 earthquake in 1989 – a very rare event for Australia which might also be cause of the divergence.
john byatt | July 25, 2011 at 11:43 pm |

Cohenite is Anthony Cox, see deltoid

Ahh McLean, you’ve done it again : Deltoid
scienceblogs.com/deltoid/…/ahh_mclean_youve_done_it_again.php – Cached28 Jul 2009 – Wikio – Top Blogs – Sciences · Deltoid Facebook Group ….. As demonstrated by the witless Anthony Cox (Cohenite) above and his Mclean …
Phil H | July 26, 2011 at 5:45 am |

Thanks for this great article and demonstrating how little I know about statistics.

Andrew Bolt (a very prominent Australian climate change “skeptic”) has linked to this paper http://benthamscience.com/open/toascj/articles/V005/74TOASCJ.pdf which claims to show, through statistical analysis of the empirical climatic data, that the anthropogenic warming rate is about 3.5 times smaller than the average 2.3 degrees C per century projected by the IPCC.

Whilst debunking The Australian newspaper’s article on Mr Watson’s paper was easy, even without your analysis of the paper’s technical merit, assessing the worth of this new paper is beyond my skill. Would you like to take a look at it?
Marco | July 26, 2011 at 6:20 am |

Darn, you’re going to be busy, Tamino. Here’s another paper to look at:

Click to access 74TOASCJ.pdf

They even refer to Houston and Dean and deceleration of sea level rise.
IA | July 26, 2011 at 11:35 am |

Tamino

Not sure how old this paper is but it seems to be tickling the deniers’ fancy:

“Climate Change Attribution Using Empirical Decomposition of Climatic
Data” Craig Loehle and Nicola Scafetta*,

http://benthamscience.com/open/toascj/articles/V005/74TOASCJ.htm

Any comments?
- Rattus Norvegicus | July 26, 2011 at 3:33 pm |
  
  That paper is brand new and in active discussion over at Curry’s.
  - dhogaza | July 26, 2011 at 6:59 pm |
    
    Rattus, to save us the trouble of holding our noses and reading the thread at Curry’s, is this paper …
    
    1. another nail in the coffin of climate science
    
    -or-
    
    2. the wooden stake through the heart of climate science.
    
    Inquiring minds want to know :)
  - Gavin's Pussycat | July 26, 2011 at 7:11 pm |
    
    Does Curry herself pretend to take it seriously? If so, really?
    
    This isn’t even mathturbation, it is astrological mathturbation. I mean, Klyashtorin and Lyubushin. Please.
  - JCH | July 26, 2011 at 8:23 pm |
    
    GP – she’ll be gettin’ back to you on that.
    
    I thought the paper got fairly rough treatment on Climate, Etc., and one commenter said it also got some push back on wuzzupwizdat.
  - andrew adams | July 27, 2011 at 11:31 am |
    
    And it’s even getting a pasting there. Of course there are a couple of the usual suspects defending it but the reaction is generally negative. Fred Moolton especially has made some pretty devastating criticisms couched in his usual impeccably polite manner – talk about an iron fist wrapped in a velvet glove.
    
    I notice that some have made the argument that yes, the substance of the paper does not actually support the claims made by the authors, but their ideas are worth exploring and we should allow them the opportunity to pursue them further rather than dismiss them out of hand. I have to wonder whether they would have been just as understanding if the author’s claims had been different, for example if they were claiming that current NH surface temperatures were probably higher than at any time in the last 1,000 years.
- J Bowers | July 26, 2011 at 8:53 pm |
  
  I thought Craig Loehle was too busy being an archaeologist these days? Clearly he doesn’t specialise in the Uruk society, or the Natufians, the Moche and Tiwanaku civilizations,….
PatricE | July 26, 2011 at 4:14 pm |

Hi,
The effect of using moving averages can be easily checked with excell. Basicaly, what I did was to fill 2 columns with linear and quadratic values (eg 1 st contained numbers from 1 to 100 and second the squares of frist one. The third was calculated from first two like contant1*column1+constant2*column2+constant3. In the fourth column I’ve calculated running averages. Then I’ve compared results of linear regression for both 3rd and 4th column as dependend values and first 2 as indepenend (I hope this is a correct method for doing regression on quadratic trends, if not please let me know where I was wrong, as I have still a lot to learn). Well the results were exactly like Tamino told us, with no surprise. For regresion on column 3, the fit was exact, with no surprise, as data was not niosy. But for column 4, when moving average was long enough the result was worse and worse, and the coefficient for column 2 (acceleration) was smaller and smaller as length of moving average increased. Sorry I cannot repost my data, cause I m on vacations right now and I have some troubles connecting my notebook and iPad (which I use when connectig to internet).
chris | July 26, 2011 at 4:47 pm |

Loehle and Scafetta is a science-free exercise in curve-fitting with post-hoc justification. That’s what each of these guys does, and it’s not surprising that they’ve joined forces here! Also not surprising that they got this rubbish into a Bentham “vanity publishing” journal.

I’m more curious about whether this pair know what they’re doing is rubbish and don’t care…. or whether they really believe this stuff has scientific value…hmmm

They assume without justification that natural variation can be fit by 20 and 60 year cycles of unspecified origin “Thus, this quasi 60-year cycle observed in the temperature record likely has an astronomical/solar and, therefore, natural origin. The weaker quasi 20-year cycle is not as easily detected in geologic records but it is clearly detected in the global surface instrumental records”. They assume, without justification that this natural fluctuation sits on a long term rising natural linear trend of unspecified origin. The assume, in contradiction to the evidence, that all pre-1950 warming in the surface temperature record was due to natural variation…once you make a set of false assumptions out pops a nicely-cooked false conclusion!

No physics…no attribution (the “attribution” in the title is actually their witless set of false assumptions about 20, 60 year cycles and linear trends)
chris | July 26, 2011 at 9:14 pm |

It’s worth pointing out one of the self-contradictory elements of Loehle and Scafetta’s “mathturbation”. Of course the whole point of their “analysis” is to construct a false “sequiter” (false assumptions lead inexorably to false conclusion!) to infer a low climate sensitivity… but let’s take their puerile “analysis” at face value:

One of the elements of their curve fitting is a positive linear trend of unspecified natural origin equivalent to ~ 0.16 oC/century. This apparently involves “recovery” from the Little Ice Age [“…about 50 +/- 10% of the 0.8oC global surface warming observed from 1850 to 2010 is likely the result of a natural warming trend recovery since the LIA plus the combined effects of 20- and 60-year natural cycles.”]. But the coolish period (N. hemisphere) of the LIA was deepest in the 17th century and the LIA was pretty much over by the early/mid 19th century. If we are still in the linear part of some “recovery” from the LIA, then that is simply incompatible with a low climate sensitivity; in fact Loehle and Scafetti’s conclusion of a linear “natural” “recovery” from the LIA that continues to this day is a scary indicator that the climate sensitivity is very large indeed.

This can be understood in relation to the work (much beloved of climate denialists) of Stephen Schwartz, who inferred (‘though, to his credit, later backtracked on) a low climate sensitivity, by analyzing a simple zero order energy balance model to calculate a time constant [time to reach 1-1/e (~ 63%) of final (asymptotic) equilibrium response to a forcing]. If the time constant is v. short (Schwartz came up with a time constant of ~5 years), then the climate sensitivity is low (in other words the Earth temperature comes quickly to a new equilibrium in response to a forcing and there isn’t much warming “in the pipeline”). Schwartz’s ~5 year time constant yielded a climate sensitivity of 1.1 oC per doubling of [CO2] – he later revised this to 1.9 oC with a revised time constant of around 9 years.

Simply put Loehle and Scafetta indicate that 150 years or more from the LIA, the Earth is still in a quasi-linear phase of the temperature recovery. The implication is that the characteristic time constant for surface temperature response to a forcing is very long indeed (not 5 years, or 9 years, but more like 50-100 years or more), and so there is a huge amount of warming “in the pipeline” from the greenhouse forcing resulting from our huge greenhouse gas emissions.

Of course Loehle and Scafetta’s analysis is pants, and things aren’t as dire as their dismal paper infers..
Hank Roberts | July 26, 2011 at 9:53 pm |

> benthamscience

Authors pay to be published in their journals.
Journalists _try_ to find out who owns the company.
E.g. see http://www.richardpoynder.co.uk/Honan.pdf

“… I emailed various Bentham directors (including Richard Scott and Matthew Honan), all of whom — with the exception of publications director Mahmood Alam — completely ignored my messages. Moreover, while Alam replied, he proved decidedly unwilling to answer my questions, despite repeated promises that he would. He was equally unwilling to put me in contact with anyone else at the company.
I also tried calling the various telephone numbers on the Bentham web site, only to be greeted by voicemail messages. Personally I knew nothing whatsoever about Bentham, so for all I knew it might have been the front for some form of Internet scam.
In the hope of enlightening myself, therefore, I posted a message to a couple of mailing lists, and shortly afterwards Ted Bergstrom, a professor of economics at the University of California Santa Barbara posted a response — a response that confirmed everything I had been hearing from other researchers. I also began to receive private emails with information about Bentham, including the home phone number of Honan, which was sent to me by a publisher concerned that Bentham would bring the scholarly publishing industry into disrepute….”
Phil Scadden | July 27, 2011 at 12:25 am |

If you think Loehle and Scafetta is peer-reviewed, then look at this about what gets the pass mark (computer-generated gibberish).
Mary Gray | July 27, 2011 at 2:44 am |

You are just as besotted with models and mathematical simulations as the rest. It is obvious to anybody that all the records have a lower slope over time. Nobody argues that it might be that the earlier measurements are less reliable because the equipment was damaged in storms and the recent ones used a GPS levelling system.. Also, what was the influence of local urban development, with heavy buildings, water and mineral removal?. The maths always assume that every point in the graph was obtained in identical conditions.

If you took these points into account you would probably find that there has been no change in sea level at all.

[Response: When hundreds of tide gauge records are combined by those who do take known factors into account, and their geographical distribution is accounted for, a clear pattern of sea level rise emerges. When those combined records are compared to the results of multiple satellites and their altimeter measurements, the two sources are found to be in agreement. When yet other evidences of sea level change are properly analyzed by genuine experts (rather than armchair snipers) they tell the same story. Your “no change in sea level at all” idea has been proved false by multiple lines of evidence.

I often receive comments which reveal astounding ignorance of basic facts, and those which betray a willful ignorance. Yours does both.]
- Ray Ladbury | July 30, 2011 at 7:21 pm |
  
  Uh, Mary, just where has all the water gone from the trillions of tonnes of ice lost in the past decade? Oh, you probably don’t believe those measurements either, do you.
  
  Well, if evidence doesn’t interest you, then have a nice life. Your progeny certainly will not.
MapleLeaf | July 27, 2011 at 4:40 am |

Had the Loehle and Scafetta “paper” been peer-reviewed, this is what they ought to have said to the lead author,
“Sir, I am sitting in the smallest room of my house. I have your manuscript in front of me. Soon it will be behind me.” –Voltaire
- Ray Ladbury | July 27, 2011 at 1:42 pm |
  
  Actually, the quote is from Max Reeger, a composer in reply to a rather nasty review by a critic.
  - Kevin McKinney | July 29, 2011 at 9:29 pm |
    
    Yes. This is, IIRC, in Slonimsky’s “Lexicon of Critical Invective,” even though I suppose it must technically be called anti-critical invective. Presumably, Reger’s riposte was just too good to leave out.
Ernst K | July 27, 2011 at 6:01 am |

This Loehle and Scafetta paper is awfully similar to the mathturbation at WUWT discussed a while back here (Circle Jerk): https://tamino.wordpress.com/2011/06/02/circle-jerk/

In that article, Pat Frank doesn’t mention L&S once and I don’t think L&S reference Frank. The only substantial difference is that L&S adds a 20 year cycle in addition to Frank’s 60 year cycle. Otherwise the approach is identical.

L&S was accepted on 4 June 2011. Frank’s article was dated 2 June 2011.
Jon Hinwood | July 27, 2011 at 11:08 am |

Watson’s paper suffered from another major fault, Chittleborough, Lennon and Mitchell and had already written a better paper using the same stations and about 30 others, from which they had been able to identify geographic groups of stations which differed from group to group. This was interpreted as regional tectonic/isostatic movement and had already been identified by geologists. Admittedly their paper was in an obscure conference but they cited it in the wider literature. Then there have been several extensive reanalyses of both Newcastle and Fort Denison (Sydney Harbour) data, the latter extending more than 100 years. In all there are half a dozen Australian stations with reasonable quality records of about 70 years.
jbh
- IA | July 27, 2011 at 11:54 am |
  
  Jon, in the reanalyses of Sydney and Newcastle what did they find?
Hank Roberts | July 27, 2011 at 1:33 pm |

> then look at _this_

I expect Phil meant a link to:

Open Access Publisher Accepts Nonsense Manuscript for Dollars

Open Access Publisher Accepts Nonsense Manuscript for Dollars
(which has quite a few links to what little can be found out about the Bentham publishing operation).

See also: http://pdos.csail.mit.edu/scigen/ for among other examples this:

Click to access randompaper.pdf
- arch stanton | July 27, 2011 at 4:45 pm |
  
  Hi Hank
  
  First, let me say that I think you post many of the most poignant comments, at the many of the most interesting conversations on many of the most significant blogs. Your lessons on “Google this” and “Google Scholar that” have been helpful. I thank you for helping me learn.
  
  Now, let me ask you a question because I can’t figure out how to “Google Scholar” it.
  
  I LIKE the basic concept of “open access” because I like being able to access original papers. Paywalls can be counterproductive for those wishing to cite significant papers.
  
  I am LEERY of “open access” because it seems to be quickly becoming a tool for (the minority) that promote junk science and other forms of mathturbation.
  
  Is there any index or other measure that helps one figure out the credibility of open access journals (how effectively refereed they are perhaps)?
  
  Thanks, arch
  - chris | July 27, 2011 at 11:07 pm |
    
    good questions Arch.
    
    An important index of the credibility of Journals (Open Access or otherwise) is the impact factor. A slight problem is that v. new journals my not have been in circulation long enough to yet recieve a 2 year, let alone 5 year impact factor.
    
    But it’s usually straightforward to identify the decent journals in any field, and particularly to spot the dismal ones. The latter have low impact factors, and the people that are highly cited and generally known to make good contributions in a field don’t publish in them. Most competent scientists don’t publish stuff until they feel they have something significant to say and then they aim to send they work somewhere good.
    
    The recent rash of dismal open access journals is a bit of a problem – not scientifically speaking since scientists and other well-informed people can identify rubbish, and if a paper makes zero contribution to a field it makes zero contribution and no-one’s time need be wasted (other than the authors, editors and reviewers inolved). However these journals are going to make it easier to get agenda-motivated junk into the “scientific” “literature”.
    
    As for Open Access journals, Bentham ones are pretty mediocre – some of the titles have some decent papers but they seem happy to publish junk. At the other extreme PLoS Open Access journals have a good reputation (at least in Biol/Mol Biol/Biophysics which is my field).
  - arch stanton | July 28, 2011 at 12:43 pm |
    
    Thanks Chris
  - Hank Roberts | July 29, 2011 at 5:20 pm |
    
    > how to “Google Scholar” it.
    Any browser:
    start from http://scholar.google.com/
    or from http://scholar.google.com/advanced_scholar_search
    and paste in the name of the paper; look through the results, look at citing papers, look at various versions (some pdf, sometimes html, etc)
    
    If you use Firefox, add it to the available list of search engines, starting here: https://addons.mozilla.org/en-US/firefox/search-tools/ (search for ‘scholar’ and add it to the list of search engines in the pulldown menu in the toolbar, then drag it up toward the top)
  - Hank Roberts | July 29, 2011 at 6:03 pm |
    
    Specific link for adding Scholar to Firefox search menu:
    https://addons.mozilla.org/en-US/firefox/addon/google-scholar-10310/
Tom_P | July 27, 2011 at 3:29 pm |

A more challenging paper on analysing time series has just come out, and it’s back to global temperatures. It works through an empirical mode decomposition: Wu et al., Climate Dynamics, DOI 10.1007/s00382-011-1128-8.

It concludes “the secular warming trend in global surface temperatures has not accelerated sharply in the past few decades.” As EMD does not make any prior assumptions about the components of the time series, I think this paper might deserve some careful attention.
- Didactylos | July 27, 2011 at 5:50 pm |
  
  “has not accelerated sharply in the past few decades”
  
  Did anyone say it had?
  - Tom_P | July 27, 2011 at 8:07 pm |
    
    The authors contrast their results to the rates of temperature change from AR4 in Table 1 of the paper.
    
    I just saw this paper was “discussed” at Curry’s blog. The authors did not seem to be too happy on the spin put on their results there, at which point comments ground to an abrupt halt….
  - SteveF | July 28, 2011 at 9:25 am |
    
    Curry appears to have gone completely off the rails. She has a blog post partly promoting this bunch of kooks and cranks:
    
    Advisers
- chris | July 27, 2011 at 10:48 pm |
  
  In my opinion Wu et al.2011 suffers from similar problems that we have discussed – however reliable the EMD method for deconvoluting time series into cyclic components and trends, it is still a curve fitting exercise. Without understanding, quantifying and incorporating the physics underlying the real contributors to variability of the time series (surface temperature anomaly in this case), we haven’t necessarily learned that much.
  
  A couple of points:
  
  (i) Wu et al published a paper on this subject in 2007 (PNAS 104, 14889-14894) in which they “pull out” a ~65 year cyclic component but state “The origin of this 65-year time scale is not completely clear because there is no known external force that varies with such a time scale”. Quite so. In their 2011 paper they postulate that the “cycle” is a result of changes in the strength of the thermohaline circulation. That may or may not be true but it’s a post hoc interpretation, and there isn’t independent evidence that this is the case or that if it were, such changes have the amplitude to affect the global surface temperature to the extent proposed.
  
  (ii) It’s worth pointing out that the surface temperature anomaly is deconvoluted in the period ~1860 – 2005. The maxima in the “cycles” occur at ~ 1870, 1935 (one full cycle) and ~2000-ish (two full cycles). Since we’ve had almost exactly two full cycles (maxima to maxima) the natural variability (in Wu’s analysis) has made essentially zero contribution to the secular warming trend over this period.
  
  (iii) Modelling (curve fitting) of this sort is potentially useful since it can generate hypotheses that can be explored empirically (in this case the hypotheses might be that there is a 65ish year natural cycle, and that this relates to changes in ocean heat transport to high latitudes). But it’s non-scientific to interpret the model as a “result” rather than a hypothesis.
  - Tom_P | July 28, 2011 at 7:27 am |
    
    One of the co-authors is Huang, the father of EMD, who makes the claim that his analysis allows the data “to speak for itself.” This is some contrast to the “cyclemania” that is being exhibited. A statistical technique that makes no assumptions has real worth, and a lack of a hypothesis going in does not make approach unscientific. After all, the classical scientific cycle of hypothesis, experiment, analysis, rehypothesis… is not available for our global system.
    Given that, the criticism of “post-hoc interpretation” seems rather wide of the mark.
    
    [Response: I disagree.
    
    The post-hoc interpretation, one which deserves scathing criticism, is to denote everything but the longest-timescale “mode” as oscillation, and assume — with no physics to back it up whatever — that it’s all natural variability. That is nothing but a post-hoc interpretation. This in turn is rooted in an entirely arbitrary definition of “trend,” one with which I do not agree. I think that Wu et al. is the pinnacle of mathturbation. And frankly, I’m sick and tired of “attribution” in climate physics which has no physics.
    
    I am also utterly unimpressed with the EMD method. If you want empirically determined modes, I’d have more confidence in a singular spectrum analysis.]
Phil Scadden | July 27, 2011 at 10:24 pm |

Hank, the link I was trying to create is:
http://www.earlham.edu/~peters/fos/2009/06/hoax-exposes-incompetence-or-worse-at.html
Tom_P | July 28, 2011 at 11:03 am |

Tamino, EMD has been used quite successfully with ECG signals to help understand epilepsy amongst other phenomena, so I don’t think a wholesale rejection of the approach is really in order. Given the authors have no evident axe to grind, this hardly qualifies as “mathturbation” (to mix metaphors quite horribly!)

[Response: I disagree. The lack of an “axe to grind” doesn’t make numerology into physics.]

And the paper does not neatly split the signal into an “anthropogenic” secular trend and “natural” cycles: the authors make the point that there may be feedback from climate change into the cycles, so a simplistic attribution is not possible, but at least bounds can be set.

[Response: I disagree. Their caveat strikes me as just as escape-hatch: their real thesis is that the “principal mode” is the anthropogenic part, the rest is natural variation. That’s the basis on which they estimate that “up to one third of the late twentieth century warming could have been a consequence of natural variability.” I call “shenanigans.”]

So you like SSA more EMD. They have quite a few similarities, so why the strong preference?
- Gavin's Pussycat | July 28, 2011 at 11:43 am |
  
  Actually Del Sole et al. do look at the physics, in GCM simulation runs.
  
  Click to access dts_jclim_2010.pdf
  - chris | July 28, 2011 at 1:49 pm |
    
    I don’t have too much of a problem with that. If/when it is established that these particular ocean current/thermohaline variations are (a) real (b) truly cyclic and (c) of sufficient amplitude to affect the global surface temperature as suggested, then I will have no problem incorporating them into my view of the climate system.
    
    There is a point that relates to possible variations in thermohaline circulation and difference between Hadcrut and NASA Giss temperature series. Potentially cyclic components aren’t so apparent in the NASA Giss data (though presumably might be “teased out” with an equivalent mathematical deconvolution). The possibility arises that any cyclic/quasi-cyclic ocean current effects that alter heat transfer to the Arctic might give rise to apparent changes in surface temperature anomalies that don’t include those regions of the Arctic where warming is currently very marked indeed. So apparent quasic cyclic components of the temperature record may not be so much an indication of periodic changes in global surface temperature, but rather periodic changes in heat transport to regions that are not-well sampled (e.g. in Hadcrut).
- chris | July 28, 2011 at 12:08 pm |
  
  Tom_P, I think Wu et al/Huang have got themselves into the sort of mess that can happen in computational modelling; i.e. being unable to resist the attribution of real world validity to the results of modelling. This is particularly regrettable when the modelling is simply deconvolution of a time series. It’s possible to see the confusion that Wu et al create even within their own papers.
  
  For example in Wu et al (2007) http://www.pnas.org/content/104/38/14889.abstract
  The authors state quite correctly that:
  
  ” It should be noted here that the definition of trend and the algorithm for detrending in this study are quite general and can be applied to any data from nonstationary and nonlinear processes. The goal, however, is not for prediction but for analysis. The assumption is that the predictive models have to be process-based, not data-driven. The analysis aspect emphasizes the discovery and understanding of the underlying processes to provide a basis for building predictive models.”
  
  In other words, their methodology relates to treatment (deconvolution) of time series data. Understanding of the underlying processes is required for predictive interpretation. That’s obvious really. Teasing out potential cyclic elements and trends of time series tells one nothing about the real world processes involved. However, in the very same paper (see last sentence of abstract) the authors fall into the trap of attribution of real world validity and state:
  
  ”Climate data are used to illustrate the determination of the intrinsic trend and natural variability.”
  
  But that’s nonsense. The data may have been used to extract an “intrinsic trend” and “variability”, but nowhere in the analysis is there any evidence whatsoever that the variability is “natural”.
  
  Couple of other points:
  
  You say ”EMD has been used quite successfully with ECG signals to help understand epilepsy amongst other phenomena, so I don’t think a wholesale rejection of the approach is really in order.”
  Absolutely (though did you mean “EEG” for electroencephalogram?) If you did mean “ECG” we know already that the electrical activity of the heart is inherently cyclical. We know that the signal can be deconvoluted into components of sequential contraction of atrial and ventricular muscle, and the currents from individual ion channels can even be identified. This is very useful. But it should be obvious that the application of EMD to a time series already known</i. to by cyclical is very different from extracting potential quasi-cyclical mathematical elements of a time series and pronouncing that these have real world reality!
  
  And you say: ”And the paper does not neatly split the signal into an “anthropogenic” secular trend and “natural” cycles”
  
  But sadly as Tamino in his quote from Wu et al 2011, and I in my quote from Wu et al (2007), the authors unfortunately seem unable to resist doing exactly that. That’s the nub of the problem.
chris | July 28, 2011 at 12:42 pm |

re: Tom_P | July 28, 2011 at 7:27 am |
”One of the co-authors is Huang, the father of EMD, who makes the claim that his analysis allows the data “to speak for itself.”

But data can’t “speak for itself” can it?

”A statistical technique that makes no assumptions has real worth, and a lack of a hypothesis going in does not make approach unscientific.”

Fine. But Wu et al don’t leave it at that. They do make assumptions, big and unjustified ones. They do an unbiased deconvolution to pull out mathematical elements of a time series which turn out to have an apparent trend and an apparent cyclic component. And then rather than “let the data speak for itself” they speak up themselves and assume that these cyclic elements represent real cyclic elements of the climate system (not “quasi”-cyclic but real-cyclic), and further assume that the cycles are due to variations in thermohaline circulation. That’s post-hoc interpretation and not terribly scientific when the assumptions are presented as if they are “results”.
- Tom_P | July 28, 2011 at 3:02 pm |
  
  This paper is a bit more sophisticated that just separating a (quasi) cycle and a trend. By filtering spatially on these decomposed signals the source can be seen as either fairly globally distributed (the trend) or predominantly restricted to the North Atlantic (multidecadal variation). It is quite reasonable to hypothesise such a partition in terms of a global source (well-mixed gases) and a localised oceanic source – real information has been extracted by their technique and some suggestions are worth making. I just can’t see this as a fatally unscientific enterprise.
  
  [Response: I don’t see it as a fatally unscientific enterprise either. But neither am I convinced by their attribution, or by their definition of “trend.”]
  - chris | July 28, 2011 at 6:23 pm |
    
    Yes fair enough. I’ve read the paper a little more thoroughly and their analysis of potential contributions to multidecadal variability (MDV) is thoughtful and useful; they do consider non-“natural” contributions (from aerosols) to MDV and they also analyze the NASA Giss temperature data. I don’t believe there is independent evidence of a ~65 year natural cycle (or for changes in Gulf Stream strength over the period of analysis for that matter, though I’m happy to be corrected), but so long as MDV is understood to be “variability”, without insinuation that an apparent ~65 year “cycle” in the deconvolution has some necessary real world correlate as a natural cycle in the surface temperature, then I think it’s a nice paper..I never thought it was a “fatally unscientific enterprise”…the unscientific element is in asserting that a component of a mathematical deconvolution has a necessary real world correlate. That’s the impression one gets from their 2007 PNAS paper…
  - Gavin's Pussycat | July 28, 2011 at 7:41 pm |
    
    I don’t believe there is independent evidence of a ~65 year natural cycle (or for changes in Gulf Stream strength over the period of analysis for that matter, though I’m happy to be corrected),
    
    Actually there is this which mentions and discusses a 60 year period:
    
    Click to access jevrejeva_GRL08_recent_sea_level_acc_started_200yrs_ago._pdf.pdf
    
    although this is mainly about sea level… but yes the pickings are thin.
Tom_P | July 28, 2011 at 4:09 pm |

Tamino, in the light of your last comment, I think your prior accusations of “numerology”, “mathturbation” and “shenanigans” against poor Wu and his co-authors were a little over the top.

EMD does not claim the crisp theoretical basis of some other techniques, but unlike many spectral analyses it does at least have a clear definition of what might be called a trend – just the first component of the decomposition. But it’s more useful just to see what the shapes of the EMD components are rather than put a label on them.

I might suggest figure 9 of the paper shows the power of the technique. It is one of the clearest analyses I’ve seen of how the monotonic and oscillatory components of temperature changes over the last 150 years are distributed. This figure alone provides infinitely more insight than the egregious examples of “cyclemania” you have recently amused us by demolishing.

[Response: If they had just done EMD, then I wouldn’t have a beef with their paper. But they didn’t. They translated their EMD results into “attribution” without a proper physical basis, which is nonsense.

They also imposed an arbitrary definition of “trend” with which I do not agree. They’re entitled to their opinion on that, but to further interpret it as delineating the anthropogenic part from the “natural variation” I believe is ludicrous. I still call “shenanigans.”

It’s clear that you don’t agree with my interpretation; perhaps we’ll agree to disagree.]
Tom_P | July 28, 2011 at 5:30 pm |

Tamino, I think the conclusions reached by EMD in this paper, separating the temporal and spatial components independent of any claimed attribution, are important enough in themselves. And that separation is pretty suggestive. I just haven’t seen another paper do that, and maybe there are better approaches than EMD.

Maybe we haven’t ended up so far apart.

[Response: Perhaps not. I agree that identifying relationships between spatial and temporal modes may be beneficial. In fact I downloaded the HadCRUT3v gridded data to look at it myself.]
- Ray Ladbury | July 31, 2011 at 4:12 pm |
  
  Tom P.,
  I have to say that I would be doubtful about any conclusions that posited a periodicity after seeing only a couple of periods. The example I often give concerns the following ordered pairs:
  
  1,2
  2,7
  3,1
  4,8
  5,2
  6,8
  7,1
  8,8
  9,2
  10,8
  
  If you apply any method that looks for a cycle to that series, it will find one with period 2. However, the y values of the series are merely the digits of e, the base of Napierian logs, and the x-values are merely their rank from right to left. I think this illustrates that it is a fraught proposition to attribute cause or even periodicity based on a mere time-series analysis. In the physical world, physics trumps math–or put it another way, attribution of physics based solely on math is mathturbation.
  - Horatio Algeranon | July 31, 2011 at 5:43 pm |
    
    Not to disgree with the gist of your post or anything, but technically, that would be “ethturbation” (and its practicioners would be ethturbunnies)
  - Horatio Algeranon | August 4, 2011 at 2:58 pm |
    
    Ray,
    Horatio has preserved your excellent example and hung it on the wall(unfortunately, sometimes you have to kill the specimen to save it)
    
    Eethturbation
    –by Horatio Algeranon
    
    Seeing cycles where there are none
    Is “eethturbation” when it’s done
    To note the base of logs repeating
    By looking at the digits leading
    Up to number ten, an “8”.
    If we go further, it’s too late
    ‘Cuz “e”, you see, is nonrepeating
    The apparent “cyle” false and fleeting.
Bernard J. | July 31, 2011 at 12:05 pm |

As much as I am loathe to direct traffic in her direction, Joanne Codling seems to want to muscle in on the theme of this thread with her posting entitled It wasn’t CO2: Global sea levels started rising before 1800.

The woman obviously had no serious statistical training in her degree, because she completely arbitrarily selects a signal of her own choosing from sea level noise. And on top of that, she seems to think that atmospheric carbon dioxide didn’t increase before 1860 – obviously she hasn’t heard of coal or of deforestation.

Her abuses of science and statistics are sufficiently egregious that they could be regarded as a ‘criminal’ breaking of the laws of physics.
Gavin's Pussycat | August 2, 2011 at 5:50 am |

Here a comment by John Hunter on Watson’s paper:

Click to access 1125_hunter.pdf

Mentioning also this post. H/t Bernard J. at deltoid
Roger Jones | August 4, 2011 at 6:57 am |

I have put up a couple of posts on this issue. One on the stoush created by The Australian about Phil Watson’s paper, another exploring non-linear changes to local sea level rise that may be of interest to some here. I presented something similar on temperature at the recent IUGG conference. It’s flying a kite (and might get the short shift that Wu et al. got) but I’m interested in reactions: http://2risk.wordpress.com/2011/08/04/sea-level-rise-part-ii-tide-gauge-analysis/
Philip F. Lee | September 5, 2011 at 11:53 pm |

First point, Watson’s use of a moving average is a common technique to remove seasonal/annual variations in tidal gauge data — see for example Journal of Coastal Research, http://e-geo.fcsh.unl.pt/ICS2009/_docs/ICS2009_Volume_I/218.222_C.Antunes_ICS2009.pdf , p.219 which you did also by another process.
Second point, in fitting a second order polynomial Watson appears to be looking for acceleration averaged over a period. Global warming with accelerated sea level rise should show a net acceleration. The method used should have been able to determine these effects and it did not. Your third order approach runs the risk of contamination from accelerations of short durations from other natural effects.
Third point, if you remain convinced of your analysis of Watson’s “errors”, you should submit a correction to the Journal of Coastal Research,

[Response: First: the drawbacks to using the moving average as applied in this case are just as stated. That the method is used elsewhere doesn’t change that. Elimination of the first and last decades is especially counterproductive. The fact that a proper analysis can reverse the *sign* of the result is sufficient to alert those who are willing to be enlightened that the analysis of moving averages is a mistake. It’s not just a less-than-optimal choice, it’s a mistake.

Second: fitting a quadratic (2nd-order polynomial) assumes as the basic model, constant acceleration over time. That’s a fact. The data prove that this model is wrong. Just plain wrong. That’s a fact.

Third: the implication that unless I publish my analysis it’s not valid is foolish.]
Bill Johnston | September 17, 2011 at 8:53 pm |

How not to interpret tide data.
It seems invariable that when ever anything is accelerating, or trending upwards, anthropogenic warming is invoked as a cause.

The differences were are talking about here are on the scale of millimeters. A tide gauge, even the one for Fremantle AUS, measure a water balance, which is different to sea-level. Freemantle and other gauges, including Newcastle NSW, Fort Denison (Sydney) are positioned so their data reflect the interaction between the ocean and the water being discharged by their respective waterways. In Fremattle’s case the gauge is just around the corner from the breakwater where the Swan River discharges; Sydney picks up on flows coming down the Parramatta River and harbour dynamics; for Newcastle it is the Hunter River.

Urban development and the diversion of water from other catchments into the basins of discharging rivers, was well as water storage developments within river catchamnts all can be expected to dramatically alter flow regimes, and thus measured water levels. For instance, over time a massive area of Sydney’s previously open woodlands and green-space has been “roofed-over” leading to massive changes in the hydrology of the Sydney Basin. Water is diverted into the Basin south from the Southern Highlands; west to the Great Dividing Range and in the north to the catchments of the Hawkesbury. Populations have expanded dramatically in western Sydney; this has resulted in urban irrigation, and all the other uses of water that end up contributing to urban runoff, and sustained flows in the Basin’s water-ways. Freemantle is the same- the Swan drains the city of Perth, and Perth “imports” large volumes of water from adjacent catchments. Newcastle is the same and there exists a major step-change in Newcastle’s data that corresponds to the year with the completion and filling of Glenbawn Dam on the Hunter River at Scone; which was the step-change in 1960.
During dry times, water increases, thus mulling any real hydrological response.
So in these data although there exists an anthropogenic signal, it probably has much more to do with the impact of a growing population on runoff/discharge relationships, than it has to do with scary global warming. (Most long-term tide data per se is contaminated in this way.)
It is worrying that people grab a bit of data, tune it with a belief in mind and present it without fulling exploring the nature of the data. The question that still remains is: to what extent can a signal due to warming be isolated from a signal due to other natural/anthropogenic causes such as population growth, water diversions and so on.
- Observer | October 27, 2011 at 11:04 pm |
  
  @Bill Johnston – You wrote:
  “It is worrying that people grab a bit of data, tune it with a belief in mind and present it without fulling exploring the nature of the data.”
  
  Well, this is indeed worrying, but it is *you*, about who you should worry. If you would have checked your own claims, you would have seen, that they are just wrong.
  
  Assuming your claim would be true, that the sea level increase on those tide gauges, situated near river mouths, is heavily influenced by increased run-off of these rivers. Then you should see, that those gauges have increased faster, than gauges not situated near river mouths.
  
  Well, lets check that e.g. for the case of Sydney:
  We therefore compare the data of the tide gauge located at Sydney Harbor (mouth of the Parramatta River) to a tide gauge situated on the same cost nearby, but not on a river mouth, e.g. Port Kembla (ca 50 miles south of Sydney). The data from Port Kembla go back until 1992, so we are limited here to this period of time. But what do we see? We see this:
  
  The sea level in Sydney increased in this period with an average rate of 2,5mm/y, the sea level in Port Kembla with a rate of 3,1mm/y. So, the sea in Sydney level increased *slower*, not faster! It just can’t be true, that increased water run-off from the river has created an artifical sea level increase there. If it had any incluence there, it would be the opposite and we would underestimate the real sea level increase!
  
  So, feel free to check this and perhaps for other tide gauges too, as you should have done in the first place, before making such claims, that are not anywhere near the truth. The tide gauge data I used are freely available and I made no adjustments or smoothings whatsoever to those data:
  http://www.psmsl.org/data/obtaining/stations/196.php
  http://www.psmsl.org/data/obtaining/stations/831.php
Ivan David Haigh | October 26, 2011 at 8:14 am |

Tamino I am working on a mean sea level analysis paper and would love to get your comments before submitting it. Would you being willing to have a quick look at the paper. If so how do I get it to you?

[Response: I’m afraid I can’t offer to act as a reviewer of works pre-submission. But if you have a website or blog where you can post it, then you can post a link here and a number of readers may be interested in examining it in detail.]
- Daniel Bailey | October 26, 2011 at 4:53 pm |
  
  Ivan, post this request over at Skeptical Science and I should be able to set you up with individuals who can help you out.
- Ivan David Haigh | October 29, 2011 at 12:58 pm |
  
  Thanks for getting back to me – I thought this might be the case, but also thought there was no harm asking. Thanks for the blog!
Ivan David Haigh | October 27, 2011 at 1:14 pm |

Could you tell me how I calculate the standard errors of the parameter estimates in a quadratic fit? This page list the equations for a linear fit: http://en.wikipedia.org/wiki/Regression_analysis. But I would like to calculate them for the three parameters in a quadratic fit, particularly the acceleration term (i.e. B2)
- Gavin's Pussycat | October 27, 2011 at 7:40 pm |
  
  Ivan, you have to form the normal equations first. The inverted coefficient matrix of this set of equations is the variance-covariance matrix of the parameters, scaled by the “variance of unit weight”, which you may compute as RSS/(n-3).
  Here, RSS is the residuals sum of squares, and n is the number of data points.
  The standard error of each parameter is now obtained as the square root of the corresponding diagonal matric element.
- Gavin's Pussycat | October 28, 2011 at 9:34 am |
  
  To give a little more detail, you should build the matrix X, and then form and invert X^T X. Then, take the third row and column element of this (X^T X)^-1 matrix, and multiply it with RSS/(n-3). Then the square root of that will be your sigma-hat beta-two ;-)
  
  X of course consists of as many rows as you have data points, with three elements in each row: 1, x_i and (x_i)^2 for data point i.
Ivan David Haigh | October 29, 2011 at 12:57 pm |

Hi Gavin’s Pussycat – thanks very much for your replies, much appreciated! Can I ask a follow up question? What is the best way to account for serial correlation? I assume that ignoring it will mean that the standard error is under-estimated.

One approach I have read suggest that you calculate the correlation coefficient of the lag-1 time series (lets say for example this is 0.4). You then take (1-0.4)/(1+0.4) = 0.42 and factor this by the number of observations (say this is 90). Hence the number of observations reduces to 90*0.42=38 and the degrees of freedom is then 38-3. Is this what you recommend or can you suggest something better?
- bill johnston | October 31, 2011 at 6:37 am |
  
  David you are fluffing-out my words.
  I did not intimate any connection between Fort Denison and any other tide gauge; but no doubt there is hydrological connectivity all around the coast.
  You have ignored my main point, which was that gauge trends in estuaries, if they exist, are fully confounded with changes in catchment characteristics.
  After I posted I looked more closely at Fort Denison’s (raw) data using off-the-shelf analytical tools. The dataset (like that for Newcastle) consisted of sequential, statistically different (at P=0.05) step changes, which in Sydney’s case showed no significant linear trends within steps.
  Drawing a line from the bottom-most step to the topmost, produces an overall trend, but correcting for the steps does not. This makes unbiased interpretation difficult. We don’t know for instance, if the “significance” of the apparent overall trend was due to the combination of steps, or to just one or two.
  Stepped data are clearly not homogeneous which may have an impact on Tamino’s analysis approach for Fremantle, whose data is also stepped.
  My level of statistical skill is only moderate compared to his, but I can’t see where he accounted for steps. Also, his anomaly differences may exaggerate “trends” depending on the comparative “level” of the steps that were differenced.
  I’m surmising here; I don’t have much interest in closely examining those data. Let’s call it a potential but unresolved issue.
  Thinking about Sydney data, in addition to catchment issues; if you were also to take into account the inter annual and historical shifts that are known to have have occurred in the positional latitude of the high-pressure ridge (HPR) across eastern Australia, it would tell an even more complex story. The HPR has varied also step-wise from time to time by as much as +/- 3 degrees relative to its average position of 33 point something degrees. Furthermore, it would be a rational hypothesis that because the ridge represents a radial pressure gradient, whose pressure has also from time to time shifted in steps (thus potentially also affecting sea-levels), unambiguously comparing what are very fine differences between gauges any distance apart would also be challenging.
  I have the shorter record for Port Kembla, as I thought it too short to be illuminating so I’ve not examined it in detail. There is also a second, shorter record for Sydney: ditto. I’m sure you’d have to normalise these records to make any sensible comparison….but then there is still the little issue of interpretation, which is open to being colored any way you like. This was actually the second of my main points and the one that you picked up on. Thanks anyway.
  Cheers
bill johnston | October 31, 2011 at 7:19 am |

What I meant was: Observer you are fluffing out my words; and I might also add now, that your tone was somewhat aggressive.
Questions back to you: You say “The sea level in Sydney increased in this period (1992 to presumably 2009) with an average rate of 2,5mm/y, the sea level in Port Kembla with a rate of 3,1mm/y.”
You said you did not compare detrended or standardised data.
My questions are:
Were the slopes of these regressions significantly different? (by eyeball measure they were not). Were their intercepts the same or significantly different? (obviously they were apart by around 110 mm). Was there a close direct relationship between the gauges’ data in the first place? (Seems they’d be strongly cross-correlated, thus not independent); and finally, why would the data for each place not be influenced independently by developments at each place (dredging; port-building; hydrology etc etc etc.) that accounted for the tiny, weeny difference in rate (0.6mm/y) that you seem to have detected? (Of course they would; and the intercept differences would suggest that to be the case.)
Your rate comparisons are nonsense anyway; you have analysed highly correlated sinusoidal data using linear regression. For your own interest, at least take out the linear trends and compare residuals; or difference the respective data and look at between-gauge trends using those. Better still, correct for step changes before you start.
Cheers
Bill

	Jim Eager on Teach your children well
	jgnfld on Teach your children well
	nialldarwin on Teach your children well
	Pariah Sojourner on Teach your children well
	sailcarpediem on Teach your children well
	Lowlander on Sea Level in New York City…
	jgnfld on Sea Level in New York City…
	climatefreak on Sea Level in New York City…
	Brian Dodge on Sea Level in New York City…
	fredericgolden on Sea Level in New York City…
	Lowlander on Sea Level in New York City…
	Lichanos on Sea Level in New York City…
	jgnfld on Sea Level in New York City…
	Brian Dodge on Sea Level in New York City…
	Lichanos on Sea Level in New York City…

How Not to Analyze Tide Gauge Data

110 responses to “How Not to Analyze Tide Gauge Data”

Support Your Global Climate Blog

Recent Comments

Recent Posts

Buy the book

astronomy

Blogroll

Global Warming

mathematics

How Not to Analyze Tide Gauge Data

Share this:

Related

110 responses to “How Not to Analyze Tide Gauge Data”

Support Your Global Climate Blog

Recent Comments

Recent Posts

Buy the book

astronomy

Blogroll

Global Warming

mathematics