Fooled Yet Again

In the previous post I discussed the refutation of LeMouel et al. (and a companion paper) by Legras et al. Now it seems that LeMouel et al. have responded to their critics (which is actually part of the discussion, not a published peer-reviewed response).


I won’t discuss the entire rebuttal except to say that it’s embarrassingly bad. They actually have the audacity to suggest that this:

is not clear evidence of an inhomogeneity (one which has a profound impact on their analysis since it coincides with one of the “high” solar cycles). Truly embarrassing.

What I will consider is their claim that they don’t actually need to account for autocorrelation in estimating the variance of their 21-day moving averages. They say this:


In addition, as can be seen in their supplementary material, when trying to account for dependencies in a 21-day interval (which we select), LMBY use 90- and 150-day intervals that naturally are affected by the seasonal variability of temperatures (plot and output on page 21, SM to LMBY). Figure 1 actually shows that autocorrelations of the daily temperatures in 21-day intervals fall below 0.2 in less than 3 days, while autocorrelation for the daily range of temperatures dT (which LMBY fail to consider) falls below 0.2 on the second day…

The sentence “the number of effective degree of freedom is about 9 times smaller than estimated by LKMC and consequently the estimated variance of the ensemble average is about three times larger” is therefore false.

They support their claim with this graph for Praha (and a couple of others for Bologna and Uccle):

First things first: even if the autocorrelation estimates of LeMouel et al. were correct, they would still need to be taken into account when estimating the variance of the 21-day moving averages, and they’re big enough to have a significant effect. But I strongly suspect that the estimates of LeMouel et al. are not correct.

At first I was puzzled how LeMouel et al. managed to get autocorrelation estimates so much lower than I got. This is especially puzzling because I estimated the autocorrelation of anomalies so as to remove the seasonal cycle — their “affected by the seasonal variability of temperatures” claim doesn’t apply. Then I noticed something interesting. The estimates from LeMouel et al. drop below zero rapidly — after day 5 for temperature and after day 3 for dT.

Does it seem implausible, on purely physical grounds, that there would be negative autocorrelation between temperature (or dT) and its value just a few days later? It is.

Then how did it happen? Here’s my theory: LeMouel et al. estimated the autocorrelation function (ACF) of actual 21-day intervals. They probably did so for many 21-day intervals and averaged the results, but their estimates are still based on 21-day intervals. But the usual estimate of the autocorrelation function (the “Yule-Walker” estimate) is a biased estimate, biased low, and for short spans of data the bias can be profound.

In fact the bias of the Yule-Walker estimate (and of others like the least-squares estimate) leads to exactly the characteristic pattern observed in the graphs from LeMouel et al., that the estimated ACF rapidly drops below zero even when the true ACF remains positive (as it does, e.g., for an AR(1) process).

We’ve seen this problem before. When Schwartz estimated climate sensitivity using a simple 1-box energy balance model, he estimated autocorrelations using time series of only 125 data points. One of the points made in response was that such a small data set carries a large bias in the estimated autocorrelation, especially when the autocorrelation is sizeable. It was a big problem with data sets of only 125 data points, and it’s a much bigger problem with only 21.

Allow me to illustrate. I generated a 21-day time series of random noise from an AR(1) process, with (true) lag-1 autocorrelation 0.8 (which is big!), and estimated the autocorrelation. In fact I did so 1000 times and averaged the results. Here it is:

Note that the true ACF remains positive at all lags (as is always the case for an AR(1) process), but the estimate drops to zero by lag 4 — even though the true ACF at lag 4 is still sizeable (= 0.4096).

The actual impact of autocorrelation on the variance of a 21-day average is given by

\sigma_{ave}^2 = \sigma^2 \Bigl [ 1 + 2 \sum_{j=1}^{20} \rho_j (1-j/21) \Bigr ],

where \sigma_{ave}^2 is the variance of the average, \sigma^2 is the variance of the data, and \rho_j is the (true) autocorrelation at lag j. I estimated the autocorrelation of the daily high temperature data from Praha, and used this formula to compute the inflation factor for variance and standard deviation (which is the square root of the factor for variance). Result? The standard error in the 21-day averages is inflated by a factor of 2.44. This isn’t as big as the factor of 3 estimated by Legras et al., but it’s a lot bigger than no inflation at all.

On another subject altogether: My wife reminds me that I shouldn’t be an ingrate. So I’d like to express my sincere gratitude to those who’ve made a donation to this blog using the “donate” button at the top right. It really has been a big help. Even the small donations make a real difference, and the not-so-small ones (you know who you are!) have been immensely helpful. Thanks.

12 responses to “Fooled Yet Again

  1. And what peer-reviewed science journal will this be published in?

    • I agree, that sounds like a good idea!

      Then again, their reply is so weak that it barely deserves a response. It would probably be necessary to add something completely new to the analysis in order to merit publication.

      • Note that the “reply” is merely a “reply” on a discussion paper, not a peer reviewed paper.

        But I think Tamino could actually still(?) reply to Le Mouël.

      • I concur. When a reply begins with wide-handed gestures about epistemology without any substance for several pages, you know that you can skip the reply (unless you want to understand how wrong they were).
        I recognize totally the Le Mouël/Courtillot style. Good luck climatologists with them, we geophysicists (and especially French ones) are quite fed up with these arrogant guys.
        Oh, by the way, don’t be surprised if in a few years you see them change abruptly sides. They do that from times to times.

  2. Dear Tamino

    Would you recommend using the Burg estimator in this situation?

    Doug

    [Response: I’m not that well-versed in it, but I believe the bias of the Burg estimator is about the same as that of the least-squares estimator. I’ve worked on some estimates which reduce the bias in ACF estimates for small samples, but they suffer from the problem that in some cases they can give ill-defined results (i.e., impossible values). If you only have 21 data points, I recommend not putting too much faith in any autocorrelation estimator. And in this situation, I recommend not basing autocorrelation estimates on 21-point samples at all — there’s plenty enough data to get much more accurate results using any of the well-known estimators.]

    [Response 2: I ran some tests, the same as in the post but using the Burg estimator, and it gave very similar results. It was only marginally better than the Yule-Walker estimate, and didn’t really solve the problem of extreme bias with such a small sample.]

  3. Hi Tamino,

    “now it seems” seems a little misleading (and indeed your first commenter seems mislead).

    The LeMouel “response” you discuss is actually their own review on the original Legras manuscript, not a response to the final paper. As such, it was “published” as part of the open review process (it is not a peer-reviewed publication though) and will have been appropriately taken into account by the editor when making their decision. “Appropriately” in this case meaning “wholly ignored”, I trust, other than perhaps to encourage some clarification :-)

    Legras et al themselves responded to the autocorrelation stuff, they don’t seem to have done the sleuthing to reverse-engineer what precisely LeMouel did wrong (I’m sure you are correct BTW) but then they didn’t have the advantage you did of recently dealing with the Schwartz stuff! It seems that the bias in this estimator, although established decades ago in the statistics literature, is not well known among the general scientific population.

    You can access the whole gory mess most conveniently from

    http://www.clim-past-discuss.net/6/767/2010/cpd-6-767-2010-discussion.html

    This open review process certainly has its benefits, though it can lead to confusion as to what the status of all the documents are…in principle none are peer reviewed other than the final paper (if it exists, which it does in this case).

    [Response: Thanks for the clarification. I noticed that Eli Rabett has posted about some of the advantages of open review — interesting reading.]

  4. Gavin's Pussycat

    Hmmm, do I see correctly that the areas between the red curve and the horizontal axis to the left and to the right of the zero point are equal?

    [Response: Yes (at least approximately). It’s a property of the Yule-Walker estimator.]

  5. Tamino,

    Many thanks for the wonderful clarifications (from this post and previous ones).

    I have a general question. Are there any good technical books or online resources that you would recommend that cover time-series analysis and data analysis in a reliable fashion? Many of the books I’ve browsed through seem to be little more than cook books, and provide little understanding of why something works and something else doesn’t. For a long-in-the-tooth codger such as myself who had basic statistics back in the 80s in the UCL Physics and Astronomy Department, I’ve come to realize that the background I have is woefully inadequate. I’ve found your posts to be wonderful, and I’ve worked through many of the analyses afterwards, but often find myself hungry for more.

    All the best,

    Adrian

    [Response: I think you’ve put your finger on a “hole” in the textbook literature. There are some decent ones but they’re a little too much “text” and not as much *insight* as I’d like. If I ever finish mine …

    In the meantime, if you’re really keen to acquire more knowledge then just get a decent text and slog through it, maybe Shumway & Stoffer “Time Series Analysis and its Applications.” And, you can get a lot of insight from *experience* — so get some data and play with it.]

    • Tamino,

      Many thanks for the recommendation! As for finishing one’s own book – I know the feeling!!!

      Adrian

      [Response: You’re welcome.]

  6. There have been suggestions to publish some of this analysis, but I don’t think it’s necessary. The properties of the Yule-Walker estimator are already well-established, and I don’t think LeMouel et al. needs any further peer-reviewed refutation.

    That doesn’t mean blogs can’t turn into published science. Expect this to be submitted before long.

  7. Tamino, Paypal doesn’t like you – something about your E-mail address. Credit cards work.

    [Response: Hmmm… I’ll look into it.]

  8. Tamino,

    From above:
    “On another subject altogether: My wife reminds me that I shouldn’t be an ingrate. So I’d like to express my sincere gratitude to those who’ve made a donation to this blog using the “donate” button at the top right. It really has been a big help. Even the small donations make a real difference, and the not-so-small ones (you know who you are!) have been immensely helpful. Thanks.”

    You have done a great job here with the statistical analysis of climate data that you provide with the posts proving to be very useful to me in a number of ways with my professional and personal interests. Several series of posts have greatly improved my understanding of time series analysis (and statistical analysis in general). Examples include the Alphabet Soup series (together with the post on autocorrelation) , the series on practical PCA, and the recent series on Comparing Data Sets. These groups of posts just to name a few, constitute excellent tutorials on methods of statistical data analysis. A great benefit is that the instruction is naturally provided within the practical framework of climate data analysis.

    An equally important effort here is the counter against the denialist arguments that is conducted here. Its instructive because not only are the denialist errors revealed but their thought processes and motivations as well. I used to think it was wasted effort exposing the arguments of Watts and the like but given the impending attacks that will be coming from the right wing in congress like Issa Barton et al, I see there is importance in these efforts. The sooner they crumble the better.

    It is important work that you do here Tamino, and there is not much out there on the web equal to it at least that I have seen. So yes based on what benefits I have received from your work it is worthy of a contribution of a good contribution of support.

    Best regards,

    PJKar