Nothin’ but Noise

Pat Michaels claims (also here) that the journal Nature has lost its credibility. That’s an extraordinary claim, considering that Nature is one of the most prestigious peer-reviewed science journals in the world. There are those who believe Pat Michaels is the one lacking any credibility.

Michaels’ problem with Nature is that it publishes scientific research on the subject of global warming which he doesn’t like. His latest beef is with the publication of Booth et al., Aerosols implicated as a prime driver of twentieth-century North Atlantic climate variability. It casts (further) doubt on one of the favorite claims of fake skeptics, that entirely natural variations in the north Atlantic ocean, such as the AMO (Atlantic Multidecadal Oscillation), are responsible for some or perhaps even most of the global warming over the last century.

To counter Booth et al., Michaels touts Chylek et al., Greenland ice core evidence for spatial and temporal variability of the Atlantic Multidecadal Oscillation. Michaels compares them thus:


And Chylek and colleagues had this to say about the mechanisms involved:


The observed intermittency of these modes over the last 4000 years supports the view that these are internal ocean-atmosphere modes, with little or no external forcing.

Better read that again. “…with little or no external forcing.”

Chylek’s conclusion is vastly different from the one reached by Booth et al., which in an Editorial, Nature touted as [emphasis added]:


[B]ecause the AMO has been implicated in global processes, such as the frequency of Atlantic hurricanes and drought in the Sahel region of Africa in the 1980s, the findings greatly extend the possible reach of human activity on global climate. Moreover, if correct, the study effectively does away with the AMO as it is currently posited, in that the multidecadal oscillation is neither truly oscillatory nor multidecadal.

Funny how the ice core records analyzed by Chylek (as opposed to the largely climate model exercise of Booth et al.) and show the AMO to be both oscillatory and multidecadal — and to be exhibiting such characteristics long before any possible human influence.

Clearly Michaels is convinced that Chylek et al. is right and Booth et al. is wrong about north Atlantic climate variability, and that the AMO is a real phenomenon.

Unfortunately for Chylek et al., their claims don’t hold water. They have committed one of the most common mistakes in time series analysis, one which convinces them of the existence of oscillatory behavior when no such claim is justified by the data.

They studied data for d18O from ice cores in Greenland and northernmost Canada, looked for periodic behavior, and believed they had found it. One of the prime (and most relevant) examples is data from the Dye3 ice core:

They split the data into four segments and Fourier-analyzed each to look for oscillatory behavior. The most recent is from 930 AD to 1872 AD, which gives them this spectrum:

It’s the peak labelled “1″ on which they base their claim that “In the southern region (Dye 3 site) the dominant multidecadal periodicity is again ~20 years.” That peak certainly does rise above the line they’ve labelled “95%,” meaning 95% confidence, statistically. But it’s not.

What they’ve plotted is actually an averaged spectrum. They first detrended the Dye3 data, then computed the FFT (fast Fourier transform), then averaged the results over 9 consecutive frequencies. When I do the same, I get a very similar result:

The results are extremely similar but not exactly so. It’s hard to be sure why because it’s hard to be sure exactly what they’ve done, because there’s not much discussion of the exact details of their analysis. For instance, many programs (R for instance), when asked to compute a spectrum using FFT, will automatically apply a taper at the edges of the data in order to reduce spectral leakage, and some will pad the series with zeros to adjust the number of data so that the FFT will be especially fast. I didn’t do any of those things. It also looks like they may have simply smoothed the plot of the spectrum a little (I hope they didn’t oversample!). But the differences in our results are minor — we’re certainly in the same ballpark.

For an ordinary Fourier spectrum, each power level is often treated as proportional to a chi-square statistic with two degrees of freedom. But when you average over 9 consecutive frequencies, it’s proportional to a chi-square statistic with 18 degrees of freedom. If the noise were white noise, then the “critical values” (the level of the lines labelled “95%” etc.) would be the same for all frequencies (all periods). But the levels of their lines labelled “95%” etc. depend on period. That’s because they’ve corrected for the fact that the noise isn’t white, the data show autocorrelation:

Again it’s unclear exactly how they’ve done this, and what autocorrelation parameters they chose. I applied an AR(1) model estimating the autocorrelation from the sample ACF, which gives me this (dashed red line) for the 95% confidence limit:

Once again our results aren’t exactly the same, but we’re still quite close, and quite clearly, in the same ballpark. In fact I assign slightly higher significance to the main peak than Chylek et al.

So why do I say that there’s no evidence of oscillatory behavior, and the Chylek et al. is wrong? Because this significance level is for testing a single period — one and one only. When you test more periods (and that’s what Fourier analysis is all about), you have lots more chances to cross that critical value with your test statistic. After all, even if the data are nothing but noise we still expect 5% of all tested periods to yield a test statistic which exceeds the 95% confidence limit.

When I adjust the critical value to account for this, the given peak is no longer significant. Not even close. It just ain’t so.

If you’re reluctant to rely on all the complications of applying all this theory (such reluctance is well advised), you could just do some Monte Carlo simulations. I created 200 artificial AR(1) noise series with the same autocorrelation as the Dye3 data and subjected them to the same analysis, in order to estimate the probability density function of the maximum power level if the data are “nothin’ but noise.” The very first simulated series — right out of the box — gave a spectrum which looks eerily similar to that from the Dye3 data:

Taking the 200 simulations as a whole, here’s a histogram of the observed peak values, with the peak value for the Dye3 data indicated by the dashed red line:

It’s abundantly clear, whether you compute the critical level theoretically (allowing for testing multiple frequencies) or estimate it from Monte Carlo simulations, that the observed peak value from the Dye3 data is not significant. Not even close. It just ain’t so.

We’ve actually already addressed exactly this very statistical issue, in relation to exactly this very situation: analysis of ice core data used as a proxy for AMO (by Knudsen et al., Tracking the Atlantic Multidecadal Oscillation through the last 8,000 years). But there are some notable differences between the two papers. Knudsen et al. were keenly aware of these issues, as was evident when I inquired of Dr. Knudsen about some of my concerns. That’s probably why they were cautious in drawing definitive conclusions in spite of the fact that their evidence was, it seems to me, quite a bit stronger than that of Chylek et al. On the other hand, reading the Chylek paper gives the distinct impression that they have no doubt whatever about the validity of their conclusion, in spite of the fact that — as we have seen — the evidence just isn’t there.

Another difference is that Knudsen et al. provided far more detail about their analysis methods. In my opinion, one of the annoying things about Chylek et al. is how the analysis details are glossed over (I’m a bit surprised that this wasn’t a major issue during the peer-review process). They applied wavelet analysis, for instance, but there’s no clue what wavelet method or program they used. And they managed to overinterpret the wavelet analysis — in the extreme — just as they did with the Fourier analysis.

I guess I shouldn’t blame Chylek et al. too much, because as I said at the outset, overinterpretation of Fourier analysis — especially the identification of periods for which there is nowhere near sufficient evidence — is one of the most common problems in the peer-reviewed scientific literature. But I do take exception to the extreme confidence they attach to their conclusions.

And, I have some sage advice for anyone who is doing analysis this complex. We have these things called “computers,” so run some damn Monte Carlo simulations — the theory can get extremely complicated with lots of ways to go astray, and Monte Carlo is a great way to get a basic reality check on your results.

As for Pat Michaels, I definitely blame him for pontificating about papers which, in my opinion, he doesn’t have the “skillz” to evaluate.

About these ads

29 responses to “Nothin’ but Noise

  1. Thanks Tamino for your analysis.

    For readers, here is a link to a power point presentation of the paper:

    Greenland ice core evidence for spatial and temporal variability of the Atlantic Multi-decadal Oscillation
    http://curry.eas.gatech.edu/santafe/papers/Chylek.pdf

    A video of the explanation of it:
    http://www.youtube.com/watch?v=SHD7P3qefSY

    And a question for you:

    The results of analysing the central Greenland ice cores (Milcent + Crete + GISP2) and the Nortern Greenland ones (AgassizIceCap + CampCentury) are similar than the results for Dye3?

  2. Tamino, looks to me like an excellent comment to the Chylek paper. Can I recommend you submit this (after some modification)? Make it a “learning” paper: “People, be careful when you use Fourier analysis to look at cycles. Case-in-point, the paper by Chylek et al. A critical analysis of their approach shows…etc, etc, etc”. There are some well-known people on the author list, which should be capable of learning. Pat Michaels? Not so much.

  3. It seems Nature bashing is flavour of the month. The UK’s The Telegraph on 14th April published a piece by its spectacularly ignorant (of global warming issues) and opinionated columnist Christopher Booker entitled “In the eyes of Nature, warming can’t be natural”, here (if you can stomach it):
    http://www.telegraph.co.uk/comment/columnists/christopherbooker/9204223/In-the-eyes-of-Nature-warming-cant-be-natural.html

  4. In high-energy physics, the phenomenon that peaks can appear due to chance – mimicking a (statistically significant) signal – is known as the Look Elsewhere Effect. Techniques have been developed to deal with this, and the use of MC is a prime example. This is why the Higgs mass peaks seen by ATLAS and CMS have “local” significances of ~2.5 sigma, but when the LEE is taken into account they drop to ~1.5.

  5. Klaus Flemløse

    Dye 3/d18-0
    My first problem with the dye-3/d18-0 measurements is to understand, what the d18-0 is showing to us is. Is it the temperature variations 1 m above the ice or the temperature at 10 km up in the atmosphere or is it the variation of the temperature of the North Atlantic Ocean. If there is a common understanding of what the variation of d18-0 is telling us, it is OK with me.
    My next problem is to investigate if there is a periodic behavior of d18-0 during the years 1989bc to 1882ac.
    I have been looking at the same data as Tamino without any adjustments at all. I can’t see any cyclic behavior at all:
    http://www.kflgavia.dk/files/8013/3474/2858/dye_3_fig.2.jpg
    The upper left most graph shows, that there is a linear decreasing trend in the data. The upper right most graph indicates and the lower left most graph indicates that the residuals from the linear regression is an AR(2)-process. The lower right graph shows the spectrum with no peaks indicating there are no cyclic movements.
    The best way to describe the variation in the d18-0 is consequently by
    1) There is a linear decreasing trend of -0.00009714 pr. year
    2) The error can be described by AR(2)-model with ar1= 0.3065, ar2=-0.615 and variance=0.7468
    In other words there is a decreasing trend with a two year hangover. From a statistical point of view this is the best model to use. However, this does not mean that the physical mechanism behind the d18-0 variations is so simple. It could be much more complicated.
    If you split the data in shorter time periods and do a lot of averaging and testing it is almost certain: then some day you will find a periodic behavior. This is what Tamino has shown.
    It is of course possible, that my analysis is flat wrong. Please let me know if you think so and come up with an alternative analysis including the R-code. Then we can start the discussion from there.
    If Tamino will release the simulation code for the AR(1)-process and calculated spectrum, I will be pleased. I can’t exactly reproduce his result but only something similar.

  6. Klaus Flemløse

    How can I make the graphs to pop up on the blog.

    I thought that xxx would make it.

    [Response: I don't want commenters posting graphics. Link to them instead.]

  7. What does a lag one autocorrelation have to do with something (e. g. a peak period of say 20 years) that is so far removed from the Nyquest frequency (and the timebase (lowest frequency of the spectrum) even)?

    [Response: Why do I suspect that if I explained it to you, you wouldn't believe me?

    So try some Monte Carlo simulations. Generate a large number of time series of AR(1) noise and compute their spectra. Or MA(1) noise. Or ARMA(1,1) noise (which is a better model for Dye3 than AR(1) or AR(2)).]

    I also take issue with your method of generating simulated spectra.

    I’ve done quite a bit of this stuff myself, with regards to water waves, Welch segment averaging (with a half lag window on top of the bottom segmented layer, giving 2N – 1 DOF) as opposed to band averaging, laboratory data, field (prototype) data and most specifically numerical modeling simulations (of 2nd order moored ship motion in harbors (POLA/POLB)).

    Never did like band averaging, never will.

    • You really need to explain what a lag one autocorrelation is to begin with.

      Lag one meaning? One year? Twenty Years? Remember we are talking about ice core data, so it’s fully expected to autocorrelate on a many years timescale simply due to the time of air capture and what not.

      However, that does not mean that we can just go about, in a willy nilly fashion, moving the spectral peak, what +/- 20%?

      You also would need to look at some actual ocean wave spectra. So, for example, if I were to do a lag N cross correlation and saw a very strong cross correlation over many lags, do I then accept your method that would produce a peak at say 15 seconds, when, in fact the measured peak was at 20 seconds?

      That would be a major no-no where I come from.

      Because that’s exactly what you’ve shown above.

      Something that anyone in physical oceanography would never do.

      Finally, you should present you data as log-log power spectra. Standard EE practice.

      The whole plot-the-frequency-domain on the time-domain is so 1960′s and distorts the energy density (the data points are no longer evenly spaced).

      [Response: If I "need to explain what a lag one autocorrelation is to begin with," then you're in over your head.

      Did you even bother to look at the data (the link is in the post)? You'll find it's regularly sampled with time spacing 1 year, so "lag-1" means 1 year.

      Your belief that it should "autocorrelate on a many years timescale" is contradicted by the data itself. The plot of the sample autocorrelation function is in the post.

      As for "moving the spectral peak," neither I, nor Chylek et al., does that. The purpose of spectral averaging is to get a better estimate of the spectral density when a process might not exhibit a line spectrum, because the variance of the raw spectral density estimate does not decrease as the number of data increases. It's pretty standard practice. That fact that you never liked it and never will, is irrelevant.

      The claim that in order to understand spectral analysis I have to look at some ocean wave spectra is ridiculous. As in, worthy of ridicule. Appealing to physical oceanography is hubris -- perhaps you failed to notice that the relevant topic is statistics.

      Log-power spectra may be standard practice in EE, but that too is irrelevant. It's the same spectrum however it's plotted.

      As for "plot-the-frequency-domain on the time-domain is so 1960's and distorts the energy density," I too dislike that kind of plot. But that's what Chylek et al. did, so I emulated them in order to make visual comparison easy. If you really hate it that much, take it up with them.]

  8. Thanks Tamino. I always enjoy the opportunity to learn more about data analysis (and manipulation). I was particularly interested to see whose name should pop up in your discussion but Joseph Fourier’s….since he is often credited in 1824 and 1827 with the original discovery of the “greenhouse effect”. Some of the people who laid the foundations for climate science were really, really smart. Pat Michaels, not so much.

  9. You gotta stop with the titles. Do you have any idea what they are doing to Horatio’s brain?

    “You think that I don’t even mean
    A single graph displayed
    It’s only noise, and noise is all
    I have to take your warmth away…”

    Horatio won’t pollute this thread with any more noise –gotta go to Horatio’s for that (some might say that’s all you’ll find there ~@:>)

  10. I’m not going to really argue here with this because I think Tamino adequately debunks the results of Chylek et al… but I don’t think that Booth et al is the end-all in the argument of the AMO either. From their abstract:
    “Here we use a state-of-the-art Earth system climate model to show that aerosol emissions and periods of volcanic activity explain 76 per cent of the simulated multidecadal variance in detrended 1860–2005 North Atlantic sea surface temperatures.”

    There is plenty of evidence in Earth System models of NA variability without major aerosol influences firstly so because a particular model doesn’t have it to the same magnitude isn’t really enough evidence to disprove others work. Furthermore using Detrended SSTs is not the way the AMO is best characterized. Trenberth has challenged this method in the past and instead uses the difference between NA SSTs and Global SSTs. The detrended SSTs underestimate anthropogenic influence on NA SSTs in the last 50 years.

    I’m not trying to start an argument. But I think that these sorts of discussions need to be had, because there is strong evidence of amplified NH and Arctic warming in response to multidecadal fluctuations. This is dangerous when you add it to AGW.
    http://img21.imageshack.us/img21/6961/amv.png

    I’ll point out a couple recent studies in major climate journals which support my view that this is a real feature in our climate system. That I believe provide stronger evidence than Booth et al (2012)

    Wei, W., and G. Lohmann, 2012: Simulated Atlantic Multidecadal Oscillation during the Holocene. J. Climate (in press), JCLI-D-11-00667

    “In this study, the Atlantic Multidecadal Oscillation (AMO) under different boundary conditions during the Holocene, i.e. orbital change, greenhouse gas concentration, the Laurentide ice sheet and its melting, are examined in several long-term simulations using the Earth system model COSMOS… our results show the strong correlation between the AMO index and AMOC index on multidecadal timescales and, during a warm phase of the AMO, the AMOC is intensified significantly… the climate influence of the AMO during the Holocene demonstrates that there is no remarkable change in its spatial pattern under different climate background conditions, which further reveals that the AMO is an internal variability of the climate system…”

    Multidecadal Co-variability of North Atlantic Sea Surface Temperature, African Dust, Sahel Rainfall and Atlantic Hurricanes
    Wang et al. Journal of Climate (2012)
    (doi: 10.1175/JCLI-D-11-00413.1)

    Semenov et al. (2011). The Impact of North Atlantic-Arctic Multidecadal Variability on Northern Hemisphere Surface Air Temperature. Journal of Climate.
    “The authors present results from a set of climate model simulations that suggest natural internal multidecadal climate variability in the North Atlantic–Arctic sector could have considerably contributed to the Northern Hemisphere surface warming since 1980…”

    DelSole, T. A Significant Component of Unforced Multidecadal Variability in the Recent Acceleration of Global Warming. (2011). Journal of Climate.
    “The warming and cooling of the Internal Multidecadal Pattern matches that of the Atlantic multidecadal oscillation and is of sufficient amplitude to explain the acceleration in warming during 1977–2008 as compared to 1946–77, despite the forced component increasing at the same rate during these two periods…”

    Ting et al. (2011). Robust features of Atlantic multi-decadal variability and its climate impacts. Geophysical Research Letters. 38: L17705.
    “…Our study adds important contributions toward the understanding of AMV: We show that there is a welldefined spatial pattern for AMV in the North Atlantic that is consistent in 20th Century observations as well as the climate model simulations of the 20th, 21st and pre‐industrial conditions…”

    Mahajan et al. (2011). Impact of the Atlantic Meridional Overturning Circulation (AMOC) on Arctic Surface Air Temperature and Sea Ice Variability. Journal of Climate.
    “…The recent declining trend in the satellite-observed sea ice extent also shows a similar pattern in the Atlantic sector of theArctic in the winter, suggesting the possibility of a role of the AMOC in the recentArctic sea ice decline in addition to anthropogenic greenhouse-gas-induced warming.”

    Polyakov et al. Long-term variability of Arctic climate: Trends and multidecadal fluctuations. (Invited Lecture) (American Geophysical Union Fall Meeting. (2010).
    “A large-amplitude multidecadal-scale mode with an approximate time scale of 50-80 years was prevalent. We estimate that this mode accounts for ~50% of Arctic atmospheric warming since 1979…”

  11. I thought the AMO might be important, too–until I saw that Granger causality led from dT to the AMO and not the other way around. Do a Sims test, people.

  12. Very nice post! I agree that Monte Carlo simulations can be our friend. Is it also possible to use a “false discovery rate” test here? After learning about such a test, I found it to be an elegant and robust way to test for global significance in other applications. (Anyone interested can find a nice paper here: http://journals.ametsoc.org/doi/abs/10.1175/JAM2404.1 .)

  13. BPL,
    What AMO definition did you use?

    It’s clear that the AMO is not AGW and that there is AGW signal in the AMO but when you use an appropriate AMO definition (NA – NH SST) then you see multidecadal variability. Which is what we expect. The important point is that the AMO only increases or decreases the trend over short intervals in the NA and Arctic. It does not impact the long term trend. I think that some of the analysis I presented above are much more rigorous than Granger causality tests for this type of thing.

    • Problem is, of course, that NA AMO is a part of NH SST, so the result will understate (although perhaps by an estimateable fraction) AMO. The best thing would be to recompute a NH – NA SST but then the question is where is the margin between them so even better you would have to impose a no calculate buffer zone.

  14. In the modern instrumental SST temperature record (i.e. after 1880) the AMO is defined as the North Atlantic SST anomaly minus the global SST anomaly.

    But what definition for the AMO is used for the pre-instrumental times (i.e. before 1880)?

    It is still the proxy-reconstructed North Atlantic SST anomaly minus the global one?

    I ask this because I don’t know how much of the pre-instrumental global SST have been reconstructed from the proxy record.

    [Response: Chylek et al. uses ice core d18O as a proxy for north Atlantic climate variability, and likens this (some might even think they equate it) to AMO variation. Ice core d18O is often taken as a temperature proxy. No accounting was made for comparing it to any global temperature estimate, they just used the bare d18O estimates.]

  15. Michaels: people might recall the infamous Soon&Baliunas paper in Climate Research, where incoming E-i-C von Stroch and other editors resigned in protest over Chris De Freitas’ pal reviews.

    But actually, Michaels was the “king of the pals,” having gotten 8 papers though de Freitas. See Skeptics Prefer Pal Review Over Peer Review: Chris de Freitas, Pat Michaels And Their Pals, 1997-2003.

  16. The AMO is not a climate forcer in senso strictu.

  17. Is there really any solid evidence of a multidecadal oscillation in the North Atlantic paleoclimate record, or just some fluctuations that could be just noise?

    [Response: Of the data I've analyzed (admittedly a limited quantity) I haven't yet found solid evidence of multidecadal oscillation.]

  18. Dear Tamino, thank you for yet another eloquent and devastating takedown. I have another question for you regarding the De Bilt station: What is the reason for the apparent discrepancy between the raw data for HadCRUT and GHCN for this station as seen on this graph?

    Both are supposed to be raw, unadjusted data, but it appears to me that there must have been some kind of station move or another similar event which has required adjustment around 1950. Do you know the reason for this? I´d greatly appreciate your answer.

  19. There is an interesting comment here by Ben Booth

    http://allmodelsarewrong.com/how-to-be-engaging/#comment-1084

    where he objects to comparisons which have been drawn between his paper and the Chylek paper. His objection is not that the Chylek paper is wrong but that it simply does not actually conflict with his own conclusions. Which is not to say that Tamino’s analysis is not also valid.

  20. The GHCN series for The Bilt before 1950 is outdated. It was based on only 3 measurements a day. Full details in: van der Schrier et al, 2011, Climate of the Past.

  21. As a rank amateur here, the term ‘oscillation’ itself implies and requires a regular, observable cyclical behavior. We don’t have to know the cause to be able to test for cyclicity, but that does help. A confirmed ‘oscillation’ necessarily implies some external forcing, therefore meaning the oscillation is an effect, not a cause. By my understanding, in part from Tamino’s interesting analysis of cyclicity in the AMO several years back (don’t have link handy), there is not even good evidence that the AMO exists exhibits regular cyclicity, let alone it there being evidence it is a forcer rather than an effect.

    (Also, sorry. sensu stricto, not verse vice-a).

  22. If this is just noise why do two of the ice cores give peaks at the same point (20 years)?

    [Response: Because they have the same noise. Noise isn't just measurement error, it's also random fluctuation in physical systems -- which we'd expect to be similar for nearby locations.]

    • I’m not sure what you mean by “random fluctuation”. Surely temperature change happens for a reason. [You flip a coin. It lands heads instead of tails for a reason. It's still random] Something is driving the particular shape of the fournier analysis in these two separate ice cores. And that shape indicates a peak oscillation at 20 years. [No it doesn't. That's the entire point of this post. Pay attention.] Isn’t that Chylek’s point? BTW
      the two ice cores are, I guesstimate, about 1500km apart and seem to be subject to very different local climates. [If their local climate is so different, why is Chylek using them as proxies for north Atlantic climate variability?]

      (And BTW the killer punch you deliver seems to be this “When I adjust the critical value to account for this, the given peak is no longer significant. Not even close. It just ain’t so.” But that’s maybe the only part of the analysis you don’t show the graph for. Can we see how bad the miss is?) [Since you seem unwilling to take my word for it, I'll find other uses for my time.]