8,000 years of AMO?

I’ve often said that the evidence for actual periodic (or even pseudoperiodic) behavior in ocean cycles is sketchy at best. What are usually quoted as periods are better referred to as characteristic time scales. Furthermore, it’s all too easy to misinterpret period analysis (usually in the form of spectral analysis) even when estimating the values of, or assessing the existence of, characteristic time scales.

I don’t deny the existence of fluctuations (which I regard as a better description). Nor do I claim that they don’t show characteristic time scales — just that the evidence is often sketchy at best. As far as being actually periodic (in the sense that knowledge of the last few “cycles” enables us to make some useful prediction of the next, or the next few, “cycles”), I believe that they’re not. I could be wrong — but I’m still waiting for evidence.

A reader recently pointed to a fascinating paper (Knudsen et al. 2011, Tracking the Atlantic Multidecadal Oscillation through the last 8,000 years. Nat. Commun. 2:178 doi: 10.1038/ncomms1186) which studies a variety of data which are believed to be correlated to AMO (Atlantic Multidecadal Oscillation) over the last 8,000 years. They analyzed the data using windowed Fourier analysis, in which a time slice (“window”) of the data is Fourier-analyzed to gather information about the behavior for a particular moment (or better to say, interval), then the window itself slides through time to see how the behavior changes. It’s one of the more popular (and excellent) time-frequency methods, enabling us to look for periodic and pseudoperiodic behavior, as well as characteristic time scales, and investigate how they might change over time.

They also used the Lomb-Scarge periodogram as their Fourier method of choice. In my opinion it’s one of the best methods available (but I prefer Ferraz-Mello’s DCDFT, or date-compensated discrete Fourier transform), and has the tremendous virtue that it does not require a regular time sampling for the data. This eliminates the need to interpolate the data onto a regular time grid before spectral analysis — the Lomb-Scargle periodogram can directly analyze the raw data.

Their analysis uses a 2,000-year sliding window (although the window shrinks as they approach the present day), which enables them to estimate periods precisely enough. At each “step” the window is advanced 50 yr in time. Finally, they de-trended the data before spectral analysis.

Their results are illustrated with their figure 5:

In this graph, the plotted colors indicate statistical significance of various periodic components in the spectral analysis. Orange indicates 90% significance, red is 95% confidence, and dark red 99% confidence.

I looked at some of the data in some detail, and got results which are dramatically different from those reported by Knudsen et al. So I emailed Dr. Knudsen, who kindly replied in detail, which confirmed my suspicion that the difference was because we were using a different sense of the word “significance.”

Although their results are both correct, and computed according to standard practices, an extreme caveat applies. A result which is reported as passing 99% significance, does not mean that it’s actually a 99% confidence periodic result! It would be, if and only if the test were applied only to a single, precisely determined, pre-defined test period. But the spectral analysis tests a wide range of periods (i.e., of frequencies), covering at least the plotted frequency range from 0.01 to 0.02 cycle/yr (periods from 50 to 100 yr). This means that there are lots more chances to get an apparently “significant” result — just by chance.

A stated significance level of 90% actually means that we can expect 10% of spectral peaks to surpass that level even if the data are just noise. There’ll be enough peaks in as broad a frequency range as 0.01 to 0.02 cycle/yr, that we should actually expect to see such a threshold surpassed, more often than not. Therefore, the vast majority of the plotted responses may be only the response to random noise. That doesn’t mean that they are just a noise response — but it does mean that there’s insufficient evidence to conclude that they indicate genuine periodic (or pseudoperiodic) behavior with real confidence. Furthermore, even those few cases in which the response is strong enough that it does constitute real evidence, the real statistical significance is far less than indicated by the single-frequency significance level.

Consider, for instance, the d18O data for the Agassiz ice cap. At time 7000 yr BP (before present), there’s dark red indicating 99% confidence at period 66 yr. This means that the peak in the periodogram for the time span from 6000 to 8000 yr BP, passed the 99% confidence level for a single peak. In fact here’s the peak itself, in a DCDFT periodogram (which for these data, is nearly identical to the Lomb-Scargle periodogram):

But what are the odds of finding a peak that strong when we scan the frequency range from 0.01 to 0.02 cycle/yr? I generated 500 white-noise data series with the same time sampling as the Agassiz d18O data from 6000 to 8000 yr BP. Then I computed the strength of the strongest peak in the DCDFT spectrum over the frequency range from 0.01 to 0.02 cycle/yr. This sample of 500 simulated noise spectra enabled me to define the probability distribution for the strongest peak in this case, and therefore to define the true significance level for the result from the Agassiz ice cap. It turns out that the peak which passes 99% confidence for a single-frequency test, is only significant at 93% confidence when taken in the context of having scanned a range of frequencies.

I did similar tests (defining the probability distribution for the tallest peak by Monte-Carlo simulations) for the entire time span of the GISP2 d18O data. It turns out that all the plotted results fail to pass 90% significance except for a brief outburst of the 63-yr band between 6500 and 7000 yr BP.

I also analyzed the GISP2 d18O data using another popular time-frequency method, wavelet analysis (using the WWZ, Foster 1996, Astronomical J., 112, 1709). At first glance the result may look dramatically different from that of Knudsen et al.:

However, when I stretch the plots so they’re nearly on the same scale, we can see that the results are essentially the same:

If, however, I restrict the plot of the wavelet analysis to include only the response that is truly significant (taking into account having scanned a range of frequencies), we’re left with this:

Clearly, the evidence for genuine periodic or pseudoperiodic behavior — or even for fluctuation at a consistent time scale — is far less convincing than one might suspect from a cursory examination of the original graph.

Dr. Knudsen and his co-authors are aware of these issues. As he mentioned in his reply to my inquiry,

We are not completely happy about this way of describing significance. It may easily create a feeling by the average reader that significances are higher than they really are. But we have adhered to this standard used in other literature on the subject.

Honestly, I agree that their approach is perfectly valid, and that it is in accord with the way this kind of analysis is treated in the literature. I’ll also agree that it is easy for these results to be misinterpreted.

But I will emphasize that the results are less “significant” than they may appear at first sight, so they should be treated as more tentative than definitive. But hey, that’s the way science is. Nature shows us tentative results more often than definitive ones, and if we don’t heed such revelations, then we won’t learn as much about her workings as we could.

I will also point out that just as spectral analysis results must be viewed in full context, so too must be the overall analysis. As Dr. Knudsen also stated in his response to me,

… our interpretation deals with patterns. We believe that the total picture of lineaments contains the real significance of our study. In a way the “total significance” should relate to the statement: “How often would random data create a joint pattern with this degree of similarity and consistency”. Such a significance level is very difficult to quantify.
As we discuss only briefly in our paper, there are so many noise effects, errors and inadequacies working against the development of these patterns, so it is perhaps a bit of a surprise that the AMO has managed to create a pattern at all in these proxy records.

In the end, I remain more skeptical of the persistence of the AMO than Knudsen et al. But that’s because I’m a naturally skeptical time-series guy. Knudsen et al. remain fully aware of the uncertainties, and I doubt they would ever make definitive pronouncements that were unjustified. And, I fully recognize that they likely know a great deal more about the physical phenomenon that I do!

As for the true nature of the AMO (or as I’d prefer to call it, the “AMF” for “Atlantic Multidecadal Fluctuation”), time — and nature — will tell.


19 responses to “8,000 years of AMO?

  1. An interesting and useful look into the topic! Thanks.

  2. Steve Metzler

    tamino said:

    I generated 500 white-noise data series with the same time sampling as the Agassiz d18O data from 6000 to 8000 yr BP.

    There’s your problem. Should have used ‘trendless red noise’ with insanely high persistence instead. Of course, then your plots would be filled with hockey-stick shaped artefacts, instead of the ones in the plots above which look like birds, fish, ring worms, and t-rex. /sarcasm

  3. Gavin's Pussycat

    What about Gauss-Vanicek?

  4. Ray Ladbury

    If I could have all the fun-with-Fourrier guys read just one analysis, this would be it. It very neatly lays out the potential pitfalls of looking or periodicity in noisy data that at most exhibits quasi-periodic behavior.

    The tendency to over-estimate significance because of trying out several hypothetical frequencies reminds me a lot of the situation wrt significance of results in particle physics–and that is why particle physics requires 5 sigma (a factor a certain cancelled Czech thoroughly misunderstands).

  5. Halldór Björnsson

    How sensitive are your results to the choice of white noise model for the pseudo series? It is easy to argue that for oceanic time series a red model is more fitting. There was a flurry of papers on natural variability on the interdecadal timescale (period 40 – 60 years) following Delworth. etal (1993) paper. (These were modelling papers, and the argument was the usual one, what drives it, is it ocean only or coupled). Regardless, my point is that on physical grounds one should a priory expect “power” on the interdecadal time scale. – Which is not the same as saying that there is a cycle with that period, indeed it might make it harder to spot real oscillations.

    [Response: Unless the autocorrelation is very long-lived, it won’t increase the actual (compared to estimated) significance but will decrease it. And I actually tested the data for autocorrelation (after detrending), and found it to be quite small. But I didn’t analyze all their data, just the GISP2 and Agassiz d18O data sets.]

  6. You close with “time – and nature – will tell”. But with wildly fluctuating forcings, etc, it seems to me that whatever windows lend themselves to observing a regular periodicity will be obscurred by trend for the foreseeable future. Am I off-base? Could detrending the data really pull the signal out within, say, the next 100-200 years?

    [Response: I’d say yes, especially since the true nature of AMO is probably its contrast with patterns elsewhere on the globe. And in spite of my continual reminders of the pitfalls of data analysis, I firmly believe in its power to reveal.

    It is possible, however, that climate change could alter the very *nature* of the AMO. Let’s hope we’re still around to answer the question in a few centuries.]

  7. Tamino

    Very well said. The *only* thing that has periodic physics that anyone has found yet (besides Milankovitch forcing and the seasonal cycle) is ENSO. As a colleague likes to say of the PDO, it might be in the Pacific, but it is neither decadal nor oscillatory!

  8. The statistical part is IMHO very interesting – but in a more Feynman-like seek for a more elegant description – why would we be surprised that a 66-year period was found for a 2000 year window? It makes very good sense that a specific combination of forcings and feedbacks would be superimposed on the (far more stronger) 11/22-year oscillation period of the Sun with the result of an oscillation with a longer period that still ‘resonates’ at the 3rd (or 6th ?) harmonic. It might have been really unusual if a new ‘resonating period’ was something more odd – like 15 or 30 years – which would point to a very, very strong influence, but something that just resonates with a harmonic of the ‘major driver’ points IMHO to a minor feedback which might have resulted from a transient (but not necessarily random) combination specific for that time window.

    [Response: First — one of the points of this post is that the evidence for a 66-year cycle is weak, and even in the most generous interpretation (look at Figure 5 from Knudsen et al.) it’s not nearly as persistent as your reasoning implies).

    Second, where did you get the idea that some solar cycle is the “major driver’? There’s no evidence at all of a cycle near that length in solar output (and Knudsen et al. went looking for it). Your claim that anyone would expect a response which “resonate” at the 3rd or 6th harmonic — strikes me as mathturbation.]

  9. Thank you so much for taking the time to write about this paper.
    Their use of statistics is really over my head. But now I am beginning to understand a little of it.

  10. Horatio Algeranon

    AMO = Amaranthine Mathturbation Oscillation?

  11. So what is the relevance of the period you do see if your even more controlled example?

    [Response: Possibly none. For this analysis, I compensated for scanning a range of frequencies but not for scanning a range of times — so the genuine statistical significance is even less.

    Or possibly, that a genuine pattern existed, briefly, which mimics a periodic fluctuation but isn’t actually periodic.

    Or possibly, that a temporary but genuine periodic fluctuation happened at that time.]

  12. David B. Benson

    Or its just
    or something akin thereunto.

  13. John Brookes

    Thanks tamino – I actually understood most of that, particularly the issue of needing to be sure of exactly what is significant.

  14. You close with “time – and nature – will tell”, but why have they not done so until now. This is an excellent example of why you need a physical model to confirm a statistical analysis.

  15. Who knows, has comet Halley something to do with this … 77 years minus 66 is that sort of 11 year solar cycle again… call it the Halley-Whipple Tractor-Pull wave :-)

  16. Marion Delgado

    what about the arctic oscillation? Or do you mean the differences from normal for that, as we had last year?

  17. Halldór Björnsson

    Tamino, thank you for your response.

    Knudsen etal define the AMO as a timeseries of the difference of Atlantic temperature anomalies and global SST anomalies. This method, which differs from the traditional definition (see http://www.esrl.noaa.gov/psd/data/timeseries/AMO/) is supposed to take care of undue influence of global warming on the AMO index.

    However, they then cite correlations between AMO and various phenomena (precipitation patterns in N America, droughts in Africa, tropical storms etc), most of which (or pherhaps all) were derived using the traditional AMO index. It is not a priory obvious that the same conclusion apply for the new AMO index.

    I am not sure if this is a serious issue for the article since its conclusion is not likely to hinge on the exact definition of the AMO (especially not the conclusions on the spectral periods of the proxy timeseries).

    However, I find it irksome when subtle (or not so subtle) changes are made in the calculation of “standard” indices withouth changing the name.

    Furthermore, physical connections to this new index do not have as clear a physical explanation as to an index solely based on N-Atlantic temperature. As an example, Holland and Scott (2006) found that warm region in the subtropical N- Atlantic has extended eastwards in recent decades allowing more and stronger tropical storms to form. In view of this, it is not surprising that there is a correlation between the classical AMO and the frequency of N-Atlantic tropical storms. The physical mechanism is clear.

    However, the AMO as used by Knudsen etal can increase due to a drop in SSTs in some corner of the global ocean far away from the N-Atlantic. Thus, a correlation between this new AMO and N-Atlantic tropical storm frequency is harder to interpret.

    Finally, I like the traditional AMO because it contains the AGW signal in it. If you plot a time-latitude plot for the zonally averaged Atlantic SST anomalies from the extended Kaplan dataset (see link above) it is clear that the Atlantic mid century warming is mainly a N-hemisphere phenomena, and when cooler SSTs prevailed in the N-Atlantic from the 1960s – 80s the S-Atlantic was warmer than before. — (A pattern like this is sometimes referred to as a see-saw pattern).

    However, the recent warming period really stands out since the warming since the 1990s extends throughout the Atlantic. It is not just a hemispheric phenomena. This shows that the current warming is different from the mid 20th century warming in the N-Atlantic.

    From this point of view AGW has already changed the AMO.

  18. This is interesting stuff, and surely these are all part of a complex mix of redistribution effects , which are related to , but may or may not be out of sync with ENSO.
    Given that ENSO when considered in that region alone is roughly biphasic surely it is logical the redistribution effect is represented by another oceanic oscillation elsewhere.
    Now if this is not true global SSTs should simply follow the trend as per R+F paper but are they?

    [Response: Your comment makes no sense to me.]