I’ve often said that the evidence for actual periodic (or even pseudoperiodic) behavior in ocean cycles is sketchy at best. What are usually quoted as periods are better referred to as characteristic time scales. Furthermore, it’s all too easy to misinterpret period analysis (usually in the form of spectral analysis) even when estimating the values of, or assessing the existence of, characteristic time scales.
I don’t deny the existence of fluctuations (which I regard as a better description). Nor do I claim that they don’t show characteristic time scales — just that the evidence is often sketchy at best. As far as being actually periodic (in the sense that knowledge of the last few “cycles” enables us to make some useful prediction of the next, or the next few, “cycles”), I believe that they’re not. I could be wrong — but I’m still waiting for evidence.
A reader recently pointed to a fascinating paper (Knudsen et al. 2011, Tracking the Atlantic Multidecadal Oscillation through the last 8,000 years. Nat. Commun. 2:178 doi: 10.1038/ncomms1186) which studies a variety of data which are believed to be correlated to AMO (Atlantic Multidecadal Oscillation) over the last 8,000 years. They analyzed the data using windowed Fourier analysis, in which a time slice (“window”) of the data is Fourier-analyzed to gather information about the behavior for a particular moment (or better to say, interval), then the window itself slides through time to see how the behavior changes. It’s one of the more popular (and excellent) time-frequency methods, enabling us to look for periodic and pseudoperiodic behavior, as well as characteristic time scales, and investigate how they might change over time.
They also used the Lomb-Scarge periodogram as their Fourier method of choice. In my opinion it’s one of the best methods available (but I prefer Ferraz-Mello’s DCDFT, or date-compensated discrete Fourier transform), and has the tremendous virtue that it does not require a regular time sampling for the data. This eliminates the need to interpolate the data onto a regular time grid before spectral analysis — the Lomb-Scargle periodogram can directly analyze the raw data.
Their analysis uses a 2,000-year sliding window (although the window shrinks as they approach the present day), which enables them to estimate periods precisely enough. At each “step” the window is advanced 50 yr in time. Finally, they de-trended the data before spectral analysis.
Their results are illustrated with their figure 5:
In this graph, the plotted colors indicate statistical significance of various periodic components in the spectral analysis. Orange indicates 90% significance, red is 95% confidence, and dark red 99% confidence.
I looked at some of the data in some detail, and got results which are dramatically different from those reported by Knudsen et al. So I emailed Dr. Knudsen, who kindly replied in detail, which confirmed my suspicion that the difference was because we were using a different sense of the word “significance.”
Although their results are both correct, and computed according to standard practices, an extreme caveat applies. A result which is reported as passing 99% significance, does not mean that it’s actually a 99% confidence periodic result! It would be, if and only if the test were applied only to a single, precisely determined, pre-defined test period. But the spectral analysis tests a wide range of periods (i.e., of frequencies), covering at least the plotted frequency range from 0.01 to 0.02 cycle/yr (periods from 50 to 100 yr). This means that there are lots more chances to get an apparently “significant” result — just by chance.
A stated significance level of 90% actually means that we can expect 10% of spectral peaks to surpass that level even if the data are just noise. There’ll be enough peaks in as broad a frequency range as 0.01 to 0.02 cycle/yr, that we should actually expect to see such a threshold surpassed, more often than not. Therefore, the vast majority of the plotted responses may be only the response to random noise. That doesn’t mean that they are just a noise response — but it does mean that there’s insufficient evidence to conclude that they indicate genuine periodic (or pseudoperiodic) behavior with real confidence. Furthermore, even those few cases in which the response is strong enough that it does constitute real evidence, the real statistical significance is far less than indicated by the single-frequency significance level.
Consider, for instance, the d18O data for the Agassiz ice cap. At time 7000 yr BP (before present), there’s dark red indicating 99% confidence at period 66 yr. This means that the peak in the periodogram for the time span from 6000 to 8000 yr BP, passed the 99% confidence level for a single peak. In fact here’s the peak itself, in a DCDFT periodogram (which for these data, is nearly identical to the Lomb-Scargle periodogram):
But what are the odds of finding a peak that strong when we scan the frequency range from 0.01 to 0.02 cycle/yr? I generated 500 white-noise data series with the same time sampling as the Agassiz d18O data from 6000 to 8000 yr BP. Then I computed the strength of the strongest peak in the DCDFT spectrum over the frequency range from 0.01 to 0.02 cycle/yr. This sample of 500 simulated noise spectra enabled me to define the probability distribution for the strongest peak in this case, and therefore to define the true significance level for the result from the Agassiz ice cap. It turns out that the peak which passes 99% confidence for a single-frequency test, is only significant at 93% confidence when taken in the context of having scanned a range of frequencies.
I did similar tests (defining the probability distribution for the tallest peak by Monte-Carlo simulations) for the entire time span of the GISP2 d18O data. It turns out that all the plotted results fail to pass 90% significance except for a brief outburst of the 63-yr band between 6500 and 7000 yr BP.
I also analyzed the GISP2 d18O data using another popular time-frequency method, wavelet analysis (using the WWZ, Foster 1996, Astronomical J., 112, 1709). At first glance the result may look dramatically different from that of Knudsen et al.:
However, when I stretch the plots so they’re nearly on the same scale, we can see that the results are essentially the same:
If, however, I restrict the plot of the wavelet analysis to include only the response that is truly significant (taking into account having scanned a range of frequencies), we’re left with this:
Clearly, the evidence for genuine periodic or pseudoperiodic behavior — or even for fluctuation at a consistent time scale — is far less convincing than one might suspect from a cursory examination of the original graph.
Dr. Knudsen and his co-authors are aware of these issues. As he mentioned in his reply to my inquiry,
We are not completely happy about this way of describing significance. It may easily create a feeling by the average reader that significances are higher than they really are. But we have adhered to this standard used in other literature on the subject.
Honestly, I agree that their approach is perfectly valid, and that it is in accord with the way this kind of analysis is treated in the literature. I’ll also agree that it is easy for these results to be misinterpreted.
But I will emphasize that the results are less “significant” than they may appear at first sight, so they should be treated as more tentative than definitive. But hey, that’s the way science is. Nature shows us tentative results more often than definitive ones, and if we don’t heed such revelations, then we won’t learn as much about her workings as we could.
I will also point out that just as spectral analysis results must be viewed in full context, so too must be the overall analysis. As Dr. Knudsen also stated in his response to me,
… our interpretation deals with patterns. We believe that the total picture of lineaments contains the real significance of our study. In a way the “total significance” should relate to the statement: “How often would random data create a joint pattern with this degree of similarity and consistency”. Such a significance level is very difficult to quantify.
As we discuss only briefly in our paper, there are so many noise effects, errors and inadequacies working against the development of these patterns, so it is perhaps a bit of a surprise that the AMO has managed to create a pattern at all in these proxy records.
In the end, I remain more skeptical of the persistence of the AMO than Knudsen et al. But that’s because I’m a naturally skeptical time-series guy. Knudsen et al. remain fully aware of the uncertainties, and I doubt they would ever make definitive pronouncements that were unjustified. And, I fully recognize that they likely know a great deal more about the physical phenomenon that I do!
As for the true nature of the AMO (or as I’d prefer to call it, the “AMF” for “Atlantic Multidecadal Fluctuation”), time — and nature — will tell.