In this post I’d like to examine another claim made in one of the Berkeley papers, that there is a periodic fluctuation in the AMO (Atlantic Multidecadal Oscillation) with period about 9.1 years.
The claim is based on Fourier analysis of the AMO time series:
It should be noted, however, that this isn’t the raw AMO time series. First, it’s only the data since 1950. Second, it has been smoothed with a 12-month moving-average filter and detrended with a 5th-degree polynomial. When I do the same thing, then compute a Fourier transform, I get a similar result:
Although the two Fourier spectra look similar, you may have noticed that the actual numerical power values are dramatically different. That’s because Muller et al. (2011, hereafter M2011) used the normalization convention which is customary in geophysics, for which the total integrated power in the power spectrum equals 1, whereas my program uses the scaling customary in astronomy, where the average power equals 1. It’s just a different choice of scaling.
We both used Fourier transforms with no taper. However, M2011 padded the time series with zeros in order to oversample the periodogram. This enables one to identify a frequency more precisely. After all, the plain Fourier transform at the “default” frequency spacing looks like this:
I simply oversampled the periodogram by a factor of 4, since my program has a feature to do that without zero-padding. We probably didn’t oversample the periodogram at the same rate — but it doesn’t make much difference.
In any case, clearly there’s a peak at frequency 0.11 cycles/yr (period about 9 years). But is it significant of periodic behavior? To answer that question, M2011 perform a Monte Carlo analysis. They noted that there are 16 points at which the smoothed, detrended time series crosses the zero line headed upward. This divides the time series into 17 segments. Then they created 10,000 artificial time series by randomly permuting the 17 segments and computed the power spectrum of each. According to M2011, only 170 of 10,000 such artificial series gave a spectral peak at least as strong as that observed in the actual AMO data. Hence they conclude that there’s only a 170/10000 = 1.7% chance of this peak being accidental, i.e., that it is statistically significant at 98.3% confidence (p-value 1.7%).
I’m not convinced that the period is real. I’m not convinced it isn’t real either. On the face of it, a p-value of 1.7% is strong evidence. However, I reproduced their test exactly as it was described. I took the same AMO data from 1950 to the present, applied a 12-month moving-average filter, and subtracted a 5th-degree polynomial fit to create a smoothed time series. Then I split it into 17 segments at the 16 points at which it crosses the zero line headed upward. Then I generated 10,000 random permutations of those segments to create artificial time series, compute the Fourier spectrum, and noted the power level of the strongest peak for each artificial data set. I got a very different result from exactly the same test, that 461 out of 10,000 artificial data sets showed a power level at least as high as that observed for the real data.
I believe I’ve interpreted their method correctly, and I’ve double-checked my program for errors. But my result is flatly incompatible with theirs. One of us has made a mistake somewhere. It might be me.
But if any interested reader has the “skillz” and wants to try it, please do. I think it would be a worthwhile effort — just be sure to follow exactly the same procedure.
Even if my result is correct, that only increases the p-value to 4.61% so it’s still significant at 95.4% confidence. But frankly, I don’t completely trust the Monte Carlo simulation. I don’t completely distrust it either, but I don’t think it’s precise enough to be reliable at a “borderline” confidence level. Here’s why.
The autocorrelation of the AMO data is very strong. Even if you omit the 12-month moving-average filter, the lag-1 autocorrelation is 0.857 (very strong)! In fact the autocorrelation structure is very much like that of an AR(1) noise process. So, I generated 10,000 artificial data sets of an AR(1) noise process with autoregression coefficient 0.857. Then I applied a 12-month moving-average filter followed by subtracting a 5th-degree polynomial fit. Then I noted the highest power level in a Fourier spectrum.
For each artificial data set, I also split it into segments at each upward crossing of the zero line, randomly permuted the segments, then recorded the highest power level in the Fourier spectrum of the permuted artificial data set.
Here’s the distribution function (not the probability density function or “pdf” but the cumulative distribution function or “cdf”) from the non-permuted (in black) and the permuted (in red) data sets:
The vertical line marks the power level (in my analysis) of the actual AMO data. Here’s a close-up:
Note that the permuted data sets consistently give higher significance levels than the unpermuted. The “significance level” of the actual AMO peak power for the unpermuted data is only 90.5% confidence, but for the permuted data it’s 93.7% confidence. Hence the process of “splicing and dicing” the artificial data reduced the p-value to only 2/3 of its actual value, inflating the significance level of the peak. It seems to me to be possible, even likely, the the Monte Carlo test used by M2011 likewise inflated the significance level.
If I correctly reproduced their method, then the indicated p-value according to the Monte Carlo test is 4.6% rather than 1.7%. If the Monte Carlo test inflated significance, then it’s really more like 7%, which is only 93% confidence. I quite agree that those are some nontrivial “if”s. I also agree that 93% confidence is still genuine evidence of periodic behavior, although it’s not as strong as reported in M2011.
Incidentally, one can compute the Fourier spectrum without the 12-month moving-average filter (which I regard as wholly unnecessary and introduces tremendous, unnecessary complications). There’s still a peak at the given period:
However, when I generated artificial AR(1) noise series with the same autocorrelation but without the 12-month moving-average filter, then used them to define the likelihood of a periodogram peak at the observed power level, the peak is only significant at the 90% confidence level. Again, that’s some evidence of periodic behavior but isn’t nearly as strong as reported in M2011.
Of course, there’s more data for the AMO than used by M2011, since the data linked at the beginning of this post start in 1856 rather than 1950. I detrended the entire data set with a lowess smooth:
Then I Fourier analyzed the entire time span:
The given peak is still present, but it’s still only about 90% significant. We can find the same peak if we use only the data prior to 1950, but it’s weaker, and it’s actually broader:
This suggests that possibly the fluctuation is present but its characteristics are variable. So — time for a wavelet transform!
Fluctuation at the given frequency is somewhat persistent, but there are also times when it seems to be absent on the whole. In addition, the frequency as well as strength seems somewhat variable, as shown by a closeup of the wavelet transform:
Here’s my bottom line: there seems to be a persistent fluctuation with period about 9 years, but both the period and the strength are variable. In fact the period seems to have been a good bit longer (frequency lower) prior to 1900 (although I can’t testify about the data quality from that era). Whether to call it a “period” or a “characteristic time scale” is, I think, still an open question — but on the whole I’d say “periodic” is not a bad description. There’s a good case for periodicity, but it’s not yet proved.