Decadal Variations and AMO, Part I

In the preceding post we examined criticism of one of the Berkeley team’s papers, Decadal Variations in the Global Atmospheric Land Temperatures (Muller et al. 2011, hereafter referred to as M2011), by a fake skeptic. Now I’ll offer my own critique.

The more I study this paper, the less I like it. There are a few serious problems and some bogus numbers. They’ve taken steps which don’t invalidate analysis but do make it a helluva lot harder. And there are unanswered questions which are relevant to the central theme. In fact the more I think about it … maybe I was to hard on Doug Keenan? Nah.

I have quite a lot to say about this paper, so I’m going to do so in two posts. In this, the first, I’ll address some of the statistical issues as well as one of the central results.

The paper computes correlation (actually cross-correlation, which allows for a lag between the factors) between land-only temperature and ocean oscillation indexes, including AMO (Atlantic Multidecadal Oscillation), ENSO (el Nino Southern Oscillation), PDO (Pacific Decadal Oscillation), NAO (North Atlantic Oscillation), and AO (Arctic Oscillation). The temperature data they used for this comparison was an average of four data sets, the land-only data from NASA GISS, NOAA/NCDC, HadCRU, and a Berkely analysis based on a random sample of data sets which were not used in any of the other temperature estimates. Because of greater variance in data prior to 1950, the analysis is restricted to the time span 1950 through 2010.

As for the statistical analysis, for one thing, there isn’t enough of it — some of the most important questions aren’t addressed at all. Also, in spite of the use of Monte Carlo tests to compensate for the quirks in the data, the standard errors they quote aren’t based on such tests. Instead they’re naive, and simply in error. Further, there are some issues with the Monte Carlo tests themselves. And although it’s by no means forbidden to smooth data before analysis, doing so imposes the severest standard of care and rigor, a standard which I’m beginning to suspect simply hasn’t been met.

M2011 specifically examine variations on time scales from 2 to 15 years. Therefore before analysis, all data were subject to two filtering steps. First, a 12-month moving-average filter was applied. This “smoothing” step effectively acts as a low-pass filter, removing rapid fluctuations but leaving slower ones in place. Then, a 5th-degree polynomial fit was subtracted from the data, to produce a “pre-whitened” data set which was the object of all subsequent analyses. Subtracting the low-order polynomial effectively acts as a high-pass filter, removing the very slow fluctuations. Finally, the time series were rescaled, in fact they were normalized so they would all have the same total variance (i.e., all be on the same “scale”).

Eliminating slow fluctuations was motivated by a desire to remove the long-term trend due to global warming or other influences. This is important because the time series are very “red” — meaning the variance is dominated by slow fluctuations. If it’s not removed, then it will overwhelm the variations at medium timescales which M2011 want to study. But eliminating fast fluctuations seems to me unnecessary — it really doesn’t interfere with medium range fluctuations — and has the severe drawback of introducing artificial autocorrelation into the data. I would have omitted this step, instead turning the focus onto fluctuations on time scales up to (but not greater than) 15 years.

Perhaps their choice was motivated by the fact that the time series show a significant (but not necessarily large) annual cycle. This is because even those time series which are computed as anomalies (like temperature and AMO) have had the average annual cycle during the baseline period removed. However, the annual cycle itself is subject to fluctuations, so removing its average form during a baseline period leaves a residual annual cycle, which is especially noticeable because the analysis data covers the time period 1950 through 2010, during which the various annual cycles can deviate from their averages during the “baseline period” used for anomaly calculation.

The 12-month moving-average filter eliminates fluctuations with a period of 1 year. I would instead have removed any residual annual cycle directly (say, by fitting a Fourier series) rather than invite the extreme complications due to artificial autocorrelation from the moving-average filter. However, their choice is a valid one and does concord with their intention to study fluctuation in the time scale range 2 to 15 years, although it imposes extreme demands on the statistical analysis — especially as the time series already have strong autocorrelation even before the moving-average step. For some questions addressed by the paper those demands are perhaps met, but for others they’re not.

I would also have removed the slow fluctuations by a method other than 5th-degree polynomial fit because polynomials of even modest order can show undesirable behavior near the ends of the time series — I would probably have used a lowess smooth instead. But again, their choice is a valid one.

The main conclusion is that land-only global temperature correlates more strongly with AMO than the other indexes, even ENSO. For correlation of temperature with AMO, M2011 report a peak correlation at lag 0, with value r=0.65 +/- 0.04. The quoted “+/-” value is said to refer to 1 standard error (so a 95% confidence interval, in the normal approximation, would be plus or minus twice that value). However, the quoted value for the standard error is just plain wrong. That’s the value which would apply if the input data series were white noise, but they’re not — even before the moving-average filter is applied and the autocorrelation is even stronger after its application. Even if one omits the moving-average filter (which I think is a much better idea), the standard error according to a crude calculation should be at least twice as large. With the moving-average filter, by a crude calculation the standard error is about 4 times as large, and considering all the uncertainties present — the impact of the moving-average filter, the crudeness of the calculation — it could well be even more than that. It is possible that M2011 simply reported the standard errors which were spit out by whatever program computed the correlations, which would be based on a white-noise model. Regardless, all the quoted standard errors for correlations are simply bogus, far too low.

Fortunately they don’t rely on those values for statistical tests. Unfortunately, they don’t actually test the main claim, that the AMO correlation is stronger than the ENSO correlation. Therefore the primary result of M2011 is not demonstrated by their analysis.

They do directly test whether or not the AMO correlation is statistically significant. This is done by Monte Carlo simulations, i.e., generating multiple artificial AMO signals with the same basic properties in order to see how they correlate with the temperature data. To get artificial signals with the same properties as the actual (smoothed) AMO data, they note that there are 16 times at which the smoothed AMO signal crosses the zero line headed upward. They split the AMO data at these 16 points, which creates 17 segments of data. These 17 segments are then “scrambled,” i.e., arranged into a random order, to generate artificial data with behavior (such as autocorrelation) which is like that of the original smoothed AMO, but is not the same as the original smoothed AMO.

On the face of it this sounds like a sound procedure, especially since it would seem to guarantee that where different segments are joined together they will both represent an upward crossing of the zero line so the scrambled signal should maintain “continuity” and be a realistic “fake” smoothed AMO. But that’s not quite true, because the two endpoint segments are incomplete, they’re not full “one upward zero crossing to the next” sections. Here, for instance is one such “scrambling” of the AMO series (to which I’ve applied the same 12-month moving average filter and 5th-degree polynomial filter, but I didn’t bother to normalize the series, which makes no difference for this particular point):

I’ve marked with red arrows where the section 17 ends, and where section 1 begins, because at those points there’s an unreal discontinuity in the artificial signal. Perhaps a better idea would have been to keep the first and last sections in place, and randomly scramble the 15 sections in between.

I have another misgiving about their Monte Carlo procedure. Because of the combination of strong autocorrelation in the original data, and additional autocorrelation introduced by the moving-average filter, the autocorrelation of the smoothed AMO data persists to large lags. Therefore scrambling the “one upward zero crossing to the next” sections might alter the autocorrelation structure significantly at lags for which it still has an impact on the analysis results. I haven’t done the analysis to prove this, but my intuition tells me that the scrambling process is courting trouble.

And it’s all because the autocorrelation is made so strong, and so persistent, by the moving-average filter. Clearly M2011 are aware of the fact that a simple white-noise test is insufficient, and it’s true that Monte Carlo tests can overcome a great many severe difficulties without even having to know what the noise structure is. But I suspect that M2011 didn’t really appreciate just how severe the problems become with the application of their moving-average filter, and may not have invested careful enough thought into designing tests which would be proof against the severity of the problem.

In spite of the difficulties with the Monte Carlo tests, I don’t think they’re hampered so much as to be useless for testing the significance of the AMO correlation. This is especially true as the indicated p-value from their tests is less than 0.000001, which would be significant even if it were too low by four orders of magnitude! Therefore I accept the result that the correlation with AMO is statistically significant.

But that doesn’t establish that the AMO correlation is stronger than the ENSO correlation. It’s like testing two coins to determine whether they’re “fair” (i.e., have equal chances of landing heads or tails). Suppose we flip each one 100 times. Coin “A” gives 62 heads and 38 tails, which rejects the null hypothesis (fair coin) with statistical significance. Coin “B” gives 58 heads and 42 tails, which fails to reject the null hypothesis. But that does not establish (with statistical significance) that coin “A” has a higher “heads” probability than coin “B”. They could very well both have a 60%-40% chance of heads over tails.

M2011 don’t give a graph of the cross-correlation function (CCF) between AMO (or other indexes) and their average temperature record, but they do graph AMO correlation with each component temperature record. Here, for instance, is their graph of the CCF for AMO and the various temperature records:

It’s worth noting that the various CCF’s don’t all peak at lag 0. An extreme closeup shows that for some (most?) of the temperature records, the peak happens at a very small negative lag (click any of the graphs for a larger, clearer view):

I don’t have their Berkeley temperature record, but I filtered the AMO and the land-only data from NASA GISS by their method and computed the CCF, giving this (my plot covers a smaller range of lags, since I already know from their analysis about where the peak lies):

For the GISS temperature data, the peak correlation is at lag -1 months, meaning that temperature leads AMO rather than lags it. This argues against causality from AMO to temperature, but only very (very!) weakly since the difference between the lag 0 correlation and lag -1 correlation is so tiny.

I did the same computation, but without the 12-month moving-average filter:

Now the peak correlation is at lag -2 months (again temperature leads AMO) and the difference from the lag 0 correlation is larger. I think this suggests two things. First, it’s yet another reason it may have been better to omit the moving-average filter. Second, the argument against causality from AMO to temperature is stronger. It’s still very weak — but based only on the time series, the argument for causality is even weaker.

The correlation between ENSO and temperature is not as strong as that between AMO and temperature. But it does peak at a realistic lag for a causal relationship, as temperature lags ENSO by 5 months:

For this computation I used the MEI (multivariate el Nino index) to characterize ENSO, rather than the Nino 3.4 index used by M2011. Even so, using their choice doesn’t alter the result substantively — the correlation is weaker than with AMO (probably!), but the lag is causally realistic.

I also computed the CCF between ENSO and AMO. The peak correlation is at lag 8 months and is much stronger than the lag 0 correlation, so if the relationship is causal then it’s ENSO driving AMO rather than the other way around:

Given all these considerations, I find implausible the speculation of M2011 that:

However, it is also interesting to consider whether oceanic changes in the AMO may be driving short-term fluctuations in land surface temperature.

I agree that it’s interesting to consider! But I consider it to be implausible. It’s hardly impossible, but it seems more likely to me that the correlation of AMO with land-only temperature reflects a common cause rather than causality from AMO to temperature.

There are other issues which I’d like to discuss, so there’s lots more to say. But this post is already long enough — so those will appear in Part II (coming soon).

21 responses to “Decadal Variations and AMO, Part I

  1. I don’t think you can use just M2011 – more like M2011Var, M2011Avg, M2011Qty and M2011UHI :)

  2. You can find the Berkeley Data here:

    I think the answers to these questions are likely impacted by the form of AMO they use. i.e. Trenberth has a version, as do NOAA and so does another person. They can be found on the KNMI climate explorer… The issue is that I think BEST used NOAAs version which is just detrended SSTs and likely has the anthropogenic signal inside it. This could significantly impact the analysis. A better way of calculating it would be to subtract the North Atlantic SSTs by the rest of the Atlantic SSTs or something similar.

  3. much appreciated Tamino, AMO is Muller’s pet theory ?

  4. Alexander Johannesen

    Have I told you lately that I (platonically) love you? Great analysis, and highly educational, eagerly awaiting part II.

  5. David B. Benson

    Good. There are plenty of papers demonstrating the effect of ENSO on climate (weather?). There are plenty of reasons to doubt that the standard AMO product represents much of anything of interest. Several authors have abandonded it for their own variant based on the North Atlantic gridded data.

  6. Perhaps these errors are part of the Koch-funded plan. They know they can’t really poke holes in the temperature record, so they’ve come out with a study that confirms it, but that is so poorly done that scientists and statisticians will have to poke holes in it for poor procedure. And then they can claim “scientists say study confirming warming is not valid!”.

    [Response: The decadal-variations analysis doesn’t reflect on the quality of the Berkeley temperature reconstruction itself, which would depend on the paper about their averaging process (lead author Robert Rohde).]

  7. I skimmed through the paper and couldn’t find what definition of the AMO they used.

    I think normally a linear warming trend is subtracted from Atlantic SSTs to (1880-2010 or so) to get the AMO index. Might produce a rather big slope in the last few decades.

    [Response: Indeed they don’t detail the definition, but their references give a link to the old “standard” definition (linearly detrended) data at

    I assumed that’s what they used. It also matches the graph they display.]

  8. Tamino, All your points about autocorrelation are exactly what I thought, on my first read. We didn’t go into these details in our RC post, so I’m glad you’ve done this. As you wrote, “I accept the result that the correlation with AMO is statistically significant. But that doesn’t establish that the AMO correlation is stronger than the ENSO correlation.” Exactly.

  9. Great display of the way science works.

    The BEST team have several enormous names attached to the project. And yet, the actual work is combed for flaws.


  10. Dunno if this is one of the ones Robert was referring to above, but the AMO-Atl index by Guan and Nigam (2009) is interesting in this context because it directly excludes Atlantic temperatures that are driven by Pacific temperature patterns such as ENSO.

    Guan, B., and S. Nigam, 2009: Analysis of Atlantic SST variability factoring interbasin links and the secular trend: Clarified structure of the Atlantic Multidecadal Oscillation. J. Climate, 22, 4228-4240, doi:10.1175/2009JCLI2921.1

  11. Timothy (likes zebras)

    These issues sound like the sort of thing that would – hopefully – get picked up on during peer review. Perhaps the final paper will address your comments?

  12. I don’t have your stats chops so took their results as given, but I did notice that AMO tended to lag land surface temperatures. As I said in the other thread, the AMO-Tland correlation is very strong in the early-90s when temperatures were dominated by the effects of the Pinatubo eruption, which also suggests a non-causal interpretation. I agree that their speculation/conclusion concerning AMO is unjustified.

    On the other hand, the lag statistics are averages are they not? So it’s statistically feasible that some, but not most, of the fluctuation could be driven by some internal changes that are being picked up in the AMO index?

    I was curious about this passage from the Muller’s paper: ENSO
    is locally a more intense effect, but it is also a more complex one giving rise to both correlated and anti-­‐correlated behavior. By contrast, the AMO map shows positive (or neutral) correlation nearly everywhere.

    Isn’t the description of ENSO here exactly the behaviour you would expect from a driver of climate variability with a localised source? And the description of AMO what you would expect from regional temperatures which are simply following the general global trend?

  13. Horatio Algeranon

    Pehaps a little off topic, but from the same paper:

    What’s with the very large peak in the BEST temperature curve around 1968 in Fig 1?

    Between 1968 and 1970, BEST indicates a change of about -0.8C when the other records only show a change of between -0.3C and -0.4C.

    It is possible that the large peak in 1968 is purely a result of the random procedure used by BEST to select the sites used to generate the record (they say 2000 sites are slected randomly from among 30,964 stations worldwide), but if that is the case, to remain in the final average, most of the (randomly) selected sites would have to share the uncommonly large peak for the year 1968.

    There was an El Nino in 1968-1969 but it was relatively weak (followed by a moderate La Nina in 1970) and the BEST peak shown in Fig 1 for 1972 (which was actually a strong El Nino year followed by a strong La Nina) is less than half the height o f the peak in 1968.

  14. OK, the only clear definition I’ve seen of AMO defines it as the detrended Sea Surface Temperature anomaly of the North Atlantic Ocean (between 7.5 and 75 deg W and 0 and 60 deg N). I understand that this definition is perhaps a bit out of date, but I haven’t been able to find anything better.

    I crudely estimate this region to be on the order of 8% of the entire surface of the Earth. As such, I’m can’t understand why anyone should be surprised that the AMO should be well correlated with the residual detrended signal of the global surface temperature anomaly.

    If I was to define a “Global Multidecadal Oscillation” as the detrended surface temperature anomaly of the global surface temperature anomaly, it would be ridiculous to act surprised that this GMO could explain 100% of the variation in the detrended global mean surface temperature signal.

    So why does anyone act surprised that the temperature anomaly of 8% of the surface of the Earth is well correlated with the temperature anomaly of 100% of the Earth?

    Or am I missing something fundamental here?

    [Response: No. You’re hitting something fundamental there.]

    • This comment offers a lot of clarity where typically there is not much of it–and I hasten to add that the set I am referring to is “discussions of AMO,” not “comments by Ernst!” ;-)


    • Also, wouldn’t this apparent 1-2 month lag between land temperature and AMO be equally unsurprising, given that AMO is measuring the temperature of an area covered in water, which has a relatively high thermal inertia compared to the land? If the land is responding to the same forcings as the ocean (roughly), then the ocean temperature’s response will be damped and delayed relative to the land. So it would seem that there is a lag built into the physics, just not a very big one, and not in the direction Muller was gunning for.

  15. Its also worth pointing out that the AMO record is linearly detrended over a century-long period, and thus will still contain quite a bit of forced response over the past 50 years (since forcings have not been linearly increasing over the century). I examined this in detail awhile back:

  16. Horatio Algeranon

    For the GISS temperature data, the peak correlation is at lag -1 months, meaning that temperature leads AMO rather than lags it. This argues against causality from AMO to temperature, but only very (very!) weakly since the difference between the lag 0 correlation and lag -1 correlation is so tiny.

    Given the uncertainty in the correlation, doesn’t the mere fact that the correlation peak is “nearly” centered on lag zero simply mean that one can not conclude with any confidence that there is a causal relationship either way?

    In other words, the correlation uncertainty translates into a lag uncertainty +- which also includes 0 lag. So, even if AMO appeared to lead temp by 1 month it would not really change anything with regard to conclusions about possible causation.

    You state that BEST has underestimated the uncertainty and inflated the cross correlation — and also apparently “distorted” the cross correlation graph a bit– but even using their estimate for 2-sigma (~0.08 where 1 standard error is 0.04) with their cross correlation graph would seem to allow for the possibility of a lag of about +- 3 months.
    ie, that either temp could lead, temp could follow or neither lags the other.

    Or perhaps this is not the correct way of looking at this?

    Also, since there always has to be some lag between an “effect” and its “cause”, any guess about the minimum for this case?

    In other words, in your opinion, given the thermal masses involved (particularly of water), would it be plausible that a short lag — one month say,– could be sufficient for a cause effect relationship 9in either direction)?

    BEST specifically states that “correlation dose not imply causation” but the sentence you quote raising the “possibility” that AMO might influence land surface temps over the short term coupled with their statement that “it is possible that some of the land warming is a direct response to changes in the AMO” along with their comments about what they say is a 70 year cycle in AMO would seem to imply that they are “raising the possibility” that AMO may have contributed to global warming over the past century.
    In other words, they seem to be using the following argument: given AMO is “so closely” (their words) linked to world land temps over the short (2-15 year) term, perhaps it has influenced global temp over the long term and hence accounts for some of what has been attributed to fossil fuel burning.

    Or perhaps this is a misinterpretation of what BEST has said?

    Finally, it is interesting that their “so closely linked” comment is based on their 0.65 correlation when you have shown that this has been inflated by the moving average and that the correlation is probably significantly lower ( ~0.4). They make a point in their paper that the Monte carlo simulation with the artificial AMO never gave a value above 0.49 for the correlation, “substantially less than the value of 0.65 obtained with the real AMO”. But a value of 0.4 would at least fall below the max value given by the Monte carlo tests.

    • It might be an interesting exercise to compute an AMO-like index for random patches of similar extent (~8% of the globe, per Ernst K) and then do the same correlation analysis. It seems like a good bet that the North Atlantic is not the only area that’s “so closely linked” to global land temperatures.

      The risk of doing this, I suppose, would be that when the obviousness becomes obvious, the fake skeptics might adopt this new analysis wholeheartedly. “It has now become clear that the Central European Oscillation is what’s really driving global climate!” It would certainly fit the pattern: first as farce, then as farce again, and repeat until you finally arrive at tragedy.

  17. Using anomoly data detrended by a 2nd order polynomial fit I get the following results:

    – AMO lags NINO3.4 by 17-19 months
    – CRUTEM3 lags NINO3.4 by 3-4 months
    – AMO lags CRUTEM3 by 9-12 months

    The lags don’t quite add up, but close enough. It just reinforces that AMO lags land temperatures.

    Here’s the turnkey R source code:

  18. Tamino,
    I think that if I’ve got my code right then your results regarding the lag of AMO vs BEST depends on the version of AMO you use. I could have done something wrong but using versions based on Trenberth’s AMO and the other version on climate explorer (and only analyzing from 1880 to 2009) I find the AMO leading Temperature without detrending and with it. I will admit I only did linear detrending but I think the point should be that using the old AMO version that others have argued against using is probably inappropriate for this type of analysis on the part of mullers and that your replication may have different results with different AMO versions.