Misalignment

I mentioned in this post that Ed Caryl’s “normalization” procedure was an erroneous way to align temperature station records when their time spans were not coincident. There seems to be some question about the reason why.


You can read Caryl’s description here:


After putting all the stations into one spreadsheet with the total year span in the leftmost column, and each station with its own column, aligned with the correct years, each station column was averaged using the SUM of the column divided by the COUNT of the cells in each column with data. Then the average of all the columns was computed. This number is then the average of all the temperatures in all the stations over the whole time period. Call that the “table” average.

The next step was to “normalize” the data for each station by subtracting the “table” average from each column average. This results in a normalization factor for each column. That normalization factor was then subtracted from each value in that column. The normalization factor will be different for each station.

Caryl’s procedure ensures that each station record, after adjustment, will have the same overall average. But that doesn’t mean they’re properly aligned! The reason is that if station records cover different time spans, then the relevant average (global, regional, whatever) may not be the same for those different time spans. In fact we rather expect that to be the case.

Allow me to illustrate. Let’s take some artificial data for a hypothetical planet which is warming, consistently and uniformly, at a rate of 2 deg.C/century (0.02 deg.C/yr). We’ll use three stations, the first in the far north, very cold, covering the time span 1900 to 1950. The second is at midlatitudes, covering the time span 1925 to 1975. The third is tropical, covering the time span 1950 to 2000. This is an imaginary planet (not the actual earth), so the temperature records show pure trend with no noise. And here’s the raw data:

Each record shows exactly the same trend — increase at 2 deg.C/century — so clearly this limited data set indicates overall warming also at 2 deg.C/century, which is consistent throughout the century.

If we didn’t align the records at all, instead simply estimating the trend from the raw data, we’d get a whopping 25.7 deg.C/century! That would probably be fatal, even on this imaginary planet. But it’s obviously not right — the different stations are at different locations, and the fact that the coldest station reported earliest while the warmest reported last is purely accidental.

Instead, let’s align them by Caryl’s method. This will reset them so that all station records have the same average, and that produces this:

This too is not right, and using artificial noise-free data makes that obvious. The station records should not have the same average value, because the “planet” was not at the same temperature during the different time spans they cover. Incidentally, trend analysis of this misaligned data indicates warming at a mere 0.68 deg.C/century.

The right way is to align them so that different station records have the best match to each other during their period of overlap. Using the “Berkely method” gives this:

Note that they’re aligned so well that the data points from different stations end up being plotted right on top of each other, since the planet is warming uniformly and there’s no noise in these data. And, just as it should, this aligned data set indicates warming at a rate of 2 deg.C/century.

Caryl’s method will generally tend to suppress trends, whether warming or cooling, biasing different time spans to have the same average value when that may not be the case. It’s only a problem when different data sets don’t cover exactly the same times of observation — but that seems to be a rather ubiquitous condition for real temperature data.

16 responses to “Misalignment

  1. An elegant exposition. Thanks.

  2. And the fake skeptics accuse the gubmint of manipulating data?? The fact that you can’t jam multiple stations together regardless of period of record should be intuitively obvious.

  3. It’s ironic that the website where Caryl’s method lies [is ‘lies’ the right word?] is called ‘notrickszone dot com’. Seems that Caryl’s method was intended to mislead the Public.

    The next question is of course who pays him for this?

    • I’m not sure it fair to assume it’s intentional. For several generations, now, university lecturers snd teachers have been plagued by the ExelEffect – kids who think because the numbers are in a spreadsheet and it gives out a value, the value must be true…
      I saw a post by another monkey somewhere claiming s/he’d calculated the temp rise as – something like 1.85356853326…. I’m pretty sure if you put a million of these chimps in front of a million spreadsheets for long enough they will eventually calculate π..

      • havinasnus,

        A quick glance at Caryl’s web page shows an advert to the Montford Delusion. Other web pages link to TWATTS, etc. There can be little doubt that the shoddy maths revealed by Tamino are almost certainly:
        a) not accidental
        and
        b) smoke and mirrors.

        Repeated liars do not deserve the benefit of the doubt.

  4. Gavin's Pussycat

    I notice that I cannot comment on this post without becoming abrasively, obnoxiously insulting to mankind in general and this specimen in particular. So I won’t.

  5. Why can’t you just extract the numerical derivative of each timeseries, average, and then re-integrate. Wouldn’t this obviate the need for any of these procedures?

    • Blimey. I read your comment and thought ‘don’t be stupid’. Then I thought ‘whoa’. Then my head started hurting.

      I *think* that would work. (And if you need to deal with annual cycles, you can treat each series as 12 separate series running in parallel). The place it gets clumsy is for series with gaps in them. If you just treat it is a new series at that point you lose a little bit of info. Otherwise … I haven’t quite figured out what goes after ‘otherwise’ yet.

      However, just treating each continuous fragment of record as a separate sequence and using your difference method might provide an easy way for a non-technical spreadsheet user to get a pretty good approximation to the right answer.

    • Mitch,

      I believe you are describing the first difference method. It’s got its problems. The least squares method and its variants are more robust.

      Peterson, T., T. Karl, P. Jamason, R. Knight, and D. Easterling, 1998:
      First difference method: Maximizing station density for the cal-
      culation of long-term global temperature change. J. Geophys.
      Res., 103, 25 967–25 974.

      Free, Melissa, James K. Angell, Imke Durre, John Lanzante, Thomas C. Peterson, Dian J. Seidel, 2004: Using first differences to reduce inhomogeneity in radiosonde temperature datasets. J. Climate, 17, 4171–4179.
      doi: 10.1175/JCLI3198.1

      • Chad:

        I wasn’t being particular about the way in which the derivative was extracted or the integration performed. But if you do something along these lines it seems to me that it removes the arbitrariness associated with the different procedures Tamino described in the post.

        Kevin C:

        How you deal with the gaps will depend I think on how you choose to extract the derivative. I wasn’t necessarily thinking of simple differences as the way to do it, but if you did do it that way, you could just interpolated values for the missing ones (say as a straight line) and then take the derivative of that. It won’t have any effect on the overall trend.

    • Mitch – one of my long term goals has been to come up with a way someone can do a realistic ITR calculation for themselves in a spreadsheet. I think your comment and Chad’s above, combined with Nick Stoke’s work on picking small sets of long-running stations which uniformly sample the globe, may provide the key to doing this. Thanks both!

    • OK, I tried it. It didn’t work.

      What not? Most likely reason, I included a bug. However there are a couple of problems with the method:
      1. If you interpolate across gaps, then you are introducing false autocorrelation, and also messing up the annual cycle. OK if you only want a trend, but not otherwise.
      2. If you don’t interpolate across gaps, then any bias in when the gaps appear biasses the results. So, for example, if spring months when delta-t’s are positive are systematically absent, you get a false cooling trend.

  6. Thank you! I’m slapping my head for not getting it earlier, but at least now i understand.

  7. Horatio Algeranon

    You put your column sums in,
    You get your column average out,
    You put your table average in,
    And you shake it all about.

    You do the hokey pokey
    and you turn yourself around
    That’s what it’s all about.

  8. I think you might be an excellent teacher, Tamino. If you had explained what was wrong with Caryl’s method in words, I probably would have failed to understand. But the longer explanation, with graphs and analysis made everything perfectly obvious.

    Thank you.