Some people ask “Why use temperature *anomaly* rather than just temperature?” because they are curious, maybe even confused about it, and want to learn. Some climate deniers claim that doing so is a mistake which invalidates trend analysis, in part because they’re stupid, in part because they’re members of the “pompous ass” club.

For the sake of those who really want to know: why do we use anomaly? What exactly is anomaly, anyway?

Suppose we want to know whether or not the climate in the state of Maine is getting hotter, colder, or showing no significant change. We might study monthly average temperatures from 1895 to the present (data available from NOAA, the National Oceanic and Atmospheric Administration). Here’s a graph of that data:

I’ve added a thick red line, which is an estimate of the linear trend using least squares regression. But the most obvious feature isn’t any trend, it’s the constant up-and-down seasonal cycle. In most parts of the earth winter is colder, summer is hotter, and Maine is no exception.

There are two interesting things about the trend estimate. First, it suggests an increase (getting hotter overall) at a rate of 2.6°F per century. Second, that result *fails* statistical significance tests (at the “de facto” standard 95% confidence). Conclusion: maybe it’s getting hotter at a pretty rapid pace — or maybe it’s not really changing at all, there’s just random fluctuation making it *seem* so!

Let’s try something different: let’s use just the data for the 124 Januarys in the data set. That looks like this:

Again I’ve included a linear trend estimate, but the thing is, this time it *passes* statistical significance tests. Not just at 95% confidence, at 99.5% confidence no less! Hmmmmm…

We could try Februarys, or Marches, etc. etc. We can even try ’em all! It turns out that warming passes statistical significance, not just for January, but for each and every one of the 12 months of the year. But … but … how can that be? How can Maine’s climate not really be getting hotter, when each and every one of the 12 months of the year is getting hotter?

Answer: the seasonal cycle is so big that it swamps the trend. It doesn’t make the trend disappear, it just dominates the trend *estimate* when you don’t take that seasonal cycle into account.

One way to do so, is to look at individual months separately as we’ve just done. Another way is to transform monthly averages to *yearly* averages. Those look like this:

Whoa! What’s going on with that final data value, for the year 2018? It sure seems out of place.

And it is. That’s because 2018 isn’t over yet; we only have data for the first six months. In Maine, the first six months of the year tend to be colder than the last six months of the year, global warming or no, so of course the 2018 value is way too cold because it doesn’t represent a complete year, just the colder half.

We really should omit incomplete years from our annual averages, giving us this:

Now we have a sensible result. We also have statistically significant warming, not just at 95% confidence, not just at 99.5% confidence, but at 99.9999999999% confidence. Yes, it’s getting hotter.

But … but … gosh darn it, I want that 2018 data! I want monthly data! Can’t I do it without having to resort to yearly averages, or having to do 12 separate analyses, one for each month? Why, yes I can.

Here’s the heart of the matter: we don’t need to test whether July is really hotter than January, *we already know that*. What we really want to know is: is *this* July hotter than the *average* July? Is this January hotter than the average January? February, March, etc., rinse, repeat.

To do so, let’s take each value for each month and subtract the average value *for that same time of year* (i.e. for the same month). This will give us *anomaly* values. Positive values indicate temperatures hotter than the average (for that month), negative values are colder than average.

Of course we have to decide how to define when things were “average.” The usual choice is to select a *baseline* period. For global temperature, NASA uses the period from 1951 to 1980 as their baseline, while HadCRU uses the period from 1961 through 1990. I’ll go with NASA’s baseline, and define “average” as the average for each month from 1951 through 1980. Then I’ll subtract those values from the monthly temperatures to compute monthly values of temperature anomaly, which gives this:

The estimated trend rate using linear regression (the red line) is 2.6°F per century, and the statistical significance is 99.9999999999% confidence.

The fact is — and yes, it’s a fact — that the seasonal cycle doesn’t affect the trend, it just interferes with our ability to measure the trend. By using anomaly values we eliminate the seasonal cycle without interfering with the trend. We also get to use monthly data, we don’t have to leave out the last six months just because the year-so-far is incomplete. All of which makes trend analysis so much stronger, we’d be fools not to do so.

But wait! There’s more! Using anomalies doesn’t just help us get the seasonal cycle out of the way. It helps us compare and combine different locations in a sensible way. After all, we don’t need to test whether downtown Portland, Maine is hotter than the top of Mount Katahdin — *we already know that*. What we really want to know is, has Mt. Katahdin warmed faster or slower than Portland? For that comparison, anomalies are tailor-made.

Or suppose we want to form an average for Maine based on those two locations, but Portland and Mt. Katahdin don’t have data for every month we’re interested in, some are missing — different missing months for different locations. If we just average raw temperature, then whenever Portland data are there but not Katahdin the average will be hotter just because we’re leaving out the coldest location, when we have Katahdin data but not Portland the average will be colder because it’s based on one of Maine’s coldest spots. But if we use anomaly values, we not only eliminate the changes due to the seasons, we eliminate the changes due to location. Then we can safely average values — missing values will make the averages less precise, but won’t make them downright wrong.

The fact is — and yes, it’s a fact — that using anomaly values helps eliminate many things that are irrelevant to climate *change*, and that makes us able to measure climate change with far more correctness and precision. Anyone who tells you different, is just plain wrong.

Now for the *very* interesting part. The analysis I’ve given is basically correct, but there are nuances involved. I won’t go into those here (don’t worry, they don’t invalidate the results) since, after all, this is a post about the very basic and very correct process of using temperature anomaly. Those who sincerely want to learn can find many resources; those who make bold assertions on topics they don’t understand, will probably never learn.

This blog is made possible by readers like you; join others by donating at My Wee Dragon.

Good basic explanation, showing the results every step of the way. Applause.

Very clear and engaging treatment. Thanx much!

Very interesting about looking at particular months and anomalies.

I really trust the U.S. Climate Reference Network (USCRN).

“Three independent measurements of temperature .. The stations are placed in pristine environments expected to be free of development for many decades.”

Looking at the most recent month of June, it shows 2006 and 2012 were hotter than this year 2018.

https://www.ncdc.noaa.gov/temp-and-precip/national-temperature-index/time-series?datasets%5B%5D=uscrn¶meter=anom-tavg&time_scale=12mo&begyear=1895&endyear=2018&month=6

USCRN is great. But it has two key limitations: 1) small geographic area (the US is only 2 percent of global area), and 2) limited timescale (13 years). So we can look at two other metrics:

1) extending the timescale by overlaying ClimDiv and USCRN and looking back to 1895: https://www.ncdc.noaa.gov/temp-and-precip/national-temperature-index/time-series?datasets%5B%5D=uscrn&datasets%5B%5D=climdiv¶meter=anom-tavg&time_scale=12mo&begyear=1895&endyear=2018&month=6

2) Looking globally and back to 1895: http://www.ysbl.york.ac.uk/~cowtan/applets/trend/trend.html

I’m not sure exactly what point you were trying to make, but these two analyses show long-term and global patterns that are more reflective of climate change rather than short-term weather (e.g., ENSO and other variability).

April 2018 (on a one month timescale) was the coldest ever for the USCRN (2005-2018)

https://www.ncdc.noaa.gov/temp-and-precip/national-temperature-index/time-series?datasets%5B%5D=uscrn¶meter=anom-tavg&time_scale=1mo&begyear=1895&endyear=2018&month=4

Cherry picking ?

[

Response:Tell you what: make the same graph, but for May instead of April. Then you tell me whether it’s cherry-picking or not.]And May 2018 is the hottest. And the NCDC page helpfully includes a graph, so you can see how much variation there is, and thus how large the random component is compared to the trend.

You appear to have accidentally hit the question mark key after correctly identifying your comment as cherry picking.

Tamino, you don’t waste any time coming up with a comment, once you say you will. Thank you for expanding what I already learned.

Mole Whackers Anonymous rates this blog post 5 moles, its highest rating.

“…using anomaly values helps eliminate many things that are irrelevant to climate change…”

Which is why ‘skeptic’ blog articles and comments are full of people bemoaning the use of anomalies.

Really good point about how the seasonal cycle in absolute temperatures from specific locations or regions skews the trend estimate. Slightly ashamed to say I hadn’t thought of that before. Thanks.

A very good way to inform people who are not so familiary with data treatment.

Now if only we could get one Mr. Watts to read this article for comprehension… nah, not going to happen.

well done, sir

This post has left out the other reason to use anomalies rather than averages: because averages have much higher uncertainty than anomalies.

Suppose that in a certain area we have 3 temperature stations (X, Y, Z) giving us average July temperatures of X=28.3°C, Y=26.7°C and Z=18.0°C over the time period 2010-2018. What’s the average temperature in this area? The correct answer is that you have almost no idea. Perhaps station Z is on a mountain peak and it’s the only mountain in the area. Or perhaps the area is filled with mountains but you happen to have two temperature stations on each side of a small low-lying plain because the mountains are hard to access so we’ve only bothered to put one temperature station up on those slopes.

Clearly, there’s not enough data to directly estimate the true average temperature. (We could use statistical records from weather or climate models to estimate the missing data, but the deniers would simply complain about “models” instead of “anomalies”.)

But let’s say we have July temperatures averaged over 1970 to 1979 of X=27.3°C, Y=25.8°C and Z=16.9. This gives us anomalies for 2010-2018 of ΔX = 1°C, ΔY = 0.9°C and ΔZ = 1.1°C. Temperature _anomalies_ are much more consistent across stations than _absolute_ temperatures, and so we have far less uncertainty about the anomaly. When reporting anomalies we can therefore use much smaller error bars.

Here is my work from today using Tmax temperature anomalies. I thought this crowd might be interested: https://datablends.us/2018/08/01/using-temperature-anomalies-to-visualize-global-warming-via-alteryx-tableau-and-mapbox/