[Note: see the UPDATE at the end of the post]
It starts with a comment on this graph:
I was sure there was no way I saw a smoothed line go up while the unsmoothed values went down. Only, I had.
It’s followed by a comment on this graph:
Again, I was shocked. This figure showed an even greater uptick than the last! My mind couldn’t understand how the data was going down, but Tamino’s derived lines were going up.
He commented here, asking why, but also making sure to emphasize at length how much it defied belief. I tried to explain it to him. Really. Repeatedly. The problem was that his intuition — that if the data go down the smoothed line couldn’t possibly still go up — was wrong. He was misinterpreting the noise in the data as the signal. A smoothed fit which took that into account could easily continue up.
He wasn’t buying it. He got more and more obstinate, and frankly, his comments got more and more insinuating. It culminated in his making a statement that I can only interpret as an outright accusation. How nice of him.
In his blog post, Shollenberger does his own Lowess smooth and gets this (note: his graph says “USHCN” whereas I used data from NCDC, but that’s not really important):
You’ll note in my figure, the smoothed line goes down at the end, exactly as I would expect, and exactly as my intuition told me.
But in my smoothed curve, the line does not go down at the end (it curves slightly that way, but doesn’t actually decline). He concludes that the reason is my modification to the Lowess algorithm:
There’s only one answer. For some reason, Tamino’s modified version of LOWESS gives extremely low weight to the data near the ends of the series.
Once again, Shollenberger is wrong.
Let’s take the data I used (anomalies of monthly data from NCDC) and compute a Lowess smooth — not using my modified version, just the built-in vanilla “lowess” function in R. Then let’s compare the unadorned Lowess smooth (in blue) to my modified version (in red):
Whoa! That’s different! It certainly does not go down at the end. But then, it doesn’t really follow the data that well either, as we can see by overlaying the 5-year averages (black dots):
What’s up with that?
A Lowess smooth includes a tuneable parameter which controls how much of the data is used in the “window” for each estimated value. A wider window means more smoothing — maybe too much, so we might fail to pick up some variations which are genuine signal. A shorter window means less smoothing — maybe too little, so we pick up some variations which are really due to the noise, not the signal.
The default (in R) is to include 2/3 of the data at each moment of time. That’s too wide, so although the Lowess smooth correctly identifies the overall increase, it fails to pick up some variations which are genuine signal. The default in my program is to use a window a bit less than 25 time units wide (note: the tricube function has finite support so it has a well-defined window width, my weighting function doesn’t). For these data, that corresponds to a parameter value of about f=0.2 in the R implementation. Let’s compare the R “lowess” result using f=0.2 (in blue) to my modified version (in red):
Now the two results are quite similar. There are differences because my algorithm is different. And no, I don’t care to go into it in depth simply to satisfy Brandon Shollenberger’s lust for fault-finding. Notice that the vanilla Lowess does not go down at the end. It curves slightly that way (as does mine) but doesn’t slope downward. Imagine that.
How did Shollenberger get such a different result? Let’s compare the vanilla R verion of “lowess” but with parameter value f=.15 (blue line) to my original smooth (red line):
Well whaddaya know? Now it goes down at the end.
He thinks this is the “right” result — exactly as his intuition told him. I say it’s the wrong result, that the smoothing time scale is too short so it’s picking up variations which are really due to noise, not signal. Let’s test that idea, shall we?
Since about 1975, USA temperature has risen dramatically. Let’s take the data since then and fit a straight line — we all agree there’s an increase. Then lets look at the residuals for some sign of statistically significant departure from the linear increase. Here are those residuals:
One way to look for other signal is by fitting higher-order polynomials, and we can even test such fits for statistical significance. I used polynomials of degree 2 through 6 and computed the p-value for each. In all cases, the p-value was nowhere near significant. I didn’t bother to correct for autocorrelation, but it’s not very large and would only make the fits less significant.
Maybe … maybe since 1975 is too much data to detect a brief downturn near the end. Let’s repeat the experiment using only the data since 2000. Result: the same.
Although not statistically significant, the smooths you get from polynomial fits show an interesting behavior. Here’s the 4th-degree polynomial fit to data since 2000:
Notice that if we allow statistically insignificant fits we can get a rather dramatic upturn at the end. It’s the kind of false impression that noise can create.
If you set the window width to too small a value, then you can get the recent downturn in the smoothed curve with my version of Lowess too — not because of the different weighting function. Brandon Shollenberger was wrong. Again.
Why did I choose the degree of smoothing I used? Because it’s the default for my smoothing program. I chose it as the default because considerable experience in its use has shown it to be a good general-purpose value for climate time series. If I notice what looks like real signal that isn’t detected, I’ll try a shorter window, if I notice what looks like response to noise I’ll try a longer time window. If I’m working on a peer-reviewed paper I’ll probably run some statistical tests (like above) to confirm or deny what is signal and what is not. In this case, my intuition told me the default was good. I didn’t try a bunch of values until I got the result I wanted.
Now let’s talk about the other graph. Shollenberger says:
Naturally, I suspected that graph was questionable too, so I decided to test something. What if when calculating the five-year averages, we don’t include 2012, which only had four months of data?
Then he gives us this graph:
And he says:
I was speechless. If I took the four months from 2012 as a whole year and averaged them with the data for 2008-2011, I got Tamino’s results. If I took the average from 2007-2011, I got a result nearly half a degree lower. Not only does this have a huge visual impact, it means Tamino used one data set in three different ways:
1) Annual averages, excluding data from 2012.
2) Monthly smoothed, using all data.
3) Five-year averages, using four months from 2012 as a whole year.
No, Brandon, you’re wrong. Again.
I did compute the smooth using all monthly data. I also computed the 5-year averages using all monthly data. Damn me for not calling them “60-month averages.” I certainly didn’t use four months from 2012 as a whole year. I did use 28 months from January 2010 through April 2012, so the final average is based on less data than its predecessors. Damn me again for not mentioning that, especially since it has no effect on the subject at hand — that Tom Harris’s claims about the USA temperature record are wrong. But at least I had the courtesy to show the (2-sigma) error bars for the averages.
Let’s recap. For 5-year averages I used all the monthly data. You added a 5-year average for 2007-2011, which means the 2007-2009 (the below-the-smooth stuff) gets used twice. And you omitted everything from 2012 (the way-over-the-smooth stuff).
The only departure from using all the monthly data was for the graph of 1-year averages. Using only January through April to represent 2012 gives this:
If I had shown that, I suspect Brandon Shollenberger would have had a conniption. If, on the other hand, I had used May-through-April for the 1-year averages it would have looked like this:
If I had shown that, I suspect Brandon Shollenberger would have accused me of deliberately choosing a period which makes the final data point the warmest.
So instead, in this and only this instance, I omitted the first four months of 2012 — the hottest third-of-a-year on record — entirely. How alarmist of me.
Shollenberger closes with this:
Does any of this have any impact on the point of his blog post? No. To be honest, I still know nothing about the Tom Harris his post criticizes. The point of this is people promoting global warming concerns should make it possible to understand their work without having to figure out undisclosed details based on arbitrary and unexplained decisions. In fact, everyone should.
I suppose I could turn every blog post into an exact elucidation of every technical detail. Lord knows I never get very technical.
How about this instead: people criticizing global warming concerns should have sufficient competence to know what they’re doing.
It seems Brandon Shollenberger is having difficulty understanding the difference between his result and mine, since he says he used a parameter f=.2, not f=.15.
Brandon Shollenberger (Comment #95500)
May 10th, 2012 at 10:11 pm
Can you verify that using that parameter value generates Tamino’s graph? If it’s true that the difference between “standard LOWESS” and Tamino’s “fancy smoothing method” is a 0.05 difference in one parameter, the dark mutterings about “very weird and undisclosed arithmetical choices” seem a bit overdramatic.
It’ll take me a little bit to see if I can figure out what is going on in Tamino’s post, but I can say this much right off the back. I did not use f=0.15 as Tamino portrays. I tested a number of different values for f to see which matched his line best, and I found 0.2 gave a near perfect match. 0.15 gives a terrible match, and it looks dramatically different than the graph I posted. There is no way anyone trying out LOWESS filters would actually think I used f=0.15, and I have no idea why Tamino implies I used it.
Heck, if you just look at the line he comes up with using f=0.15, it’s obviously different than my figure by a large margin. As an example, the last rising slope in my figure pretty much rises steadily, but his supposed replication has a huge flat portion. There’s also the obvious fact the period from 1950-1960 is basically flat in my figure, but in his, it’s rising notably. Given how obvious these differences are, I’m at a loss as to how you can say:
A parameter value of f=0.15 seems to give the graph that Brandon shows.
All it takes to see that wrong is to look at the two graphs. That’s it.
It took me about a minute to figure out his problem. He computed his Lowess smooth using 1-year averages, not the monthly data. This in spite of the fact that he was undoubtedly aware I had used monthly data. It’s number “2″ in his list of ways I had used the data. Damn me for assuming that he would attempt to reproduce my result by doing the same thing!
Gosh Brandon, you could have told us. Why didn’t you follow your own advice to “make it possible to understand their work without having to figure out undisclosed details”?
Perhaps he’s surprised that using annual averages instead of monthly data could have that much effect on a Lowess smooth. What does that suggest to you, Brandon? Is it possible that the sensitivity of the result to annual averaging indicates that the wiggle you are so fond of is just an artifact of the noise?
Here’s something else for you to try, Brandon. Compute 1-year averages for May-through-April since you don’t want to bias the result by leaving out the most recent data. Then study the time span from 1975 to the present. I’ll get you started: