# More Mathturbation

After the last post I expected a firestorm of commentary about Donald Trump. Personally, I’m not very fond of Donald Trump.

Instead, most comments since then have focused on a post by “GreenHeretic” referred to in a comment.

Putting aside some of GreenHeretic’s nonsense which commenters have already addressed, he rejects linear regression to establish that the temperature trend (actually he refers to its relationship to CO2) is statistically significant. Why? Because the residuals fail the Durbin-Watson test.

That means those residuals exhibit autocorrelation. He even confirms this himself by computing the autocorrelation function. The horrid mistake is his conclusion from this: that “Estimates or inferences that depend on error variance are suspect, at best. That includes any tests of statistical significance. The errors are not independently and identically distributed (iid).

He seems to be yet another who thinks (the first I’m aware of was our old friend Tim Curtin) that this invalidates linear regression. It doesn’t. It does require that we compensate for autocorrelation, both when evaluating statistical significance and when estimating probable error ranges. When you do so, you find that statistical significance is still quite real. Evidently, GreenHeretic doesn’t know how to do this.

His other mistake is to substitute an ARIMA(1,1,0) model. I’ll bet he learned that in econometrics class. The “1” in the middle makes it a first-difference model, which is how they tend to deal with trends. It also means that the ARIMA model has what’s referred to as a “unit root.”

Curiously, as intent as he was on applying significance tests to anything that shows a trend in temperature, he doesn’t report any tests of his first-difference model. A natural thing to do is to ask whether or not that unit root is real.

That’s not easy to establish; tests for unit root generally have very little power, so even if there is no unit root it’s hard to reject the idea. But in this case, the unit root is so thoroughly absent that the idea is rejected easily. The Phillips-Perron test, e.g., rejects it resoundingly. GreenHeretic’s model simply does not apply.

To sum up: First, he rejects linear regression, not because it isn’t valid but because he doesn’t know how to deal with autocorrelation. Second, he puts his faith in a first-difference model which is easily shown to be invalid.

My opinion is that “GreenHeretic” is a fine example of the phenomenon referred to as Dunning-Kruger.

I’m also of the opinion that there’s less and less value in refuting arguments from the ignorant which are rooted in mathturbation. Rather than refute you, let’s forget you, better still.

I’m wondering if you have any further comments about interpreting the data analysis. I get that if the question is whether there is a trend, the correct way of approaching this problem is to find the linear fit, then correct the standard error for the the autocorrelation of the residuals, then perform a hypothesis test using the observed slope and the corrected standard error.

However, I’ve been experimenting with some artificial data sets. If you deliberately create some time series data with a unit root, and then add a linear trend to that data, the resulting data will pass the unit root test. I don’t have a great deal of experience with time series, but my previous understanding had been that differencing the data accounts for both the unit root and the linear trend. And this leads potentially to the same conclusion that GreenHeretic had: the standard error on the mean term is sufficiently large that the hypothesis test with a null hypothesis of a zero mean (or no trend) fails to reject the null hypothesis.

This can occur when the autocorrelation corrected hypothesis test for a zero slope on the linear fit clearly rejects the null hypothesis. The linear fit concludes that there is a trend, but the time series model cannot conclude that there is a trend.

Is there an explanation for this discrepancy? Am I misinterpreting the standard error of the mean in the time series model? Fitting the linear model first and then fitting the ARIMA model to the residuals results in nearly the same model as just fitting the ARIMA model in the first place. The differences are that with the linear model, there are estimates for the intercept and slope, and the hypothesis test for the slope shows it is statistically significant. If the ARIMA model is fit directly, the intercept from the linear model gets lost due to the differencing, while the slope from the linear model becomes the mean in the ARIMA model.

The slope and the mean are nearly equal, the parameter values of the ARIMA model are nearly equal, the residuals of the final model are nearly equal, and the predicted values from the models are nearly equal.

The only substantial difference is that the hypothesis test in the first case rejects a zero slope, but the hypothesis test in the second case does not reject a zero mean. I would expect the two tests to give the same result, but they clearly do not.

From a practical standpoint, it’s clearly correct to perform the hypothesis test on the linear regression. But is there a theoretical reason for the different outcomes for the hypothesis tests?

Also, because I can’t avoid beating dead horses, I noticed that in a previous post you had mentioned that autocorrelations can be reduced by averaging the data. So I went ahead and confirmed that if you compute the annual averages of the temperature data, linear regression shows that the slope is clearly statistically significant, and the resulting residuals are effectively uncorrelated. It’s just one more argument showing that GreenHeretic is wrong.

[Response: When testing for a unit root one should allow for a possible trend; rather than simply the Dickey-Fuller test, e.g., you should really use the augmented Dickey-Fuller test. Otherwise, you’re going to make genuine trends look like unit roots (as you seem to have discovered).

At the heart of the matter is the fact that there are so many possible mathematical models to apply, that often it’s all too easy to find one which won’t reject your “desired” result (for GreenHeretic, that’s “no trend”). Economists in particular have a proclivity for finding ways which destroy the trend in order to “disprove” the trend — first-differencing seems inordinately popular. For many of them, it’s the only way they can imagine to deal with a trend anyway. Or, they’ll just keep throwing different exotic possibilities at the wall until something seems to stick; “fractional Gaussian noise” is one of the choices du jour.

And, such methods often require vast quantities of data to give even reasonable estimates. If, for instance, one tries to estimate the “Hurst exponent,” one’s estimates are bound to be unreliable unless there are so many data values that you’ll never find such a sample in real climate data. At the same time, they seek out data sets which cover very brief time spans (as when rejecting surface temperature data to take refuge in satellite data).

One should also recognize that this is a question of physics, not econometrics, so it is constrained by the laws of physics. ARIMA(1,1,0) models are unbounded, a situation which is quite clearly impossible physically.

I question the motives of “GreenHeretic” and his ilk. Why would one prefer a physically impossible, improperly applied, and provably wrong model, over one which those pesky laws of physics don’t just support but *requires*?

Meanwhile … still it warms.]

8. Andy

9. A minor point I don’t recall seeing mentioned: the choice of lower trop data is also a cherry pick in the sense that, since it is much more variable than the instrumental data, it will be that much harder to show a significant trend, regardless of which test or model one adopts.

Not a new preference/tactic/whatever, I know–it’s probable that some of these folks don’t even know why the satellite record is ‘better’; they have just absorbed the meme that it ‘is.’ But perhaps worth naming that aspect of the cherry.

10. john byatt

Tamino.It is claimed that the last three years “recovery” in PIOMAS is statistically significant , may i get your expertise , tks

• you need 30 years to tell a climate trend. 3 doesn’t cut it.

• JCH

It’s possibly a function of the ramp up of the positive phase of the PDO, about which not a great deal is known.

• The graph is here:

I don’t know if the recent dramatic swing following the record low minimum of 2012 is statistically significant or not; it’s certainly dramatic, covering something like 3 sd. It’s not unprecedented in the data, though, since you can see a broadly similar swing between roughly ’82 and ’86.

But who really cares? The current value is right smack on the long-term trend line of -3,000 km2 per decade.

• john byatt

my reply to him was that the large upswing is obvious but your statistical test does not prove that it is beyond natural variation

11. The satellite data isn’t better. It’s worse. It’s just that some people mistake quantity for quality. The fact is that satellites don’t measure temperature. They measure microwave brightness at various wavelengths, and it takes a very long, complex, finicky algorithms to get temperatures out of those data. Which is why two different teams, using exactly the same raw data from exactly the same satellites, can have tropical lower troposphere trends that differ by a factor of three.

Climate skeptics don’t love satellite data because it’s better. They love it because it’s shorter, and therefore shows less warming.

• Hence the scare-quoted ‘better’ in my comment.

12. GreenHeretic’s Response to Tamino’s blog posting, ‘MORE MATHTURBATION’

Tamino: If you are going to critique someone else’s blog posting, especially with gratuitous insults, why isn’t it your practice to post something ‘over there’ to alert them? I don’t think much of your ethics.

Did you actually READ my post? Apparently not since you misrepresented why I rejected the Temp=f(CO2) relationship. True, I rejected the original model because of the strong autocorrelation of the errors. However, you are correct that such a deficiency can be ‘compensated’.

In the article I wrote, I rejected pursuing the question down that rabbit hole because CO2 explained no more than a simple time trend model. Real analysts with decades of modeling experience (like myself) understand the importance of that fact.

CO2 has no discernible incremental association with temperature beyond mere correlation over time. Nevertheless, I did waste considerable time exploring, but found nothing worth reporting. That led me to ask the question as to whether an actual trend existed. I have learned the hard way to always check model assumptions. It turns out that there isn’t, which is what the article demonstrated and concluded.

You claim that substituting an ARIMA(1,1,0) aka simple change model is not appropriate and say “I’ll bet he learned that in econometrics class.” I learned that in a graduate level advanced regression class in the late 1970s.

When I started analyzing weather in the energy sector in a professional capacity nearly twenty years ago, I validated the application of ARIMA methods for weather. My citation for the appropriateness of analyzing weather data using ARIMA is Daniel Wilks, ‘Statistical Methods in the Atmospheric Sciences’ published by Academic Press in 1995 (First Edition). It’s up to Third Edition today. You can find it easily enough on Amazon. Chapter 8 in my edition is entitled ‘Time Series’ should convince even you that my methodology is accepted by meteorology professionals.

I am not sure what your point was in your discussion of ‘unit root’. If you believe that my analysis has a problem with stationarity, then you should show it, with numbers. Hint: The problem when the dataset fails stationarity is that spurious regression relationships are reported, NOT when no regressions are reported. It’s clear that you have no idea what you are blathering about.

Your point regarding my lack of appropriateness tests for the ARIMA model is actually partially well taken. The ARIMA(1,1,0) shows an annoying negative residual autocorrelation at the fourth lag. A better fit model would have been to add a seasonal term. For other purposes, I would have done that. However, since it didn’t change the outcome (which was to check for statistical significance for the drift term in the ARIMA model), I didn’t include it.

As for your application of the so-called Dunning Kruger phenomenon, I suspect that you should really look in the mirror for the best example of that. You really haven’t a clue what you are talking about. Your multiple insults show a lack of maturity and lack of basic respect for those who disagree with you. Grow up.

[Response: The fact that you don’t understand why one should test for a unit root tells us a lot. Your further bloviating about the relationship between temperature and CO2 being “mere correlation” reveals astounding ignorance; causation follows from fundamental laws of physics. Your disdain of surface temperature data and overconfidence in satellite data is certainly not founded in sound science; I suspect it’s only because it gave you an excuse to get your silly result. A little understanding has given you a far too high an opinion of your own ability, hence the reference to Dunning-Kruger. As for your post, it’s a fine example of why I coined the term “mathturbation.”

I don’t think much of your ethics, either.]

• y: I rejected pursuing the question down that rabbit hole because CO2 explained no more than a simple time trend model.

BPL: Except that we have a PHYSICAL reason for believing CO2 to influence temperature, and the correlation found only confirms that. The relation between CO2 and temperature was NOT found from data mining. I strongly suggest you read a textbook on atmosphere physics. Houghton’s “The Physics of Atmospheres” is a good one. So is Petty’s “A First Course in Atmospheric Radiation.” Read one through–or better yet, both of them–and work the problems. Then give us your insights on climate change.

• Green Imbecile,
In the face of such overconfident cluelessness, I find myself only able to point and laugh. Dude, you do know that CO2 has been known to be a greenhouse gas since the 1850s and that anthropogenic CO2 was predicted to cause warming in 1896, right?

A piece of advice: clowns get laughed at. If you don’t want to get laughed at, don’t be a clown.

• PJKar

Green Heretic,
This reply combines several of your comments from this site as well as your own. I probably should have posted this at your site but the login there with discus is too much of a pain in the ass what with their requirements to get to profiles and the like. Anyway……….

Just a few points:
“Your commentary on what satellites measure is misplaced. How is that any different in principle than any temperature measurement?”
What is misplaced about his commentary? Satellite measurements are indirect measurements of temperature. They carry microwave radiometers that measure microwave radiance of atmospheric oxygen in 4 bands. Radiance in a band can be estimated via Planck’s radiation law.
Here is a tutorial on the subject that is very worthwhile:
http://www.skepticalscience.com/Primer-Tropospheric-temperature-measurement-Satellite.html

2. What is your fixation with lower tropospheric temperatures? Why don’t you try using the surface temperature data? Isn’t that what you are really interested in? Using annual global GISS data over the period 1979-2014 and annual CO2 data over that period I am sure you will find a statistically significant trend. Here’s what I got using Matlab:

N =36
Slope: 00905
t-stat: 11.4
p-value 3.57e-13
Rsq = .793
DW stat = 1.701
DW p-value = .2

3. Maybe we should straighten this out too while we’re at it. At your site you question reporting of July as the warmest month and you criticize reporters for providing false information. To prove this you provide the monthly GISS global anomaly data to show that other months have larger anomalies.

The articles you refer to are referencing absolute highest temperature. I think you may be missing the point that the climatology baseline from 1951 to 1980 varies from month to month so you would need this monthly baseline to reconstruct the absolute temperature for each month. That baseline may be difficult to acquire but this site has a post on this same topic that may be helpful to you:

https://tamino.wordpress.com/2015/08/15/

Also here is a GISS page on the topic:

http://data.giss.nasa.gov/gistemp/abs_temp.html

13. I regressed CRUTEM4GL anomalies on both CO2 and time. I got:

a = -4.178 + 0.01296 c – 0.0004415 t
R^2 = 82% N = 176 DW 1.36 rho^ 0.314

The t-statistic on the CO2 term is 11.24 (p < 4.96 x 10^-22), while that on the time term is -0.6518 (p < 0.5389). That means CO2 has a highly statistically significant relationship with temperature anomalies, whereas elapsed time, considered with it, does not. It adds nothing.

Thus y's contention that time explains temperature as well as CO2 is prima facie wrong from a simple regression test.

14. Sorry, that should read “N = 165.”

15. ” CO2 has no discernible incremental association with temperature beyond mere correlation over time.”

Even if you made no mistakes in your methods, is it really surprising to find that CO2 and time have a similar relationship to temperature during the satellite era, given that CO2 has risen steadily over time? Which is more likely on the basis of physics, that time is heating the planet or that CO2 is heating the planet? One could as easily argue that time does not offer an incremental association, once allowing for CO2.

You have failed to make a case that your analysis was worth doing in the first place.

What happens when you apply your technique to climate model runs? Over the time-frame of interest, and with comparable data sets, does CO2 have a significantly stronger association with temperature in climate models than in reality? If so, you might have a point worth raising, and we could start to care if the technique was valid.

16. Jim Eager

GreenHeretic wrote: “You really haven’t a clue what you are talking about.”

Which tells us GH really hasn’t a clue who he is talking to.

Time to make some popcorn.

17. I wandered over to look at Green Heretic’s blog… Wish I hadn’t. Quite painful to see such a lame line of argument advanced with so much misguided earnestness and pomposity.

Basically he sets out to prove that the recent CO2 signal, deliberately emasculated by removing the time-linked component, no longer correlates well with temperature. Well, duh.

In other news, experts have shown that smoking does not cause cancer because (after allowing for the number of cigarettes bought) the number of cigarettes a person has actually smoked “has no discernible incremental association with cancer risk beyond mere correlation with the number of cigarettes bought”.

18. jgnfld

For my part, I personally see little value in aggregating data by months to make climate inferences. Even annually may be too fine. I know tamino disagrees. Anyway, aggregating the RSS data annually (1979 to 2014, all months equally weighted) leads to the following results:

durbinWatsonTest(summary, max.lag=5)
lag Autocorrelation D-W Statistic p-value
1 0.05481342 1.885276 0.614
2 -0.24706498 2.423485 0.218
3 0.04960155 1.786935 0.676
4 0.06550576 1.736973 0.678
5 -0.22103625 2.232786 0.232
Alternative hypothesis: rho[lag] != 0

Call:
lm(formula = Data2\$x ~ Data2\$Group.1)

Residuals:
Min 1Q Median 3Q Max
-0.24270 -0.09515 0.01569 0.08663 0.38143

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -27.724283 4.401486 -6.299 3.53e-07 ***
Data2\$Group.1 0.013895 0.002205 6.303 3.49e-07 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1374 on 34 degrees of freedom
Multiple R-squared: 0.5388, Adjusted R-squared: 0.5253
F-statistic: 39.73 on 1 and 34 DF, p-value: 3.491e-07

• jgnfld

I should have given the overall R code (updated with correct names here). Place the RSS text file in the working directory and execute.

#Be sure to delete comment rows at bottom before reading
# Grab full years only
Data2 <- aggregate(Data\$Globe[2:433], list(Data\$Year[2:433]), mean)
# Rename generated names
names(Data2) <- c("Year","AnnualMean")
# Run lm
fit <- lm(Data2\$AnnualMean ~ Data2\$Year)
# Run DurbinWatson
durbinWatsonTest(fit, max.lag=5)
# Print lm results
summary(fit)

19. Maybe

Maybe we need factcheck crowdfunding. If enough people are interested enough in having you weigh in on a contrarian claim, you accept the offer and provide. Otherwise it sits there, with insufficient pot attached.

20. PJKar

Getting back to issues about Trump. In my opinion Trump’s popularity directly results from the fact that he exposes the hopeless corruption of American electoral politics. His authority on this issue is based on his admitted use of a corrupt system for his own gain. He says here:

“Q: You’ve also supported a host of other liberal policies, you’ve also donated to several Democratic candidates, Hillary Clinton included, Nancy Pelosi. You explained away those donations saying you did that to get business related favors. And you said recently, quote, when you give, they do whatever the hell you want them to do.

TRUMP: You better believe it… I will tell you that our system is broken. I gave to many people. Before this, before two months ago, I was a businessman. I give to everybody. When they call, I give. And you know what? When I need something from them, two years later, three years later, I call them. They are there for me. And that’s a broken system.”

The inside financier of so many of his opponents campaigns (Bush, Huckabee, Graham, Pataki, Hillary) now as candidate and here he is extracting his due from them in a way they never could have imagined: I know him..I bought him, he’s a lightweight… I bought him, he’s a loser.

He strengthens his position relative to his opponents with quotes like this:

“I’m using my own money. I’m not using the lobbyists. I’m not using donors. I don’t care. I’m really rich”

He said it all with this one:

“So I’ve watched the politicians. I’ve dealt with them all my life. If you can’t make a good deal with a politician, then there’s something wrong with you.”

His trouble seems to be with Ben Carson who is not a politician.

His supporters find him refreshingly honest for it even if he’s admitting his own participation in the corruption, something they never seem to consider. In the end it’s all just another con job. Instead of selling a casino or a golf course he’s selling Trump. He builds his support by arousing hostility towards undocumented people but how many undocumented workers helped build trump towers? His supporters don’t seem to care. His bigotry is wrapped in spectacle but spectacle can get elected particularly in a corrupt system like ours.

Interestingly the one thing he said (in contrast to many of his opponents) in an earlier speeches that got the most applause was that he would “save social security”. Hardly a Republican value but one that resonates with many people.

21. Just had my attention redirected to the ‘Syrian drought’ paper, here:

http://www.pnas.org/content/112/11/3241.full

Interestingly,

…we separated the observed anthropogenic precipitation trend from the residual, presumably natural, variability by regressing the running 3-year mean of observed (CRU) 6-month winter precipitation onto the running 3-year mean of observed annual global atmospheric carbon dioxide (CO2) mixing ratios from 1901–2008 (39, 40). The latter time series was used as an estimate of the monotonic but nonlinear change in total greenhouse gas forcing (Materials and Methods). After removing the CO2 fit from the total observed winter precipitation timeseries (Fig. 3A), we constructed frequency distributions of the total and residual timeseries (Fig. 3B) and applied gamma fits to the distributions. The difference in the total and residual distributions is significant (P < 0.06), based on a Kolmogorov−Smirnoff test, and is due almost entirely to the difference in the means. Thresholds are shown at 10%, 5%, and 2% (in percent of the total sample size of 76 3-year means) in the dry tail for the timeseries (Fig. 3A) and for the distribution of the total (Fig. 3B). The result is that, when combined, natural variability and CO2 forcing are 2 to 3 times more likely to produce the most severe 3-year droughts than natural variability alone. Residual, or natural, events exceeding the 10% threshold of the total occur less than half as often (3 versus 8, out of 76). For the residual alone, no values exceed the 5% threshold of the total.

Thoughts?

[Response: I’d have to look a lot closer to give a good opinion, but on the face of it, it raises some red flags. For one thing, why are they regressing 3-year running means against 3-year running means? That’s something almost never justified, and requiring careful consideration when it does apply. Also, wouldn’t we expect the distribution of raw and residual to be different, no matter what? Especially different mean?

But, I really haven’t looked at this closely enough, just what I’ve read in reader comments here, so don’t take my word for it.]