Pursuant to the subject of step change and telling the difference from linear increase, a reader pointed to a post on Roger Pielke’s site which claims to prove a step change (rather than linear increase) in two time series, of warm and cold nights over South America from 1960 to 2000. There’s a good bit of hand-waving about visual inspection of graphs (which really amounts to “it sure looks like a step change”), but the essence of the “proof” comes from modelling the data as two different straight lines, one fit to data before the purported step change, the other fit to data after the purported step change. The stated conclusion is:
… the slopes before and after the change points are not statistically significant (P > 0.05) and thus not significantly different from zero. In each data series, therefore, μ1 ≠μ2 and β1 = β2 = 0, proving, beyond doubt, the presence of flat step changes in the two data series.
Permit me to doubt.
I tested the validity of such proof by generating some artificial data which were the sum of a linear trend and random (Gaussian white) noise, so we already know the right answer. Then I subjected the series to change-point detection. Then I fit the stated model (one straight line before the change point, another independent one after the change point). I then noted whether or not the slopes of both straight line segments failed statistical significance — which, it is suggested, is sufficient for “proving, beyond doubt, the presence of flat step changes.”
If it’s even “proof” in the statistical sense (let alone “beyond doubt”) then no more than 5% of such simulations should exhibit this behavior. And what was the result? Out of 1000 simulations, 580 (58%) met the stated “proof” criterion, in spite of the fact that they were all artificial data with a known, linear trend. More than half the time, the data indicated “proof, beyond doubt” of the wrong answer.
It’s not that the trend itself was too small to detect. In 998 out of 1000 cases (99.8%) the linear trend (for the entire data set) passed statistical significance.
The fact is that unless the signal-to-noise ratio is very high, the difference between linear trend and step change is damn hard to establish. Even when we know the trend is linear, the response to change-point detection will be quite strong (and will pass statistical significance with flying colors) and it’s highly likely (in the simulations, more than half the time) that the slopes of the linear fits to “before” and “after” segments can fail statistical significance.
In fact, in these simulations even if you compare the linear model (which is right) to the step-change model (which is wrong) using AIC, the wrong model had better AIC in 238 out of 1000 (about 24%) of cases.
An example, which is not atypical, is the last of the 1000 simulated data sets. Here’s the data, with a linear trend line superimposed:
Here’s the step-change model, which not only has smaller residuals, it even (like 24% of the samples) has better AIC:
And here’s the two-linear-segments model, for which (like over half of the samples) neither of the linear segment slopes is statistically significant:
Simply put, what the idea fails to account for is that when a linear trend is present, if you pick the single moment which gives the strongest “step change” model (which is what change point detection does), you have in a sense “hand-picked” a moment which is most unfavorable for establishing significant trend for its subsections. It’s easy to see why someone who hasn’t thought the problem through closely enough would miss this, and be fooled into overestimating the significance of this test. Extrapolating to “proof, beyond doubt” is just a bridge too far.
Do the series of warm nights and cold nights over South America follow linear increase or step change? I don’t know, I haven’t seen that data. If slopes before and after a purported step change both fail statistical significance, is that “proof, beyond doubt” of a step change, or even disproof of a linear trend? No.