Lately I’ve been having fun with change-point analysis.
There are many versions, depending on what kinds of change you allow. The most common is to allow a change in the value of the data at some particular time. Essentially, one creates a step-function model in which the value is constant within each time span, but from one span to another the step occurs. How many times this happens (how many spans there are), and what the values are during each span, are estimated by change-point analysis. Perhaps most important, it applies some rigorous statistics to determine whether or not the changes are statistically justified.
That’s quite necessary, and definitely not trivial because you can’t just go looking for the change-point time which gives the best fit, then evaluate that fit in isolation. When you have many possible change-point times and you test them all (or at least, almost all), you get so many chances to find one that fits well just by accident that you have to compensate for all those chances you took. After all, hitting the bulls-eye isn’t nearly so impressive if you’re allowed to take a million shots to do so.
Another form of change-point analysis looks not for a sudden change in the value of a time series, but in the slope of a best-fit model. In this case we assume the underlying model is a straight line over each time segment, but at the change-points the slope is allowed to change. Even so, the segments have to match up at those change-point times, so our model is a continuous piecewise-linear function of time. Continuous is what ensures that it doesn’t suddenly “jump” from one value to another (although the slope is allowed to jump from one value to another).
Then there’s the version in which both the value and the slope are allowed to change at the change-point. This still gives a model which is piecewise-linear, but it’s no longer necessarily continuous. Sometimes such models are counterindicated by other considerations; for instance, on physical grounds we really don’t expect something like temperature to suddently jump from one value to another, we expect it to change by the flow of energy into the system, which implies a continuous change and therefore a continuous model.
An important thing to bear in mind is that, although change-point analysis (in all its forms) creates a good model of a time series, one which is statistically justified in that the changes are reliably real rather than random accidents, that doesn’t mean that the model is right. Remember the words of George Box: “All models are wrong, some models are useful.” After all, we really don’t expect the trend in something like temperature to follow a perfectly straight line, not even over a number of different pieces.
But if change-point analysis gives us a decent model like that, we can be sure that the piece-wise linear model is a good approximation. We can also be confident that where changes occur, they’re real. As for other changes which are different from the model, they’re almost certainly there but we can’t be confident that they exist and we can’t justify saying when or how they take place. There are likely other changes indeed, but they’re bound to be small compared to those we can confirm, and we don’t have enough information in the data itself to justify such claims. The piece-wise linear model from change-point analysis probably isn’t all that’s going on, but it’s close to all we can deduce from the given data.
Let’s begin with the version which looks for change in the trend rate but not the value.
We’re rapidly approaching the summer minimum of Arctic sea ice (which generally happens in September). Some have claimed that since its precipitous decline, reaching an unprecedented low in 2012, Arctic sea ice has been in “recovery.” It ain’t so; those who say it’s recovering are either willfuly deceptive, or woefully ignorant.
The National Snow and Ice Data Center has released their monthly data through August, so let’s take a look. Here’s the latest graph of anomaly values (to remove the rather large annual cycle):
I’ve added two different smooths (thick lines in red and blue), more clearly to emphasize what the trend might be. The difference in the smooths is the time scale on which they smooth. The red line shows steady decline, but ice loss accelerated around 1998. In fact both smooths agree well up to that point. But after 1998, the “slower” smooth (red) just goes steadily down while the “faster” smooth (blue) first declines rapidly, then less rapidly, seeming even to level off. Which is more correct? What can be justified statistically?
Change-point analysis to the rescue! Using the trend-rate change-point version, we find not just one but two change-points that are statistically justified, giving a model like this:
Arctic sea ice loss did indeed accelerate, around 2002. Then it went through a period of extremely rapid decline, but about 2006 it returned to a slower rate of decline, in fact it may have stopped declining, becoming somewhat “stable.” If we look at the rates determined by change-point analysis, we get this for the three episodes:
The estimated rate of change since 2007 is negative as before, but is not statistically significant, so it is indeed possible that Arctic sea ice has stabilized (but a million and a half square kilometers below its prior value) since then. But it’s also possible that it has continued to decline since then, at the same rate it was declining prior to its period of hyper-active ice loss.
Even if it has stabilized since 2007, sea ice extent is still below what it would have been had it continued at the same rate since the beginning. So, in no sense can this be called some kind of “improvement,” and the notion of “recovery” is downright foolish. The Arctic sea ice is still in deep trouble, as the Arctic continues to warm. What will happen in the near future, only the near future can tell for sure.
For quite a different example, here are poll numbers among Republican voters over the last six months, for presidential candidate Donald Trump:
One can smooth the results on many different time scales, but one which seemed to me to be reasonable looks like this:
It makes it seem that Trump’s support among Republican voters took off at a rapid pace, then the pace declined (although, only a little). The same change-point analysis we used on Arctic sea ice gives this:
Which justifies the chosen time scale for the smoothing, and supports the idea of very rapid increase followed by somewhat less rapid increase. But is that the whole story?
Popularity among voters can change suddenly, so it need not be a continuous function of time. The moment when Trump’s popularity “took off” coincides with his announcement for his presidential campaign in June, when he made the now-famous comment about Mexicans (which rather displeased both Mexicans and Mexican-Americans). Could his popularity have changed both its trend rate and its value at that moment?
The version of change-point analysis which allows both value and slope to change, confirms that indeed it did:
It appears that Trump’s June announcement, the one that caused so much ire among so many people, also caused a sudden increase in his popularity among republican voters. His popularity has further increased, apparently steadily, since then, with him now holding a big lead over the overcrowded field of republican candidates.
There’s great value in the many forms of change-point analysis. As for Donald Trump, you are invited to form your own opinion.