Probably the most commonly used way to estimate a trend in something is a mathematical process called linear regression. Basically, it means to fit a straight line [for those who must be pedantic, a flat hyperplane if we have multiple predictor variables]. In the case of time series, use time as the predictor variable and look for a linear relationship. If we find it, we declare “Trend!” and might even posit how big it is.
Why linear? Does anybody really believe that global average temperature since, say, 1970 has followed a straight line? Couldn’t it have wiggled around a little, just a little maybe — not noise, mind you, but genuine signal, real climate change rather than random fluctuation? Might it actually have accelerated, or even decelerated, or — heavens forbid! — taken a “hiatus”? Hell, mightn’t there have been brief episodes of all three, just not strong enough to be detected statistically (for a stickler like me)?
Of course. To my mind, the idea that as far as global temperature goes the climate — the signal, not the noise — followed a perfect straight line, is ludicrous.
So why the hell do I fit so many straight lines? I do it all the time, and if it’s so obvious that the signal is not a straight line that nobody can get away with the idea, as often as not I’ll resort to making a model out of straight-line pieces.
Then I can wax philosophic about the trend rate during each episode, i.e. along each straight-line piece, and estimate not only how fast it’s going, but how uncertain we are about how fast it’s going.
And I’m not the only straight-line maven. Far from it. Very, very, very far from it. Straight-line models (and that means linear regression) are everywhere. Global temperature, local temperature, rain, drought, snow, ice, rate of CO2 growth, the rate of growth of the rate of CO2 growth, … they’re everywhere.
For all those physical variables, the idea of perfectly linear trend is ludicrous. In many cases, just looking at a graph makes one question whether or not the linear-trend model is even useful, let alone “correct.” Yet linear regression persists. I think a lot of people, including many scientists, don’t fully appreciate what linear regression is useful for, and in some cases damn good at.
I’ll offer my opinion, that the most fundamental use of linear regression is to confirm or deny that the trend is doing something nontrivial.
By “trend” I mean the signal, the expected value apart from the noise, and by “doing something” I mean anything except lying there flat as a pancake going nowhere.
Whether or not things are changing is one of the most common and important questions in all of science. That’s the same as whether or not the trend is doing something other than going nowhere. Note that according to my terminology, there’s a trend even when it’s going nowhere and doing nothing — it’s just a flat trend. Others would say that if it’s not doing anything, there’s no trend at all. Po-tay-to, To-mah-to.
For answering this most fundamental question, is there something or nothing, linear regression is terrific! It’s one of, if not the (in many cases) most powerful methods. I believe that the source of its power is the fact that the null hypothesis — that the trend is doing nothing — is exactly the question of greatest importance. The fact that the “alternate hypothesis” (so says the statistician, the climate scientist might say “model”) is a straight line does not (did I say that strongly enough?) mean that the real signal is a straight line. Not. Did I emphasize that strongly enough?
If linear regression confirms something going on, we can generally rely on it’s answer to the additional question: is it heading generally up or down? But don’t forget that the estimated rate of change that we get out of linear regression is really an estimate of the average over the entire interval. The true rate might not be following that straight line model.
But hey, we knew that. Everybody knows that! Basic stats, right? Nobody would ever take a powerfully significant linear regression (statistically significant, that is) and use that alone to conclude it’s following a straight line? Especially when there’s further evidence that it’s not only doing something, it’s doing something besides just that straight-line stuff.
Alas, too often, even in the scientific literature, I see that basic mistake of extending a linear relationship or interpreting it as physically real (not just a good model mind you, but physically real) when there’s no justification for taking it that far. I won’t be naming names.
None of which negates the tremendous usefulness of getting good answers to that most basic question. Linear regression has weaknesses (like all methods) and complications (love ’em!), but it remains a powerful, efficient, and effective way to test whether or not change is happening in scientific data.
When the data actually do follow a straight line, not perfectly perhaps but close enough to make a model that’s downright useful, then the rate of increase or decrease will be constant, and it’s very good to know what that rate is.
In many cases we can say that linear regression is the best method, meaning it gives the most precise and accurate answers. In some of the exceptional circumstances, that tend to trick analysis, we have clever methods to avoid the pitfalls (linear regression isn’t just least-squares regression, you know). Here’s another fundamental usefulness of linear regression
If you want to rely on the idea that the trend is linear, I think you should either have a compelling physical reason to support it, or you should search the data for evidence that there’s something more.
The linear model means a straight line, and that means whether it’s up or down it’s going at a constant rate. What if the rate is changing? Wouldn’t linear regression fail to detect that?
Of course it would. If your model doesn’t include rate change, it’s never going to detect rate change.
That means you need another analysis. A common choice is to fit a quadratic function of time, a model that allows rate change. Then we test the quadratic term (which is responsible for that rate change) for significance. If it passes, we can declare that the rate is not constant, and even give a decent answer whether it’s getting faster or slower.
Again, that doesn’t mean that the signal is actually following a quadratic curve. But it can confirm that it’s not just a straight line, and give us an idea of how large the effect is.
A quadratic curve is only one choice. Another is a function made of two straight-line pieces joined at their endpoints. I call it the continuous (joined at their endpoints) piecewise-linear (made of straight-line pieces) model. It too allows for a rate change, but only a single, sudden rate change. Just when that happens, is one of the parameters of the model.
There are enough possible such models (enough “degrees of freedom”) to make the stats rather complicated, in particular the choice of changepoint time, the moment when the rate changes. But it can be done, and it turns out that the continuous piecewise-linear model is very powerful for detecting rate changes. It’s one of the main weapons in my arsenal when I going looking for that.
This is, essentially, another fundamental usefulness of linear regression, rooted in the fact that any function [for the pedantic: bounded smooth] can be approximated by a continuous piecewise linear function as closely as we want. For high precision it might require a lot of pieces, but it can be done.
And now we come to a drawback. The statistics of fitting multiple straight lines must be tested with great care, it’s oh so way too easy to get a result one thinks is significant (rate change!) which really isn’t (sorry!). The drawback is that although the piecewise linear model (continuous or not) is terrific for statistical testing of trend change when done right, it’s also so easily done wrong that it’s the source of far too many mistaken results published in the scientific literature. I’m not naming names.
Do bear in mind that the piecewise linear model is just one choice. There are polynomials, smoothing and averaging filters, splines, you can get exceptionally fancy if you want to (wavelets and singular spectrum analysis). But for reliability and power of testing whether rates have changed or not, those piecewise linear models are among the best.
Perhaps the best use of linear models (such as piecewise linear) is that they enable us to say when there is *no evidence* for a rate change. There are so many claims of rate change, often used unwisely (sometimes nefariously) that such is a very fundamental usefulness.
Sometimes, they even make good models. The piecewise linear model of global average temperature (data from NASA) shown in this post is one example. The model isn’t just useful, it’s competitive with other statistical models including some pretty fancy schmancy stuff. Let’s face it, sometimes things actually do follow a straight line very closely. Damn closely.
This blog is made possible by readers like you; join others by donating at My Wee Dragon.