A previous post addressed some issues with linear regression, “linear” meaning we’re fitting a straight line to some data. Let’s devote another post to scrutinizing the issue — so this post is all about the math, readers who aren’t that interested can rest assured we’ll get back to climate science soon.
It was mentioned in a comment that least-squares regression is BLUE. In this acronym, “B” is for “best” meaning “least-variance” — but for practical purposes it means (among other things) that if a linear trend is present, we have a better chance to detect it with fewer data points using least-squares than with any other linear unbiased estimator. “U” is for “unbiased,” meaning that the line we expect to get is the true trend line. Both of these are highly desirable qualities.
Finally, “L” is for “linear,” which in this context has nothing to do with the fact that our model trend is a straight line. It means that the best-fit line we get is a linear function of the input data. Therefore if we’re fitting data x as a linear function of time t, and it happens that the data x are the sum of two other data sets a and b, then the best-fit line to x is the sum of the best-fit line to a and the best-fit line to b. In some (perhaps even many) contexts that is a remarkably useful property.
Continue reading →