Why Not Weighted?

The question arose on another blog, when analyzing the Berkeley data, why not use weighted least squares with weights determined by the uncertainty levels listed in the Berkeley data file?


Weighted least squares uses knowledge of the noise level in the data to improve the precision of least-squares regression. If the noise level of each individual data point is \sigma, then we should assign weights given by w = 1/\sigma^2, and our regression will be more precise. That’s great! It requires knowing the noise level for each individual data point. The problem is, the uncertainties listed with the Berkeley data are not those noise levels. They’re only part of the noise — the part we could call measurement error or estimation error.

But there’s more noise in temperature data than just estimation error. There’s also physical noise, which has nothing to do with the uncertainty in instruments or area-weighted averages. It’s noise that arises as an actual physical phenomenon. Even if our instruments were perfectly precise and we had total area coverage of the planet, so that all the uncertainty levels listed were zero, there would still be noise in the data. When you do weighted least squares but omit a major part of the noise, you can get into some very serious trouble.

Let’s create some artificial data. We’ll start with 35 years of data with a perfectly linear trend at a rate of 0.02 deg.C/yr — climate change! Then we’ll add physical noise — weather! — and just to keep things simple we’ll make it Gaussian white noise with standard deviation 0.2 deg.C. That gives us this:

The thin black line gives the actual temperature — not an estimate. But as you can see, it’s noisy.

However, there’s more noise because our measurements are imperfect. This is the noise which is indicated by the “uncertainty” numbers in the Berkeley data file. Let’s simulate that by adding more Gaussian white noise — not weather, but estimation error. However, let’s suppose that the final two data points are much more precise than those which precede them. Maybe we used superior thermometers, or had much better spatial coverage, but the estimation error in those last two data points is far less than in the others, only 0.001 C. This gives us the measured data:

We can estimate the trend by ordinary least squares (not weighted). In fact we get a good estimate, indicated by the fact that the estimated trend (the red line) is so close to the actual trend (the green line):

But if we estimate it by weighted least squares using only the estimation error to determine the weights, we get this:

That’s not just less precise. It’s wrong.

The root of the problem is that the last two data points get so much more weight than the others. That would be OK if their low error levels were truly reflective of the noise in those data points. But they’re not, because even though those final points have far less measurement error, they have just as much physical noise as all the others.

Weighted least squares really is better than unweighted if you know the true noise level. But if you use the wrong noise level, then weighted least squares is just wrong.

15 responses to “Why Not Weighted?

  1. Weather noise and statistical noise should not be too difficult to estimate. Combine the two and you get your adequate weighting function.

    • Timothy (likes zebras)

      “Weather noise … should not be too difficult to estimate.”

      I beg to differ. How are you going to estimate it?

      You could use past observations of weather and climate, but these contain inaccuracies due to measurement errors, plus there is variability introduced due to changing climate forcings [greenhouse gases, aerosols, solar cycle, etc], where these forcings are themselves uncertain.

      You could use a control integration of a climate model, where all the forcings are held constant and you don’t have to worry about the measurement errors. However, you’re then assuming that the model has a good representation of the weather noise. While I don’t think the models are all that bad, they clearly have some deficiencies.

      Tamino has shown that if you get it wrong, you will make a horlicks of the trend analysis. He has also shown that this doesn’t really matter, because with a suitably long period of data – in this case 35 years – a simple unweighted ordinary least squares regression gets very close to the true trend in the data.

    • Yvan, Given that climateis changing, how do you know that noise is stationary.

      • You have point. I was just thinking about that when posting. Statistical behaviour of non linear system changes just before jumping from a state to another. The whole system might be fundamentally heterocladistic. That alone might be a good research project.

        In practice, estimating the physical noise over a long time frame would be probably adequate if you assume the stationary. In consequence, it would not be too difficult to do a weighted averaging.

        However, there is a catch, the length of time in this specific case is very short, in consequence noise will be poorly sampled. This alone probably creates to much uncertainties to do a proper analysis. In other term, the amount of information gained would not change much the final result. Maybe simple criteria like Chauvenet’s would be more appropriate in that case.

      • YD:

        I know heteroskedastic, but heterocladistic is a new one for me. What is that?

      • John, your right, My fault. Memory translation fault.

  2. David B. Benson

    That is useful to know. Thank you.

  3. A slightly more subtle point is that the underlying forced response is not generally linear anyway. This certainly crops up in estimates of ocean heat content. Modelling it as a straight line plus “noise” is not really valid when the residual “noise” actually includes the signal we are interested in in the first place (plus natural variability and observational uncertainty, of course).

    Amusingly, some authors do use the obs errors as weights on the data points, and others do a simple straight line fit.

    Even more amusingly, in both the cases I know about (it’s not always clear from the paper, I had to check with the authors), the choice happens to be the one that maximises the trend for the respective data sets… however it has to be admitted that the effect is not huge.

  4. Nice, but YD’s point is a good one. We (well, when I say we… :-) can probably estimate the “weather noise” from the better parts of the series and add that in. Although then you run into the problem that the series as a whole clearly isn’t well fitted by a linear trend.

    [Response: The data since 1975 actually are well fit by a linear trend. I think you’d have a hard time showing that the residuals are other than autocorrelated noise.]

  5. It should be noted that your (synthetic) graphs show that has warming stopped. You have found JC’s basis at last!

    [Response: If that’s her basis then let her say so. Then I’ll comment on it.]

  6. If, hypothetically speaking, your last 2 points instead had a much greater error margin than the rest of the data, wouldn’t a weighted least square minimise the possible distortion due to those 2 points in the fitted line ?

    [Response: Yes. In that case, the much lesser weight attached to them would correctly reflect their much greater uncertainty — the weights still wouldn’t be right (ignoring as they do the physical noise) but at least it wouldn’t throw things off as much as unrealistically high weights. But when points have very low measurement noise, giving them high weight because of that neglects that their physical noise is just as big as all the other data points.]

  7. I strongly believe that as the world is warming the weather noise level will stay the same or increase in magnitude. So expect more “decades of cooling” as we are boiling up

  8. “why not use weighted least squares with weights determined by the uncertainty levels listed in the Berkeley data file?”

    The Berkeley Preliminary text dataset (downloaded after the TMAX posting error was corrected) appears to have no uncertainty values differing from zero (or more precisely, values other than 0.0000 or -0.0000) in the data file data.txt. The README.data file does indicate that “some of the expected fields are currently missing (essentially reserved for future use)”, so uncertainty may be such a field. Can anyone using the Preliminary Mathlab dataset confirm this, or confirm the existence of actual uncertainty values differing from zero in that version?

  9. Answering my own post: more careful reading above shows that in the Berkeley data file above refers in fact to the Berkeley analysis chart data (results) rather than the Berkeley Earth data set, which I had mistakenly assumed on quick reading, and which is the data of interest to me at this point.

    As I see on another thread (https://tamino.wordpress.com/2011/11/01/questions-for-judith-curry/ that there has been some discussion here of the raw data, I’ll add that I have compared the Matlab data set now, with the text data set, and the text version appears to be a correct export of the Matlab version. I have therefore reported to Berkeley an error in this data set, indicative of a coding error on their part. (I had wished to rule out the possibility that this might only be present in the text version in the event of a faulty export to text).

    A caution for anyone using the data set: the “# of Sources” in site_detail.txt in the text dataset, and the “sources” field in “sites” in the Matlab version do not correctly reflect the set of sources shown when the sources shown for the monthly data, for more than half the station records, are examined. In 17,887 cases this represents an undercount of sources (by up to 9), but in 2,456 cases this is an overcount of sources (by up to 6).

    I have not examined the Matlab code to see where this error arises, or to determine what effect (if any) this has on subsequent analysis, as my own interest in the raw data is in a different aspect. The inconsistency however indicates a coding error which, whether or not the count of sources and list of sources is currently needed for subsequent analysis, should be corrected by the Berkeley team.