Trend and Uncertainty

A reader by the name of “Will” fails to understand why trends can be established with confidence even if the uncertainty of the individual values is unknown, even if those values are averages rather than raw data values. It’s a failure he shares with William M. Briggs, numerologist to the stars. Perhaps we can enlighten reader “Will” — I very much doubt anyone can enlighten William M. Briggs.

But first, let’s dispense with a challenge issued by “Will”:

Please make sure that you explain why the time series I presented (102, 97, 98), given the experiment Gator described, is in fact showing a negative trend.

That time series does not show a negative trend. More to the point, nobody claimed it does.

However, we can use a similar example to illustrate when we might actually need to know the uncertainty to detect change, and when we can detect change (in particular, trend) in the absence of knowing the uncertainty.

Here’s a classic statistical problem: we have some data, say the number of gamma rays detected per minute, from only two days, and we need to know whether there’s any difference between the background levels for those two days. To make that comparison, we really do need to know the uncertainty level of the measurements. And we won’t accept anyone saying “It must follow the Poisson distribution,” we insist on 100 separate measurements for each day which will enable us to compute both an average and an uncertainty level. So we’ll use a gamma-ray detector for 100 minutes and note the number of detections for each minute, or we’ll use 100 separate gamma-ray detectors for one minute, or 20 gamma-ray detectors for 5 minutes each, but any way you look at it, we’ll have 100 individual — and presumably independent — measurements for each of the two days.

I generated such data (following a Poisson process just to be realistic). The first day’s data have a mean of 101.46 and standard deviation 9.33. Hence the standard error (standard deviation of the average) is 0.933. For the second day, the mean is 100.44 with standard deviation 11.18, so the standard error is 1.118.

We can already see that the two daily means are not very far apart, compared to their standard errors — they’re not nearly far enough apart to establish a statistically significant difference between the two days. More rigorously, we could apply a t-test, giving a t-value of 0.7 and a p-value 0.48. As far as statistical tests for a difference in mean between two data sets, this one falls into the broad category of “no way!”

Note that we can only complete this analysis because we know the uncertainty of the two daily means.

Now suppose we look for a trend using only the two daily means, not using all the individual measurements, and ignoring the uncertainty level of the daily means. From day 1 to day 2, the mean level decreased from 101.46 to 100.44 — does that mean there’s a trend, declining at 1.02 counts/day?

No, it doesn’t mean that. If we fit a straight line by linear regression, a statistical test of that fit fails — it gives a t-value of NaN (not a number) on 0 degrees of freedom. There’s not really enough data to do the test.

OK, let’s do another day’s work, another 100 minutes, compute an average of those for a third day’s value. This time the daily mean count is 100.22, and it’s starting to look like a trend! If we had computed the uncertainty of our daily means, then we’d know that wasn’t the case, but we’re ignoring that information. Will that cause the evil trend test give us a false result?

No. It won’t. Testing for trend using only these three values (and ignoring their uncertainty levels) gives a t-value of -2.685. In many cases that would be strongly significant, but in this case the test has only 1 degree of freedom. So the p-value of the test is 0.227 — there’s a 22.7% chance of a result that extreme, or more extreme, just by chance. No evidence of a trend.

At this point, some of you (especially Will) might be wondering, “How the hell can you compute a t-test with no knowledge of the uncertainties? That requires some measure of uncertainty!” Or as Will himself said:

The issue, as I see it, is when one fits a line to a series of estimates without accounting for the fact that the estimates contain uncertainty.

The answer is, “The uncertainty level is estimated from the variance of the residuals to the linear fit.”

That’s the really interesting, absolutely fascinating, fact of the matter. Even if you compute means and throw away their uncertainty estimates, the fact that you have many values to analyze for a trend test enables you to estimate the uncertainty of your trend analysis.

Let’s show a more realistic example. I generated 100 sets of 100 values to simulate 100 days of data. Then I computed the mean for each day, as well as its standard error. That enables me to plot the data with “error bars,” which looks like this:

This plot gives the visual impression that the uncertainty levels of the daily values are larger than the trend (if it exists) over the 100 days. That impression is correct — but one shouldn’t draw rigorous conclusions based on visual inspection of graphs. It’s a great way (indispensible, in fact) to get ideas and gain insight, but it’s no substitute for statistical tests.

We can also plot the data without showing error bars, instead superimposing a trend line estimated by linear regression:

That trend line might give the visual impression of a declining trend, or it might not, depending on your level of experience analyzing data and interpreting time series graphs. Some might even want you to get the impression of a declining trend whether there is one or not!

But again, visual impression is a great way to get ideas but not to have statistical confidence. We should apply a statistical test to that linear regression. The result? The test for trend gives a p-value of 0.173, nowhere near statistically significant.

We were able to do the test because the residuals from the linear fit enable us to estimate the uncertainty level in individual days’ values. In fact we already know the uncertainty, because these data were created as a Poisson process with mean value 100, so the individual measurements will have a standard deviation of exactly 10, so the average of 100 measurements taken on any given day will have a standard error of exactly 1. The “estimate from the residuals” is that the residual standard error is 1.01. That’s not just a valid estimate of the uncertainty, it’s a damn good one.

[Technical note: to do the test itself, we compare the variance of the residuals to the variance of the fit as part of an F-test. That way we avoid the pitfall of assuming the trend is real in order to demonstrate the presence of a trend. If the trend is real, then we estimate the uncertainty of the data by the residuals — if the trend is not real, we estimate the uncertainty from the raw data.]

The trend in global temperature is also subjected to statistical tests. Even absent the uncertainty level of any single month’s (or year’s) estimate, the aggregated data allow us to estimate the uncertainty. That’s why we can quote “+/-” values for trend estimates, that’s how we compute p-values for trend tests, and that’s how we know that the trend in global temperature over the last 30 years and more is real, and William M. Briggs is full of …

This is what Will didn’t seem to get (although we don’t blame him) and what William M. Briggs either didn’t get or didn’t want to admit (in either case we do blame him).


38 responses to “Trend and Uncertainty

  1. Wow – I think I did the calculations correctly in Excel: using the annually averaged GISTEMP LOTI across the satellite era and the explanation given at Wikipedia for a regression estimate,'s_t-test#Slope_of_a_regression_line
    I get a t-value of 8.59. At 33 degrees of freedom, right? Hot DAMN, can someone verify that for me?

    • Agh – nope. Too many degrees of freedom, it’s 31. Rechecking my work in Excel though, that’s what I used, so I misspoke above.

  2. It seems to me that one of the things William Briggs is asking for is to have a credible interval on the regression itself, as I did here:
    or the frequentist equivalent. I rather like this for communicating the uncertainty of the trend to a non-statistical audience. I am intending to have a look at the effect of the uncertainties of the observations in the BEST dataset to see if it has much effect on the width of this credible interval.

    Appologies if this isn’t one of the things Dr Briggs is asking for.

  3. BTW, very enlightening post Tamino, thanks for this.

  4. Section 2 of a paper derived from my PhD explain in an awful statistical demonstration the impact of the length of the sample on slope determination.

    Click to access Dutil.pdf

    Please, I know this derivation is far from rigorous. I did it for illustrative purpose.

  5. Interesting… Are you claiming that the uncertainty of the slope is independant of the uncertainty in each individual measurement?

    [Response: No. I’m stating that the variance of the residuals provides enough information about that uncertainty to determine the uncertainty level of the slope itself.]

    • So if I understand correctly, you are using the fact that the measurements are all drawn from the same probability distribution. Right? Then, of course, it seems reasonable that the variance in the distribution of measurements can be estimated from the residuals, i.e., the differences between the regression line and the measurements. However, would you say the same is true if the uncertainty in mesurements vary for some reason?

      • Gyro,
        This depends on how the measurement uncertainties varied. A random variation could be modeled, and the error model for the points adjusted. If there were a systematic error, this might be corrected. None of this is particularly rare in dealing with data. Briggs and Will are on drugs.

      • Ray,
        Thanks. Let’s leave it at that.

  6. I think there is another point to make. The error bars are seldom independent of the residuals. In a single measurement, you start off with a rough but educated guess of the uncertainty. With multiple measurements, the residuals tell you if your uncertainty estimate was too high or too low. You then correct your error bars (and hopefully learn more about where the uncertainty is coming from). When you are told that a thermometer is accurate to 0.5C, that value was established statistically not from the physics and chemistry of mercury in glass tubes.

  7. ” … the residuals from the linear fit enable us to estimate the uncertainty level in individual days’ values.”

    Correct me if I’m wrong, but isn’t there a very important piece of information to be had by comparing the uncertainty in individual values as estimated from the process of measuring/deriving those values to the uncertainty in individual values as estimated from the linear fit? Specifically, if they’re significantly different from each other, it means either (a) the uncertainty estimate from the measurement process is wrong (so the process is not properly understood) or (b) a linear fit is inappropriate (two subcases for which uncertainty estimate is larger – either any real relation is more complex than linear, or there’s no evidence for any relation).

    This is something I learned the hard way when I was schooled for not taking this into account in my Generals experiment. My data had the exact relation it should have had – it was designed to be fit by a line, the slope of which was the essential ingredient in calculating the physical property (the shear modulus of a foam) I was trying to measure, and I estimated the uncertainty in the slope of the line purely from the residuals. Apparently I should have estimated the error in each individual measurement to make sure there wasn’t a problem with the experiment’s design. Anyway, it wasn’t critical and I passed :-)

    [Response: Of course there is additional information from detailed knowledge of the raw data values. But that’s not what these posts are about, it is? They’re about how wrong William Briggs is to claim that lack of knowledge of the uncertainty in the averages falsifies statistical significance of the regression line. It doesn’t.]

    See comment 239 where Will Says:
    February 5th, 2012 at 9:15 pm

    Oh my… Attacked? Really? This has nothing to do with climate change Phil, and everything to do with data analysis.

    You fit a line to the mean of a set of time series measurements. There are good reasons to not do this. Why didn’t you just use the raw measurements? Did you try incorporating those error bars in to your analysis, or are you saying they don’t mean anything?

    You are too certain of your conclusion. End of story.

  9. I think that a strong trend might also tell you something about the error of your measurements. For example, if I have a series of points, 0.14, 0.23, 0.35, 0.44, 0.55, 0.66, 0.74, 0.86, 0.95, etc. – I’m pretty sure that I know that the trend is there. 0.1 per time period, highly statistically significant.

    So, what if I then claimed that the uncertainty of any individual point was plus or minus 1.0? Well, you’d tell me that something is screwy. There’s no way that the data points can have a trend that consistent with that kind of uncertainty. So either I’ve overestimated my uncertainty, or there is something wrong with my measurement device.

    In any case, the point I’m trying to make is that the behavior of the residuals can provide some data on the maximum possible uncertainty (they don’t necessarily tell you what the minimum uncertainty is – it is possible that you are very precisely measuring a noisy system – but it would be hard for the uncertainty in measurement to be LARGER than the noise in the data…)


  10. Dikran Marsupial: Thank you. You are one of them few to admit that the big red line isn’t the only plausible red line.

    Imagine that uncertainty in the measurement would result in uncertainty in the model… Funny that.

    [Response: Hold on there, bud.

    Almost every time I’ve quoted a trend rate for a temperature time series (which is a helluva lot of times) I’ve also stated the estimated uncertainty in the trend rate. That includes published research for all five major global temperature data sets. Every time I do that, it’s tacit acknowledgement that there’s more than one possible pathway of global temperature.

    I’ve also corrected the estimates for autocorrelation in a way which is, as far as I can tell, better than done by my predecessors in the analysis of global temperature trends.

    I’ve also stated explicitly on many occasions that the real trend is obviously not perfectly linear, that the linear estimate is only an approximation, but that for data since 1975 I see no evidence of demonstrable nonlinearity.

    It’s WAY past high time for you to admit that any plausible line will be goddamn close to “the big red line,” that any line which isn’t increasing over time is so unlikely as to defy belief let alone be implausible, and that your claim that the uncertainty in monthly averages invalidates the trend analysis was completely wrong and has been utterly demolished.

    Man up.]

    • Time to get out your crayons or colored chalk and draw any up any old line by hand through a data set.

      Or just draw an N-1 polynomial through N data points. Perfect fit every time.

      Since any old line is just as good, don’t need no R^2 or statistical tests either.

      Oh, and what dikranmarsupial showed was the confidence limits of the trend line estimate, not the uncertainty (due to measurement inaccuracy) in the individual data point measurements themselves.

    • Try this for an exercise:

      – take a series of 101 points: x = (0, 100) and y = (50,60), which has an underlying slope of 0.1

      – add random Gaussian noise, e, with a standard deviation = s

      – choose several values of s from 1 to at least 10

      – perform a statistical significance test of your choice (such as the Makesens excel utility which is easy to find via google) to estimate the uncertainty in the slope of y = ax + b + e for each value of s

      – plot the uncertainty of the slope vs the standard deviation of the noise, s

      – plot the uncertainty of the slope vs the standard deviation of the residuals for each value of s

      you should find that the uncertainty in the slope estimate, the standard deviation of the noise, and the standard deviation of the residuals are all VERY highly linearly correlated with each other.

      So yes, you can estimate the uncertainty of the slope from the variance of the residuals.

    • Will, as Tamino correctly suggests the standard statistical approaches to the problem are completely open about the uncertainty (and I suspect the Bayesian credible interval on the regression is pretty much the same as the frequentist one). It seems to me that plotting the credible interval is a good way of communicating the uncertainty to a non-specialist audience who don’t understand statistics, but the confidence interval on the trend is a perfectly adequate indication for those that do.

      Tamino is an excellent statistician, who obviously knows time series analysis very well. However, unlike Dr Briggs, he also seems to have a good grasp of the climatology, which means his analyses are useful as they address the important issues, rather than just viewing the observations as a set of numbers.

    • Horatio Algeranon

      Sorry, but the irony is killing Horatio.

      Tamino has cautioned ad nauseam against simply taking the trend (eg returned by OLS) at face value — ie as “THE (one and only possible) trend” — without considering the error bars.

      OTOH, parading THE Trend around through the city square naked (wearing no error bra or nothin’) is something the pseudo-skeptics do ALL the time. It seems to turn them on.

      And coincidentally, in the latter case THE (naked)Trend always seems to be looking down. Maybe it’s just because she’s naked and embarrassed. Then again, perhaps it has something to do with the fact that she’s usually looking down from the top of an El nino peak (eg, 1998) — another frequent focus of Tamino’s posts.

    • Gotta hand it to you, Will. You are bound and determined to play the fool. First, no one has contended that there is only one line one can draw. Indeed, there are an infinite number. However, to contend as you have been that they are all equally likely is simply flat, weapons grade stupid.

      I have published an algorithm for fitting curves when the data points exhibit a given error model (in conjunction with radiation effects in satellites). Your criticism is simply flat ignorant.

    • Whoa there.. You just put a whole lot of words in to my mouth.

      I never once said that the conclusion of the trend analysis was “demolished”. I said that the graph,and the post at discovery, we’re too confident. And they are! While you personally may include error estimates for your linear fit, the graphic in question does not. One red line without a CI over a bunch of points without error bars… Why wouldn’t I claim that’s its over certain???

      Secondly, you did say that the mean trend line was the most likely, and I said “that’s not true”. All kinds of lines, while close to the mean, are equally likely. When I said that the last time you, condescendingly, asked if I’d ever heard of maximum likelihood. Sheesh.

      We disagree on some technicalities regarding averaging measurements.. Or maybe we don’t. Im starting to think that you perceive any criticism (or observation) of anything climate related as a criticism of you.

      [Response: What was “demolished” was your claim that the trend analysis of averages without taking into account their individual uncertainties was not valid. Yes, that’s what you claimed, and yes, it was demolished.

      I said at the outset that when I showed this, it would be interesting to note whether you were willing to budge. But you won’t budge an inch. And you won’t admit you were wrong.

      I won’t waste my time demolishing you latest folly about “all kinds of lines” begin equally likely. Why bother, when you won’t admit it even if I prove it? Perhaps the very idea that global warming is really happening is so terrifying for you that you’d rather cling to doubt than admit the truth, even to yourself. Perhaps that’s why you perceive any demonstration of your own mistakes to be a personal attack.]

      • Will, do yourself a favor: Quit while you’re behind. You have no leg to stand on. First, the model here is that each data point is defined by the trend (expected value) and some random error following some distribution. Unless there is a systematic error in addition to the random error, the expected value will be the mean. The central limit theorem tells us that the sample mean is an unbiased estimator of the population mean, with errors decreasing with sample size. Therefore, the trend through the means WILL BE the most likely. PERIOD!!!

        If errors are large on the data points, then we do not expect the trend to be the whole story. However, this will be reflected in the goodness of fit of the trend to the data. The error bars on individual data points are not needed.

        If you persist in making absurd arguments you don’t fully understand (e.g. that there are many equally probable trends), you will only persist in playing the fool. You have a PhD statistician here who you can learn from. Take advantage of it.

  11. It might become even a teensy bit clearer if one considers a series of measurements of something unknown made in a black box with only a digital readout on the cover.

    Will doesn’t know what the instrument is. Will doesn’t know how accurate or precise the instrument is. Will doesn’t know what the instrument is measuring. Will doesn’t know if what the instrument is measuring is changing in time. Will doesn’t know much, but we knew that.

    All Will knows is the numerical (digital) representation of what he reads on the indicator.

    Will diligently records the numbers and then arranges them in a table as a function of time. We know that Will never gives up so he gives us a long series or maybe not.

    Will can now use statistical analysis to estimate the probability of there being a trend or not and to estimate the trend and the uncertainty in the trend. The residuals and trends in the residuals can be used to estimate the probability of the trend being linear or higher order (since everything reduces to a power series we don’t have to directly deal with anything but polynomials. From that point of view accepting or rejecting the hypothesis that the behavior of whatever is being measured is unchanging is simply checking if the zeroth order term in the series is the only significant one).

    From this POV, the residuals tell us of the summed variability of the what is being measured and the measurement device. We would, of course, have to break into the box and calibrate the instrument against a well characterized source to separate accuracy and precision.

  12. Dikran Marsupial

    I just had a go at incorporating the uncertainty in the response variable using a Bayesian analysis. Even if you accidentally overstate this uncertainty by a factor of two (as I did – mea culpa) it appears to make very little difference to the regression or the credible interval. See

  13. Does Will not understand measuring temperature and using anomalies?

  14. Will is dealing in [edit], as described here:
    “…bullshitters seek to convey a certain impression of themselves without being concerned about whether anything at all is true. They quietly change the rules governing their end of the conversation so that claims about truth and falsity are irrelevant. Frankfurt concludes that although bullshit can take many innocent forms, excessive indulgence in it can eventually undermine the practitioner’s capacity to tell the truth in a way that lying does not. Liars at least acknowledge that it matters what is true. By virtue of this, Frankfurt writes, bullshit is a greater enemy of the truth than lies are.”

    • Thanks for that Stewart. Sounds like its worth reading. I’m guessing that WTFUWT is rather full of it.

    • I heard Frankfurt interviewed one time on Fresh Air. He told the story of being approached by a representative of Princeton University Press shortly after the publication of his essay, “On Bullsh*t”. The Press, he was told, was interested in publishing the essay as a book.
      “Oh,” Frankfurt said, “I’m afraid it is too short for a book.”
      “You’d be surprised,” said the rep, “what we can do with fonts and spacing.”

      Frankfurt did not say whether the rep was sufficiently savvy to appreciate the irony. However, unfortunately, the copyright on the book has kept Frankfurt’s amusing and trenchant essay from being more widely read and known. It really deserves to be a standard in the education of any literate person.


    We, the undersigned, being brilliant genius climate scientists like Galileo, have agreed on the following points, all of which are true.

    1. The world is not warming.
    2. The warming the world is undergoing is caused by the sun.
    3. Whatever it is, it’s not carbon dioxide.
    4. The theory of Anthropogenic Global Warming is a lying fraudulent hoax meant to implement world Communism, strangle puppies and kittens, and pollute our precious bodily fluids.

    We call on governments everywhere to burn every last ounce of carbon between Earth’s surface and mantle, and to do so as fast as possible. To use any other source of energy is treason and should be treated as such.


    Dr. Elmer Fastbuck, Ph.D., Department of Insider Stock Trading, Iowa State Business College.
    Dr. Alfven Sartorius, Ph.D., Department of Advanced String Theory, Bob Jones University of the Stars.
    Dr. Fred Flintstein, M.D., Chairman, Department of Dental Medicine, Guadalajara Community College.

  16. The forces of anti-science descend on Michael Mann and the Climate Wars

    Eminent climate scientist Michael Mann has written a book describing what’s it’s like to be on the receiving end of an orchestrated anti-science campaign. The comments page has collected a swarm of negative reviews from climate science denialists

    • And I am dealing with the comments from my own review. As James points out, there are some really excellent reviews from the reality-based crowd.

    • nice how they decided to further prove Mann’s point, by heading off to fill the reviews section with personal attacks at the request of WUTWAT.

  17. I’m very interested in the second paragraph of Tamino’s reponse to Will (Feb 8th, 12.06am), in which he states “I’ve also corrected the estimates for autocorrelation in a way which is, as far as I can tell, better than done by my predecessors in the analysis of global temperature trends.”. I often fit a tentative model to time series, and compute the assorted (and relevant) statistics, some of which have been addressed in this thread. However, exactly what is the size of the modification to inferential statistics such as confidence intervals for the fitted line and for a further observation from the same population. Is it simply (equivalent to?) a modification of the degrees of freedom associated with the residuals, or is it a function of the autocorrelation function of the time series, or possibly some other more complex things too? Whatever it is I am really interested in finding out what you deem to be an acceptable method for taking account of autocorrelation.

    Hope you can help


    [Response: It’s given in the appendix of the paper.]

  18. Nicely done. Too bad it won’t convince any skeptics.

  19. The lagged auto-correlation function gives you an estimate of the decorrelation time scale (formally its integral to infinity is, but the integral to the first zero-crossing often provides a good upper bound). Divide the record length of your time series by this decorrelation time scale and you get the degrees of freedom to be used in subsequent statistical tests. Not sure why one needs to “model” the lagged auto-correlation function with something that deviates from the data …

    [Response: First, you need to integrate (actually sum) the ACF over the entire range of lags, both positive and negative. Second, this is only approximately equal to the decorrelation time scale (actually twice the decorrelation time scale) when that time scale is sufficiently large. Third, in cases such as this you need to include the impact of autocorrelation at more lags than can reliably be estimated by the sample ACF, especially since the sample ACF is biased low and would tend to underestimate the impact of autocorrelation. A decent model (like AR(1) or ARMA(1,1)) is a much better way to do this than just to use the sample ACF. As for “model the lagged auto-correlation function with something that deviates from the data,” both the model and the sample ACF are estimated from the data, — they’re the same in that regard.]

  20. (along with a half dozen other papers published in 2012) looks at trend detection using the GRACE satellite. This one in particular focuses on problems sorting out or lumping together all other effects besides melting. It might be worth featuring.

  21. Here’s another that might be interesting to look at, though not climate-related (well, I’d guess not), as an example of adjusting for a known factor (age) that affects what we make of the raw numbers counted per year:

  22. Thanks for the pointer towards your technique for adjusting the time series analysis for autocorrelation. I have now read the Appendix. I have another request! Do you have the numbers used in your example of 100 data points, each the average of 100 “observations” or realisations of the Poisson process? I would really like to try my software on them!


    [Response: No. I do so many artificial data sets for blog posts that if I saved them all, it’d fill up my hard drive. But you can generate such data yourself — most statistical packages (including R) will generate random numbers following a Poisson process.]