A reader by the name of “Will” fails to understand why trends can be established with confidence even if the uncertainty of the individual values is unknown, even if those values are averages rather than raw data values. It’s a failure he shares with William M. Briggs, numerologist to the stars. Perhaps we can enlighten reader “Will” — I very much doubt anyone can enlighten William M. Briggs.
But first, let’s dispense with a challenge issued by “Will”:
Please make sure that you explain why the time series I presented (102, 97, 98), given the experiment Gator described, is in fact showing a negative trend.
That time series does not show a negative trend. More to the point, nobody claimed it does.
However, we can use a similar example to illustrate when we might actually need to know the uncertainty to detect change, and when we can detect change (in particular, trend) in the absence of knowing the uncertainty.
Here’s a classic statistical problem: we have some data, say the number of gamma rays detected per minute, from only two days, and we need to know whether there’s any difference between the background levels for those two days. To make that comparison, we really do need to know the uncertainty level of the measurements. And we won’t accept anyone saying “It must follow the Poisson distribution,” we insist on 100 separate measurements for each day which will enable us to compute both an average and an uncertainty level. So we’ll use a gamma-ray detector for 100 minutes and note the number of detections for each minute, or we’ll use 100 separate gamma-ray detectors for one minute, or 20 gamma-ray detectors for 5 minutes each, but any way you look at it, we’ll have 100 individual — and presumably independent — measurements for each of the two days.
I generated such data (following a Poisson process just to be realistic). The first day’s data have a mean of 101.46 and standard deviation 9.33. Hence the standard error (standard deviation of the average) is 0.933. For the second day, the mean is 100.44 with standard deviation 11.18, so the standard error is 1.118.
We can already see that the two daily means are not very far apart, compared to their standard errors — they’re not nearly far enough apart to establish a statistically significant difference between the two days. More rigorously, we could apply a t-test, giving a t-value of 0.7 and a p-value 0.48. As far as statistical tests for a difference in mean between two data sets, this one falls into the broad category of “no way!”
Note that we can only complete this analysis because we know the uncertainty of the two daily means.
Now suppose we look for a trend using only the two daily means, not using all the individual measurements, and ignoring the uncertainty level of the daily means. From day 1 to day 2, the mean level decreased from 101.46 to 100.44 — does that mean there’s a trend, declining at 1.02 counts/day?
No, it doesn’t mean that. If we fit a straight line by linear regression, a statistical test of that fit fails — it gives a t-value of NaN (not a number) on 0 degrees of freedom. There’s not really enough data to do the test.
OK, let’s do another day’s work, another 100 minutes, compute an average of those for a third day’s value. This time the daily mean count is 100.22, and it’s starting to look like a trend! If we had computed the uncertainty of our daily means, then we’d know that wasn’t the case, but we’re ignoring that information. Will that cause the evil trend test give us a false result?
No. It won’t. Testing for trend using only these three values (and ignoring their uncertainty levels) gives a t-value of -2.685. In many cases that would be strongly significant, but in this case the test has only 1 degree of freedom. So the p-value of the test is 0.227 — there’s a 22.7% chance of a result that extreme, or more extreme, just by chance. No evidence of a trend.
At this point, some of you (especially Will) might be wondering, “How the hell can you compute a t-test with no knowledge of the uncertainties? That requires some measure of uncertainty!” Or as Will himself said:
The issue, as I see it, is when one fits a line to a series of estimates without accounting for the fact that the estimates contain uncertainty.
The answer is, “The uncertainty level is estimated from the variance of the residuals to the linear fit.”
That’s the really interesting, absolutely fascinating, fact of the matter. Even if you compute means and throw away their uncertainty estimates, the fact that you have many values to analyze for a trend test enables you to estimate the uncertainty of your trend analysis.
Let’s show a more realistic example. I generated 100 sets of 100 values to simulate 100 days of data. Then I computed the mean for each day, as well as its standard error. That enables me to plot the data with “error bars,” which looks like this:
This plot gives the visual impression that the uncertainty levels of the daily values are larger than the trend (if it exists) over the 100 days. That impression is correct — but one shouldn’t draw rigorous conclusions based on visual inspection of graphs. It’s a great way (indispensible, in fact) to get ideas and gain insight, but it’s no substitute for statistical tests.
We can also plot the data without showing error bars, instead superimposing a trend line estimated by linear regression:
That trend line might give the visual impression of a declining trend, or it might not, depending on your level of experience analyzing data and interpreting time series graphs. Some might even want you to get the impression of a declining trend whether there is one or not!
But again, visual impression is a great way to get ideas but not to have statistical confidence. We should apply a statistical test to that linear regression. The result? The test for trend gives a p-value of 0.173, nowhere near statistically significant.
We were able to do the test because the residuals from the linear fit enable us to estimate the uncertainty level in individual days’ values. In fact we already know the uncertainty, because these data were created as a Poisson process with mean value 100, so the individual measurements will have a standard deviation of exactly 10, so the average of 100 measurements taken on any given day will have a standard error of exactly 1. The “estimate from the residuals” is that the residual standard error is 1.01. That’s not just a valid estimate of the uncertainty, it’s a damn good one.
[Technical note: to do the test itself, we compare the variance of the residuals to the variance of the fit as part of an F-test. That way we avoid the pitfall of assuming the trend is real in order to demonstrate the presence of a trend. If the trend is real, then we estimate the uncertainty of the data by the residuals — if the trend is not real, we estimate the uncertainty from the raw data.]
The trend in global temperature is also subjected to statistical tests. Even absent the uncertainty level of any single month’s (or year’s) estimate, the aggregated data allow us to estimate the uncertainty. That’s why we can quote “+/-” values for trend estimates, that’s how we compute p-values for trend tests, and that’s how we know that the trend in global temperature over the last 30 years and more is real, and William M. Briggs is full of …
This is what Will didn’t seem to get (although we don’t blame him) and what William M. Briggs either didn’t get or didn’t want to admit (in either case we do blame him).