Here’s some data, annual values for the time span from 1979 through 2013:
If we test for a trend using linear regression (the trend line is also plotted) we get a “p-value” of about 0.54. That’s not less than 0.05 (the de facto cutoff for “significance” which gives us “95% confidence”), so no, that trend is not statistically significant. I believe the correct statistical nomenclature would be: “No way.”
Now let’s look for a change in trend by testing whether we can get statistical significance using just the more recent data. We’ll try every possible start year from 1979 through 2009 and use the data all the way up to 2013 (so we’ll have at least 5 years of data in every case), in order to identify which start year gives the smallest p-value and therefore the most likely real trend. That gives us this:
Note that I highlighted the final 15 years of data in red, and added a trend line for those data (also in red). If we test that trend line for statistical significance, we get a p-value of 0.0055. That’s highly significant, at over 99% confidence. Should we conclude that the final 15 years of data demonstrate, convincingly, a significant departure from the trend that preceeded it?
The answer is “No.”
I say that with confidence because the data are the output of a random-number generator, i.e. by construction they’re white noise with no trend. I know because I created them using a random-number generator. But that’s not why we should reject the conclusion of a convincing change in trend.
If you take some data and test for trend, but the data are just white noise, then the “p-value” you get is actually the probability of getting such a result or stronger when the data are just white noise. If that p-value is small enough (usually, 0.05 or less for 95% confidence) then we can claim statistical significance because the chance of getting a p-value that small, or less, is only 0.05.
But that’s not what we did. We tried every possible start year which included at least 5 data points. That means we gave ourselves a lot of chances to get a small p-value, so the actual chance of finding some start year which gives a p-value 0.05 or less is quite a bit bigger than 0.05. In fact, in this particular case the chance of finding at least one start year which gives a p-value of 0.05 or less, is just about 0.39.
Yeah, you read that right. When we do this particular analysis on plain old white noise, we have a 39% chance of finding a “statistically significant” trend.
Why such a difference between the “apparent” p-value and reality? Because when we pick out the start year that gives the lowest p-value we’re cherry-picking. It’s “cherry-picking” because we picked the start year because of the result it gives. It usually, as in this example, involves choosing a start year which is an extreme value.
If we want to allow this kind of “cherry-picking” but not invalidate our statistics, we have to allow for the vastly greater chance of getting small p-values somewhere. I ran Monte Carlo simulations which indicate that in this specific case (35 years of data, white noise, minimum 5 years for trend) if we want a genuine p-value of 0.05 (for genuine 95% confidence), we have to require an observed p-value of 0.0038.
Yeah, you read that right. For a genuine p-value of 0.05 we have to require an observed p-value of 0.0038.
And that is why we should reject the claim of statistically significant trend “since 1999.” Because the observed p-value (0.0055) isn’t less than the the required cutoff (0.0038) when allowing for “honest cherry-picking.”
As I said, the start time which is selected by such a procedure is usually an extreme value. Look at any of the many graphs of global temperature, think about the oft-repeated claim about a “pause” in global warming since 1998 — then consider how the start year for that claim was selected.