Extreme Trends — Detection is Hard

Not long ago I posted a graph, from NOAA, of the number of billion-dollar weather/climate disasters in the U.S. since 1980.

It is adjusted for inflation, although some may argue about how that was done (using consumer price index), some may point out that it doesn’t take into account the increase in population or total value at risk, and others that it doesn’t account for improvements in building codes and protective technology. And of course, the United States is not the entire world.

Let’s set all that aside, and consider whether the data as given support claims of an increase in the number of such extreme disasters, not just total but for different classes.

To test for trends we should use something more appropriate than least-squares regression. These are counts, and if the mean number (the expected value) changes then the variance will also change, so it won’t follow the constant-variance model inherent in least squares regression. Instead we’ll use Poisson regression, which is tailor-made for the purpose. It confirms right away (and overwhelmingly) that the trend in total disasters is real and strongly “statistically significant.”


However, when we look at the counts for individual types of disaster only one of them gives a statistically significant result: severe storms:


The other types show a range of responses, but the possible error in estimated rates is just too large to draw conclusions. For instance, there’s clearly no noticeable change in the number of billion-dollar freeze events (which is why I’ve plotted the estimated trend as a dashed line rather than a solid line):


Even if the trend estimate seemed to be changing, with only 6 total events in the last 36 years we shouldn’t expect the statistics to support strong conclusions.

There might seem to be an increase in the number of billion-dollar wildfires since 1980; after all, the present estimated number per year is seven times larger than the 1980 estimate. But the uncertainty is still too large to put confidence in that conclusions.


The p-value for the trend is close to the standard 5% cutoff, but at 0.057 it doesn’t make the cut.

One might be tempted to think, therefore, that the increasing trend in total billion-dollar disasters is entirely due to the increasing trend in billion-dollar severe storms. But that’s not the case; if we tally the number of “other” billion-dollar disasters, i.e. those which do not fall into the “severe storm” category, again we see a stastically significant rise:


The salient point is that even when a trend is present, if we’re looking at rare events there may be too few for trends to reach statistical significance. This problem plagues the detection of trends in disasters, and in extreme weather generally. By definition, extreme events are rare — we won’t see very many of them, so we need data for a long time to have enough for conclusions to be reliable.

In fact, some people go out of their way to limit the number of cases just when it’s most necessary to do the opposite. For example, instead of looking at tropical cyclones they may count only those in the Atlantic ocean basin, or only those which reach hurricane strength, or only Atlantic hurricanes that make landfall in the U.S., or only Atlantic hurricanes which were still at hurricane strength when making landfall in the U.S. and had over a billion dollars of insured losses. There are lots of ways to exclude the events that tell a story you don’t want to hear (more to the point, that you don’t want others to hear).

If you like what you see, feel free to donate at Peaseblossom’s Closet.


17 responses to “Extreme Trends — Detection is Hard

  1. Instead of using the arbitrary filtering criterion of discrete events over a certain value, why not use the TOTAL value of these events per year?

    E.g. Total yearly cost of wildfires, severe storms etc.

    There has to be some insurance / reinsurance data out there on this. Peter Sinclair sometimes posts this type of information from Munich Re.

  2. Might it be possible to do a nonparametric test of runs (Wald–Wolfowitz runs test)? I suppose the assumption that extreme events are independent might be questionable.

  3. Another factor which would increase the trend is the continual upgrading of infrastructure because after each event any bridges or flood defenses or what ever will be an improvement on the last- therefore any event that tops new defense will be progressively greater.

    • Actually, that would _decrease_ the trends. Improved infrastructure will be _less_ susceptible to an equivalent event, decreasing the cost of similar extremes. The rising costs despite infrastructure upgrades indicate the severity of the trends.

      • Could be wrong, but I think he meant that it would require a correction factor, which would result in increasing the trend in comparison with ‘raw’ data.

  4. Harold Brooks

    There are a lot of very difficult issues to deal with there. One of the reasons I strongly prefer the tangible wealth adjustment to CPI is that it implicitly takes the population increase into account. Adjusting by CPI is guaranteed to lead to more events and more total cost as you go through time.

    The data are problematic. Our best information is probably insured losses, but the question of the fraction of total losses that are insured always comes up and is likely to be different for different variables (wildfire is likely to have a much lower fraction than hurricanes), and has changed over time and location. The aggregators of insurance losses are also different for different events. We tend to have better information on the biggest events because they trigger reinsurance contracts, so that we more accurate estimates for a single $1B event than we would for 10 $100M events.

    There are also issues for what gets paid out for insurance. Changes in business procedures at the insurance companies and roofing companies have led to a dramatic increase in hail damage payouts in the US in the last 15 years with no similar increase in event frequency.

    We also have issues with the historical databases. Until the mid-90s, the official database of tornado damage in the US only recorded damage in order of magnitude categories ($50K-$500K, $500K-$5M, $5M-$50M, etc.) It’s typically possible to find the losses associated with the really big events, but not for smaller events. Typically, half of the damage during a year occurs with a couple or a few tornadoes, but there are large uncertainty bars on the other half.

  5. Thanks for another trenchant analysis.

    I had an experience years ago that we’d all better hope is *not* analogous to our current historical moment.

    I found myself in the bow of a canoe after dark, descending a largely unfamiliar Southern river. As with many such, the banks were lined with ‘strainers’–trees partially toppled into the stream. You want to avoid them, for multiple reasons.

    The good news was that you could hear them ahead, as the current rushed through the lower branches. The bad was that you couldn’t identify which bank they were on by sound, and you couldn’t see them until avoidance was impossible.

    Statistics to the rescue! I reasoned that avoiding at least half of them would be possible by committing to one side or the other as soon as detection was made. So that’s the leap of faith in logic that we made.

    The kicker: we didn’t hit a single strainer from then on. Every single choice I made was right, even though none of them ‘felt’ trustworthy at the time. It could have been chance, though it would have been highly unlikely, but I don’t believe it. What I do believe is that there is such a thing as unconscious (maybe preconscious) knowledge–and that there are times that the only successful strategy involves attending to it (contradictory though that sounds on the face of it.)

    All of which goes against the implications of tenet that strict adherence to scientific method is the best prescription to avoid fooling yourself, and which I also believe.

    That is where the uncomfortable possible parallel to climate risk lives: in the space for action between ‘detection’ and ‘attribution’.

    • I may be misremembering, but don’t they tend to be on the outside bank? Possibly because it’s eroding more quickly and undercutting more trees? Certainly the ones on the outside bank are more dangerous, regardless of where they are more common.

      • You are right that the outside bank is usually the eroding one, but much of the course is straight(ish) and at night you aren’t always able to well assess which way the bend is going. So I think it was pretty random in terms of right/left.

  6. @IanR, non-parametric tests tend to be less sensitive than the Poisson regression Tamino used, even if one is checking for runs. In other words, if the runs test comes back insignificant, it doesn’t tell one much. To check on the mean == variance assumption in his data, one could use a Negative Binomial regression instead. The small number of counts for severe storms is not really enough to establish dependence or not. Instead, a “tiny data analysis” (see http://www.sumsar.net/blog/2014/10/tiny-data-and-the-socks-of-karl-broman/) might be done for an assumption of no correlation and correlation and see how they compare. Alternatively, the Harvey-Fernandes idea of using a Poisson model with Gamma priors in a state-space context, e.g., in the PGAM package of R (https://cran.r-project.org/web/packages/pgam/pgam.pdf) could be tried to see if there’s much variation. [Ref: Harvey, A. C., Fernandes, C. (1989) Time series models for count data or qualitative observations. Journal of Business and Economic Statistics, 7(4):407–417]

  7. A couple other metrics that you might find a useful addition to the mix include: insured losses from catastrophes, weather related disruptions to the US electric grid and declared FEMA disasters. Declared FEMA disasters could always include a “political” component, so it’s a bit questionable, insured losses and disruptions, not so much.

    • There are some effects that might make the disruptions less severe when there is a shorter interval between extreme events in an area. First, one cause of disruptions is when a large amount of dead overhanging tree limbs build up between storm events strong enough to bring them down. Second, the more disruptions you have the more resources will likely be allocated to preparedness for fixing future disruptions (trucks, crews, trimming operations, etc) and so the future disruptions might be shorter.

  8. That Atlantic hurricane study (cat 4 or 5 in N. Atlantic) always bothered me. They may be seeing a real effect, but it’s hard to know how large a space they pulled that out of.

    • One does get a bit tired of isolating one kind of extreme storm and drawing conclusions. Just like the temperature record, using the maximum amount of information that is available produces the most indication of trends. I watch the Pacific and that has been cruelly wild (use of “cruel” intentional: so sick of what happens to the victims going under the radar).

  9. A while back Tamino posted a Poisson regression analysis of wildfire counts, which intrigued me because the idea looked like it might be useful for a project we have in Oregon. Tamino shared his data promptly on request, too. This train of thought eventually became Figure 2 in a paper, although we used different wildfire and climate data, with a negative binomial model instead of Poisson. Anyway the paper should be coming out soon in Regional Environmental Change, with acknowledgement for the source of inspiration.

  10. For wildfires and such, how about a plot of the number of years WITHOUT a wildfire/freeze, etc in some window?

  11. Some online links for testing the accuracy of Eyeball Trend Detection, here: