One of the subjects that interests me (a lot of people, in fact) is extreme values. By definition, they’re not very common. It follows that when we look at observed data to discover what the likelihood of extreme values is, we have little data to go on.
In fact, sometimes there’s a sense in which we have no data to go on. One of the more interesting questions is, how likely is some quantity to reach a given value which is larger than any we’ve yet observed? That is, without doubt, quite a challenge. But these days it’s a recurring theme as observations take us into new territory, terra incognita.
The situation is inherently uncertain, but not hopeless. We’re interested in the total probability of a value being as big as, or greater than, some value x. That is just 1 minus the cdf (cumulative distribution function), something called the survival function.
We do know some of the properties of the survival function. Of course it’s always in the range from 0 (no chance of being that high or higher) to 1 (everything is that high or higher). We also know that as x increases, the survival function S(x) cannot increase, i.e. the survival function is monotone nonincreasing. In almost all cases we expected it to be monotone decreasing. Finally, for x equal to negative infinity the survival function is 1 (every actual value is bigger than negative infinity) while for x equal to infinity, the survival function is 0 (no chance of a value being infinite or bigger). These are just mirror images of the properties of the cdf, since the survival function is 1 minus the cdf.
So the survival function starts out at 1, then decreases to 0; the extreme values are those near the end. We can take a look for some well-known probability distributions. Suppose for instance something actually follows the normal distribution. Here’s the pdf for the normal distribution (the bell curve, you’ve probably seen it before):
More relevant to our discussion is the survival function, which looks like this:
For extremes, we’re interested in the tail of this distribution. Here it is for x values 2 or greater, which is “pretty extreme” (but not extremely extreme) when we’re working in units of standard deviations:
Not only does the survival function necessarily decay to zero as x increases, for the normal distribution it decays with extreme rapidity. That means that the chances of extreme values, for a variable following the normal distribution, are extremely low. To put it in more familiar terms, ‘taint likely.
Other distributions don’t show such rapid decay of the survival function. Here (for an “extreme” example) is the t-distribution with only 1 degree of freedom:
Its survival function looks like this.
It gives us an example of a heavy-tailed distribution. In fact here’s that heavy tail:
Compared to the normal distribution it has a very heavy tail. The tail is so heavy that I haven’t plotted it in terms of standard deviations, simply because with only 1 degree of freedom the tail is so heavy its variance diverges, so can’t be computed.
What about an in-between case? The archtype is the one distribution for which the pdf looks exactly the same as the survival function for allowed x values, the exponential distribution. It’s only defined for x values which are positive, i.e. negative x values aren’t allowed:
One of the fascinating properties of the exponential distribution is that its tail also looks the same:
This is a case for which the survival function decays faster than the heavy-tailed t-distribution (with 1 or more) degrees of freedom) but not as fast as the normal distribution.
There are even distributions whose tails decay faster than normal. An example is the uniform distribution, for which the survival function decays so fast that it actually hits zero in finite time:
Of course we can’t know what the real distribution is, other than in certain exceptional (and rare) circumstances. But we can approximate the tail of a distribution as decaying slowly, at intermediate speed (exponential distribution), or rapidly. This leads to the Pickands–Balkema–de Haan theorem. It states that for many underlying probability distributions, the survival function for large values of x is well approximated by the generalized pareto distribution, often expressed in the form
The quantities and don’t have their usual meanings, the mean and standard deviation of the distribution. Instead they are a generic location parameter and scale parameter. The quantity k is sometimes referred to as the shape parameter.
There are three “regimes” for the shape parameter. When k is positive, the survival function decays as x increases according to a power-law. Therefore the survival function gets lower and lower as x increases (as it must), but never quite reaches zero. In fact, because of its power-law decrease it decays rather “slowly” (compared to other distributions). When k is negative, it decays but actually hits zero at a finite value of x — this is the case where there’s an upper limit to x values which are at all possible.
When k is equal to zero the generalized pareto distribution is undefined, but we can use the limiting distribution as k goes to zero which turns out to be the exponential distribution.
Hence one approach to estimating the probability of extreme values which are more extreme than any yet observed, is to fit a generalized pareto distribution to the tail that we have been able to observe. We then extrapolate that to higher x values and voila — we have an estimate of their probability.
The procedure is fraught with uncertainty. But that doesn’t mean it’s useless (however much certain deniers might want you to believe that). As imperfect as it might be, it gives us a realistic (at least in the ballpark) and quantitative estimate.
How could we tell how much of the “observed tail” to use, and whether or not the extreme part of the tail is heavy, light, or in between? A diagnostic I’ve found useful is to use the logarithm of survival function which we’ve estimated from the data. Actually it’s useful to use the negative of the logarithm, a quantity referred to in survival theory as the cumulative hazard function
For in-between decay (exponential distribution) the cumulative hazard function follows a straight line. Here for instance is the cumulative hazard function for the standard exponential distribution:
For slowly decaying tails, the cumulative hazard function curves downward as x increases to extreme values, like in the t-distribution:
For rapidly decaying tails, however, it curves upward as x increases. For the generalized pareto distribution it will go to infinity (the survival function goes to zero) in finite time, as for example the uniform distribution:
Another useful diagnostic is something else borrowed from survival theory, the mean lifetime. But I’ve gotten mathematical enough already, I’ll leave it to interested readers to research the topic themselves.
Not all distributions approach the generalized pareto distribution for large x values. For example, for the normal distribution the cumulative hazard function does curve upward for large x values:
But it does not reach infinity in finite time (the survival function does not go to zero in finite time), so we can’t really extrapolate to very very large x values. Likewise for the log-normal distribution.
But such an approach can still be useful to extrapolate to values higher than observed but not too much higher. The survival function for the normal distribution is approximately exponential; it curves upward but not by a lot, as long as we’re only looking a little above the highest-yet observed values.
All of which reinforces my opinion, that this approach to extreme values is fraught with uncertainty but is by no means hopeless.
One final note: such attempts to estimate the likelihood of bigger-than-yet-seen values is different from the other tactic in extreme value theory, to estimate the distribution of the “hottest-per-year” or “biggest-per-century” values. That’s covered by the other part of extreme value theory, with comes complete with its own set of special distributions which apply (Gumbel, Frechet, reverse Weibull). But that is a topic for another day.