The topic: it’s probably not what you think.
This post has nothing to do with climate change. It’s about the survival function, which finds application in many fields, most notably medical research (where survival is considered important) and failure testing. I’ve come up with a new way to estimate it from observed data, and most important, what I believe is a much-improved way to estimate confidence limits.
The traditional way is called the Kaplan-Meier (KM) estimator. Back in 1957, Edward L. Kaplan and Paul Meier both submitted manuscripts defining essentially the same method to the Journal of the American Statistical Association. The editor at the time was John Tukey, who suggested they join forces and submit together. Which they did.
To say their method became a success, is a gold-medal understatement. It’s nearly universal these days, indispensible in medical research, and their original paper has become one of the most-cited scientific papers of all time. When your work has been referred to by other scientific publications over 30,000 times, you know it’s had an impact. They well deserve one of the most flattering accolades one can achieve in statistics: to have a method named after you.
The germ of the idea is simple. Suppose we observe a number of individuals, say n of them. At a particular time t, some may have died (or been diagnosed, or gone into remission, or any kind of notable event) while others still live. Life or death, in this case, can be viewed as a binomial event. If the number still alive is k, then the chance of surviving at least that long is easy to estimate using binomial statistics: it’s just k / n.
So far, so ridiculously easy. But medical research (and failure testing, and other applications) comes with a catch: sometimes we don’t know whether some patients lived or died because we lost track of them. Maybe they were checking in every week (so it’s easy to tell whether they’re dead or alive), but stopped checking in and we can’t find out what happened. So we have some data — they lived at least “so long” — but we don’t know how long they lived after that. Or, perhaps we simply chose to end the study after a year or two. Even if we know the status of everyone who was still alive at that time, we don’t know how long they lived because their lives aren’t over yet.
Such data are referred to as censored data. In this particular case they’re right-censored because the lifetime is at least some given length, but could also be any value higher than that. Left-censored data mean that the event happened no later than a given time, but we don’t know how much earlier. Finally, interval-censored data are events that happen no earlier than one given time and no later than some other, but could have occured at any time in between. Medical research tends to be dominated by right-censored data.
Censoring complicates the calculation of binomial statistics for the Kaplan-Meier estimator. We can’t just ignore it, or ignore it after it’s censored, because that will bias the results. Even if it didn’t, we wouldn’t want to; many studies need the maximum information but suffer from small samples, it’s too much of a loss to ignore any of it.
Therefore Kaplan and Meier devised their method, also called the product-limit estimator. It’s ingenious; interested readers can find details on Wikipedia (linked to at the start of this post).
Of course just estimating the survival function isn’t enough, we need to know how uncertain our estimate is. A method to do that had already been worked out by Greenwood, and is called “Greenwood’s method.” It estimates the standard error of the Kaplan-Meier estimate, then uses the normal-distribution approximation to assign confidence limits. It’s akin to the calculation of confidence limits for binomial statistics called the Wald interval.
If, for instance, you want 95% confidence intervals, you take the mean value (the Kaplan-Meier estimate), subtract 1.96 times the standard error to get the lower confidence limit, add 1.96 times the standard error for the upper confidence limit. That approximation can be shown to apply asymptotically, meaning as the sample size gets larger and larger. But it’s just an approximation, a better one the more data we have.
One of the things noticed before long was that at late times, the lower confidence limit estimated by Greenwood’s method tended to be too low. Another method was devised to estimate the standard error with censored data, which gave somewhat improved results, called “Peto’s method.”
But the main problem is that the Wald interval isn’t very good for binomial statistics. This is especially a problem with small samples, even medium-size samples. It’s even a problem with very large samples when the number of “events” or “non-events” (deaths or survivals) gets very low. Unfortunately, for survival analysis it’s often important to know what’s happening at precisely those times, either very early (e.g. infant mortality) or very late (total life expectancy).
Those are the times for which the naive Wald interval for binomial statistics can give nonsense results, like a lower confidence limit less than zero. That of course is impossible; the probability of survival (or of anything, for that matter) can’t be less than none. Likewise it can lead to upper confidence limits greater than one, similarly impossible. The custom in such cases is simply to truncate the confidence limits at zero below and one above. It works, the result is at least no longer nonsense, but it’s not very good and certainly not very satisyfing.
The survival function estimate is essentially a probability, so it’s always between zero and one. If we log-transform, the result is between minus infinity and zero. If we compute the standard error of the log-transformed value, then we can subtract 1.96 times that to get a lower limit which is surely finite (and negative), that can be exponentiated (un-log-transformed) to give a lower limit estimate which is never below zero. Problem solved! Or at least, improved. Such a confidence interval is called, not surprisingly, “log-transformed.”
It solves the problem of the lower limit being negative, but doesn’t solve the problem of the upper limit being above one. So, we can log-transform the negative of the log-transform (whew!), compute the standard error of that, add and subtract 1.96 standard errors, then un-log-transform the un-log-transformed resultant limits, and we’re guaranteed our confidence limits will never be below zero or above one. Such an interval is called, not surprisingly, log-log-transformed.
Hence we seem to have three different ways to estimate confidence limits for the Kaplan-Meier estimate, “plain,” “log,” and “log-log.” But actually there are six, because for any of the three we can use the standard error from Greenwood’s method, or Peto’s.
The simple fact that there are six viable alternatives for confidence limits, should be taken as a sign: we can’t have much “confidence” in our confidence limits.
Perhaps at this point you’re thinking, “What’s wrong, Tamino? All this text and no graphs?” Fear not.
I retrieved some data for survival of AIDS patients. It’s only an illustrative sample, and I even limited it to those who did not have a history of IV drug use. That left 51 subjects, and for 9 of them the data were censored but for the other 42 the time of death was known. Here’s the plot of the KM estimate, with the “plain” (95%) confidence limits shown as dashed lines:
The little vertical lines (tick marks, if you like) mark the times at which subjects were censored, i.e. were lost track of in the study.
This also makes evident another property of the KM estimate, which I consider a shortcoming, that it’s not a continuous function of time. That’s the way it is; between event times the numbers don’t change so the KM estimate doesn’t change either. But it’s implausible that the survival function is constant from one moment of death until the next is imminent. Some have attempted to deal with this by smoothing the KM estimate, but my new method takes a different approach.
How do those plain confidence limits compare to alternatives? I won’t bother showing the versions using Peto’s method, there are enough already with Greenwood’s. Here are the plain (already shown), log, and log-log estimated confidence intervals (all 95% confidence):
There are clearly visible differences among them (of course). What’s surprising is how big the differences are. It’s easier to see if we zoom in, say, on the earliest times, the first six months:
Just after month 1, the upper confidence limit by the log-log method is more than twice as far from 1 as the other two estimates. The lower confidence limit is also substantially different. Here’s a zoom-in of the latest times, the last months:
For the lower confidence limit it’s the “plain” estimate that is the “odd man out.” It even falls to zero at month 57 (it gets truncated there). But for the upper confidence limit it’s the log estimate that’s the extreme. In both cases, the differences can well be described as substantial.
Regardless which is the odd-ball or how far apart the estimates are, it’s an open question which one is closest to being “right.” Until now, I believe. I think my new method doesn’t just give good answers, at certain moments of time it gives the correct answers.
That’s why I’ve written it up and submitted to the Journal of the American Statistical Association. I chose them for two reasons. First, it’s a top-flight statistics journal. Second, it’s where Kaplan & Meier published their groundbreaking paper over 50 years ago.
And what’s the new method, you wonder? There’s another post coming soon.