In the last post I mentioned that I’ve come up with a new way to estimate the survival function. I first got the idea years ago while studying survival analysis. But recently I became aware just how troublesome it is to get good estimates of confidence limits, so I elaborated on the idea and came up with what I think is the answer.
Of course I could be wrong. And of course it’s possible somebody else already thought of it, and it’s published somewhere I’m not aware of. That’s the way science works. But I have submitted a manuscript for publication, so I should find out before too long. In the meantime, I’ll share it here.
The Kaplan-Meier (KM) method estimates the survival function between events. After one event but before the next, it uses binomial statistics to record the maximum-likelihood estimate, which is also an unbiased estimator of the number of events remaining. Between events the numbers don’t change, so neither does the KM estimate.
One idea I came up with was to estimate the survival function, not between events, but at the moment of events. And I decided to do so with order statistics. This is possible because, even though the times of event follow some unknown probability distribution, the actual values of the survival function follow a known distribution: the uniform distribution on the interval [0,1]. This enables us to compute the exact probability density function (pdf) for the jth ordered value. For those interested, it’s this:
in which z is the value of the cumulative distribution function (which is 1 minus the survival function). From this we can compute numerous estimates of the survival function at that moment (not between times). One is the mode of this distribution, the single most likely value
Another, which is the one I selected, is the mean of the distribution, which gives the minimum expected mean-squared-error:
In any case, it differs slightly from the KM estimate. More to the point, it applies to the moment of event, not between events (have I mentioned that often enough?).
It is sometimes said that the KM estimate is not only maximum-likelihood, it’s also unbiased. However, that’s not actually true. It’s based on the unbiased estimate of the number of observed events as a function of the survival function, but that’s not the same as an unbiased estimate of the survival function as a function of the number of observed events. That idea is central to the theme of Bayesian statistics. In fact another approach is simply to form a Bayesian estimate of the survival function at the time of event.
The bugaboo about Bayesian is the choice of prior probability. But in this particular case, there’s an obvious, and (I think) correct choice, because we know a priori that the survival function at some observed moment, viewed as a random variable, follows the uniform distribution U[0,1]. Therefore the clear choice is a uniform prior. If you use that, you get the same formula for the pdf of the survival function at the jth event as already given. It’s always satisfying when more than one approach leads to the same result.
And now for one of the good things: by estimating the survival function at event moments rather than between, it becomes easy to estimate it between events by interpolation. I chose linear interpolation because of its simplicity, but other choices might be better (that’s a topic for further study). By doing so, we get a continuous function rather than the discontinuous step-function of the KM estimate.
And now for the best thing: at the event times we don’t just have an estimate of the survival function (and its standard error too), we have the actual pdf for it. That means we can compute, directly, the confidence interval (Bayesians refer to it as a credible interval) for any confidence level we want. These values too can be extended to cover times between events by linear interpolation.
This gives us confidence limits which are not just sensible (never below zero or above 1), they’re actually correct at the event times, and should be quite close to correct in between. There’s no need for six different ways to estimate confidence limits — just one, the right one.
So far, it’s just a Bayesian approach to binomial statistics, with the added twist of viewing the data as order statistics and estimating at the moment of one of them rather than between times. That’s a fine thing to do because other methods of estimating confidence limits for the binomial probability suffer from a lot of drawbacks. But I believe the Bayesian approach to binomial has been done already. The original part (as far as I know) is working out a way to deal with censored data. The details are a bit technical so I won’t discuss them here.
So, how do the results compare to the KM estimate? Let’s start with some artifical data consisting of 9 event times from 1 to 9, without any censoring. Here’s the KM estimate with 95% confidence limits by the “plain” (in black), “log” (in green), and “log-log” (in cyan) methods:
It’s plain to see that the three methods don’t agree. In fact they differ by rather a lot. The “log” method gives the most implausible upper confidence limit, which is equal to 1 past the 3rd event and almost equal to 1 up to just before the 5th event. The idea that the survival function even might be equal to 1 after 3 of 9 deaths have already occured is not just implausible, it’s impossible. The plain method also has the upper limit at 1 even after the 2nd event, similarly impossible. At late times, the log method is way above the other estimates.
For lower confidence limits, at early times it’s the “log-log” method which is the “odd man out.” It’s nearly twice as far below 1 as the others.
What about the new estimate? Let’s add it to the graph, in red — along with the revised estimate of the survival function itself:
I regard the new way as correct, so this enables us to evaluate the accuracy of the existing methods. At early times “log-log” is best for upper confidence limits, in fact it’s not too bad, the others have that unrealistic behavior. But for lower limits at early times “log-log” is worst. The others aren’t very good either. They’re quite close at the times of event, but between times they persist at too-high values.
At late times, the “log” method has an upper limit that’s way too high, the others are too low especially at the moment of event, but get closer as one approaches the next event. For lower limits they’re all not too bad, but not too good either.
At the final event the KM estimate doesn’t even provide confidence limits. That’s a property of the Wald interval for binomial statistics. Another factor, one which I don’t like, is that at the final event the KM estimate itself (quite apart from confidence limits) is zero. In most cases it’s hard to believe, in some cases it’s impossible, that just because the final death was observed at that time the probability of survival past that is “no chance at all.”
To get an idea of results with censoring, let’s look again at the HIV data shown in the last post. Here are the results, again using black for “plain,” green for “log,” and cyan for “log-log” intervals, together with red for both the new survival function estimate and its confidence interval:
They’re all pretty close, as they must be. But we can get a better idea of the differences if we zoom in on early and late times. Here’s the first six months:
All the usual methods tend to have limits that are too high, for both upper and lower limits, except the “log-log” estimates which seem to me to be the best of the lot. Here’s what we get at late times:
The “log” method looks like the loser for the upper confidence limit, the “plain” method is worst for the lower confidence limit. Overall, the “log-log” method seems to give best results overall among the traditional methods. But the new method, I believe, is actually correct.
Which traditional method is best can actually vary from case to case, depending on the sample size and on the number and distribution of censored data. That can make it quite problematic to choose among them. But, if I’m correct, the new method solves the problem. It never gives unrealistic results. It provides continuous functions, not discontinuous step-functions. And one needn’t choose among as many as six ways to determine confidence limits, the one will do.
Its advantages make it, in my opinion, the clear winner. But there’s no doubt that I’m biased; after all, it’s my baby. Only time will tell if the new method survives.