In the last post I mentioned that I’ve come up with a new way to estimate the survival function. I first got the idea years ago while studying survival analysis. But recently I became aware just how troublesome it is to get good estimates of confidence limits, so I elaborated on the idea and came up with what I think is the answer.
Of course I could be wrong. And of course it’s possible somebody else already thought of it, and it’s published somewhere I’m not aware of. That’s the way science works. But I have submitted a manuscript for publication, so I should find out before too long. In the meantime, I’ll share it here.
The Kaplan-Meier (KM) method estimates the survival function between events. After one event but before the next, it uses binomial statistics to record the maximum-likelihood estimate, which is also an unbiased estimator of the number of events remaining. Between events the numbers don’t change, so neither does the KM estimate.
One idea I came up with was to estimate the survival function, not between events, but at the moment of events. And I decided to do so with order statistics. This is possible because, even though the times of event follow some unknown probability distribution, the actual values of the survival function follow a known distribution: the uniform distribution on the interval [0,1]. This enables us to compute the exact probability density function (pdf) for the jth ordered value. For those interested, it’s this:
in which z is the value of the cumulative distribution function (which is 1 minus the survival function). From this we can compute numerous estimates of the survival function at that moment (not between times). One is the mode of this distribution, the single most likely value
Another, which is the one I selected, is the mean of the distribution, which gives the minimum expected mean-squared-error:
In any case, it differs slightly from the KM estimate. More to the point, it applies to the moment of event, not between events (have I mentioned that often enough?).
It is sometimes said that the KM estimate is not only maximum-likelihood, it’s also unbiased. However, that’s not actually true. It’s based on the unbiased estimate of the number of observed events as a function of the survival function, but that’s not the same as an unbiased estimate of the survival function as a function of the number of observed events. That idea is central to the theme of Bayesian statistics. In fact another approach is simply to form a Bayesian estimate of the survival function at the time of event.
The bugaboo about Bayesian is the choice of prior probability. But in this particular case, there’s an obvious, and (I think) correct choice, because we know a priori that the survival function at some observed moment, viewed as a random variable, follows the uniform distribution U[0,1]. Therefore the clear choice is a uniform prior. If you use that, you get the same formula for the pdf of the survival function at the jth event as already given. It’s always satisfying when more than one approach leads to the same result.
And now for one of the good things: by estimating the survival function at event moments rather than between, it becomes easy to estimate it between events by interpolation. I chose linear interpolation because of its simplicity, but other choices might be better (that’s a topic for further study). By doing so, we get a continuous function rather than the discontinuous step-function of the KM estimate.
And now for the best thing: at the event times we don’t just have an estimate of the survival function (and its standard error too), we have the actual pdf for it. That means we can compute, directly, the confidence interval (Bayesians refer to it as a credible interval) for any confidence level we want. These values too can be extended to cover times between events by linear interpolation.
This gives us confidence limits which are not just sensible (never below zero or above 1), they’re actually correct at the event times, and should be quite close to correct in between. There’s no need for six different ways to estimate confidence limits — just one, the right one.
So far, it’s just a Bayesian approach to binomial statistics, with the added twist of viewing the data as order statistics and estimating at the moment of one of them rather than between times. That’s a fine thing to do because other methods of estimating confidence limits for the binomial probability suffer from a lot of drawbacks. But I believe the Bayesian approach to binomial has been done already. The original part (as far as I know) is working out a way to deal with censored data. The details are a bit technical so I won’t discuss them here.
So, how do the results compare to the KM estimate? Let’s start with some artifical data consisting of 9 event times from 1 to 9, without any censoring. Here’s the KM estimate with 95% confidence limits by the “plain” (in black), “log” (in green), and “log-log” (in cyan) methods:
It’s plain to see that the three methods don’t agree. In fact they differ by rather a lot. The “log” method gives the most implausible upper confidence limit, which is equal to 1 past the 3rd event and almost equal to 1 up to just before the 5th event. The idea that the survival function even might be equal to 1 after 3 of 9 deaths have already occured is not just implausible, it’s impossible. The plain method also has the upper limit at 1 even after the 2nd event, similarly impossible. At late times, the log method is way above the other estimates.
For lower confidence limits, at early times it’s the “log-log” method which is the “odd man out.” It’s nearly twice as far below 1 as the others.
What about the new estimate? Let’s add it to the graph, in red — along with the revised estimate of the survival function itself:
I regard the new way as correct, so this enables us to evaluate the accuracy of the existing methods. At early times “log-log” is best for upper confidence limits, in fact it’s not too bad, the others have that unrealistic behavior. But for lower limits at early times “log-log” is worst. The others aren’t very good either. They’re quite close at the times of event, but between times they persist at too-high values.
At late times, the “log” method has an upper limit that’s way too high, the others are too low especially at the moment of event, but get closer as one approaches the next event. For lower limits they’re all not too bad, but not too good either.
At the final event the KM estimate doesn’t even provide confidence limits. That’s a property of the Wald interval for binomial statistics. Another factor, one which I don’t like, is that at the final event the KM estimate itself (quite apart from confidence limits) is zero. In most cases it’s hard to believe, in some cases it’s impossible, that just because the final death was observed at that time the probability of survival past that is “no chance at all.”
To get an idea of results with censoring, let’s look again at the HIV data shown in the last post. Here are the results, again using black for “plain,” green for “log,” and cyan for “log-log” intervals, together with red for both the new survival function estimate and its confidence interval:
They’re all pretty close, as they must be. But we can get a better idea of the differences if we zoom in on early and late times. Here’s the first six months:
All the usual methods tend to have limits that are too high, for both upper and lower limits, except the “log-log” estimates which seem to me to be the best of the lot. Here’s what we get at late times:
The “log” method looks like the loser for the upper confidence limit, the “plain” method is worst for the lower confidence limit. Overall, the “log-log” method seems to give best results overall among the traditional methods. But the new method, I believe, is actually correct.
Which traditional method is best can actually vary from case to case, depending on the sample size and on the number and distribution of censored data. That can make it quite problematic to choose among them. But, if I’m correct, the new method solves the problem. It never gives unrealistic results. It provides continuous functions, not discontinuous step-functions. And one needn’t choose among as many as six ways to determine confidence limits, the one will do.
Its advantages make it, in my opinion, the clear winner. But there’s no doubt that I’m biased; after all, it’s my baby. Only time will tell if the new method survives.
How would you do “real world” testing?
You could get a large data set, so that you have good estimates of values and confidence intervals, and then take a random subset. Then run your method and the others on the subset, and see which gets closer to the large sample values.
Bon chance! It does look promising.
I think I’ve been able to recreate the artificial data mean and confidence intervals using the method you’ve described (I don’t know anything about dealing with censored data so I didn’t try any of that), but given what I’m seeing from my code and from your picture I have a question. For that data, what if the distribution of the 9 events was not uniform? It seems that the mean estimates derived from = j / (n+1) are visually nice because of this, but if a lot of people (say) died at 3 months, and then a fraction lingered to 9, how are the values informed by this fact? f_j(z) does not appear to be dependent on the data; but shouldn’t it be?
[Response: The estimated values don’t depend on the times, only on their order — and the same is true of the KM estimate. But the times at which those values apply (and are plotted) *do* depend on the data (just as for the KM estimate).]
OK, I think I understand it now. If I may rephrase: if a lot of people died at 3 months, then the spot in the order at which the first 5-month person died would be different than the spot it would be at if the distribution of deaths was uniform. So I’d choose a different j value than 5, based on the spot in the ordering?
[Response: The j values are the order of events, but the “t” values are the times. The j values alone determine the estimated survival function (and confidence limits) but the t values determine where they apply, and are plotted.
I see now that for my artificial data example, choosing the time values to be the same as the j values was confusing. I should have chosen random time values, and of course just assigned j values based on their order; I think that would have been much clearer. Look closely at the graphs for the HIV data, it may make more sense than my artificial example.]
looks good this far, I’ll bet pretty many microprocessors will be running at lower temps should this be accepted and applied. will be interested on the article too, please post a note when it’s published, Intermittent data is a problem for many.
I suggest using a Beta prior instead of the Uniform. You don’t really lose anything by doing so — since a Uniform(x; 0,1) is a Beta(1,1) — and then have a way of incorporating prior information. Better, because the Beta is the conjugate prior for the Binomial, you get a compound distribution in closed form, the Beta-Binomial as its posterior: https://en.wikipedia.org/wiki/Beta-binomial_distribution. (Also see these notes: http://web.as.uky.edu/statistics/users/pbreheny/701/S13/notes/1-15.pdf) AND there’s a physical mechanism, the Polya Urn Model, which also applies to your model, with the Uniform prior, but it’s specialized.
See also Example 3 and following beginning on page 335 of http://www.glicko.net/research/glickman-vandyk.pdf
I also believe there is some discussion of this in the last comment at
Great stuff. looking forwards to the paper.
Take also a look at “Interval estimation for a binomial proportion”:
Click to access 2001a%20Interval%20estimation%20for%20a%20binomial%20proportion%20(with%20T.%20T.%20Cai%20and%20A.%20DasGupta).pdf