Note: See the update at the end of the post.
Since I’ve devised a new way to estimate the survival function, I wrote some code (in R) to execute it. The program is in its early stages of development, so there’s a lot of work still to be done. But one of the ways to learn about it is to let people play with it. So, I’ve decided to share my code so others can have some of the fun.
I’ll start with the main caveat: it’s slow. There’s a lot of calculation involved! And, it’s written in R which is an interpreted language rather than a compiled language like C. This means that if you have a large sample it tends to crawl, and the more heavily censored the data are, the slower it crawls. Samples of around 500 take several minutes, samples of around 1000 are painfully slow. Again, the more heavily censored, the slower it gets.
Now, to how to run it. First import it with the single command:
Then you’re ready to use it. There are two primary inputs. One is a set of times, let’s call them “t”. The other is a set of censoring indicators, let’s call them “cens”. The censoring indicators are equal to 0 for censored data, equal to 1 for event data, so all the values should be either 0 or 1. When you’ve got them, you can estimate the survival function (and assign it to a variable, let’s call it “gf”) thus:
gf = gfsurv(t,cens)
This will put the results into a data frame called “gf” with the following columns:
t = times
S = survival function
se = standard error
lower.ci = lower confidence interval
upper.ci = upper confidence interval
H = cumulative hazard function
It also produces a plot.
There’s one other option. In addition to passing it times and censoring indicators, you can also pass it a factor which separates the data into different groups. Suppose, for instance, you have a variable called “gender” which indicates male or female. You can compute separate survival functions for each gender with this call:
gf = gfsurv(t,cens,gender)
This will return a list with two parts, one for males and one for females. It will also produce a plot showing both, each in a different color. Do note that groups with fewer than 4 uncensored data will simply be ignored.
There’s also a plotting program, in case you’ve got one of those multiple-set results and you want to plot it. You can call it like this:
That instruct the program to plot the results, adjust the y-axis so it only goes from 0.5 to 1, use line width 2 for heavier lines, and NOT to plot confidence limits.
As usual, it’s entirely possible I’ve made some mistake which means it won’t run on your system even though it does on mine.
So, there it is. Except of course, the program itself. WordPress won’t allow me to upload an R program, or even a text file, so I changed the name to “gfsurv.xls” to make it think it’s an Excel file. Just change the name back, from “gfsurv.xls” to “gfsurv.r” and you should be good to go. Here ’tis:
I’ve made a few changes. The most important one is that you can now choose your own confidence level. The default is 0.95 (for 95% confidence), but if you want, say, 99% confidence you can change the call to this:
gf = gfsurv(t,cens,fact,conf.level=0.99)
The other change is that you can now use the “gfplot” routine to plot an already-computed result, even if you didn’t pass a factor so it has only one survival function in it.
Here it is, and as before I renamed it to have a suffix “.xls” just so WordPress will allow me to upload it:
As before, enjoy!