Suppose I have constructed a prediction model for the occurrence of a particular disease in one dataset (the model building dataset) and now want to check how well the model works in a new dataset (the validation dataset). For a model built with logistic regression, I would calculate the predicted probability for each person in the validation dataset based on the model coefficients obtained from the model building dataset and then, after dichotomizing those probabilities at some cutoff value, I can construct a 2x2 table that allows me to calculate the true positive rate (sensitivity) and the true negative rate (specificity). Moreover, I can construct the entire ROC curve by varying the cutoff and then obtain the AUC for the ROC graph.
Now suppose that I actually have survival data. So, I used a Cox proportional hazards model in the model building dataset and now want to check how well the model works in the validation dataset. Since the baseline risk is not a parametric function in Cox models, I do not see how I can get the predicted survival probability for each person in the validation dataset based on the model coefficients obtained in the model building dataset. So, how can I go about checking how well the model works in the validation dataset? Are there established methods for doing this? And if yes, are they implemented in any software? Thanks in advance for any suggestions!
sumber
I know that this question is pretty old but what I have done when I encountered the same problem was to use the predict function to get a "score" for each subject in the validation set. This was followed by splitting the subjects according to whether the score was higher or lower than than median and plotting the Kaplan-Meier curve. This should show a separation of the subjects if your model is predictive. I also tested the correlation of score (actually of its ln [for normal distribution]) with survival using the coxph function from the survival package in R.
sumber