Background
There are numerous methods available to develop clinical prediction models to estimate the risks of a nominal polytomous outcome. A comprehensive evaluation of the most appropriate method has not yet been undertaken. We compared the predictive performance of a range of models in a simulation study and illustrate how to implement them with shrinkage in a clinical example.
Methods
Performance of models [One-versus-All with normalisation (OvA-N), One-versus-One with pairwise coupling (OvO-PC), two types of continuation ratio regression (c-ratio and c-ratio-all) and multinomial logistic regression (MLR)] was evaluated by calibration, discrimination and magnitude of overfitting. We considered two data generating mechanisms and 4 underlying data structures to allow us to evaluate how robust each method was to model mis-specification.
Results
At large sample sizes OvO-PC and MLR had comparative calibration across all scenarios. When the models were misspecified, MLR and OvO-PC had the best calibration, followed by c-ratio-all, and then c-ratio. Discrimination of all methods was similar across most scenarios, however c-ratio had poor discrimination in certain scenarios. At small sample sizes MLR and c-ratio had a similar level of overfitting, and OvO-PC suffered from the highest levels of overfitting. MLR and c-ratio-all had the best calibration, followed by OvO-PC and then c-ratio. OvA-N had the worst performance at both large and small sample sizes.
Conclusions
We recommend MLR for developing clinical predictions models for the prediction of polytomous outcomes as it was the most robust to model misspecification and suffered from the joint smallest levels of overfitting.