Survival analysis (SA) prediction involves the prediction of the time until an event of interest occurs (TTE), based on input attributes. The main challenge of SA is instances where the event is not observed (censored). Censoring can represent an alternative event (e.g., death) or missing data.
Most SA prediction methods suffer from multiple drawbacks that limit the usage of advanced machine learning methods: A) Simplistic models, B) Ignoring the input of the censored samples, C) No separation between the model and the loss function, and D) Typical small datasets and high input dimensions.
We propose a loss function, denoted suRvival Analysis lefT barrIer lOss (RATIO), that explicitly incorporates the censored samples input in the prediction, but still accounts for the difference between censored and uncensored samples. RATIO can be incorporated with any prediction model. We further propose a new data augmentation (DA) method based on the TTE of uncensored samples and the input of censored samples.
We show that RATIO drastically improves the precision and reduces the bias of SA prediction, and the DA allows for the inclusion of high-dimension data in SA methods even with a small number of uncensored samples.