Joint modeling versus discriminant analysis for dynamic 1 prediction of a binary outcome based on longitudinal 2 data: a simulation study

Background: In literature, much emphasis has been placed on new statistical methods for 14 dynamic risk prediction using longitudinal biomarker information. However, few studies 15 have compared their performance for predicting a long-term binary outcome. In this paper 16 we perform a simulation study to compare the dynamic predictive performance of two 17 commonly used methods namely: longitudinal discriminant analysis (LoDA) and joint 18 modeling (JM-Bin) for longitudinal and binary data. 19 Methods: Motivated by a real-world dataset, we simulate different scenarios in which we 20 changed the event rate (i.e. the percentage of patients) and the subject-specific variability 21 of the biomarker to assess their influence on the predictive performance of the two 22 approaches. More specifically, we allow the variability to be different between the subjects 23 based on their outcome, i.e., the between- and/or within- subjects variability is larger in 24 patients compared to healthy subjects. Time-dependent predictive measures (mean squared 25 error of prediction, area under ROC curve) are used to evaluate the dynamic predictive 26 performance. 27 Results: Results show that LoDA produces more accurate predictions than JM-Bin in most 28 of the simulation scenarios. In general, increasing the biomarker’s between-subjects 29 variability reduces the predictive accuracy of both approaches to the same extent. The 30 increase in the number of events does not influence the prediction accuracy of both 31 methods. 32 Conclusions : The predictive performance of LoDA is especially better than JM-Bin when 33 the biomarker’s (within- and/or between-subjects) variability is different between the outcome groups.


38
In many medical applications, longitudinal biomarker data can serve as predictor for a future 39 clinical outcome, for example the occurrence of a disease. In some applications the interest is 40 to use at each moment in time all collected information of a patient to obtain (up to date) 41 predictions of the risk of that disease, and to revise these predictions whenever new information 42 is available. For example, the prostate-specific antigen (PSA) levels are used to provide 43 dynamic predictions of the future development of prostate cancer (1), and the measurement of 44 blood pressure at antenatal appointments is used to identify risk of pre-eclampsia in pregnant 45 women. 46 In the past, only the most recent information (obtained at the most recent follow up visit) was considered in predicting patients' risk of a clinical outcome. All previously gathered 48 information was not considered, which could be an inefficient use of data. For example, in cases (LoDA) for two groups as a special case of the pattern mixture models (PMM). 61 In a previously published paper we presented a tutorial for using different SREM approaches 62 to obtain dynamic prediction for a binary outcome (2). The presented SREM approaches consist 63 of two sub-models. The first sub-model is a linear mixed model fitted to the longitudinal 64 biomarker data, while the second is a binary logistic regression for the clinical outcome where 65 the predicted random effects from the first model are used as covariates. The two sub-models 66 are either linked in separate steps through a two-stage approach or by using the joint modeling 67 (JM-Bin) approach where simultaneous estimation of parameters for both the longitudinal 68 biomarker and the binary outcome model is done. The joint modeling approach is a novel 69 statistical tool that not only can be used to predict a binary outcome but also it can accommodate 70 categorical, count, continuous and survival outcomes when using the suitable regression model. 71 We recommended applying joint modeling approaches above two stage methods in applications 72 where the biomarker data are measured with large error or/and when the variability in the 73 biomarker data between the study subjects is large.

74
The LoDA framework models the longitudinal biomarker distribution using a mixed effects 75 model separately for patients and healthy subjects in available, historic data. Then, for a new 76 subject, it estimates the probability of the biomarker measurements given the disease status and of studies that compare their predictive performance. Therefore, in this study we aim to compare 87 the dynamic predictive performance of these two approaches and to explore the situations where 88 one of them outperforms the other. We will focus on the situation where we have one 89 longitudinal biomarker and one binary outcome. We will investigate the influence of changes 90 in the event rate (i.e. the percentage of patients) and the subject-specific variability of the 91 biomarker on the predictive performance of the two approaches. We will do that using various 92 simulation scenarios informed by a real-world dataset.

93
The rest of this paper is organized as follows: section 2 briefly reviews the general settings of where ( ) is the biomarker measurements for the subject measured at time point , the joint model combining the two sub-models can be written as follows: (2). We apply the MCMC technique, where two chains are initiated with 1,000 burn-in 138 iterations and are run for 10,000 iterations. We use rjags package (version 4.9) which provides 139 an interface between R software (version 3.6.2) and the JAGS library to perform the analysis 140 (7-9).  are higher and more stable compared to the hCG levels for the healthy subjects, which start at 171 lower levels and decrease with time. These data have been analysed elsewhere (2, 10, 11). 172 We generated the data using parameters for different scenarios described in Table 1   The above scenarios were repeated for different event rates (i.e. percentage of patients) of 10%, 30% and 50%.

221
In this paper, we have compared two approaches to obtain (dynamic) predictions for a future

232
When the subject-specific variability is noticeably higher in the patients compared to the 233 healthy subjects, the LoDA provides higher prediction accuracy when compared to JM-Bin.

234
The difference is especially large when both the within-and the between subjects' variability 235 is larger in only one of the groups (in our study is the patient's group). This could be due to the 236 structural difference between the two approaches. The joint model will not capture the 237 differences in variability between the two groups since we only fit one mixed effects model for   288 We used only summarized data from the following paper :     The area under the ROC curve for JM-Bin and LoDA using the scenarios from Table 1 and a 30 % event rate.