Imputing Partial Status and Estimating Incidence Rate in an Illness-death Model with 1 Application to a Phase IV Cancer Trial


 BackgroundPhase IV clinical trials are designed to monitor the long-term toxic effects of drugs in cancer survivors. Evaluations to study the long-term effects of the cancer treatment are often made with cross-sectional surveys. This leads to interval censored data since the exact time of the onset of toxicity is not known. In addition to finding prognostic factors for log-term survival outcome, estimating and comparing the cumulative incidence rates for adverse outcomes of interest for interval censored data is also desired. However, the analysis of such data is further complicated by many issues, such as incomplete data, competing risks and selection bias. For example, one such study was designed by Hudson et al. to study the effect of anthracyclines exposure, received as part of treatment for childhood cancer, to cardiotoxicity. Rai et al. had utilized a parametric approach for assessing the effect of anthracycline on the cumulative incidence of cardiotoxicity but excluded the patients with missing information on the parameters used for assessing cardiotoxicity.MethodsIn this paper our focus is on imputing the missing data and then using the current status regression methods, previously described in Rai et al. for estimating and comparing cumulative incidence rates in an illness-death/failure model.ResultsWe undertook a comprehensive simulation study to evaluate the performance of our imputation approach and applied it to a Phase IV clinical trial to evaluate the effect of anthracycline exposure on long-term cardiotoxicity in childhood cancer survivors, which had missing cardiotoxicity information. ConclusionsOur simulations suggest that the results obtained by imputing the missing values using regression methods are significantly more efficient than those obtained without imputation. The proposed approach is easy to implement, and we demonstrate its usefulness by applying it to the data reported in Rai et al. and compare the results reported there to our approach that utilizes imputation.

were deleted from our analysis and we focused our attention on imputing 34 missing AF 122 observations. Although, the imputation approach discussed could easily be applied to data missing 123 in several variables in a recursive manner. The scatter plot of AF and FS displayed in Figure 1, does 124 not show any clear missing pattern; the missing values of AF are in the entire range of values of FS. 125 Also note that the missing proportion of AF values in the NR (7/54 = 13%) and AR (27/218 = 12%) 126 were almost similar. AF are displayed in Figure 1, also do not show any pattern.

156
A crude approach to estimating the incidence rates and obtaining confidence intervals is to apply 157 the Kaplan-Meier estimator with the assumption of the evaluation time as the onset time. However, this information is available from the medical record abstraction and with longer follow-171 up the number of deaths would increase. Hence, we present the general theory here for a cross-172 sectional data with indicators of cardiac abnormality and death/cardiac failure, and time since the 173 treatment to the survey or the death/cardiac failure, as depicted in Figure 2. We also assume cardiac 174 abnormality is the precursor for cardiac failure.

214
Let the cardiac measure, such as AF, be denoted by Y. Assume that = ( 1 , 2 ) be a × 1 215 response vector with 1 ( 1 × 1) observed and 2 ( 2 × 1) missed, and = ( 1 , 2 ) be  There are several methods for imputation which can be broadly classified as single imputation or 219 multiple imputation (MI). In MI approach several copies of the complete data set are created and 220 then the appropriate statistical method is applied to each data set and the results from these analyses 221 are then combined to provide the final results. Usually, MI approaches are preferred over single imputation as they incorporate variability due to imputation 17, 29-30 . There are many MI approaches discussed in literature, but two most commonly used approaches based on joint multivariate 224 modeling or fully conditional specification perform quite well in the regression setting, as seen in 225 Huque et al 31 . It may be noted that PROC MI can perform imputations for data that have monotone 226 or arbitrary missing patterns. PROC MI with FCS option, a standard feature in SAS version 9.4 32 , 227 utilizes the conditional distribution and can incorporate both continuous and categorical variables 228 appropriately, see Liu and De 33 . In our setting we had missing values only in AF and we wanted to 229 take advantage of the relationship between AF and other covariates of interest that included 230 categorical variables. Therefore, we preferred to perform the imputations using PROC MI in SAS 231 with FCS option. The method can be briefly described as follows: 232 A multivariable regression model = + is fitted based on the complete data 1 and 1 , and 233 the least squared estimator ̂ of ( × 1) and associated variance-covariance matrix is obtained.

234
Then, missing values in Y2 are imputed using the posterior predictive distributions, see PROC MI in 235 SAS 32 for details. It is natural to use the imputation data ( 1 ,̂2) instead of only 1 and is 236 anticipated that the imputed information in ̂2 will improve the related results in statistical analysis.

237
To each complete data set likelihood ratio test was applied to compare the two risk groups (AR and 238 NR). In the regression framework one could use PROC MIANALYZE in SAS to combine the 239 results from multiple imputations to conduct inference that incorporates inherent variability 240 introduced due to imputations 17,29-30 . However, in our setting p-values associated with each 241 imputation are obtained based on the likelihood ratio test. Then, the overall conclusion can be based 242 on some type of summary measure of all the p-values such as mean or median. We prefer to report 243 the results based on median as that would be much more robust than mean.

245
Application: Cancer Survivor Study 246 In this section we obtained the imputed data for the cardiotoxicity example and applied the theory 247 for the exponential model described in appendix to evaluate the effect of anthracyclines on 248 cardiotoxicity. Furthermore, the results obtained using the imputation approach are then compared 249 with those obtained without imputation, reported in Rai et al. 2 , under the assumption of no 250 deaths/cardiac failures. The simplest model is the Parametric-1, which is one parameter Exponential 251 model. Since there are very few events before 5 years and after 10 years, we also fit two piecewise 252 Exponential models; Paremeter-2, based on two incidence rates, one up-to year five and the second 253 for year 5 and above, and Parameter-3, based on three incidence rates one up-to year 5, second 254 between years 5 and 10 and the last one for year 10 and above, (see Figure 3). given in Table 3b.The cumulative incidence function (CIF) was derived for exponential and 294 piecewise exponential models for the imputed data using the above regressions model and SAS 295 procedure (PROC MI, with m=5, 10 and 100 imputations) and were compared to those based on 296 the original data (without imputation).

297
In Table 3b, the group effects are reported for both data without imputation and with imputation for 298 m=5, 20 and 100. For imputed data, we reported the p-values of group effect as mean, minimum, 299 maximum, and median. A comparison of cumulative incidence rates can be found in Figure 4. To assess the performance of the imputation approach we conducted simulation studies as described 307 below.

308
The primary focus is on assessing the performance of imputing AF in Anthracycline Cardiac

309
Toxicity data for comparing the cumulative incidences of cardiac toxicity in the illness-death model 310 as discussed above. The detail steps are described as follows.
Step 1: From Step 2: Then, for simulation studies we first created four subgroups:  Step 4: (Incomplete Data in AF): From the sample size generated in Step 2, we randomly deleted 337 R% (R=20 or 30) of AF values, and got incomplete data with sample sizes (100-R)% .

338
Step 5: (Imputed data): Using SAS procedure PROC MI with FCS option we imputed AF values 339 and obtained a complete copy of the data set.

340
Step 6 (Calculate p-value for group Effect): For the one parameter exponential distribution, the p-341 values for group effect (comparing AR with NR) were obtained using likelihood ratio test for 342 complete (originally generated), incomplete and imputed data set.

343
Step 7: The imputation process (Step 5) was repeated 20 times to obtain 20 copies of complete data 344 sets, which resulted in 20 p-values. A description of the p-values in terms of mean, median, 345 minimum, and maximum is summarized in Table 4.

346
Step 8: Steps 2 -7 were repeated 10 times to assess the performance of the imputation approach on 347 10 independently generated data sets. The results of the simulation study are summarized in Table   348 4.

349
From The simplified forms of intensities ( ) = for = 1 or 2 and 3 ( | ) = 3 lead to ( ) = 553 − for = 1 or 2, 3 ( | ) = − 3 ( − ) and ( ) = −( 1 + 2 ) . We derive the corresponding 554 likelihood contributions from 1 ( ) to 4 ( ) for the four observation types in Table 2  It is further extended to the model to allow the intensity 1 with piecewise constant 2 . Assume two 570 intervals: less than years and above (including ) years (say, = 5) and let these two rates 571 be 11 and 12 . Then the log-likelihood function is derived as has three or more pieces.