Prediction Models of Tumor Growth Trajectories Based on Pretreatment CT Images of TKI-Treated Lung Cancer Patients


 We have developed prediction models of tumor growth trajectories (TGTs) based on pretreatment computed tomography (CT) images, prior to targeted therapy with four specific tyrosine kinase inhibitors (TKIs)—erlotinib, gefitinib, afatinib, and osimertinib—for epidermal growth factor receptor (EGFR)-mutated non–small-cell lung carcinoma (NSCLC) patients. TGTs of the time-variant number of tumor cells were predicted for individual patients with tumor growth equations under the assumption that each tumor could contain three cell types: TKI drug-sensitive, drug-persister, and drug-resistant populations. Seven parameters of the TGT models were estimated by support vector regression, which learned the relationships between principal component features and the referenced parameters that were optimized by a Levenberg–Marquardt method. The Spearman correlation coefficient was employed to evaluate the correlation in the number of tumor cells at each CT acquisition timepoint between the reference numbers (derived from CT images) and predicted numbers. The average Spearman correlation was 0.822 (p=0.073) for a training dataset (27 treatments) and 0.818 (p=0.042) for a test dataset (10 treatments). The proposed models could predict TGTs for TKI non-treated patients, thereby estimating how lung tumors will respond to specific TKI drugs to optimize the selection of treatment strategies.


Introduction
Lung cancer is the commonest malignant cancer (11.6% of the total cancer cases) worldwide, with an associated mortality rate of 18.4% 1 . Targeted therapy with tyrosine kinase inhibitors (TKIs) is the most efficient treatment modality for controlling tumor growth in stage III or IV non-small cell lung carcinoma (NSCLC), which accounts for more than 80% of lung cancer cases 2 . TKIs mainly target the epidermal growth factor receptor (EGFR) inside the tumor to block the growth factor signal, thereby causing tumor cells to die. Unfortunately, tumor response to first-line agents, which includes first-and second-generation TKIs such as erlotinib, gefitinib, and afatinib, is unexpectedly weakened within 10-12 months. This is because of the emergence of TKI-resistant cells due to the secondary EGFR mutation T790M, which appears in more than half of all cases 3 . Therefore, the second-line TKI, osimertinib (third-generation TKI), should be applied in a timely manner for patients who make the transition from complete/partial response (CR/PR) to stable/progressive disease (SD/PD). However, there are two challenging issues, as follows: (1) the prediction of the tumor response to specific TKI drugs (e.g., CR/PR or SD/PD); and (2) the prediction of the transition period from the CR/PR to PD. One of approaches to solve this problem would be the prediction of the tumor growth trajectories (TGTs) with regard to time, and this approach could improve the selection of lung cancer treatment strategies prior to targeted TKI therapies.
Studies on the estimation of TGTs after targeted TKI therapies for EGFR-mutated NSCLC patients 4,5,6,7 . Michor et al. 4 developed a mathematical model which describes the four layers of the differentiation hierarchy of the hematopoietic system in NSCLC patients with chronic myeloid leukemia treated by breakpoint cluster region-abelson (BCR-ABL) TKI-imatinib. The authors calculated the probability of developing imatinib resistance (BCR-ABL) mutations and estimated the time until the detection of resistance, thereby providing the first quantitative insights into the in vivo kinetics of a human cancer. Foo et al. 5 , in contrast, constructed a simple mathematical model using birth and death processes to describe the evolution of resistant cells during a targeted anticancer therapy for NSCLC patients. Their model aimed to develop a methodology for designing optimal drug administration strategies to minimize the risk of resistance in clinical practice. With an effort to discover the emergence of resistance factors, Mumenthaler et al. 6 investigated the effect of local microenvironmental factors such as drug concentration, oxygen, and glucose on the evolution of erlotinib-resistant cells by exposing the same cancer cells (either drug-sensitive or drug-resistant) to specified conditions for observing the changes in cell behaviors in terms of growth and response to the therapy. In each microenvironmental factor, a multitype time-inhomogeneous stochastic branching process model was developed to describe the population of cancer cells. By considering the abnormal growth of post-treatment resistant population caused by T790M mutation, Grassberger et al. 7 applied a mathematical Gompertz growth equation combined with a cell loss model to describe the TGTs for a TKI (erlotinib) treatment by using tumor volume data calculated from pretreatment and post-treatment computed tomography (CT) images. In Grassberger's study, the assumption was that the lung cancer tumor contains three types of cell components: TKI drug-sensitive cells, which can be killed by TKIs; TKI drug-resistant cells, which are resistant to TKIs; and drug-tolerant persister cells, which can potentially mutate into resistant cells via a T790M mutation. Accordingly, Grassberger successfully estimated the initial fraction of persister and resistant populations to administer treatment strategies.
The numbers of cancer cells estimated both before and after the targeted TKI therapy were required to predict the TGTs in earlier studies and, therefore, TGTs for new NSCLC patients cannot be predicted prior to the TKI therapy using the abovementioned approaches. Meanwhile, prediction models of tumor responses before the application of TKIs have been increasingly demanded for the timely selection of a more appropriate treatment strategy for each patient.
Therefore, in this study, we developed prediction models of TGTs to estimate the responses of NSCLC tumors based on pretreatment CT images obtained before TKI (i.e., erlotinib, gefitinib, afatinib, and osimertinib) treatment. We hypothesized that quantitative imaging features ascertained from pretreatment CT images can determine the TGT models for individual patients.

Results
The prediction models of TGTs developed in this study is described in Fig. 1, wherein support vector regression (SVR) is used, because the SVR showed the best results for the prediction of TGTs among two machine learning models, that is, the SVR and random forest (RF), in this study.
In this approach, the input information comprised the pretreatment CT images and the name of the TKI. Tumor contours were manually delineated on the pretreatment CT images. Initial       Table S1

Discussion
The proposed model showed a suitable potential for predicting the tendency of TGTs of patients who were treated with TKIs or their combinations based on the pretreatment CT images. Tumor response prediction prior to the initiation of TKI treatment could increase precision of TKI selection as well as selection of timepoints for switching to a different TKI during the treatment in clinical practice.
The pretreatment CT images to calculate the imaging features evaluated in this study were obtained using four scanners made by different manufacturers in Kyushu University Hospital (KUH) as follows: GE Medical (10 treatments, PET/CT), Toshiba (16 treatments), Philips (7 treatments), and Siemens (4 treatments) ( Table 3). In clinical practice at the study center, lung cancer is diagnosed mainly based on conventional CT images that have a higher image quality (resolution and noise) rather than on CT images in PET/CT scanners. Despite the use of different types of CT scanners to obtain the pretreatment images, we predicted TGTs with relatively high correlations. Therefore, the proposed model may have the robustness to predict the TGT whereas overcoming the different imaging qualities of multiple CT scanners.
Earlier studies included patients with the T790M mutation, whereas this study cohort comprised participants with 19del, L858R, and T790M mutations who were treated with targeted TKI therapy under a defined clinical protocol. Other mutations and wild-type lesions were excluded from this study. Nonetheless, the proposed model can be applied to other mutation types, such as ALK, KRAS, BRAF, and so on. Further studies are needed to extend the application of the model to hematological malignancies.
We developed TGT prediction models for four TKIs-erlotinib, gefitinib, afatinib, and osimertinib-that have been specified for NSCLC EGFR-mutation treatments. It is necessary to apply the proposed model for other drugs approved by drug administration offices, such as the Food and Drug Administration, for other mutations in NSCLC patients. We believe that this approach will validate the TGT prediction methods for managing the targeted TKI therapeutic strategies.
This study has two main limitations. First, the number of patients included in this study was small. The total number of patients from both the training and test datasets was 24, which was larger than that in an earlier study 7 (17 patients). The proposed model could predict the tumor growth tendencies during the specific TKI treatments with average Spearman correlation coefficients of 0.822 and 0.818 in training and test datasets, respectively, although average MAPEs of 13.6% (training) and 33.2% (test) were reported. These large errors could be reduced by increasing the number of patients. Second, the primary criterion for selecting the patient cohort was patients with only targeted TKI therapy during the treatment duration. However, the combination of targeted therapy with other treatment modalities, such as radiation therapy or chemotherapy, has been proved to ensure the best treatment outcomes. 8,9 Therefore, we suggest that future studies should develop strategies for prediction of complete tumor responses after combination treatment to provide physicians with more complete and appropriate information for treatment selection. We believe that this is a potential subject for our future research.

Conclusions
The TGT models for predicting the tumor response to specific TKI drugs using pretreatment CT could predict TGTs for non-TKI-treated patients, thereby facilitating the estimation of how lung tumors will respond to specific TKI drugs to enable the selection of optimal treatment strategies.

Clinical data
The study protocol was approved by the Kyushu University Certified Institutional Review Boards for Clinical Trials, and informed consent waiver was granted because this research was a retrospective study. We confirmed that all methods were performed in accordance with the relevant guidelines and regulations of the institutional review boards. The patient information is summarized in Table 1. There were 24 patients with 37 treatments who were selected from 100 NSCLC patients in the KUH patient database based on the study eligibility criteria. We mainly focused on solid NSCLC tumors which had three EGFR mutation subtypes: L858R (8 patients),   Table 3 shows the treatment and its related information in the training and test datasets.
The study cohort was divided into four groups of specific TKI treatments. Each TKI group was then randomly subdivided into two groups, the training dataset (80%) and test dataset (20%), to develop four independent TGT models for the four TKI groups (Supplementary Fig. S1).
Therefore, those patients received 2 or 3 specific TKI treatments, which were the first, second, and third treatments. That is because patient IDs were overlapped in both training (20 patients) and test (7 patients) datasets (Supplement Tables S1-S4).  TGT models Tumor growth could be described by several mathematical models 15 . The first preference was for an exponential model; however, from a biological perspective, the exponential model was unrealistic as it implied the tumor could grow until infinity despite the nutrient limitation in the human body. Therefore, the Gompertz model was introduced, after adding the following upper limit: the tumor-carrying capacity was added to restrict the infinite-growth problem. Based on this idea, many models were subsequently proposed, including the Power Law model, Logistic model, Dynamic CC model, etc. Nevertheless, for evaluating the best TGT model, we used a statistical root mean square error (RMSE), which proved that the Gompertz model had the lowest error values compared to others 16 .
Thus, under the assumption that each tumor could contain three types of cells, drugsensitive (cells that could be killed by TKI drugs), drug-tolerant persister (cells that could mutate into resistant cells), and drug-resistant (cells that resist the effect of TKI drugs) cells, a TGT model ( Fig. 5) was developed from the original Gompertz equation 17 : where ( ) was the total number of tumor cells at a timepoint , was the growth rate parameter, and was the tumor-carrying capacity, which was set to 4.09 × 10 12 cells (constant) in this study 7 .
Each cell population had their own Gompertz equation to describe their growth. In particular, for the drug-sensitive population, the Gompertz equation was combined with a cell loss model to describe the tumor shrinkage during TKI treatment as follows 7 : with where the 1/2 (days) and (μg/mL) were the half-life and maximum concentration in a steady state of TKI drugs, and was the cell loss parameter. The values of the half-life and maximum concentration depend on the specific TKI drugs listed in Table 2. Thus, a drug-sensitive population 7 was defined as: where was the number of drug-sensitive cells, was the growth rate of sensitive cells, was the total number of tumor cells. Similarly, a drug-tolerant persister population 7 was described by the subtraction of persister cells that mutate into resistant cells from persister cells to contribute to tumor growth as follows: where was the number of persister cells, was the growth rate of persister cells, and was compound mutation probability. Simultaneously, the drug-resistant population 7 was expressed by: where was the number of resistant cells and was the growth rate of resistant cells. The right side represents the sum of the resistant cells that contribute to tumor growth and the mutated cells from persister cells.
where was the initial fraction of the persister population and was the initial fraction of the resistant population.
Determination of reference parameters in TGTs model using a Levenberg-Marquardt method In total, seven parameters were employed in this TGT model: three growth rates of sensitive, persister, and resistant populations ( , , and ), the cell loss parameter ( ), the compound mutation probability ( ), and the initial fractions of persister and resistant populations ( , ). We assumed that the persister cells mutated into resistant cells when the TKI treatment was initiated.
The tumor growth models of Eqs. (4) to (6) that include seven parameters were simultaneously optimized using the Levenberg-Marquardt optimization method 18 , which combined the Gauss-Newton algorithm and the gradient algorithm for nonlinear least squares curve-fitting problem ( Supplementary Fig. S2). The optimization constraints were restrictions from 0 to 1 for initial fractions of persister and resistant populations, three cells population growth rates, and the mutation probability.
Seven reference TGT model parameters that were optimized by Levenberg-Marquardt method for each treatment were normalized, with the mean as 0 and standard deviation as ±1, before use as the training and test input of the TGT model: where was the value of the normalized parameters, was the original value of the referenced optimal parameters, and and were the maximum and minimum values, respectively, of the referenced optimal parameters.

Prediction of TGTs determined using a machine learning with imaging features
The pretreatment CT imaging features, which comprised tumor characteristics, were used for training the machine learning program, which determined the TGTs. In total, 486 image features were calculated based on 14 histogram-based and 40 texture-based features (Supplementary Table   S5) for an original image and 8-wavelet-decomposition-filtered images. The texture-based features were calculated from the 4 texture characterization matrices of the gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), gray-level size-zone matrix (GLSZM), and the neighborhood gray-tone difference matrix (NGTDM). The 8-wavelet decomposition filters, that is, LLL, HLL, LHL, HHL, LLH, HLH, LHH, and HHH, contained the combinations of the lowpass filter (L) and high-pass filter (H) in three dimensions 19 .
In each TKI group, the PCA was applied to 486 original image features using the function "sklearn.decomposition.PCA" in Python sklearn package for machine learning 20 in the training and test datasets to select three principal component features. The best combinations of first, firstsecond, and first-second-third principal component features were selected for each TKI by minimizing the RMSE between the predicted and referenced parameters value of TGT models.
Seven parameters of the TGT models were determined by two machine learning approaches, that is, SVR and RF, which learned the relationships between the principal component features and the reference parameters and were optimized by the Levenberg-Marquardt method.
The SVR, which uses a symmetrical loss function that equally penalizes high and low misestimates, is an effective tool in a real-value function estimation 21 . The optimal SVR parameters were obtained along with leave-one-out (LOO) cross validation to minimize the cost function, which was defined as: where was the number of tumor cells predicted by the TGT model, and was the number of tumor cells derived from CT images. Three SVR parameters were optimized by using the following ranges: 10− 5 to 10 (step: 0.00001) for a gamma of the radial basic function (RBF); 1 to 10 (step: 1) for the regularization parameter C; and 10 −3 , 10 −2 , and 10 −1 for an epsilon parameter. The function "sklearn.svm.SVR" 20 and "sklearn.model_selection.GridSearchCV" 22 were used as the SVR model and the optimization tool of its parameters, respectively ( Supplementary Fig. 3).
Furthermore, the RF 23 was employed to compare the prediction outcomes with SVR. All of the abovementioned input data settings were maintained for testing the robustness of each model. The function "sklearn.ensemble.RandomForestRegressor" 20 was used for the RF model with two parameters, which were also optimized using "sklearn.model_selection.GridSearchCV".
Two RF parameters were optimized by using the following ranges: 1 to 100 (step: 1) for n_estimators (number of trees in the forest), and 1 to the maximum number of treatments minus 1 (step: 1) for max_depth (the maximum depth of the tree). The maximum value of max_depth was determined for each TKI group. The maximum values of max_depth were 6, 3, 7, and 7 for the afatinib, erlotinib, gefitinib, and osimertinib groups, respectively.
The RMSE between the predicted and referenced parameters value of TGTs model was calculated to optimize SVR parameters. Finally, the SVR models of four TKI groups were built and they were applied independently to the four test datasets.

Evaluation of proposed approach
The evaluation metrics in this study were a Spearman correlation coefficient and MAPE between the number of tumor cells derived from tumor volumes and the predicted number of tumor cells in the TGT model. The Spearman correlation coefficient evaluated the correlation between reference and predicted TGTs with the interpretation range 24 (Supplementary Table S6). In this study, the Spearman correlation coefficient could assess the tumor growth tendency, such as a transition from CR/PR to SD/PD.