Feasibility Study of Deep Learning Based Radiation Sensitivity Prediction Model Using Gene Expression Proling

Since radiation sensitivity prediction can be used in various field, we investigate the feasibility of an in vitro radiation sensitivity prediction model using a deep neural network. A microarray of the National Cancer Institute-60 tumor cell lines and clonogenic surviving fraction at an absorbed dose of 2 Gy values are used to predict radiation sensitivity. The prediction model is based on convolutional neural network and 6-fold cross-validation approach is applied to validate the model. Of the 174 samples, 170 (97.7%) samples show less than 10% and 4 (2.30%) show more than 10% of relative error, respectively. Through an additional validation, model accurately predict 172 out of 174 samples, representing a prediction accuracy of 98.85% under the criteria of absolute error < 0.01 or the relative error < 10%. This results demonstrate that in vitro radiation sensitivity prediction from gene expression can be carried out with the deep learning technology. vitro radiation sensitivity from gene expression profiling data, based on several previously established deep learning modalities. Moreover, by comparing the performance of the resulting model with that of models from previous studies, we demonstrate the applicability and potential power of using deep learning algorithms to predict radiation sensitivity from gene expression data.


Introduction
Quantification for response prediction of normal tissue and tumor to radiation has been considered to be necessary in radiation risk assessment, radiation protection, or even radiotherapy. In radiation protection, it is basically assumed that members of the population subject to protection are equally sensitive to adverse health effects related to radiation exposure, which is the limitation of existing radiological protection practices. However, in order to improve this weakness under consideration of the various radiation sensitivity differences among members in a protection group, an accurate and robust method to evaluate the radiation sensitivity of individuals or subgroups is needed 1 . Likewise, in radiotherapy, accurate prediction of radiation sensitivity is critical for determining patient-specific treatment methods, doses, fractionation schedules, corresponding clinical outcomes, and reducing possible side effects of radiotherapy 2,3 .
From this perspective, several researchers have investigated that the sensitivity of cancer cells to radiation damage depends on the type, characteristics, and the gene expression level of the cancer cells 4 . Simultaneously, advances in gene expression profiling technology have allowed the analysis of a growing variety of genetic factors that influence gene expression in cancer cells 5 . Based on this, several recent studies have shown that in vitro radiation sensitivity can be quantitatively analyzed based on gene expression profiling data and have suggested models that can predict radiation sensitivity from gene expression data 3,4,6,7,8,9 . These studies have improved our understanding of the relationship between gene expression and radiation sensitivity. However, further discussion and research are still needed to establish a robust paradigm for predicting radiation sensitivity 4,10 .
Meanwhile, as a novel prediction and decision-making methodology, deep learning has recently emerged as a major tool for decision-making, classification, and prediction. The deep learning model updates itself using the hidden relationships between numerous data, which clearly exists but hard to represent numerically. With this characteristics, it seems reasonable to expect that deep learning can improve the performance of the prediction models when applied to in vitro radiation sensitivity prediction.
Therefore, in this study, we aimed to investigate the feasibility of a prediction model that predicts in vitro radiation sensitivity from gene expression profiling data, based on several previously established deep learning modalities. Moreover, by comparing the performance of the resulting model with that of models from previous studies, we demonstrate the applicability and potential power of using deep learning algorithms to predict radiation sensitivity from gene expression data. Fig. 1 shows the overall flowchart of the radiation sensitivity prediction model. A total of 174 samples from 59 NCI-60 cell lines and corresponding SF2 values were split into test and training sets by a 6fold cross-validation. For each round of cross-validation, the training sets were fed to the model, and the parameters were trained based on a gradient descent algorithm. After training, the test data were fed to the model, and the SF2 value of the test data was predicted. Evaluation metrics including absolute error, relative error, and prediction accuracy were calculated for the entire test set. If a sample failed to be classified as "correct prediction", such samples were subjected to additional prediction validation. If the error was still larger than the criteria after the prediction validation, these samples were classified as "prediction hard" cases. These processes were performed over the entire cross-validation test set, and the performance of the model was measured through prediction accuracy for the entire dataset obtained through this process. Table 1 shows the average of the predicted radiation sensitivity in five rounds of the 6-fold crossvalidation experiment. As shown in fig.2, of the 174 triplicated samples, 142 (81.61%), 28 (16.09%), and 4 (2.30%) samples were included in groups with relative errors of less than 2%, 2 to 10%, and 10% or more, respectively. The model correctly predicted 171 of the 174 samples, indicating that the initial prediction accuracy of the model was 98.28%. Three samples (red points in the Fig. 3; one each from the cell lines MOLT-4, MDA-MB-435, and HL-60) with abnormally larger error (527.59%, 129.11%, and 72.88% of relative error, respectively) were subjected to prediction validation and the predicted SF2s were 0.302, 0.362, and 0.301 (relative error of 504.27%, 102.34%, and 4.54%), respectively. Therefore, one sample (from HL-60) was changed to "correct prediction", and only two samples (one each from MOLT-4 and MDA-MB-435) that produced a relative error larger than 10% were classified as final "prediction-hard" cases. Overall average, standard deviation, absolute and relative error, and the other detailed information can be found in table S1.

Performance of model and validation of prediction
As shown in Fig. 3, the SF2 value predicted by our model and the true (measured) SF2 value had a distinct linear correlation, indicating that the model successfully predicted the radiation sensitivity of the cell lines from their gene expression data (95% CI: 0.9834 to 0.9909, Pearson's r = 0.9877).
The average relative error and absolute error of the "correct prediction" samples were 1.351 ± 1.875% and 0.00596 ± 0.00638, respectively (n=172). In contrast, the relative errors of the "prediction-hard" cases were 102.34% (MDA-MB-435) and 504.27% (MOLT-4), and the absolute errors were 0.1832 and 0.2521, respectively. The overall prediction accuracy after the validation of prediction was 98.85% (172 out of 174 were correct), and the RMSD was 0.0252 with prediction-hard cases and 0.00867 without prediction-hard cases.

Discussion
Deep learning is a recently emerging research field that has gained prominence as hardware has advanced. It is widely used for decision-making, prediction, and classification. With this perspective, we proposed the feasibility of deep learning as a novel methodology by developing a deep learningbased in vitro radiation sensitivity prediction model from gene expression with an accuracy of 98.85%. This is the first study to attempt to use deep learning in in vitro radiation sensitivity prediction.
In their analyses of model accuracy, Torres-Roca et al. and Zhang et al. who similarly tried to predict the radiation sensitivity of the NCI-60 cancer cell lines both set the criteria of the "correct prediction" when the predicted SF2s were within 10% of the true (measured) values 4,9 . With these criteria, they proposed models with an accuracy of 62% (22 out of 35) and 91% (54 out of 59), respectively.
Comparably, in our study, 172 samples out of 174 samples were correctly classified using the similar but more reasonable criteria described in material section, representing a 98.85% accuracy. Moreover, the RMSD of our deep learning model was 0.0251 with the prediction-hard cases and 0.00867 without the prediction-hard cases, compared to 0.2 described by Torres-Roca et al., or 0.011 of Zhang et al. 4,9 . These results indicate that biological nonlinear complex interactions that influence the radiation sensitivity of a cell are likely to be well represented by a deep learning model.
In case of the three samples with large errors in our study, one each from the cell lines MOLT-4, MDA-MB-435, and HL-60, these were subjected to prediction validation because it might not be due to merely variance and bias over trials. This was supported by the fact that the fluctuation, represented by the standard deviation of the prediction of each round of the experiment of these data, was not significantly different compared to the other samples, and the other samples in the same cell line showed a relatively low error and the prediction.
As a result of this prediction validation, the sample from HL-60 cell line showed a significantly improved prediction error. Hence, it can be inferred that the large error of the HL-60 sample from the initial prediction appears to be due to a lack of training data in the training fold of initial prediction.
For the remaining "prediction-hard" cases, MOLT-4 and MDA-MB-435, we were unable to determine whether there was an insufficient amount of data to predict their SF2s correctly or if there were other possible problems that could not be investigated in this study, such as mislabeling issues. From this perspective, further research is needed to investigate these problems. What is noteworthy, even if these "prediction-hard" cases are due to a lack of training samples, the model still predicted these samples as radiosensitive. It could be considered that the model has a resistance to these "predictionhard" cases, such that the model is still able to predict whether the cell is radiosensitive or not, which is fundamentally important.
There are two major limitations of this study. First, it should be noted that in general, deep learning algorithms are fed enormous amounts of data to train the model and thereby enable the model to provide general decision making as AlphaGo does 11 . However, in this study, the limited number of cell lines sample with survival data available for training may not have fully demonstrated the overall potential of deep learning. Thus, it seems necessary to further boost the performance of this deep learning-based radiation sensitivity prediction model by additional training using a large amount of radiation sensitivity datasets from not only NCI-60 cell lines but also from other types of cancer cell lines. Second, it may be considered as a limitation of using classical microarray analysis rather than the latest gene expression profiling methodology, RNA sequencing. Though microarray is a little outdated method and is constantly being replaced to the RNA sequencing, we used microarray data to demonstrate the feasibility of deep learning aided radiation sensitivity prediction through comparison with previous studies. In this perspective, further research is needed regarding the prediction model using RNA sequencing data.
Despite these limitations, several improvements in radiation sensitivity prediction analyses are expected from our study. First, since deep learning aims to "let the data speak" without any additional step to extract the feature that represents the characteristics of the input data (as is the case in existing statistical methods), we can expect the model to learn to represent a direct and transparent relationship between the input genes since data with large dimensions are fed as an input variable 12 . Second, the deep learning model can further learn (trained) from additional data presented after training, which enables deep learning to self-correct and absorb huge amount of data to make itself more robust 13 .
Third, a characteristic of the literally "deep" model enables high-level feature learning, especially effective when it comes to handling complexly combined data such as genetic information. Therefore, the deep learning based methodology can provide better model performance compared to the conventional statistical shallow machine learning-based model, which leads more valid and accurate prediction result.
In deep learning, causes and results are the only information provided. One of their characteristics is that they maintain "black boxes" with respect to their internal processes even though they provide good results. Deep learning used in this study is also very useful for its ability to predict radiation sensitivity with high accuracy, but it cannot provide any scientific explanation for how such predictions are made. Therefore, further research is needed to reduce the non-explainability of deep learning. With advanced research that attempts to understand and explain the inner world of deep learning, it will help to identify the biological and medical mechanisms of how organisms react to radiation exposure.
In conclusion, this study successfully demonstrated the feasibility of a deep learning-based in vitro radiation sensitivity prediction model using gene expression profiling data. We established a CNNbased feature extractor and residual block-added prediction part of the model with previously established deep learning methodologies. With additional research and external validation, this model and its methodology can be expanded to in vivo radiation sensitivity prediction.

Radiation response
Since the clonogenic surviving fraction of cells at an absorbed dose of 2 Gy (SF2) is widely used as a measurement of in vitro radiation sensitivity, we also selected SF2 as an indicator of radiation sensitivity in this study. The true (measured) SF2 values used in this study were obtained from previous publications 9, 14 .

National Cancer Institute-60 (NCI-60) cell lines
The NCI-60 panel contains 60 cell lines representing nine types of tumors. It was established by the US National Cancer Institute in the 1980s for in vitro drug screening 6 . This NCI-60 panel is now a valuable research resource, considering the continuous use of this panel for investigations of radiation response analysis 9,15,16,17 . With this perspective, this panel was used as a platform representing multiple cancer cell lines to evaluate the performance of the radiation sensitivity prediction model of this study.

Gene expression profiling data
Gene expression profiling data of NCI-60 cancer cell lines were obtained from the Gene Expression Omnibus (GEO; available at https://www.ncbi.nlm.nih.gov/sites/GDSbrowser; series accession number GSE32474 18 ) database, generated from microarray analysis performed with Affymetrix Human Genome U133 Plus 2.0 chips (54,675 probe sets). The entire transcript/gene set from the Affymetrix array was used to predict radiation sensitivity. Excluding the melanoma cell line MDA-N, which was shown to be "not available" from the NCI-60, duplicated or triplicated 174 samples of remaining 59 tumor cell lines were used as inputs in the radiation sensitivity prediction model.

Radiation sensitivity prediction modeling
The deep learning-based radiation sensitivity prediction model is based on the architecture of convolutional neural network (CNN). It comprises two distinct components: a feature vector extractor based on a convolutional layer and a fully connected (FC) layer.
First, a feature extractor based on a convolutional layer was designed 19 . A convolutional layer is a type of layer consisting neural network that only connects nodes within a certain range, which leads to two distinct advantages: inherently prevent overfitting and can be trained with a relatively small amount of data. In this study, high-level feature vectors were extracted from the input gene expression vector using a one-dimensional convolutional layer with pooling and no padding.
After convolutional layers, radiation sensitivity is predicted via the FC layer with residual skipconnection 20 . This FC layer utilizes a skip connection designed to make calculated gradients propagate over several hidden layers along the gradient descent algorithm, allowing the deep learning model to be constructed more deeply 20,21 . The residual block is applied by skipping each layer one by one. The overall structure of the prediction model is presented in Table 2.
For both the convolutional layer-based feature extractor and the residual block-added FC layer, a leaky rectified linear unit activation was applied 22 . L2 regularization and dropout while training were also used at the end of every convolution and FC layer while training to prevent overfitting to specific data or feature parts and to let the model learn from all interactions within the entire dataset 19,23,24,25 .

Training and testing of the prediction model
To train and test the developed model, five rounds of 6-fold cross-validation were applied. The 6-fold cross-validation method divides the entire dataset into six sub-datasets and uses each dataset in turn as a test set with the remaining five datasets used as a training set to test the model. To prevent overfitting to data of a certain tumor cell line, a 6-fold cross-validation was designed such that the data of a particular cell line were not included in the same fold. The final predicted SF2 was determined as the average of five rounds of independent cross-validations to increase the stability and reduce the deviation of the predicted value.

Measurement of model performance
The performance of the radiation sensitivity prediction model was evaluated based on calculation of the root mean squared deviation (RMSD). RMSD is defined as where ̂ represents the true (measured) SF2 of sample t, represents the SF2 value predicted by the model, and T represents the number of samples.
The absolute error and the relative error of the SF2 prediction were defined as the absolute deviation between the true and predicted SF2 and the absolute error divided by the true SF2, respectively.
In order to evaluate the performance of this prediction model, the "correct prediction" criteria were defined. In previous studies, the correct prediction was defined using only relative error, following the known variability of the clonogenic cell survival assay 4 . However, it tended to be overly strict in cases with a low value of true SF2. Therefore, we classified a "correct prediction" if either the absolute error of the sample is less than 0.01 (1% in terms of survival fraction, not considered to be clinically significant) or the relative error between the measured and predicted SF2 is less than 10%.
The model was evaluated and trained with the NVIDIA TITAN RTX and the TensorFlow 1.14.0 framework based on Python version 3.6.8.

Validation of the prediction
If a particular sample cannot be classified as correctly predicted, it is necessary to determine whether the error was caused by insufficient training set data used for sample prediction due to the folded cross-validation, or whether the total data used in the study could not provide sufficient explanation to predict that sample correctly. Therefore, additional experiments were conducted to identify such "prediction-failed" samples, using them as an independent test set and all the other samples as a broader training set.
If the prediction was successful in this additional experiment, it could then be determined that the corresponding fold was not able to provide sufficient evidence to predict the data correctly.
Conversely, if the prediction failed again, the data could be classified as "prediction-hard" cases.

Statistical analysis
Statistical analysis was utilized to evaluate the predictive performance of the model. Two-tailed Pearson correlation analysis with a 95% confidence interval was used to investigate the correlation between true (measured) SF2 and predicted SF2. Statistical analysis was performed using GraphPad Prism version 7.03 (GraphPad Software, San Diego, CA, USA).

Data availability
The