Development and Validation of Radiomics Signatures to Predict KRAS Mutation Status Based on Triphasic Enhanced Computed Tomography in Patients with Colorectal Cancer

Purpose: In this study, we used computed tomography (CT)-based radiomics signatures to predict the mutation status of KRAS in patients. Methods: This study involved 447 patients who underwent KRAS mutation testing and preoperative triphasic enhanced CT. They were categorised into training (n = 313) and validation cohorts (n = 134) in the ratio 7:3. Radiomics features were extracted from CT imaging. The Boruta algorithm was used to retain the features closely associated with KRAS mutation. Multivariate logistic regression was used to develop radiomics, clinical, and combined clinical-radiomics models for KRAS mutation. The receiver operating characteristic curves were used to evaluate the predictive performance and clinical usefulness of each model. Results: Fourteen radiomics features were retained as the as nal signatures for predicting KRAS mutations. Delayed phase models showed superior predictive performance compared to arterial phases models or venous phase models. The clinical-radiomics fusion model showed excellent performance, with an AUC, sensitivity and specicity were 0.772, 0.792 and 0.646 in the training cohort, while 0.755, 0.724 and 0.684 in the validation cohort, respectively. Conclusions: The clinical-radiomics fusion model can be used as a potential imaging marker for preoperative detection of KRAS mutation status. t-test or Mann-Whitney U test was used to compare continuous variables; Chi-squared test or Fisher’s exact test was used to compare categorical variables. lter can smoothen the image and improve the eciency of capturing phenotypic features related to tumor heterogeneity 19 . The wavelet lter could disassemble the frequency signal of the image to extract edges and substantial features of tumor more effectively. This study nally screened out 25 radiomics features as the radiomics signatures of the AP, VP, DP, and triphasic enhanced combined phase. The features with wavelet ltering accounted for 52% (13/25) of the total features. This shows that the wavelet lter is very important for extracting features related to KRAS mutations status, while the features with LoG lters are not used as radiomics signatures for predicting KRAS mutations, indicating that the features extracted from the LoG lter images were weakly correlated with KRAS mutations. Multivariable regression analysis combined with the radiomics signatures of triphasic enhanced phases, 11 radiomics features were retained as key features for identifying KRAS mutation status (Table 3), including 5 texture features: A_wavelet.HHH_glszm_GrayLevelNonUniformityNormalized, A_wavelet.LLL_glcm_MCC, D_wavelet.HLL_glcm_Idn, D_wavelet.HLL_gldm_SmallDendenceLowGrayLevelEmphasis, D_wavelet.LLL_glcm_Idn. Texture features are microscopic descriptions of tumors, reecting the interaction between adjacent pixels, and reect tumor heterogeneity 20 . These features are not easily identied by the human visual system and cannot be interpreted as having a clear meaning. Previous studies have shown that texture features may be associated with tumor microenvironment reecting tumor heterogeneity, presence of hypoxia or angiogenesis 21–23 . A studies 24 found that KRAS mutations were associated with higher texture characteristic values (Gskewness and SDs), indicating that mutated KRAS had more tumor heterogeneity than wild-type KRAS. The radiomics score values of texture features (A_wavelet. LLL_glcm_MCC, D_wavelet. HLL_gldm_SmallDendenceLowGrayLevelEmphasis) in the KRAS mutation group wild-type . CEA CA199 for the mutated KRAS group for the wild-type KRAS group in our nding in with Both KRAS mutation and serum levels of CEA and CA199 associated with more aggressive biological behaviour in CRC patients higher CEA and CA199 levels alterations have independent inuences on in tumor biomarkers


Introduction
Colorectal cancer (CRC) is the second leading cause of cancer-related deaths worldwide and causes almost 881,000 deaths every year 1 . The incidence of colorectal cancer is approximately 3-fold higher in developed countries than in developing countries. However, as the developing countries become richer, increasing trends is likely to be seen in these countries 2 . Kirsten rat sarcoma (KRAS) viral oncogene homologue is a G protein, which occurs in 40-50% cases of CRCs. Following the mutation in KRAS gene, the mutant protein activates downstream mitogen-activated protein kinase (MAPK) pathway, subsequently leading to uncontrolled cell proliferation and malignancy 3 . National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines have explicitly indicated that CRC patients with KRAS mutations are resistant to anti-EGFR monoclonal antibody therapy 4 . Therefore, KRAS mutation testing is crucial for individualised and effective treatment of CRC.
Generally, pathologic specimens obtained via invasive procedures such as colonoscopy or surgery are usually required for the identi cation of KRAS mutation status. However, the presence of extensive heterogeneity in CRC archival samples represents a major limitation of the histological approach 5 . Additionally, postoperative tissue specimens might not be obtained for testing from patients with metastatic CRC specimens 6 . Furthermore, biopsy testing might not be an effective approach to determine the mutational status of KRAS due to poor DNA quality 7 . Therefore, it is necessary to develop a non-invasive and easy-to-use method to identify KRAS mutation status.
Several studies have demonstrated the use of medical imaging technology, such as uorine-18 udexyglucose (18F-FDG) positron emission tomography (PET) and magnetic resonance imaging, in the prediction of KRAS status 8, 9 . However, these studies involved small sample sizes and lacked validation. Radiomics provides a variety of parameters for quantitative analysis, and these parameters have been widely used in cancer diagnosis, classi cation, and prediction 10 . A previous study demonstrated a signi cant correlation between a CT-based radiomics signature and KRAS/NRAS/BRAF mutations in CRC patients 11 . However, this study involved a small sample size and was only performed in the venous phase. Moreover, the superiority of venous phase as compared to arterial or delay phases in the prediction of KRAS mutation status in CRC patients remains to be con rmed. The aim of this study was to investigate whether a CT-based radiomics signature could identify KRAS mutation status in CRC patients and whether the venous phase is superior to arterial and delay phases in the prediction of KRAS mutation status in CRC patients.

Clinical Characteristics
This study involved a total of 447 CRC patients in the nal analysis, including 263 men (58.8%) and 184 women (41.2%), with an average age of 58.93 ± 12.85 years. Among the 447 patients, 207 patients contained mutated KRAS and 240 patients contained wild-type KRAS. We used strati ed sampling to categorise the study cohort into a training cohort (n = 313) and a validation cohort (n = 134) in the ratio of 7:3. The training cohort and validation cohort were used for model building and internal validation, respectively. Patient and tumor characteristics in the training cohort are listed in Table 1. In the training cohort, the clinical characteristics age, CEA, CA199, and cT stage were found to be signi cantly different statistically (P < 0.05), and the other characteristics not signi cantly different (P > 0.05) between mutated KRAS and wild-type KRAS groups (Table 1). Therefore, after multivariate analyses, clinical characteristics including age, CEA and cT stage were selected as independent predictors of KRAS mutation and enrolled into clinical model. The clinical model showed lower performance in predicting KRAS mutation both in the training cohort and the validation cohort, with the AUC being 0.654 (95%CI, 0.593-0.714) in the training cohort and 0.575 (95%CI, 0.478-0.672) in the validation cohort ( Table 2). The accuracy, sensitivity, and speci city were 0.617, 0.664, and 0.573 (training cohort) and 0.552, 0.552, and 0.553 (validation cohort), respectively. Radiomics signature building and discrimination performance assessment Finally, 4, 3 and 7 radiomics features were selected as the nal signatures. The feature names and distributions are listed in Table 3. Following stepwise regression analysis, three features were removed after combining the AP, VP and DP radiomics features. Four models were built based on the above radiomics signatures for preoperatively predicting KRAS mutation. The AUC, accuracy, sensitivity, speci city, PPV, and NPV are listed in Table 2. The DP model had the optimal predictive performance than AP model or VP model both in training and validation cohorts ( Figure 3A-B, Table 2). In the training cohort, the predictive AUC of KRAS mutations in AP, VP and DP models were 0.711, 0.692 and 0.752, respectively. In the validation cohort, the AUC of the three models were 0.723, 0.673 and 0.746, respectively. The Radiomics model combined by triphasic enhanced CT phases showed moderate KRAS mutation prediction performance, with AUC, accuracy, sensitivity, speci city, PPV and NPV of 0.754, 0.700, 0.738, 0.665, 0.667 and 0.736 in the training cohort, respectively, while AUC, accuracy, sensitivity, speci city, PPV and NPV in the validation cohort were 0.775, 0.701, 0.707, 0.697, 0.641 and 0.757, respectively ( Figure 3A-B, Table 2). Predictive Performance of the Combined Model As shown in Table 2 and Figure  The DCA curves for the clinical model, radiomics models, and clinical-radiomics model is presented in Figure 5A-B. The combined clinical-radiomics model achieved more clinical utility in predicting the KRAS mutation than the clinical model and other radiomics models. The DCA curves of the clinical-radiomics model demonstrated that when the threshold probability of a patient or doctor ranged between 20% and 65%, the use of the clinical-radiomics nomogram adds greater bene t for KRAS mutation prediction than the treat-all-patients scheme or the treat-none scheme in training and validation cohorts.

Discussion
In this study, clinical, radiomics, and clinical-radiomics model were developed for the preoperative prediction of KRAS mutations. We answered a question that the DP model had a higher outstanding performance than the AP or VP models. Additionally, the clinical-radiomics model showed higher predictive performance than the clinical model or radiomics model alone. The calibration curve and decision curve of clinical-radiomics model showed excellent model stability and actual bene t.
KRAS mutations can lead to continuous activation of the EGF/RAS/RAF/ERK signalling pathway without the regulation of EGFR, gradually leading to increased cell proliferation and decreased apoptosis [12][13][14] . Colorectal cancer with KRAS mutation is a negative marker for anti-EGFR targeted drugs 15 .
Numerous studies 16,17 have used 18 F-FDG PET/CT to investigate the association between KRAS mutation and 18F-FDG uptake and demonstrated that cells with KRAS mutation had higher 18F-FDG uptake than those with wild-type KRAS. However, there was no correlation observed between them according to a study by Riklis et al. 18 . The major clinical use of PET/CT in CRC is to detect potentially curable metastases. Yang et al. 11 proposed a CT-based radiomics model to identify KRAS/NRAS/BRAF mutation in CRC and found a relatively high predictive performance. However, this study de ned the positive group based on mutations in any of KRAS/NARS/BRAF, which would complicate the clinical application.
During the image preprocessing stage, the LoG lter and wavelet lter were applied to process the original image. The LoG lter can smoothen the image and improve the e ciency of capturing phenotypic features related to tumor heterogeneity 19  Among the triphasic enhanced phase models of KRAS mutation prediction in the training cohort, the DP model showed the highest performance, with an AUC value of 0.752, followed by 0.711 in the AP model and 0.692 in VP model. To our knowledge, this is the rst time that the triphasic enhanced CT radiomics has been used in KRAS mutation prediction. Although the VP is the most commonly used phase in gastrointestinal radiomics research, contrary to the results assumed in this study, the enhancement phase with the best predictive performance was the DP rather than the VP. One reason for the high predictive performence of the DP model is the possibility of high content and uniform distribution of contrast agent in the DP lesions 25 . Another reason for the high predictive performance of the DP model may be that the ROI range of tumors in the DP images is larger than that in the AP and VP 25 .
In terms of clinical characteristics, Age, CEA and CA199 were independent predictors for KRAS mutation. In this study, KRAS mutations patients were older than KRAS wild-type patients, and the difference was statistically signi cant (P < 0.05), this result is consistent with the previous literature 26 . CEA and CA199 were signi cantly higher for the mutated KRAS group than for the wild-type KRAS group in our study. Our nding is in line with those from previous studies 27,28 . Both KRAS mutation and elevated serum levels of CEA and CA199 are associated with more aggressive biological behaviour in CRC patients [29][30][31] . A correlation between KRAS mutations and higher CEA and CA199 levels suggests that genetic alterations may have independent in uences on CRC development, thus resulting in increased tumor biomarkers 32 .
Triphasic enhanced CT is often conducted in gastrointestinal tumors CT examination. The AP is used for tumor detection, the VP is used to differentiate the tumor from adjacent organs, and the DP is used to determine the depth of tumor invasion 33 . As for radiation dose, the average DLP value of triphasic enhanced scans was 1917.52 ± 152.31 mGy cm, which is slightly higher than the diagnostic reference level (DRL) for adults (1490 mGy cm) published by China's National Health Industry standard (WS/T 637-2018) 34 . Application of new techniques such as multi-model iterative reconstruction technology could effectively reduce the radiation dose in clinical practise 35 .
Our study should be interpreted after considering several limitations. First, 269 patients were excluded because they did not meet the inclusion and exclusion criteria, which inevitably produced selection bias. Second, our study only included a single team with an internal validation cohort. The reproducibility should be addressed in future multi-centre studies. Third, due to the irregular shape of some tumors, the ROI delineation process is di cult and time-consuming. In future studies, it will be very necessary to develop an automated or semi-automated tool to achieve effective and automatic tumor segmentation. Finally, in this study, we used different imaging instruments and acquisition parameters to complete the CT scanning. The in uence of different instruments and different parameters on radiomics features is obvious. Therefore, it is important to standardise scanning protocols in different instruments and different institutions.

Conclusion
In conclusion, Triphasic enhanced CT radiomics models was constructed to predict KRAS mutation status in colorectal cancer, and the results showed that the AP, VP and DP models could better predict KRAS mutation status in the training cohort and validation cohort. The DP models showed a higher predictive performance compared to AP or VP models. Additionally, the clinical-radiomics model that incorporates both clinical risk factors and radiomics features of DP images showed good performance in predicting KRAS mutations. The clinical-radiomics fusion model can be used as a potential imaging marker for preoperative detection of KRAS mutation status, and guide the selection of molecular targeted drug therapy for CRC.

Patients
Ethical approval had obtained by the medical ethics committee of Lanzhou University Second Hospital for this retrospective study, and the informed consent requirement was waived. All methods were carried out in accordance with relevant guidelines and regulations. For the primary cohort of this study, we analysed the institutional database in our hospital between March 2014 and June 2020 to identify eligible patients with con rmed cases of CRC who underwent curative resection. A total of 447 patients met the inclusion criteria in our study. The inclusion criteria were set based on the following factors: (1) pathologically identi ed cases of primary CRC adenocarcinoma; (2) patients who underwent KRAS mutation status testing prior to the treatment; (3) pre-treatment abdominal triphasic enhanced CT with a reconstruction slice thickness of 1.25 mm. The exclusion criteria were set based on the following factors: (1) abdominal triphasic enhanced CT was not performed before surgery or the interval between abdominal triphasic enhanced CT and surgery was > two weeks; (2) CRC patients who have received any anticancer treatment prior to the collection of pathological tissue samples; (3) insu cient CT quality for qualitative and quantitative analyses; (4) clinical information incomplete; (5) occurrence of intussusception in the area where the tumor was located. Figure 1 shows the ow diagram of the recruitment pathway. Patients were categorised into training cohort and validation cohort in 7:3 ratio.

Clinical-pathologic characteristics and semantic features
Baseline clinicopathological characteristics included age, sex, tumor location, KRAS mutation status, CEA level (threshold value ≥ 5 ng/mL, < 5 ng/mL), CA125 level (threshold value ≥ 35 U/ml, < 35 U/ml), and CA199 level (threshold value ≥ 37 U/ml, < 37 U/mL). These data were collected from medical records, blindly and independently by two radiologists to ensure the accuracy of the extracted data. Two experienced gastrointestinal radiologists (Y T C and J Z), analysed the images (including tumor location, maximum diameter, cT stage, and cN stage). Both radiologists were blinded to the patient's clinicopathological data.
The maximum tumor thickness was de ned as the maximum diameter perpendicular to the long axis of the cross-sectional image. Clinical tumor stage (cT stage) and clinical node stage (cN stage) were identi ed according to the 8th edition of the American Joint Committee on Cancer Staging system 36 .

KRAS mutation evaluation
Formalin-xed tumor tissue samples were obtained following CRC operations and con rmed that the specimens used to extract DNA are clearly in ltrated by the tumor. DNA was extracted from formalin-xed tumor sections. KRAS mutation status (exons 2, 3, and 4) was detected via polymerase chain reaction (PCR).

CT Image Acquisition and segmentation
Abdominal triphasic enhanced CT scans were performed on a Discovery CT 750 HD scanner (GE Healthcare, Waukesha, WI) and iCT 256 scanner (Philips, Amsterdam, Netherlands). The scanning parameters are listed in supplementary Table S1. Enhanced CT scanning was performed using a high-pressure dualcylinder syringe to inject intravenous iohexol (1 mL/kg) through the median cubital vein with an injection rate of 3.5-4.5 ml/second. Following the injection of contrast medium, arterial phase (AP), venous phase (VP), and delayed phase (DP) were scanned at 25-30 s, 60-70 s, and 120-150 s, respectively.
The original images of AP, VP and DP were stored in the corresponding folders in DICOM format. Two gastrointestinal radiologists (reader 1 Y T C and reader 2 J Z) performed three-dimensional (3D) radiomics segmentation on AP, VP, and DP using ITK-SNAP software (version 3.6.0; www.itksnap.org). Reader 1 segmented 247 cases, reader 2 segmented the other 200 cases.
For 3D radiomics segmentation, the ROI was manually delineated on each slice of the tumor. Air and faeces in the intestinal tract, and pericolonic fat were carefully excluded from the contours (Figure 2). Finally, each patient generated three ROIs (AP ROI, VP ROI and DP ROI). To evaluate inter-observer reproducebility and robustness of feature extraction, reader 1 and reader 2 randomly select 30 patients and do the manual segment. We estimated the reproducibility of feature extraction using inter-class correlation coe cients (ICCs) and ICCs value greater than 0.80 indicates good reproducibility 37 .
Additionally, 30 patients were randomly selected from each CT scanner to build the CT scanners set for calculating the intra-/interclass correlation coe cients (ICCs).

Feature extraction
Radiomic features were extracted and selected using PyRadiomics software 38  Feature selection and radiomics prediction model building After radiomics features extraction, all missing data in the training cohort were replaced by median value and z-score normalisation was performed on each feature, and the same preprocessing procedure was applied to the validation cohort. After features preprocessing, the most importance features were selected to predict KRAS mutation using a three-step procedure. Firstly, univariate analysis was performed for feature selection to retain the feature with p < 0.05 to enter the following process. Secondly, the Boruta method 39 was used to retain the features that closely associated with KRAS mutations. Finally, multivariable stepwise regression further eliminates irrelevant features and retains the most informative features. A ten times ve-fold cross-validation was applied to avoid over tting and to identify the model with the best performance. A clinical-radiomics fusion model was developed based on correlated clinical risk factors, strong correlated imaging characteristics, and radiomics features to verify whether the combination of radiomics signatures and clinical factors could improve performance in the prediction of KRAS mutations. Two steps were used to build the fusion model in this study. First, AP, VP and DP models were compared to determine the enhancement phase with the best KRAS mutation prediction performance. Secondly, Random Forest (RF) algorithm was used to combine clinical factors, imaging characteristics, and the radiomics features of the best predictive performance phase to construct a clinical-radiomics fusion model in the training cohort, and the discriminant ability of the fusion model was evaluated by AUC value in the validation cohort.

Statistical analysis
All statistical analyses were conducted using the R statistical software package (version 3.6.3; http://www.Rproject.org). The Student's t-test, Mann-Whitney U test, and chi-squared test or Fisher's exact test was used to compare continuous and categorical variables, as appropriate. A two-sided P value < 0.05 was considered statistically signi cant. The intra-/interclass correlation coe cients (ICCs) were used to calculate the consistency of measurements between the two radiologists and different CT scanners. ROC analysis was used to evaluate the predictive accuracy of the different models. The AUC value and 95% con dence interval (CI), accuracy, sensitivity, speci city, positive predictive value (PPV), and negative predictive value (NPV) were also calculated.

Declarations
Final approval of the manuscript: All authors Data Accessibility Data are available from the corresponding author upon reasonable request.

Con icts of interest
The authors declare no con icts of interest. Figure 1 Flow diagram of the recruitment pathway.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryMaterials.docx