An Integrating of Clinical, Pathological, and Radiomic Approach to Predict the Microsatellite Instability in Rectal Carcinoma


 Objective: To develop an integrative model with clinical, pathological, and radiomic characteristics to predict the status of microsatellite instability (MSI) in rectal carcinoma (RC). Methods: A cohort of 788 RCs with 97 high MSI status (MSI-H) and 691 microsatellite stable status (MSS) were enrolled. The clinical and pathological characteristics were recorded. The radiomic features were calculated after segmentation of volume of interests and then patients were divided into the training set and validation set with a random proportion of 7:3. The logistic models of simple clinical characteristics (LM-Clin), pathological characteristics (LM-Patho), and radiomic features (LM-Radio) were constructed to distinguish MSI-H from MSS. The relevant radiomic score was calculated. Finally, a integrative nomogram (LM-Nomo) including significant clinical, pathological characteristics, and radiomics was developed. The area under receiver operator curve (AUC) was calculated to evaluate the efficacy of prediction. Results: The AUC of simple LM-Clin including variables of CEA and hypertension and LM-Patho including characteristics of gross type and lymph node metastasis ratio (LNR) was 0.584 (95%CI, 0.549-0.619) and 0.585 (95%CI, 0.550-0.619), which was lower than that of LM-Radio including 12 radiomic features with AUC of 0.737 (95%CI, 0.675-0.799). The LM-Nomo contained CEA, hypertension, LNR, and radiomic score, and the AUC was 0.757 (95%CI, 0.726-0.787). Conclusion: The AUCs of LM-Clin and LM-Patho were disappointing and lower than that of LM-Radio. The LM-Nomo demonstrated the best performance in predicting MSI-H status.


Background
Colorectal carcinoma (CRC) is one of the most frequently diagnosed cancers and also the leading cause of cancer-related mortality [1] . Rectal carcinoma (RC) accounts for approximately 29% of newly diagnosed CRC between 2012 to 2016 and is the most common type of CRC in people younger than 50 years old [2] . The prognosis of CRC depends on the biology and heterogeneity of the tumor [3] . Microsatellite instability (MSI) is an important biomarker of CRC with prominent diagnosis, prognosis, and prediction signi cance. Tumors detect loss of one or more mismatch repair (MMR) proteins testing by immunohistochemistry are considered to be high-MSI (MSI-H) [4] . Whereas those with intact MMR proteins are expected to be microsatellite stable or low-MSI (MSS or MSI-L). MSI is detected in about 15% of CRC patients and has emerged as a predictor of patient response to adjuvant chemotherapy [5] . MSI which shared clinicopathological characteristics distinctly different from the MSS ones has been reported to have a higher prevalence in stage II CRC [6] , and a better prognosis [7] .
Radiomics extracts the quantitative high-through image data from conventional images which was applied to improve diagnostic and predictive accuracy [8] , is gaining great attention in medical research. Previous studies indicated that computed tomography (CT) based [9] or magnetic resonance (MR) based [10] radiomics analysis helped to predict MSI status in CRC. To the best of our knowledge, there were only three articles that studied the MR-based [11,12] and T2WI-based [13] radiomic signature in predicting MSI phenotype of RCs. However, there was no CT relevant radiomic analysis in this eld. It is meant to develop a non-invasive, reproducible CT radiomic approach to evaluate the MSI-H status of RCs. The purpose of this article was to construct and con rm an integrative model with clinical, pathological, and radiomics features to evaluate the status of MSI-H of RCs on the basis of three-phase CT images, preoperatively.

Materials And Methods
This retrospective study was conducted with the permission of the Medical Ethics Committee (No. 2020QT251) and in conformity to the Declaration of Helsinki. The informed consent was waived for this retrospective study.

Patients selection
There were 1103 patients who pathologically proved to be RCs from January 2015 to January 2021 after searching the surgical database in our hospital. The inclusion criteria were as follows: (a) patients were pathologically proved to be RCs, including classical adenocarcinoma, mucous adenocarcinoma, and signet-ring cell carcinoma. (b) all CT examinations were implemented within two weeks before surgeries. (c) in addition to the tumor occurring in the rectum, the tumors originated from the rectum to the adjacent colon sigmoideum were also recruited. The exclusion criteria were as follows: (a) patients received preoperative therapy including radiation, chemotherapy, or chemoradiotherapy. (b) patients with metachronous or recurrent cancer. (c) the lesion occurred in the ascending, descending, and sigmoid colon. In addition, cancers that originated from the junction of the rectosigmoid belonged to upper rectum carcinoma. (d) patients without MSI evaluation.
Finally, a total of 788 patients with 97 MSI-H and 691 MSS were retrospectively enlisted in this analysis.

Clinical and pathological characteristics of RC patients
The baseline clinical variables for analysis included age, gender, body mass index (BMI), CT-displayed long diameter, tumor location (low RC refers to the lesion within 5cm from anal margin, middle RC refers to the lesion is 5-10cm from anal margin, high RC refers to the lesion is more than 10cm away from the anal margin), carcinoembryonic antigen (CEA) with threshold values of 5.0µg/L, carbohydrate antigen 19-9 (CA19-9) with threshold values of 37.0 U/mL, the history of smoking, drinking, diabetes, and hypertension. In addition, the tumor originated from the junction of the rectosigmoid region, and the distance to the anal margin greater than 10cm was classi ed as high RC.

Evaluation of MSI status
The method of immunohistochemistry was used to test MMR proteins including MLH1, MSH2, MSH6, and PMS2. Tumors displaying a lack of one or more MMR proteins were collectively classi ed as defective mismatch repair (dMMR) and expected to be MSI-H, while those with intact MMR proteins were considered as pro cient mismatch repair (pMMR) and estimated to be MSS or MSI-L. After referring to the revised Bethesda guideline for MSI, for clinical purposes, the MSI-L type for CRCs was be revised and categorized as MSS tumors [14] . Therefore, our study divided all the RC patients into two groups based on the MMR proteins: the MSI-H cohort and the MSS cohort.

CT examination
All the 789 RC patients were conducted three-phase examinations using 64/128 slices CT (Siemens, Somatom De nition AS). The threephase (unenhanced phase, arterial phase, and venous phase) CT examination was achieved by the method of computer-aid bolus tracking with a dose of 1.3 mL/kg contrast media (iomeperol 350, GE Healthcare) at a rate of 3.0 mL/s via a high-pressure injector. The arterial phase was scanned after 35s of injection of iomeperol, and the venous phase was followed after 25s of arterial phase. The speci c parameters were as follows: 120 Kv of tube voltage, 200 mA of tube current, 360mm eld of view, 64*0.625mm of collimation, 0.75s of the rotation time, 5mm of slice and interval thickness, and 300HU of window width, 40HU of window level.

Tumor segmentation and radiomics features selection
All the three-phase CT images were received from our picture archiving and communication system in DICOM format. After the standardization of original images using the software of "A.K. 3.0.0" (Arti cial Intelligence Kit, GE Healthcare), the volume of interests (VOIs) were manually segmented in the software of "itk-SNAP 3.4.0" (http://www.itksnap.org/) by two radiologists with 7 and 10 years of experience, respectively (Figure 1a,b). The regions of necrosis, intraluminal air, non-invaded rectal wall, vessel, and peri-rectal fat were eliminated from contours of VOIs.
Then the radiomic features of tumors were automatically calculated by A.K. software. The intraclass correlation coe cients (ICCs) of radiomic features from two radiologists were calculated, all the ICCs of radiomics features were greater than 0.75, which was interpreted as of good agreement between different observers [15] . Therefore, the mean values of radiomic features from two radiologists were calculated for later research. Since the two sets of sample sizes were not balanced, the method of synthetic minority over-sampling technique (SMOTE) was used to balance them. SMOTE is a straightforward approach used for regulating the ratio between the unbalanced groups [16] . The cohort (97 MSI-H and 691 MSS) was randomly partitioned into a training set (68 MSI-H and 484 MSS) and a validation set (29 MSI-H and 207 MSS) with a proportion of 7:3. Hereafter, the methods of variance or Mann-Whitney U-test (ANOVA or MW), correlation analysis, and least absolute shrinkage and selection operator (LASSO) were performed to select optimal radiomic features. A 10-fold cross validation approach was used in both the training and validation cohorts to construct the model with the best performance. The detail of tumor segmentation and radiomics features selection were expounded in Supplement Material.
Clinical, pathological, and radiomics models construction After radiomic features selection, this corresponding logistic model (LM-Radio) was constructed by the selected radiomic features and the radiomic score was acquired. The clinical and pathological characteristics were rstly analyzed by independent t-test or chi-square test.
Then the clinical logistic model (LM-Clin) and pathological logistic model (LM-Patho) by corresponding signi cant variables were developed, respectively. The area under curves (AUCs) of the receiver operator curve (ROC) calculated by the Delong test were applied to assess the e ciency of all logistic models. Finally, an integrative clinical-pathological-radiomic nomogram (LM-Nomo) with radiomic score, signi cant clinical characteristics, and pathological parameters was constructed to evaluate the MSI-H status.

Statistical analysis
The methods for radiomics features selection including ANOVA or MW, correlation analysis, and LASSO were proceeded in R software (https://www.r-project.org/). The analysis of clinical and pathological characteristics was executed in SPSS software (https://spss-64bits.en.softonic.com/) using the independent t-test or chi-square test. The ICCs were utilized to assess the consistency of VOI segmentation between two radiologists. The logistic models of LM-Clin, LM-Patho, and LM-Nomo with the method of entrance were developed in R software. The Delong test was carried out in MedCalc software (https://www.medcalc.org/), the corresponding AUC, 95% con dence interval (CI) was recorded. A Hosmer-Lemeshow test was used to evaluate the goodness-of-t and accuracy of the model. A twotailed p value<0.05 indicated a statistical difference.

Baseline clinical and pathological characteristics
The baseline clinical and pathological characteristics were outlined in Table 1

Performance of the clinical and pathological model
The LM-Clin and LM-Patho with signi cant clinical and pathological characteristics were constructed, respectively. The detailed parameters of models were reported in  The

Discussion
Unlike MSS CRCs, the MSI-H CRCs have been proved to be associated with abundant lymphocyte in ltration, poor differentiation pattern, longer postoperative survival, predominantly occurred in the proximal colon [17] , and mucous or signet-ring cell component [18] . They may have a mildly better prognosis and could not bene t from 5-FU-based chemotherapy compared with patients with MSS [19] .  [20] suggested that the radiomics signature of triphasic enhanced CT was a reliable method to predict MSI in CRCs, and the clinical-radimoics nomogram including age, location, CEA, and radiomics has shown promising prediction. Our integrative clinical-pathological-radiomic nomogram including CEA, hypertension, LNR, and radiomic score was the most meaningful model in predicting MSI-H phenotype of RCs with-the highest AUC of 0.757 (95%CI, 0.726-0.787) than that of simple LM-Clin, LM-Patho, and LM-Radio. The p values of Hosmer-Lemeshow tests of all models were non-signi cant, indicated the goodness-of-t of models.
Despite some inspiring strengths, there were several limitations. First, this retrospective analysis existed several biases including singlecenter design, unbalanced sample size, and limited universality. Thus, a future multi-center supplement is necessary to validate and improve the performance of the predictive nomogram. Second, we only evaluated the tumoral radiomics to predict the MSI-H phenotype of RCs, the peri-tumoral radiomics was neglected. While peri-tumoral radiomics should be emphasized by providing additional information to better predict the MSI-H status. Third, due to the irregular shape of RCs, the bias between manual segmentation may affect the radiomic analysis, though the ICCs were calculated to reduce the intra-observer difference. An automatic approach to segment the RCs for radiomic analysis needed to be further explored.

Conclusion
In conclusion, an integrative clinical-pathological-radiomic nomogram including a history of hypertension, CEA, LNR, and radiomic score The VOIs were manually segmented in the software of "itk-SNAP". Figure 1a showed the VOIs segmentation in the axial image. Figure 1b showed the VOIs segmentation in the sagittal image.

Figure 2
To the radiomic analysis, after the method of LASSO, there were 12 radiomic features extracted.

Figure 3
The coe cients of 12 optimal radiomic features in LM-Radio were listed.

Figure 4
The integrative clinical-pathological-radiomic nomogram including variables of CEA, hypertension, LNR, and radiomic score was developed.