Development and Validation of the Pyroptosis-related genes Signature for Predicting the Prognosis of Colorectal Cancer

Background: Pyroptosis is an important component of the tumor microenvironment, associated with the occurrence and progression of cancer. However, the expression of pyroptosis-related genes and its impact on the prognosis of colon cancer (CC) remains unclear. Here, we constructed and validated a pyroptosis-related genes signature to predict the prognosis of patients with CC. Methods: Public data source was obtained to screen out candidate genes for further analysis. Various methods were combined to construct a robust pyroptosis-related genes signature for predicting the prognosis of patients with CC. Based on the gene signature and clinical features, a decision tree and nomogram were developed to improve risk stratication and quantify risk assessment for individual patients. Results: The pyroptosis-related genes signature successfully discriminated CC patients with high-risk in the training cohorts. The prognostic value of this signature was further conrmed in independent validation cohort. Multivariable Cox regression and stratied survival analysis revealed this signature was an independent prognostic factor for CC patients. The decision tree identied risk subgroups powerfully, and the nomogram incorporating the gene signature and clinical risk factors performed well in the calibration plots. Conclusions: Pyroptosis-related genes signature was an independent prognostic factor, and can be used to predict the prognosis of patients with CC.

Pyroptosis is a form of cell death, leading to the cleavage of gasdermin D (GSDMD) and activation of inactive cytokines like IL-18 and IL-1β [11]. Existing evidence suggests that pyroptosis plays an important role in the development of cancer cells. Kolb R summarized the role of various in ammasome factors in cancer progression and therapy. It indicated that key components of pyroptosis, such as gasdermin (GSDM) proteins and proin ammatory cytokines, are associated with tumorigenesis and invasion, as well as metastasis [12]. For example, Wei J evaluated the impact of the GSDM on the occurrence and prognosis of lung adenocarcinoma (LUAD). The results indicated that GSDMC is signi cantly upregulated in LUAD tissues and the overexpression of GSDMC was an independently negative prognostic factor in patients with LUAD [13]. Meanwhile, Tulotta C et al found that IL-1β produced by breast cancer cells could promote epithelial-to-mesenchymal transition (EMT), invasion and migration. IL-1β also caused expansion of the bone metastatic niche, which led to tumor proliferation [14]. As for CC, by knocking out the in ammatory vesicle-related genes (NLRP3 and CASP1), Dupaul-Chicoine et al found that transgenic mice were more likely to develop CCs than its counterparts [15].
Based on these ndings, it is increasingly clear that pyroptosis may play an important role in the development of cancer, as well as the disease progression. However, its functional impact in CC represented a critical knowledge gap. Thus, we performed a systematic study to explore the pyroptosisrelated genes for CC and the prognostic value of gene signature was validated in independent cohorts.
Also, an integrated model, which was based on the gene signature and clinical features, was developed to improve the predictive power and accuracy.

Dataset
The methods have been well-established in previous studies [16,17]. Brie y, a total of 1086 patients with CC were included in our study across different platforms. The data of gene expression was downloaded from gene expression omnibus (GEO) database. The microarray datasets GSE14333, GSE226, GSE17536, GSE177, GSE41258and GSE250 were integrated into a new train cohort, using the sva package (COMBAT) for removing batch effects. All raw CEL les from the three datasets were downloaded and normalized using a robust multichip average (RMA) algorithm. Meanwhile, the RNA-seq data of 432 CC patients with clinical information was downloaded from The Cancer Genome Atlas (TCGA) and transcripts per million (TPM) normalized. All microarray and RNA-seq data included in our study were log2 transformed.
Development and validation of the pyroptosis-related genes prognostic modelThirty-three pyroptosisrelated genes were extracted from prior reviews and presented in Table S1 [18 -22]. Based on these genes, ssGSEA were used to construct the pyroptosis-related risk score (PRS). Speci cally, univariate Cox proportional-hazards (Cox-PH) regression model was performed to evaluate the signi cance of different cancer hallmarks in CC, which was based on the R package 'survival'. With the package of 'wgcna' (weighted gene co-expression network analysis), the scale-free co-expression network was performed to identify the module which was most correlated with pyroptosis based on transcriptome pro ling data and ssGSEA scores. The association of individual gene with pyroptosis ssGSEA score was quanti ed by Gene signi cance (GS), while the correlation between module eigengenes and gene expression pro les was represented by module membership (MM). Subsequently, a least absolute shrinkage and selection operator (LASSO) Cox regression model was utilized to narrow down the candidate genes, screening the most robust prognostic markers. The PRS was established as follows: The Z-score method was used to normalize ssGSEA scores and HRS when necessary. CC patients from GEO and TCGA were divided into low-and high-risk according to the risk score. Kaplan-Meier analysis with a two-sided log-rank test was employed to compare the OS between the two groups. With the R package 'survival ROC', time-dependent receiver operating characteristic (tROC) analysis was performed, and the areas under the curve at different time points [AUC(t)] of all the variables were compared. To evaluate the prognostic value in the pooled cohort, meta-analysis (I 2 > 30%, random-effects model) was conducted. And non-negative matrix factorization (NMF) consensus clustering was used to divide one cohort without a full-scale gene signature expression pattern into different clusters according to the best k value with the R package 'nmf'.

Independent Prognostic Analysis Of Prs
Clinical information of CC patients was extracted from the GEO and TCGA (Table S2), then analyzed in the regression model, combined with PRS. Univariate and multivariable Cox regression models were employed for the analysis. A decision tree for risk strati cation with the R package 'rpart' was constructed, using recursive partitioning analysis (RPA). A nomogram and a calibration curve were plotted using the R package 'rms'.

Results
Pyroptosis is the risk factor for overall survival in CC patients The correlation heatmap containing the candidate pyroptosis-related genes is presented in Fig. 1A (red: positive correlations; blue: negative correlations). Survival-related genes were screened out by univariate Cox regression analysis. With the criteria of P<0.05, 11 genes (ZBP1, SCAF11, PRKACA, NOD2, GZMA, GSDMD, CASP8, CASP5, CASP3, CASP1 and APIP). Among them, only PRKACA was associated with increased risk, as HRs >1, while the others were protective genes with HRs <1 (Fig. 1B).
Based on the Z-scores of the pyroptosis ssGSEA score, 654 CC patients were divided into low-and highrisk subgroups equally. Patients with low-risk scores had a survival advantage over patients with high-risk scores (Fig. 1C). And Figure 1D shows that Z-scores of the pyroptosis ssGSEA were signi cantly elevated in dead patients compared with living patients during follow up. Statistical difference in overall survival (OS) was observed between two subgroups (P < 0.0001, Fig. 1E). As tROC demonstrated, that the area under the ROC curve (AUC) was 0.70 for 96-month, 0.68 for 60-month, 0.68 for 36-month and 0.65 for 24month survival (Fig. 1F).

Establishment Of Pyroptosis-related Genes Signature For Prognosis
Using whole-transcriptome pro ling data and pyroptosis ssGSEA Z-scores in the training set, WGCNA was developed. Since a power of β = 5 was the optimal threshold to ensure a scale free co-expression network (supply_Fig S1), a total of 26 non-grey modules were generated ( Fig. 2A). The module which was considered the most correlated with pyroptosis was represented by lightcran (r = 0.39, p = 3e-25) (Fig. 2B, C). Hub genes extracted from the lightcran module were used for further univariate Cox regression analysis, as a threshold of p value for GS set as <0.0001. With a threshold of p value for uni-Cox of <0.05, 67 potential candidates were identi ed (Fig. 2D). Among of them, 45 were protective markers, while 22 were risk markers. Furthermore, the most robust markers for prognosis were identi ed by the LASSO Cox regression model. To address the over-tting, ten-fold cross-validation was applied, with selected optimal λ value of 0.0249 ( Fig Fig. 2G. Finally, the PRS formula was developed as follows: .The expression level of each gene was log2 normalized.

PRS is an accurate predictor for overall survival of CC patients
The prognostic value of PRS was validated in the training set. Using the 33-gene set of pyroptosis reported in previous studies, GSEA con rmed the difference of gene set enrichment in the two subgroups, since risk markers positively correlated with pyroptosis accumulated more in the high-PRS group, compared with the low-PRS group (Fig. 3A). Furthermore, the follow-up data indicated that the risk score signi cantly decreased in patients alive (Fig. 3B). As Kaplan-Meier survival curve indicated, patients with lower PRS enjoyed a statistically signi cant survival bene t than its competitor (p < 0.0001, Fig. 3C). Since various clinicopathological factors were included in multivariate Cox regression model of the training cohort, the results revealed that TNM stage (HR = 1.8732, p < 0.1) and PRS (HR = 4.302, p < 0.1) were two independent risk factors for OS (Fig. 3D). In addition, tROC analysis suggested that PRS was an accurate variable for predicting the survival of CC patients (Fig. 3E).
The robustness of predictive value of the pyroptosis-related genes signature was validated in other independent external cohorts. Brie y, by using TCGA, higher enrichment score of pyroptosis gene set was con rmed in the high-PRS group in the validation I cohort (Fig. 4A). Also, patients alive had lower PRS, compared with deceased patients (p < 0.0001, Fig. 4B). Kaplan-Meier analysis con rmed the survival bene t in patients with lower PRS in the validation cohort (p<0.0001, Fig. 4C). Furthermore, NMF consensus clustering was used to divide both TCGA and GEO cohort into two groups (Fig. 4D, E). The division was based on the best k value, which was the remaining expression pattern of the gene signature. The results revealed statistical difference in OS between NMF-derived groups (Fig. 4F, G). Moreover, multivariate Cox regression analysis indicated that not only in the training and validation cohort, PRS was an independent risk factor for OS (Fig. 4H), but also in the all cohorts (Fig. 4I).
PRS acts as an indicator of worse prognosis in the pooled cohort and a promising marker of therapeutic resistance.
To evaluate the prognostic value of the pyroptosis-related genes signature in the pooled cohort including the training and validation cohorts, meta-analysis was performed. The results suggested that CC patients with higher PRS had worse prognosis than those with lower PRS (pooled Hazard Ration (HR) = 2.63, 95% CI 2.07-3.35) (Fig. 5A, B). Subsequently, 1096 patients were extracted for further investigation. PRS Zscores were signi cantly elevated in those patients who died during the follow-up, especially in patients with shorter survival (Fig. 5C). When the candidate patients were divided into different subgroups, according to virous clinicopathological features (age, sex and stage), PRS also helped to screen out highrisk patients with poor prognosis (Fig. 5D).
Since limited evidence can be reached, we tried to explore whether the pyroptosis-related genes signature is a marker of chemotherapy resistance, as well as recurrence. As shown in Fig. 6A, GSEA con rmed that higher PRS is not only signi cantly associated with resistance to chemotherapy drugs, including cisplatin and uorouracil, but also with disease recurrence. Further functional assays indicated that PRS is negatively associated with virous cancer therapeutic pathways, which have been validated in previous study [23] (Fig. 6B). However, PRS is related to immunosuppressive cells, such as Treg. In contrast, it is negatively associated with T cell (Fig. 6C). Furthermore, by using GSCALite, a landscape plot was generated to depict the relationships between the response to targeted drugs and pyroptosis-related genes signature expression (Fig. 6C). The results presented as the bubble heatmap were consistent with that of LASSO coe cients (Fig. 2C). Specially, CCDC88A was associated with multi-drug resistance, while RAB38, CYSLTR1exhibited drug sensitivity. Medical information from TCGA were used to validate the prediction. As Fig. 6D shown, although statistical difference was not reached, the disease control rate (DCR) was much higher in patients with low-PRS, compared with its competitor (88 vs 83%, p=0.092). Moreover, low-PRS is a prognostic marker of a more favorable outcome among CC patients who received anti-cancer drugs (p = 0.032) (Fig. 6E) or surgery (p = 0.026) (Fig. 6F). When patients were strati ed by the location of tumor, low-PRS group still had a survival advantage over high-PRS group (p<0.0001) (Fig. 6G).
Combination of the pyroptosis signature and clinicopathological features improves risk strati cation and survival prediction A decision tree improving risk strati cation for OS was constructed, as 1096 patients with four parameters available, age (>70 or ⩽70), sex (male or female), TNM stage (<IV or >=IV) and PRS (low or high) were included. The results indicated that only TNM stage and PRS remained, as three different risk subgroups was identi ed (Fig. 7A). Furthermore, in the node of stage <IV, PRS replaced TNM stage. Kaplan-Meier analysis indicated that statistical difference of OS was reached among the three risk subgroups (p<0.0001) (Fig. 7B).
A nomogram with PRS combined with other clinicopathological features was developed, aiming to quantify the risk assessment and survival probability for individual CC patients (Fig. 7C). In the calibration analysis, all the prediction lines (red, blue and purple line represented 8-year, 5-year, 3-year survival) of the nomogram were close to the ideal performance (45-degree dotted line) (Fig. 7D), validating the accuracy of the nomogram. As shown in Fig. 7E, tROC analysis indicated that nomogram had the most powerful predictive ability, as the clinical bene t of the nomogram has been demonstrated in decision curve analysis (DCA) (Fig. 7F).

Discussion
In current study, using public data source, we rstly identi ed pyroptosis is a risk factor for OS in CC patients. Then combined methods were used to construct a pyroptosis-related genes prognostic model. The prognostic value of PRS, derived from the gene signature, was validated in independent cohorts. Further analysis suggested that PRS could be served as an independent risk factor to identify patient population with high-risk. Functional analyses indicated that PRS is negatively associated with virous pathways, such as cell cycle and p53 signaling pathway, but related to immunosuppressive cells, such as Treg.
Pyroptosis, an in ammatory type of regulated cell death, is characterized by cell swelling, lysis, and the release of many proin ammatory factors, such as IL-1β and IL-18 [20]. In the past decade, the relationship between pyroptosis and cancer have attracted widespread attention. Existing evidence suggested that pyroptosis may impact the proliferation, invasion, and metastasis of tumor, making it a promising therapeutic target [21,22,24]. Also, some individual pyroptosis genes have been studied in various cancer types, such as NOD2 in colorectal cancer [25,26], Gasdermin D (GSDMD) in gastric cancer [27], and gasdermin B (GSDMB) in digestive system [28]. However, current studies are clearly inadequate.
To date, how pyroptosis-related genes interact and whether they are related to the survival of patients with cancer remain little known. Only a study was powered by Ye Y et al, aiming to address this question in patients with ovarian cancer. Using data from TCGA cohort, a multigene signature was constructed to evaluate the prognostic value of pyroptosis-related genes in this patient population and was further validated in GEO cohort [29]. The results con rmed that prognostic value of pyroptosis-related genes in patients with ovarian cancer. But the in uence of these gene signatures may vary for different types of cancer. Thus, it is worthy to establish a pyroptosis-related risk score for patients with CC, providing evidence of prognostic value.
Unlike 7-gene risk signature constructed by Ye Y et al [29], we established a 27-gene signature, which was derived from LASSO Cox regression model. Among those genes, few of them showed evident correlations with cancer or pyroptosis in previous studies. The protective value of ACOT11, ST3GAL5, and SERPINB9 has been validated in renal cell carcinoma, bladder cancer and colorectal cancer [30][31][32], while MYO5A, RAB38, and CYSLTR1 were involved in the progression of various cancer type, such as gastric cancer, pancreatic cancer, and non-small cell lung cancer [33][34][35][36]. As for SOCS1, a biomarker with the largest protective coe cient in our study, is widely recognized as a tumor suppressor. However, its role in CRC remains controversial. Hanada T et al found that SOCS1 is a tumor suppressor which could prevents chronic in ammation-mediated carcinogenesis by regulation of the IFN gamma/STAT1 pathways [37]. Inconsistent with this nding, study conducted by Tobelaim WS et al indicated that SOCS1 may work as an oncogene in CRC [38]. In summary, the biological functions associated with tumor pyroptosis of the novel gene signature still require further investigation in CC.
After the selection of pyroptosis-related genes signature, the risk score, decision tree and the nomogram were constructed. The methods were well-established in previous studies. Aiming to develop a hypoxiabased method to identify patients with high-risk in early-stage lung adenocarcinoma (LUAD), Sun J et al used combined methods to screen for robust biomarkers and establish a hypoxia-related gene signature for prognosis. The prognostic value of the gene signature was further validated by the decision tree and the nomogram, con rming that the hypoxia-related gene signature could discriminated patients with highrisk in early-stage LUAD powerfully [16]. In addition, Ye Y et al used similar methods to establish a novel de ned pyroptosis-related genes signature to predict the prognosis of ovarian cancer [29]. The application of univariate and multivariable Cox regression analyses con rmed that the risk score was a prognostic factor.
Again, our results indicated that this combined method is a useful tool to develop and validate of the pyroptosis-related genes prognostic model in CC patients. Patients with lower PRS had favorable survival compared with those with higher PRS, which was con rmed in training and validation sets. Potential reason could be derived from our study, as higher PRS is signi cantly associated with resistance to chemotherapy drugs, as well as the disease recurrence. Previous studies have been powered to investigate the involvement of pyroptosis in cancer treatment [39,40]. For example, Wang X et al found that the treatment of Taxol caused pyroptotic death in nasopharyngeal cancer cell line, which was mediated by Caspase-1 and GSDMD [39]. Meanwhile, based on a series of experiments conducted by Westbom C et al, the results suggested that doxorubicin and cisplatin could activate Caspase-1 and pyroptosis, which contributed to the death of cancer cells [40]. As for our ndings, the pyroptosis-related genes signature-derived resistance to therapies, need to be conquered by further studies. Y et al proposed that pyroptosis could regulate the composition of the tumor immune microenvironment [29]. Interestingly, after strati ed by PRS, Ye Y et al found that higher proportion of Treg was found in the low-risk group. As Treg was redeemed as immunosuppressive and associated with poor outcomes, possible reason for this discrepancy may be that the regulation of the overactive in ammatory reaction caused by pyroptosis requires Treg. In our study, functional assays indicated PRS is closely related to immunosuppressive cells, including Treg. Notably, Saito T identi ed two main subtypes of Treg in colon cancers, which had opposite roles in the regulation of the tumor microenvironment [43]. Therefore, it is worthy to further identify the speci c subtype of Treg correlated to PRS.
To our best knowledge, this is the rst study to explore the pyroptosis-related genes for CC patients. A pyroptosis -related risk score (PRS) was established and furtherly validated in independent cohorts. However, this study is limited by its retrospective nature. As little evidence could be reached, our ndings should be conquered by further experimental studies. And the molecular mechanism also needs to be furtherly addressed.

Conclusions
In summary, we established a pyroptosis-related genes signature to discriminate CC patients with different risk and predict the survival outcomes of this patient population. Further decision tree and nomogram con rmed the predictive value of pyroptosis-related genes signature.

Consent for publication: Not applicable
Availability of data and materials: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no competing interests.    Patients with lower PRS exhibited better prognosis in the validation cohort. (D, E) The best k value was chosen for NMF consensus clustering in the TCGA ( Fig 4D) and GEO (Fig 4E) cohorts. (F, G) Statistical difference in OS was observed in NMF-derived clusters based on the expression pattern of the gene signature (TCGA: Fig 4F; GEO: Fig 4G). (H, I) Multivariate Cox regression analysis indicated that PRS was an independent risk factor for OS in the training and validation cohorts (Fig 4H), as well as in the all cohorts (Fig 4I).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.