Identi ﬁ cation of Pyroptosis-related Gene Prognostic Signature in Patients with Hepatocellular Carcinoma

Abstract


Background
Hepatocellular carcinoma (HCC), which accounts for the most signi cant proportion of liver cancers, is one of the most common malignancies in humans and the fourth leading cause of cancer-related deaths worldwide 1 .Studies have shown that pathogenic factors for HCC include hepatitis B and C viruses, alcohol abuse, dietary toxins such as a atoxin and aristolochic acid, metabolic liver disease (mainly nonalcoholic fatty liver), and exposure to carcinogenic substances such as hexachlorobenzene 2 .In the past few decades, due to the popularization of health knowledge and advanced screening procedures, including alpha-fetoprotein (AFP) detection and non-invasive imaging technology, signi cant progress has been made in the treatment of HCC [2][3][4][5][6] .However, the survival rate of advanced HCC is still poor 7 .
Nowadays, the indicator of tumor lymph node metastasis (TNM) staging is widely applied in clinical neoplasm management for predictive risk assessment and treatment decisions.Nevertheless, due to the high degree of molecular heterogeneity, even among patients with the same clinicopathological characteristics, the risk of recurrence and death may vary greatly 8,9 .Therefore, there is an urgent need for new prognostic signatures to identify the risk of HCC patients more accurately.
Pyroptosis, also referred to as in ammatory cell death, is de ned as gasdermin-mediated programmed necrosis 10,11 .The characteristic of cell pyroptosis is that gasdermin is activated to form cell membrane pores, which is different from the mechanism of cell apoptosis.Subsequently, the intact cell membrane is destroyed under the dual action of pro-in ammatory cytokines and alarm factors, eventually leading to cell death 10 .The initial research on pyroptosis was mainly focused on its function in ghting infections.Nowadays, a large number of studies have shown that pyroptosis plays an indispensable role in the evolution of human malignancies.The interactions between pyroptosis and cancers are incredibly complicated 12 .On the one hand, the growth of tumors can be inhibited by mediating the pyroptosis of cancer cells.Still, on the other hand, pyroptosis provides nutritional and environmental support for the occurrence and development of neoplasm 13 .Increasing evidence has highlighted the role of pyroptosis in cancer 13,14 .
Research on pyroptosis-related genes (PRGs) and ovarian cancer, lung adenocarcinoma, and gastric cancer have been published recently [15][16][17] .The molecular characteristics, clinical signi cance, and immune interaction of PRGs in the above-mentioned cancer species have been elucidated by applying bioinformatics methods.These studies revealed that using PRGs as a new gene signature to forecast the survival prognosis of cancer suffers is extremely valuable.However, the expression levels, clinical prognostic characteristics, and immune in ltration status of the PRGs in HCC have not yet been clari ed, making our research signi cant.In this study, gene expression data from The Cancer Genome Atlas (TCGA) were employed to identify molecular subtypes of HCC based on genes related to pyroptosis.To develop a prognostic risk model, ve genes were ltered from 35 PRGs by LASSO-Cox regression.The risk model can assess the survival status of HCC suffers and be veri ed by the International Cancer Genome Consortium (ICGC) external validation cohorts.This risk score model can be utilized as an independent prognostic evaluation mark for suffers from HCC.
In order to further understand the interaction of these pyroptosis-related genes at the protein level, we constructed a PPI network.The minimum interaction score required for PPI analysis was set to 0.4 (the default recommended value).Through MCC, DMNC, and MNC algorithms in Cytoscape, we determined that NLRP1, PYCARD, AIM2, IL18, CASP1, NLRC4, CASP4, CASP8, NLRP3, and TNF were hub genes (Fig 1B ).These hub genes are also DEGs between the normal specimens and HCC.

Molecular subtypes based on pyroptosis-related genes
To investigate the relationship between the expression pro les of 35 PRGs and HCC subtypes, we conducted a consistent cluster analysis on all 343 HCC samples in the TCGA training cohort.By adjusting the clustering variable (k), we observed that when k=2, the inter-group correlation was lowest and the intra-group correlation was highest, suggesting that based on 35 PRGs, 343 HCC patients could be well divided into two robust clusters (cluster A contains 59 patients and cluster B contains 284 patients) (Fig 1C and 1D).

Prognostic differences and immune landscape of the subtypes
The prognosis signature among molecular subtypes was further analyzed.Survival analysis in OS showed that cluster A had a worse prognosis than cluster B (Figure 2A).We applied ssGSEA to study the immune cell in ltration of the two molecular subtypes of the pyroptosis gene and observed that compared with cluster-B, cluster-A not only had more in ltration of innate immune cells (macrophages, mast cells, and immature dendrites cells and MDSC) but also adaptive immune cells, such as activated B cells and activated CD4 T cells, effector memory CD4 T cells and effector memory CD8 T cells (Fig 2B).
Interestingly, in the 17 immune-related pathways, cluster-A was also more abundant than cluster-B (Figure 2C).In addition, through the ESTIMATE package, we found that cluster A had higher stromal scores, immune scores, and ESTIMATE scores than cluster B, but the tumor purity was the opposite (Figure 2D), which was consistent with the above ndings (Figure 2B; Figure 2C).We noticed that patients with poor prognoses among the subtypes of pyroptosis molecules were more abundant in immune in ltration, revealing the complex relationship between the pyroptosis pathway and the immune microenvironment in HCC.

Development of a Prognostic Risk Model in the Training Dataset
The risk prognostic model of PRGs was generated based on the TCGA cohort.First, we used LASSO-Cox regression analysis to reduce the amount of PRGs.The minimum lambda (0.06544434) was obtained by ten-fold cross-validation to construct our prognostic model containing ve genes (Fig 3A; According to the above formula, the up-regulation of GPX4, CASP8, NOD2, and GSDME was associated with poor prognosis, while NLRP6 was the opposite.With the best cutoff value of risk score obtained by the "survminer" package (cutoff = -0.01471409),sufferers in the TCGA cohort were well divided into highrisk groups and low-risk groups.In Figure 3C, an obvious difference in the KM survival curve between the two subgroups was observed (P<0.0001).Furthermore, in Figure 3D, we followed that the basic conditions of the high-risk and low-risk groups include survival and death status, population distribution trends, and the expression heat map of the ve genes that made up the risk score.The AUC in the 1-year, 3-year, and 5-year ROC curves were 0.63, 0.65, and 0.60, respectively (Figure 3E), indicating that the risk score had high sensitivity and accuracy in forecasting the prognosis of hepatocellular carcinoma.

External validation of the prognostic risk model
The ICGC cohort was regarded as an external validation dataset to verify the robustness of the risk model established by the TCGA training cohort.Using the same method as described above, the ICGC cohort was divided into high-risk and low-risk groups based on its best cutoff value (cutoff =-0.00717451).The KM survival curve illustrated that the prognosis of the high-risk group is much worse than that of the lowrisk group (Figure 4A). Figure 4B displayed the overall trend of the ICGC cohort, including survival and death status, population distribution, and ve prognostic PRGs expression levels, which is consistent with the TCGA cohort.In addition, the ROC curve generated based on the risk score was applied to evaluate the 1-year, 3-year, and 4-year prognosis of the validation cohort, as shown in Figure 4C.

Comparison of the immune activity between subgroups
We further investigated the distinction in the abundance of immune cell in ltration between the high-risk score group and the low-risk score group.Through the "GSVA" package, we observed that CD56dim natural killer cells and eosinophil in ltration in the low-risk score group were relatively higher than in the high-risk group.In contrast, the cells with higher in ltration in the high-risk score group included activated CD4 T cells, central memory CD4 T cells, effector memory CD4 T cells, γδ T cells, immature dendritic cells, MDSC, memory B cells, and natural killer T cells, plasmacytoid dendritic cells, follicular helper T cells, and type 2 helper T cells (Figure 4D).Subsequently, by the TIMER database, compared with the low-risk group, we found that the high-risk score group had a higher degree of in ltration of B cells, CD4 T cells, CD8 T cells, neutrophils, macrophages, and dendritic cells (Figure 4E).
We next explored relationships between molecular subtypes of pyroptosis and the risk score.The higher risk score was found in patients in cluster-A (Fig 4F), previously con rmed to have a poor prognosis.

Analysis of Enrichment Pathways and Genome Changes between Risk Groups
We employed the Wilcoxon rank-sum test (implemented in the R software) to evaluate differentially expressed genes between the high-risk and low-risk scoring groups.The threshold was set as FDR<0.05 and fold change (expressed in log2 (average FPKM ratio between the high-risk group and low-risk group) ≥1.Using the above settings, we found that in the high-risk group of the TCGA cohort, there were 2629 signi cantly up-regulation genes and 76 signi cantly down-regulation genes (Additional le: Figure S1).Refer to the attached le for detailed illustrations of highly expressed genes and low expressed genes (Table S3; Table S4).
GO analysis showed that high-expressed genes in the high-risk group were mainly enriched in various synaptic signaling pathways, embryonic organ morphogenesis, and G protein−coupled receptor signaling pathway (Figure 5A).However, low-expressed genes in the high-risk group were mainly enriched in various biological metabolic processes, such as the xenobiotic metabolic process, lipid hydroxylation, and cellular response to the xenobiotic stimulus.(Figure 5B).KEGG analysis revealed that upregulated genes were mainly enriched in neuroactive ligand−receptor interaction, nicotine addiction, and Rheumatoid arthritis (Figure 5C), while downregulated genes were enriched primarily on pathways such as chemical carcinogenesis, retinol metabolism, and drug metabolism (Figure 5D).
As shown in Figure 5E and Figure 5F, the 20 most frequently mutated genes in the high-risk and low-risk cohorts were not wholly consistent, which might reveal the differences in the genomic alteration of the risk score group.Based on the frequency of mutation, the mutated genes in the high-risk cohort included TP53, TTN, CTNNB1, MUC16, PCLO, AXIN1, XIRP2, APOB, RYR1, ABCA13, CSMD3, FAT3, LRP1B, ARID1A, CACNA1E, CCDC168, RYR2, ABCB5, COL6A6, and FRAS1 (Figure 5E).The mutant genes in the low-risk cohort were CTNNB1, TTN, TP53, MUC16, ALB, MUC4, RYR2, ABCA13, APOB, OBSCN, PCLO, BAP1, USH2A, ARID2, FLG, HMCN1, LRP1B, SPTA1, CUBN, and PRKDC (Figure 5F).We used Fisher's exact test (P threshold was set to <0.05) to explore different mutant genes in the high and low-risk groups and found that TP53 occupies the rst position (Figure 5G).This result implied that TP53 has a high correlation with pyroptosis in patients with HCC.The lollipop chart revealed the different mutation sites of TP53 between the two cohorts (Figure 5H), and the difference in survival prognosis of TP53 mutation in HCC was observed (Figure 5I).In addition, as shown in Figure 5J and Figure 5K, in the co-occurrence and mutually exclusive mutations of the high and low-risk groups, we observed a particular TP53-CSDM1 co-occurrence mutation, suggesting that their respective mutations in HCC might cause a combined effect.

Establishment and Assessment of a Nomogram based on Risk score
Univariate and multivariate Cox analysis suggested that the pyroptosis risk score could be identi ed as an independent prognostic signature to forecast the survival status of HCC (Table 1).Additionally, we integrated the risk score and prognostic clinical characteristics to develop a nomogram to forecast the 1year, 3-year, and 5-year overall survival rates of sufferers with hepatocellular carcinoma (Figure 6A).The AUC of the time-dependent ROC curve illustrated that the performance of the nomogram prediction was better than that of the risk score alone (Figure 6B).The result of the KM curve of the OS of the nomogram was signi cant (Figure 6C).Furthermore, compared with the ideal model, the calibration chart showed that the nomogram with the risk score and TNM stage performed well (Figure 6D).

Discussion
HCC is highly malignant and progresses rapidly, and the prognosis of patients with HCC is usually poor.The symptoms of early HCC are insidious, and most patients with HCC are at the advanced stage when they are diagnosed, which means that most of them have already lost the opportunity to undergo surgical liver resection.Although radiotherapy, chemotherapy, liver transplantation, and other potential treatment methods have produced various effects and prolonged the survival time of patients, the prognosis of HCC is still unsatisfactory owing to the high risk of recurrence and intrahepatic spread.Pyroptosis is a unique sort of programmed cell death characterized by the swelling and lysis of cells and the release of many pro-in ammatory factors.The relationship between malignancies and pyroptosis is complex, and pyroptosis of different human tissues and genetic landscapes has different effects on cancer.With the continuous deepening of tumor molecular biology research, prognostic gene signatures re ecting malignancies progression have received more attention in predicting survival in HCC, which may help achieve more accurate individualized treatment management.This study is signi cant in exploring the relationship between pyroptosis genes and hepatocellular carcinoma.
Herein, we clari ed the mRNA levels of 35 PRGs from the published literature in HCC and normal tissues and determined that most of them are differentially expressed.Compared with healthy liver tissue, the expression of 28 PRGs decreased in HCC, and 6 PRGs increased in HCC.Then, we determined that PYCARD, AIM2, IL18, CASP1, CASP4, CASP8, NLRC4, NLRP1, NLRP3, and TNF were hub genes in the pyroptosis pathway.
We also applied the unsupervised clustering algorithm to distinguish pyroptosis-related subtypes (cluster-A and cluster-B).Cluster-A was overexpressed in immune cells and immune-related pathways, and the prognosis was worse than cluster-B.Due to the prognostic effects of the PRGs mentioned above, we employed LASSO Cox regression analysis to develop a predictive risk model based on 5 PRGs (GPX4, CASP8, NOD2, GSDME, and NLRP6).This model had high accuracy and practicability in forecasting the 1year, 3-year, and 5-year overall survival of HCC sufferers.Subsequently, we used the ICGC validation set to validate the dependability of the model.The results indicated that GPX4, CASP8, NOD2, and GSDME were risk factors, and NLRP6 was a protective factor for HCC prognoses.
In recent years, glutathione peroxidase 4 (GPX4) has been considered a vital regulatory factor of ferroptosis tumor cell death.Unlike other GPXs, GPX4 has the function of catalyzing the reduction of lipid peroxides in a complex cell membrane environment, indicating that GPX4 has a distinctive role in physiology 18 .Inhibition of GPX4 can induce the death of cancer cells that are resistant to conventional chemotherapy or radiotherapy 19,20 .Herein, we revealed that up-regulation of GPX4 is associated with poor prognosis and a high risk of hepatocellular carcinoma in patients.In this way, inhibiting the high expression of GPX4 may offer a novel therapeutic strategy for reducing the mortality rate in patients with HCC.
Caspase-8 (CASP8) plays an essential enzyme in the pyroptosis pathway 21 .The signi cance of CASP8 in initiating necrosis receptor-induced pyroptosis and maintaining immune homeostasis and surveillance is well established.It has long been held that the expression levels of CASP8 in most tumors are usually down-regulated, leading to pyroptosis escape and enhancing resistance to anticancer therapies 22 .Multiple studies have shown that, compared with healthy liver tissues, the expression levels of CASP8 are down-regulated in HCC, which is consistent with our results [23][24][25] .Nevertheless, the relationship between the levels of CASP8 expression and the prognosis of HCC patients has not yet been con rmed.Our analysis revealed that up-regulation of CASP8 could increase the mortality risk of in HCC which may be related to the RIPK1/FADD/Caspase-8 signal complex causing liver parenchymal cell damage, thereby impairing the normal functioning of the liver 26 .
Nucleotide-binding oligomerization domain 2 (NOD2), a recognized innate immune sensor, has a signi cant effect on carcinogenesis 27 .It was reported that NOD2 dysregulation is involved in the pathogenesis of Crohn's disease (CD) and colitis-related colon cancer.NOD2 gene polymorphism was associated with colorectal cancer, lymphoma, lung cancer, gastric cancer, ovarian cancer, laryngeal cancer, and breast cancer 28 .In their research on HCC, Ma et al. showed that NOD2 restrains tumorigenesis and enhances tumor chemotherapeutic sensitivity through targeted regulation of the AMPK pathway, which may reveal a novel strategy for the treatment of HCC based on NOD2 adjustment 29 .A recent study on HCC demonstrated that NOD2 initiates the nuclear autophagy pathway, leading to DNA damage and increased genomic instability, and proved that NOD2 as a poor prognostic factor is closely related to the increased risk of death in HCC patients 30 .Our conclusion that overexpression of NOD2 had a bad prognosis and high-risk score is consistent with the above research.
Gasdermin E (GSDME), also known as DFNA5, was rst identi ed as a mutated gene inherited in an autosomal dominant manner, causing the loss of cochlear hair cells and eventually developing into progressive, sensorineural and non-syndromic hearing impairment.In recent years, multiple studies have shown that GSDME, a member of the Gasdermin family, is closely related to cancer 31 .Compared with healthy human tissues, due to epigenetic suppression of methylation, the expression of GSDME is downregulated in gastric cancer, colorectal cancer, breast cancer, and most human cancer cell lines 10 .This is consistent with our research results.GSDME is a molecular drug target for treating human malignancies, and clinicians can choose appropriate chemotherapeutic drugs according to its expression levels to improve the sensitivity of chemotherapeutic drugs and reduce drug resistance 13 .NLRP6 (originally called PYPAF5) is a novel node-like receptor (NLR) family member that produces in ammasomes.NLRP6 consists of an N-terminal pyrin domain, a central NACHT domain, and a terminal leucine-rich repeat (LRR) domain 32 .Nowadays, the majority of research on NLRP6 has concentrated on the intestine in the mouse model, and only a few studies have involved human patients to date 32,33 .The survey by Domblides et al. determined that down-regulation of NLRP6 is related to the poor prognosis of human colorectal cancer 34 .Our research revealed that NLRP6 is a protective factor for the prognosis of hepatocellular carcinoma, which provides new clues for the further study of NLRP6 in HCC.
It is well known that the tumor microenvironment plays a vital role in anti-tumor molecular therapy.Therefore, we also explored the abundance of immune and stromal cells in the high-risk and low-risk groups.Interestingly, unlike the phenomenon observed in ovarian cancer and gastric cancer 16,17 , in our research, the degree of immune cells and stromal cells in ltration of patients in the high-risk group was higher than that in the low-risk group.This may partly indicate that the mechanism of pyroptosis genes in the microenvironment of hepatocellular tumors is more complicated.Unlike the low-risk group related to various biological metabolic pathways, the high-risk group mainly participated in signal transduction pathways.It is worth noting that the high-risk group is also signi cantly associated with nicotine addiction and autoimmune diseases (rheumatoid arthritis).By comparing the genomic changes in different risk groups, we observed that the high-risk group was signi cantly associated with more aggressive molecular alteration (such as TP53 mutations).More importantly, the combination of univariate and multivariate Cox analysis illustrated that the nomogram using the pyroptosis risk score and TNM stage has high prediction capacity and may serve as a meaningful indicator in personalized medicine.
Our research had some limitations.First, the pyroptosis risk score based on ve genes was constructed and validated based on retrospective cohorts.Moreover, further in vivo and in vitro experiments could be applied to further verify our results.

Conclusions
In summary, we determined two pyroptosis molecules subtypes in hepatocellular carcinoma patients with different prognoses.A risk model with good prognostic performance containing ve genes (GPX4, CASP8, NOD2, GSDME, and NLRP6) was constructed.This risk prognosis model could be applied as a prognostic predictor for the survival status of sufferers with hepatocellular carcinoma.

Datasets and preprocessing
The TCGA-liver hepatocellular carcinoma (LIHC) data set was composed of the gene expression RNA sequencing (RNA-seq) data of HCC and adjacent normal samples and corresponding clinical prognostic traits and was obtained from the TCGA database (https://portal.gdc.cancer.gov/).The validation group data of ICGC contained the gene expression data of HCC specimens with clinical characteristics and was received from the ICGC data portal(https://icgc.org/).After removing data with no clinical information and OS < 30 days, this study included 343 HCC and 50 adjacent normal samples in the TCGA cohort and 228 HCC samples in the ICGC cohort.The gene expression RNA-seq data of 110 normal human liver tissue samples were downloaded from the GTEx data set (https://xenabrowser.net/datapages/).The somatic mutation pro le was gained from the TCGA data portal (https://portal.gdc.cancer.gov/).Subsequently, we used the R package "maftools" to process somatic mutation pro les sorted in the form of mutation annotation format (MAF).

Identi cation of pyroptosis-related subtypes
The RNA-seq expression pro les of 35 PRGs were used to perform k-means unsupervised clustering through the "ConsensusClusterPlus" package, which was repeated 1000 times 39,40 .

Estimation of immune in ltration and immune-related pathway activity
In order to clarify the immune in ltration status of each specimen in the TCGA data set corresponding to the pyroptosis subtype, we used single-sample gene set enrichment analysis (ssGSEA) through the "GSVA" package based on immune cell gene sets and immune-related pathways obtained from published literature 41,42 (Additional le: Table S1; Table S2).In addition, we applied the ESTIMATE algorithm as a tool assessing each tumor specimen to compute the immune score, stromal score, ESTIMATE score, and tumor purity 43 .

Construction of the risk model based on PRGs
We directly used the least absolute shrinkage and selection operator (LASSO) penalty method to further narrow down the prognostic-related pyroptosis genes to construct a prognostic risk model.Then, we determined the best parameter λ through ten-fold cross-validation.According to the optimal cut-off value calculated by the "survminer" package, we divided the TCGA cohort into the high-risk and low-risk groups based on the prognostic risk score obtained from the screened genes.In addition, we plotted timedependent receiver operating characteristic (ROC) curves by the "timeROC" package, to assess the sensitivity and speci city of the risk model in the TCGA and ICGC cohorts.

Correlations of the risk score with immune in ltration
We applied the above-mentioned ssGSEA method to estimate the difference in in ltration levels of 28 immune cells between the high-risk score group and the low-risk score group in the TCGA cohort.However, using a single algorithm and marker gene set alone to evaluate tumor-in ltrating immune cells (TIIC) may cause calculation errors.To avoid these problems, we also used the TIMER database (https://cistrome.shinyapps.io/timer/) to quantify the immune in ltration per specimen to make the results more convincing 44 .

Gene function and pathway enrichment analysis
WebGestalt (https://www.webgestalt.org/option.php)was utilized to explore the function and pathway enrichment of target genes 45 .First, we utilized Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database overexpression enrichment analysis method to input the genes of interest to the WebGestalt server.Then, we applied the "ggplot2" package to plot the results.

Construction and validation of the prognostic nomogram
Univariate and multivariate Cox regression analyses were utilized to evaluate the survival prognosis of pyroptosis risk score and clinical characteristics.Subsequently, we integrated the risk score and clinical characteristics that have prognostic value to construct a prognostic nomogram.Calibration curves for 1year, 3-year, and 5-year survival rates were applied to assess the deviation of the nomogram from the ideal model.The ROC curve was constructed and the area under the curve (AUC) was calculated to evaluate the survival prognosis prediction ability of the nomogram containing the pyroptosis gene risk score and TNM staging.

Statistical analysis
Data were analyzed with the R (version 4.0.5) and R Bioconductor packages.Comparisons between two groups were evaluated with the student's-test or Wilcoxon test.ROC curves were generated with the "timeROC" package.The nomogram was developed with the "rms" package.Overall survival (OS) was compared in pyroptosis molecular subgroups and risk score groups with Kaplan-Meier curves and logrank tests.Unless otherwise stated, all statistical tests were two-way, and a P-value <0.05 was considered signi cant.Comprehensive analyses of enriched pathways and genomic alterations between different risk groups.
A & B Gene Ontology analysis performed with signi cantly upregulated and downregulated genes, respectively.
C & D Kyoto Encyclopedia of Genes and Genomes analysis was performed with signi cantly upregulated and downregulated genes, respectively.
E & F Top 20 most frequently mutated genes were illustrated in High-Risk Score Cohort and Low-Risk Score Cohort.
G TP53 occupies the top 1 position among differently mutated genes between High-Risk Score Cohort and Low-Risk Score Cohort.
H A lollipop plot showed the different mutation spots of TP53 between two cohorts.
I Kaplan-Meier analysis shows the independent relevance between overall survival and TP53 mutation.
J & K The heatmap illustrates the co-occurrence and mutually exclusive mutations of the top 25 frequently mutated genes in each cohort.B ROC curves and AUC for1-year, 3-year, and 5-year survival of the nomogram.

Figures Figure 1
Figures

Figure 2 B
Figure 2