Immune and Stromal Score-Related Gene Signatures For Risk Stratication of Gynaecologic Cancers

Background: The immune and stromal landscape of gynaecologic cancers is far from elucidated, and there is a lack of systematic prognostic analysis of the immune or stromal parts of the tumour microenvironment (TME) in all types of gynaecologic cancers. We aimed to establish immune and stromal score-based risk signatures for each type of gynaecologic cancer. Methods: In the present study, using the algorithm of “Estimation of stromal and immune cells in malignant tumours using expression data”, we calculated immune and stromal scores for cervical squamous carcinoma (CESC), uteri corpus endometrial cancer (UCEC) and ovarian cancer (OVCA) from the cancer genome atlas (TCGA) to establish immune and stromal score-based risk signatures. Multi-omics analysis combining single nucleotide polymorphism (SNP), tumour mutation burden (TMB) and gene set variation analysis was further performed to investigate the mechanism of immune and stromal score-based risk signatures in gynaecologic cancers. Eventually, the prognostic value of some signature genes was examined in independent microarrays and aberrant expression of IHH in UCECs through immunohistochemistry. Results: According to the results, the prognosis prediction ability of the immune and stromal score-related risk signatures constructed for pan-gynaecological cancers withstood the tests of tROC curves and Kaplan-Meier survival curves, proving to be effective in risk stratication. Three signature genes, including BNC1, CCDC80, and IHH, were veried in independent datasets to have prognostic value in gynaecologic cancers. Conclusions: In summary, the immune and stromal score-related risk signature constructed in this study might be promising prognostic markers for pan-gynaecological cancers. six KEGG pathways, including ECM receptor interaction, glycosaminoglycan biosynthesis heparin sulphate, B cell receptor signalling pathway, glycosaminoglycan biosynthesis chondroitin sulphate, aldosterone regulated sodium reabsorption, and arachidonic acid metabolism, all showed signicantly differential enrichment between high- and low-risk groups of three gynaecologic cancers.


Background
Gynaecologic cancers are a set of cancers that arise from female reproductive organs, including the uterus, ovary, cervix, vagina, fallopian tube and vulva [1]. This group of tumours have similar embryonic origins from Mullerian ducts and was affected by hormones [2]. As one of the most frequent neoplasms worldwide, gynaecologic cancers are the main contributors to cancer-related mortality in women, taking a severe toll particularly on women in developing countries [3]. From the perspectives of incidence and mortality, cervix uteri neoplasms and OVCAs are ranked as the deadliest gynaecological cancers [4].
Accurate prognosis evaluation of patients suffering from gynaecologic cancers was an important segment of oncological care, which propelled us to identify effective prognostic biomarkers for gynaecological cancers.
It has been gradually recognised in the past two decades that the tumour microenvironment (TME) plays an essential role in governing the biological behaviours of human cancers. Non-tumour components in the TME, such as immune and stromal cells, have emerged as promising predictors of prognosis for a wide variety of human cancers, including gynaecological cancers. Hinchcliff et al. discovered that lymphocyte-speci c kinase presented signi cant prognostic value in OVCA [5]. Ittiamornlert reported that the neutrophil-lymphocyte ratio could predict the clinical outcomes for stage IVB, persistent, or recurrent cervical cancer patients after receiving chemotherapy [6]. In endometrial cancer, intraepithelial CD8 + tumour-in ltrating T-lymphocytes indicate a favourable prognosis [7]. Despite great research progress on the crosstalk between non-tumour components of the microenvironment and cancer cells of gynaecological cancers, the immune and stromal landscape of gynaecologic cancers was far from elucidated, and there lacks systematic analysis of the prognostic value of immune or stromal parts of TME in all types of gynaecologic cancers.
In the present study, using the "Estimation of stromal and immune cells in malignant tumours using expression data (ESTIMATE)" algorithm proposed by Kosuke Yoshihara et al. [8], we calculated immune and stromal scores for cervical squamous carcinoma (CESC), uteri corpus endometrial cancer (UCEC) and ovarian cancer (OVCA) from the cancer genome atlas (TCGA) to compare the commonness and differences between the TME landscape of gynaecologic cancers, based on which immune and stromal score-based risk signature was established for each type of gynaecologic cancer. Multi-omics analysis combining gene set variation analysis (GSVA), single nucleotide polymorphism (SNP) and tumour mutation burden (TMB) was further performed to investigate the mechanism of immune and stromal score-based risk signature in gynaecologic cancers. Immunohistochemistry (IHC) was performed to evaluate the differential expression of one of the signature genes, IHH, in UCEC and non-cancerous endometrium tissues.
Material And Methods 1. Data download RNA-seq data in the form of fragments per kilobase million (FPKM), count data, and clinical information on CESC, OVCA and UCEC were available from the Genomic Data Commons (GDC) Data Portal of the TCGA database. Normalised RNA-seq data in the form of log2(TPM+0.001) was reckoned from FPKM RNA-seq data and was utilised for calculating immune and stromal scores. Each sample of gynaecologic cancers was assigned an immune score and a stromal score through the ESTIMATE package in R software v.3.6.1. Because the sample size of uterine carcinosarcoma in the TCGA database was not large enough for subsequent prognostic analysis, we did not include it in the current work. Independent sample's t-test and a Kruskal-Wallis test were performed in GraphpadPrism v.8.0.1. for comparing immune scores between different groups of clinical variables of the three gynaecologic cancers.
Thresholds of statistical signi cance were P < 0.05.
Differentially expressed genes (DEG) related to immune and stromal scores All samples of the three gynaecologic cancers were divided into high and low immune or stromal score groups according to the median value of immune and stromal scores of all samples. DEGs between the two groups of immune and stromal scores were identi ed using the count data of the three gynaecologic cancers. The limma voom package in R software v.3.6.1 was employed for identi cation of DEGs [9]. Adjusted P value < 0.05 and |log2FC|>1 were considered as the threshold for DEGs.

DEGs with prognostic value
Patients of the three gynaecologic cancers with more than 90 days of overall survival time were reserved for further survival analysis. Intersected genes of signi cantly upregulated DEGs from the high immune score group and signi cantly upregulated DEGs in the high stromal score group were submitted to univariate Cox regression analysis for selection of prognostic DEGs in three gynaecologic cancers (P < 0.05), which was conducted via survival package in R software v.3.6.1. We further conducted gene ontology (GO) analysis via the ClusterPro ler package of R software v.3.6.1 to investigate the functional enrichment of prognostic DEGs in biological process (BP), cellular component (CC) and molecular function (MF). Thresholds of signi cant GO terms were P < 0.05.
3. Immune and stromal score-based risk signature for pan-gynaecologic cancers For each of the three gynaecologic cancers, multivariate Cox regression analysis was conducted in SPSS v.22.0 to choose the signature genes of the prognostic model. Signi cant prognostic DEGs from multivariate Cox regression analysis were de ned as the signature genes (P < 0.01 for CESC, P < 0.05 for OVCA and UCEC). Immune and stromal score-based risk signatures composed of signi cant prognostic DEGs were built for three gynaecologic cancers with the formula: risk score = (ßi was the regression coe cient from multivariate Cox regression analysis). The prognostic prediction power of the risk signature was appraised by Kaplan-Meier survival curves and time-dependent receiver's operating characteristics (tROC) curves [10]. Independent sample t-tests and Kruskal-Wallis tests were performed in GraphpadPrism v.8.0.1. to compare risk scores between different groups of clinical variables of the three gynaecologic cancers. P < 0.05 was considered signi cant.

Immune in ltration of common prognostic DEGs in gynaecologic cancers
We took intersection of signi cant prognostic DEGs from multivariate Cox regression analysis of the three gynaecologic cancers. For genes from intersected parts, we consulted Tumour Immune Estimation Resource (TIMER, https://cistrome.shinyapps.io/timer/) to analyse the correlation between the expression of common prognostic DEGs and the tumour purity or the in ltration level of various immune cells in the three gynaecologic cancers.

Construction of nomogram combining immune-related risk signature and clinical features
The independence of the immune-related risk signature was evaluated by univariate and multivariate Cox regression analysis, considering the clinical features of the three gynaecologic cancers. Clinical features signi cantly correlated with survival from univariate Cox regression analysis (P < 0.05) and immunerelated risk signature were combined into nomogram. Calibration curves were plotted to assess the prediction accuracy of the nomogram against real survival conditions at three and ve years. Univariate and multivariate Cox regression analyses for this part were carried out in SPSS v.22.0. Nomogram and calibration curves were drawn by Rms, foreign and survival packages in R software v.3.6.1.
. GSVA analysis Enrichment analysis of the KEGG pathway for three gynaecologic cancer samples included for survival analysis was analysed via GSVA package in R software v.3.6.1. Enrichment scores of KEGG pathways against in each sample were calculated according to the log2(TPM+0.001) normalised expression data.
7. Mutation analysis for gynaecologic cancers TMB refers to the rate of somatic nonsynonymous mutations occurring in the coding area of sequenced DNA per megabase [11]. The TMB of low-and high-risk gynaecologic cancer samples was estimated by the MutSigCV algorithm based on the whole exome sequencing data of the three gynaecologic cancers from TCGA [12]. The mean TMB of samples with available mutation data was set as the cutoff value for allocating gynaecologic patients to low and high TMB groups. Differential distribution of risk scores in high or low TMB groups of three gynaecologic cancers was compared by independent sample's t-test in GraphpadPrism v.8.0.1. SNP data of the three gynaecologic cancers were inferred from the mutation annotation les in TCGA using the method of VarScan in the TCGA database. A waterfall plot was created using "GenomeInfoDbData" and "GenVisR" packages in R software v.3.6.1 to illustrate the mutation type and its translational effect of signature genes in three gynaecologic cancers.
. Validation of immune-related risk signatures for pan-gynaecologic cancers GSE44001 and GSE9891 were two microarrays that contained disease-free survival information of 300 CESC patients and overall survival information of 285 OVCA patients. Part of the signature genes for CESC and OVCA were available in the two microarrays. The two microarrays served as the validation set for CESC and OVCA. Kaplan-Meier survival curves were plotted to assess the prognostic value of part of the signature genes in the two validation datasets. The median value of the normalised expression value was set as the threshold. No microarray with survival time and survival status for UCEC patients was found. To remedy the problem, we examined the expression value of part of the signature genes in 21 survivors and 24 non-survivors diagnosed with UCEC from GSE21882. Expression difference was measured by independent sample's t-test in SPSS v.22.0. P < 0.05 was considered statistically signi cant.

IHC of IHH in UCEC and paracarcinoma tissues
A total of 215 UCEC tissue samples and 48 non-cancer endometrium tissue samples were included for tissue microarray by Pantomics, Inc. Part of the samples (20 UCEC tissues and 15 non-cancer endometrium tissues) were collected from the UCEC patients of the First A liated Hospital of Guangxi Medical University. Permission for the study was obtained from the ethics committee of the First A liated Hospital of Guangxi Medical University, and all patients signed informed consent forms.
The procedure of IHC experiments and the assessment criteria of the tissue slices were described in a previous study [13]. Expression data were analysed by an independent sample's t-test in GraphpadPrism. P < 0.05 was considered statistically signi cant.

Immune scores and stromal scores for gynaecologic cancers
Immune and stromal scores were deduced from 306 CESC samples, 379 OVCA samples and 552 UCEC samples from TCGA. Intersected upregulated and downregulated immune or stromal score-related genes in the three gynaecologic cancers were displayed in a set of Venn plots (Additional le 1). Differential expression of these DEGs in high and low immune or stromal score groups is exhibited in Additional le 2. Because the intersection part of upregulated immune and stromal score-related DEGs occupied the majority of DEGs, we selected these DEGs to perform further survival analysis.

DEGs with prognostic value
According to the selection criteria, a total of 211 CESC patients, 351 OVCA patients and 456 UCEC patients were eligible for survival analysis. Altogether, 162 DEGs, 78 DEGs and 64 DEGs showed prognostic signi cance for CESC, OVCA and UCEC, respectively (P < 0.05). BP, CC and MF terms signi cantly assembled by these DEGs are presented in Additional le 3 and Additional le 4 and 5, which were mainly correlated with the immune system.
3. Immune and stromal score-based risk signature for pan-gynaecologic cancers Three immune score-based risk signatures were constructed for each of the three gynaecologic cancers. Univariate and multivariate regression coe cients of signature genes for the three gynaecologic cancers are listed in Table 1. As revealed by 5-year tROC curves, the three immune and stromal score-based risk signatures all presented strong prognosis prediction ability, with the immune-score-based risk signature for CESC performing best (AUC = 0.854; Fig. 1). Kaplan-Meier survival curves indicated that high-or lowrisk gynaecologic cancer patients strati ed by immune and stromal score-based risk signature showed signi cantly different survival outcomes (P < 0.001, P < 0.001, P = 0.003; Fig. 2). Moreover, the immune and stromal score-based risk signatures were signi cantly associated with the worse clinical progression of gynaecologic cancers. CESC patients with distant metastasis, higher clinical stage and higher T stage had signi cantly higher risk scores than patients in control groups (Additional le 6A-C). OVCA patients with younger age, venous invasion and higher clinical stage had signi cantly higher risk scores than patients in control groups (Additional le 6D-F). UCEC patients with elder age, serous type, higher histological grade and higher clinical stage had signi cantly higher risk scores than patients in control groups (Additional le 6G-J).

Immune in ltration of common prognostic DEGs in gynaecologic cancers
Six genes, TRAV9-2, CD3G, TRBC2, LINC00861, MS4A1 and GIMAP7, were common genes of prognostic DEGs from three gynaecologic cancers. Three of the genes were available query terms in TIMER (CD3G, GIMAP7 and MS4A1). All three genes were negatively correlated with tumour purity and positively correlated with in ltration of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and dendritic cells (P < 0.05, Additional le 7 and 8).

Construction of nomogram combining immune and stromal-related risk signature and clinical features
Univariate analysis and multivariate analysis for immune and stromal score-related risk signature and other clinical variables of CESC suggested that the prognostic value of immune and stromal score-related risk signature for CESC was not affected by clinical parameters (P < 0.05, Table 2). Although immune and stromal score-based risk signatures for OVCA and UCEC showed insigni cant prognostic value in the coexistence of other clinical features, they were signi cant indicators of worse prognosis from univariate Cox regression analysis (P < 0.05, Table 3 and 4). Since there were too many missing values of lymph node metastasis and distant metastasis for CESC, we excluded the two variables from the construction of the nomogram. Only immune and stromal score-related risk signature, clinical stage and T stage were component variables for the nomogram of CESC. With regard to OVCA and UCEC, immune and stromal score-related risk signatures and all signi cant clinical variables from univariate Cox regression analysis were combined to construct a nomogram. Nomograms for the three gynaecologic cancers all highlighted the importance of immune and stromal score-related risk signatures in prognosis prediction of CESC, OVCA and UCEC (Additional le 9). Calibration curves at three and ve years proved the accuracy and stability of the combined nomograms for CESC and UCEC (Fig. 3).  (Fig. 4). For CESC and UCEC, most of the differentially enriched KEGG pathways were notably enriched in the low-risk group, while most of the 110 KEGG pathways were notably enriched in the high-risk group of OVCA. In particular, six KEGG pathways, including ECM receptor interaction, glycosaminoglycan biosynthesis heparin sulphate, B cell receptor signalling pathway, glycosaminoglycan biosynthesis chondroitin sulphate, aldosterone regulated sodium reabsorption, and arachidonic acid metabolism, all showed signi cantly differential enrichment between high-and low-risk groups of three gynaecologic cancers.

. Mutation analysis for gynaecologic cancers
We compared TMBs in high-and low-risk groups of three gynaecologic cancers. There was no signi cant relationship between TMB and risk of gynaecologic cancer patients (data not shown). Mutation type and mutation frequencies of signature genes for signature genes in the three gynaecologic cancers are illustrated in Additional le 10. It could be inferred from the waterfall plots that various types of mutations occurred in signature genes, and missense mutation was the predominant mutation type.
RASAL3, DIO3 and IHH were the most frequently mutated signature genes (Additional le 10).

Validation of immune and stromal-related risk signatures for pan-gynaecologic cancers
Kaplan-Meier survival curves for eight signature genes (CHST11, HAVCR2, ASGR2, SELP, SLITRK4, ARHGAP15, P2RY14 and RASAL3) of immune and stromal score-related risk signatures for CESC suggested that none of them was a statistically prognostic factor for disease-free survival of 300 CESC patients (data not shown). Two signature genes (BNC1 and CCDC80) were signi cantly correlated with the inferior disease-free survival of 285 OVCA patients from GSE9891, consistent with their indication of worse survival from univariate and multivariate Cox regression analysis. As for UCEC, IHH expression was obviously higher in 21 survivors of UCEC than in 24 non-survivors of UCEC, corresponding to the prognosis-protective effect of IHH from univariate and multivariate Cox regression analysis (Additional le 11).
. Protein expression of IHH in UCEC and paracarcinoma endometrium tissues As illustrated in Fig. 5, IHH presented low expression in cancer cells of UCEC tissues (4.833±2.863). By comparison, the expression of IHH was high in non-cancer endometrium tissues (9.917±2.019) (P < 0.001).

Discussion
In the current work, we constructed a set of immune and stromal score-based risk signatures for gynaecologic cancers and explored the molecular mechanism underlying the immune and stromal scorerelated risk signature across pan-gynaecologic cancers. The prognosis prediction ability of the immune and stromal score-related risk signature withstood the tests of tROC curves and Kaplan-Meier survival curves, proving to be effective in risk strati cation. Although previous researchers have worked out immune-related risk signatures for cervical cancer and OVCA [14,15], risk signatures developed by prior studies were constructed on immune cells or immune genes imported from the ImmPort database for only one gynaecologic cancer. The highlights of the current work embodied in the methods adopted to identify immune and stromal score-related signature genes, the research scope of pan-gynaecologic cancers, and multi-omics analysis for investigating the molecular mechanism of immune and stromal score-related signatures in gynaecologic cancers.
The immune and stromal landscapes of the three gynaecologic cancers varied from each other, and there was also similarity between the immune backgrounds of the three gynaecologic cancers. Six genes, including TRAV9-2, CD3G, TRBC2, LINC00861, MS4A1 and GIMAP7, all demonstrated signi cant prognostic values across three gynaecologic cancers. We assumed that the six genes might serve as the pivots for connecting the immunogenetic mechanism of the immune and stromal score-related risk signatures in gynaecologic cancers. None of the six genes have been studied in gynaecologic cancers.
We for the rst time mined out the prognostic values of these genes in pan-gynaecologic cancers and further evaluated the close association between expression of the six genes in gynaecologic cancers and tumour purity or in ltration of immune cells in the three gynaecologic cancers. GSVA analysis for immune-score-related risk signature strati ed groups of three gynaecologic cancers also provided essential clues for the crosstalk immunogenetic mechanisms underlying the initiation and progression of the three gynaecologic cancers. . Participation of the immune and stromal score-related risk signature genes in the above KEGG pathways might explain the mechanism of risk signatures across pan-gynaecologic cancers. The different mutations that occurred in risk signature genes from SNP analysis in this study also drop hints for interpreting the differential survival conditions in high-and low-risk gynaecologic patients.
Finally, validation of the risk signatures in independent datasets was an indispensable step. Although we did not nd appropriate microarrays to examine the prognostic e ciency of the immune and stromal score-related risk signatures in gynaecologic cancers, we at least veri ed the signi cant prognostic value of genes such as BNC1, CCDC80 and IHH in gynaecologic cancers. Moreover, the down-expression of IHH in UCEC was con rmed by IHC, which was in alignment with the protective effect of IHH on the prognosis of UCEC patients. IHH is a key constituent of the hedgehog (HH)  Limitations of the present study will be narrated in the following aspects. We only included three types of gynaecologic cancers with ample prognostic data in this study (CESC, OVCA and UCEC). We would construct immune and stromal score-related risk signatures in other gynaecologic cancers, such as uterine carcinosarcoma, when we collect enough survival information to test the application of the immune and stromal score-related risk signatures in pan-gynaecologic cancers. Second, in vitro and in vivo experiments were warranted to investigate the functional roles of the signature genes on the immune or stromal-related processes of gynaecologic cancers. Third, it was necessary to validate immune and stromal score-related risk signatures built in the present study through sequencing analysis of clinical gynaecologic samples when experimental conditions permit.

Conclusion
In summary, the immune and stromal score-related risk signature constructed in this study might be promising prognostic markers for pan-gynaecologic cancers.     Differentially enriched KEGG pathways between high-or low-risk groups of gynaecologic cancer patients.