A Novel Immune Checkpoint-Related Gene Signature to Predict Overall Survival and Immunotherapy Response in Triple-Negative Breast Cancer

Triple-negative breast cancer (TNBC) is a highly aggressive subtype of breast cancer that lacks effective therapeutic targets. Immunotherapy is considered as a novel treatment strategy for TNBC. However, only some patients could benet from the treatment. Limited studies have comprehensively explored expression patterns and prognostic value of immune checkpoint genes (ICGs) in TNBC. In this study, we downloaded relevant ICGs expression proles and clinical TNBC data from the Cancer Genome Atlas (TCGA) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database. The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was employed to develop a multi-gene signature for predicting the prognostic outcome. PDCD1, PDCD1LG2 and KIR3DL2 were identied as hub genes and incorporated into the model. This gene signature could stratify patients into two prognostic subgroups, and unfavorable clinical outcomes were observed in high-risk patients. The predictive performance was assessed by the receiver operating characteristic curves. Moreover, we also analyzed differences in immune status and therapeutic response between both groups. This novel gene signature may be served as a robust prognostic marker, but also an indicator reecting immunotherapy response.


Introduction
Breast cancer (BC) is the most common malignant tumor, posing a serious threat to the health of women worldwide [1]. As a result of high heterogeneity of BC, different types differ in terms of morphology, molecular pathological features, clinical characteristics and response to treatment [2,3]. Triple negative breast cancer (TNBC), a highly aggressive subtype characterized by negative expression of hormone receptors and human epidermal growth factor receptor 2, comprises about 15-20% of all BC cases and is more common in younger patients [4,5]. Due to the paucity of effective therapeutic targets, surgery combined radiotherapy and chemotherapy is the main therapeutic approach [6]. However, the resistance of the chemoradiotherapy and the aggressive nature of TNBC trigger to the high postoperative recurrence rate which even reached 25%-40% [5] and the median survival time was only 20 months [7,8]. Therefore, it's urgent to explore new therapeutic approaches to improve treatment outcomes.
Recently, immunotherapy has continued to bring breakthroughs and surprising results in the treatment of refractory tumors such as melanoma [9], lymphoma [10] and lung cancer [11], and some studies have also demonstrated immunotherapy may provide new ideas for the treatment of TNBC [12,13]. Immune checkpoints are considered as the most important therapeutic targets and immune checkpoint inhibitors (ICIs) were well-studied in different malignancies [14]. Mechanistically, ICIs could block immunosuppressive receptors and strengthen the capability of cytotoxicity and proliferation of tumorin ltrating lymphocytes (TILs) [15]. In fact, TNBC shows pronounced genomic instability and tumor mutational loads so that it appears higher tumor immunogenicity compared with other subtypes [16,17].
Thus, TNBC is readily recognized by the immune system and expected to gain bene ts from immunotherapy. However, the overall remission rate for TNBC patients remained only 5%-23% [18], which indicated that not all individuals could bene t from the immunotherapy. Therefore, a robust biomarker is needed to identify potential bene ciary population.
Several published studies revealed that immune checkpoint genes (ICGs) correlated with immunotherapy response and prognosis of patients with nasopharyngeal carcinoma, hepatocellular carcinoma and other tumors [19,20]. However, studies for TNBC were limited to a single immune checkpoint gene and the ndings were still under debate. Few studies have systematically investigated the expression pattern of ICGs in TNBC or attempted to establish an immune checkpoint genes-based prognostic gene signature to facilitate and improve clinical decision making. In our present study, taking the advantage of genomics databases, we aimed to identify the prognostic ICGs and construct a novel gene signature for TNBC based on the transcriptomic and clinical data from the TNBC cohort in the Cancer Genome Atlas (TCGA) database and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database.
We expected that the novel gene signature could provide more references for individual prognosis and targeted therapy in clinical work.

Data source
The transcriptome data and clinical characteristics of samples were obtained from the TCGA (https://portal.gdc.cancer.gov/) and the METABRIC database (http://molonc.bccrc.ca/apariciolab/research/metabric/). All patients were required to meet the following inclusion criteria: (1) TNBC samples determined by immunochemistry results of ER, PR and HER2 status; (2) Transcriptome data and clinical data were comprehensive and available; (3) The overall survival time was longer than one month. After screening, 113 TNBC patients in the TCGA database and 286 patients in the METABRIC database were included for the subsequent analysis. The clinical characteristics of two cohorts were presented in Table 1. We used samples in the TCGA database as the training cohort and the METABRIC database was employed as the validation cohort. The overall work ow of this research is presented in Supplementary  Table 1)[21]. The expression data of above ICGs of TNBC and normal tissues from the TCGA database were extracted. We screened out differentially expressed immune-checkpoint genes (DE-ICGs) using the "limma" package under the R environment (version 4.0). |log 2 Fold Change|>1 and p value<0.05 were the cut-off criteria. Then, the heatmap of DE-ICGs was generated using the "pheatmap" R package. Volcano-plot was plotted with Graphpad Prism 7.
GO and KEGG enrichment analysis "Cluster pro ler" R package was applied to conduct GO functional annotation analysis and KEGG pathway enrichment analysis, aiming at exploring biological functions and signaling pathways that the DE-ICGs mainly participated in. GO functional analysis includes three main categories: biological processes (BP), cellular components (CC), and molecular functions (MF). The results are visualized using the "enrichplot" package.

Identi cation of prognosis-related ICGs
To identify the prognosis-related ICGs, we rstly matched the expression data of these ICGs with the survival information of TNBC patients. The "survival" package was used to conduct univariate Cox analysis to select the prognostic genes. Statistical signi cance was determined by p<0.05. Next, LASSO regression analysis was applied to reduce collinearity between genes and prevent over-tting of the subsequently constructed prognostic risk model. Dimensionality reduction could be achieved by compressing the regression coe cients of independent variables. Finally, the genes identi ed were entered into the multivariate cox regression analysis to obtain the ICGs associated with overall survival (OS) of TNBC patients.

Construction and validation of an ICGs-based prognostic gene signature
The coe cients of hub genes in the multivariate Cox regression were used to calculate the risk score for each sample. The risk score = β 1 *X 1 +β 2 *X 2 +…+β n *X n , where X represents the gene expression values and β represents the regression coe cient of each hub gene included in the ICGs signature model. The median score was considered as the cut-off value to divide the TNBC patients into a high-risk group and a low-risk group. The Kaplan-Meier survival curves were generated by R package "Survival" to further explore the association between the risk scores and prognostic outcomes. We also used "timeROC" package to plot the receiver operating characteristic (ROC) curve of the model and evaluate the accuracy of this prognostic model by calculating the area under the curve (AUC). Meanwhile, in order to verify the stability of our gene signature, external validation was also performed in the METABRIC database.

Correlation analysis with immune cell in ltration and immunotherapy response
To further elucidate the association between immune status of TNBC patients and risk scores, a single sample gene set enrichment analysis (ssGSEA) was implemented using "GSVA" package. The in ltration score of 16 immune cell subsets and 13 immune function pathways were compared between two risk groups. Immunophenoscore (IPS), a machine learning-based algorithm, was used for the quantitative evaluation of tumor immunogenicity. It was calculated according to the Z-score of representative cell type gene expression including: immunomodulators, effector cells, immunosuppressive cells and MHC molecules. The IPS ranges from 0 to 10. A higher score indicates an increased immunogenicity and a better immunotherapy response. We retrieved patients' IPS from The Cancer Immunome Atlas (TCIA) database and compared them between two groups. At the same time, expression of four ICGs (PD1, CTLA-4, PD-L1, PD-L2) was also compared between two groups.

Establish a predictive nomogram combined with clinical parameters
To evaluate whether this gene signature could serve as an independent prognostic predictor, univariate and multivariate Cox regression analysis were carried out. Variables were included in the model as covariates including: age, AJCC stage, whether the patients have taken surgery, radiotherapy or chemotherapy and risk score. Based on independent variables in both univariate and multivariate analysis, we established a nomogram by using "rms" R package. Next, calibration curves were further plotted to evaluate the consistency between the actual and predicted survival.

Results
Differentially expressed immune-checkpoint genes in TNBC Among the 79 immune-checkpoint genes, we identi ed a total of 43 genes from TCGA dataset, and extracted the expression pro le of these genes. Compared with normal tissues, 20 immune-checkpoint genes were differentially expressed in TNBC tumor tissues, among which 19 genes were upregulated and 1 gene was downregulated. The results were visualized using the heatmap ( Figure 1A) and the volcano plot ( Figure 1B).

Functional enrichment analysis of DE-ICGs
To further investigate functions and signaling pathways these DE-ICGs were involved in, GO and KEGG pathway analysis were conducted. The GO terms "T cell activation", "positive regulation of lymphocyte activation" and "regulation of leukocyte cell-cell adhesion" displayed signi cant enrichment in the category of biological process. For the category of cellular component, MHC protein complex was found to be enriched. For the molecular function, peptide antigen binding and antigen binding were the most abundant categories (Figure 2A). Meanwhile, KEGG pathway analysis revealed that these DE-ICGs were mainly associated with cell adhesion molecules, antigen processing and presentation, PD-L1 expression and PD-1 checkpoint pathway in cancer ( Figure 2B).

Identi cation of prognosis-related ICGs
Firstly, univariate Cox regression analysis was used to identify candidate ICGs associated with overall survival (OS) of TNBC. 10 ICGs were identi ed as potential OS predictors of TNBC patients, including IDO1, CD274, PDCD1LG2, PDCD1, CTLA4, ICOS, KIR3DL2, HLA-B, HLA-F, LGA3 ( Figure 3A). All these genes were protective factors. Comparison of mRNA expression of these genes in normal and TNBC tissues was shown in Figure 3B, and correlation between ten genes was depicted in Figure 3C. Next, LASSO regression analysis showed that 3 of 10 ICGs were the OS-related ICGs including PDCD1LG2, KIR3DL2 and PDCD1 ( Figure 3D-E). Finally, the 3 OS-related ICGs were included in the multivariant Cox regression model ( Table 2). The coe cients of the 3 OS-related ICGs were shown in Figure 3F.
Development and validation of an ICGs-based gene signature for individual survival prediction Based on coe cients and expression level of the 3 genes, the gene signature was generated: Risk score=-0.20*ExpPDCD1LG2-3.20*ExpKIR3DL2-0.17*ExpPDCD1. The median risk score (-0.91) was used as cut-off point, and 113 patients were classi ed into two prognostic subgroups. As shown in Supplementary Figure 2, clinical parameters distribution and expression patterns of three genes were visualized in the form of heatmap between the two groups. The high-risk group tended to have advanced pathological stages and lower expression level of these hub genes. Next, survival analysis showed that patients in the high-risk group exhibited a worse prognosis (p=4.719E-04, Figure 4A). The distribution of risk score and survival status further corroborated the above conclusion ( Figure 4B). We then evaluated the predictive accuracy of this gene signature by operating ROC curve analysis. The AUC for predicting 1-, 2-and 3-year OS was 0.925, 0.822 and 0.835, respectively ( Figure 4C). To validate the robustness of this model, we also applied this gene signature to the METABRIC cohort. In the validation cohort, the survival of low-risk patients was superior to high-risk patients ( Figure 4D, E). Moreover, ROC curves also indicated a fairly good predictive value of 3-ICGs signature in the validation cohort ( Figure 4F).

Establish a nomogram for clinical application
Univariate analysis showed that radiotherapy (p=0.032), AJCC-stage (p<0.001) and risk score (p=0.003) were statistically signi cant in predicting the prognosis of TNBC patients ( Figure 5A). Then, we investigated if the risk score was an independent predictor for clinical outcomes. As indicated by the multivariate analysis ( Figure 5B), AJCC-stage (p<0.001) and risk score (p=0.018) were independent predictive factors for TNBC prognosis. By integrating above risk factors, we established a nomogram, which could facilitate clinicians to make risk quanti cation for TNBC patients ( Figure 5C). By adding the points signed for each variable, we can obtain the total points and predict corresponding OS of each patient. Additionally, calibration curves revealed nomogram-predicted survival matched well with the actual observations, which laid a solid foundation for clinical application ( Figure 5D-F).

Patterns of immune cell in ltration and immunotherapy response of TNBC patients
To evaluate the differences of immune status in two prognostic subgroups, enrichment scores of immune cell subsets and immune function pathways were quanti ed. As expected, adaptive immunity cells and innate immunity cells were evidently enriched in low-risk patients, demonstrating a high percentage of CD8 + T cells, B cells, DCs, macrophages and NK cells ( Figure 6A). Higher enrichment scores were observed for cytokine-cytokine receptor interaction, check-point, cytolytic activity and T cell co-stimulation signaling pathways in low-risk group. Furthermore, the scores of type and IFN response were signi cantly higher than that in the high-risk group ( Figure 6B). Altogether, these results indicated that effective immune response was more activated in the low-risk group, which may contribute to differences in the therapeutic e cacy. Meanwhile, we compared the immunogenicity in patients between two groups using IPS as a quantitative index for analysis. Our results showed that low-risk group obtained higher IPS of PD-1, CTLA4, PD-L1 and PD-L2 ( Figure 6C). Similarly, the gene expression of CTLA4, PD-1, PD-L1 and PD-L2 was elevated in low-risk group ( Figure 6D). We could speculate that low-risk group patients might get a favorable response to immunotherapy due to their higher immunogenicity.

Discussion
In recent years, cancer immunotherapy has gained great attention. And its application in TNBC also shows new developments that hold the promise for improving outcomes of patients. Nonetheless, TNBC patients could hardly achieved complete pathological remission from therapy of ICIs [12]. Therefore, it is critical to explore validated biomarkers to accurately screen out the population that might bene t from the immunotherapy. Considering the complexity of tumor heterogeneity, a single biomarker to predict prognosis and treatment response, appears not optimal and needs further improvement. The previous studies indicated that several immune checkpoint gene signature model had fairly good predictive value of long-term prognostic outcomes and therapeutic response in lung cancer [22], endometrial cancer[23] and hepatocellular carcinoma [20]. However, relevant studies in TNBC are not systematic and remain to be further explored. So, in this study, utilizing the TCGA and METABRIC database, we analyzed the expression of 43 immune checkpoint genes retrieved from literature, and screened out 3 genes independently associated with prognosis of TNBC patients. Based on above genes, a 3-gene signature was nally generated, which could be used to realize the individualized prediction of prognosis and immunotherapy response.
The novel gene signature was composed of three genes including PDCD1, PDCD1LG2 and KIR3DL2. Our ndings suggested that all these genes were related to a favorable prognosis in patients with TNBC. showed that PD-L2 may be served as an indicator of poor prognosis in some digestive system tumors, while the opposite result was found in head and neck cancer [32]. Asano et al. [33] indicated that PD-L2 expression was not correlated with the prognosis of TNBC patients, which seemed not to be consistent with the ndings of our study. However, the conclusion was limited by the small sample size. Similarly, KIR3DL2 could inhibit the tumorkilling ability by suppressing the secretion of IFN-γ and TNF-α and the cytotoxic function of NK cells [34].

PDCD1 (also known as PD-1) is an important immunosuppressive molecule, mostly expressed on activated T cells and B cells. It is responsible for mediating the immune escape of tumor cells and
The effect of Lacutamab, an anti-KIR3DL2 monoclonal antibody, is being evaluated by the clinical trial with encouraging results [35,36]. However, the role of KIR3DL2 in TNBC has not been reported and further research is required.
With the risk signature, we could divide patients into low-and high-risk groups. Survival analysis revealed that the prognosis of low-risk group was clearly superior to the high-risk group, and ROC curve analysis re ected good predictive power of the gene signature. Meanwhile, we also testi ed the robustness of risk signature by external validation. In recent years, the nomogram-plot has been widely used in medical research and practice as a tool for prognosis evaluation. In our study, we developed a nomogram integrating three independent predictors for clinical outcomes, including risk signature and pathological stage. Consistently, the calibration curves showed the clinical value of nomogram was a reliable risk strati cation tool.
The immune microenvironment is known to have a close relationship with the tumorigenicity and development of TNBC. Our ndings showed that signi cantly abundant TILs, NK cells, CD8 + T cells, Tfh cells and B cells were observed in low-risk group patients determined by the 3-ICGs signature. The proportion of TILs in TNBC was closely related to the prognostic outcome [37]. Moreover, it was generally believed that enriched NK cells, B cells, CD8 + T cells were associated with favorable prognosis in TNBC patients. Likewise, Tfh cells played a key role in regulating the anti-tumor immune response of B cells [38].
In terms of immune signaling pathways, impaired immune activity was associated with a higher risk score, including cytolytic activity, T cell co-stimulation, type or IFN-response. In addition, IPS and the expression level of CTLA-4, PD-1, PD-L1 and PD-L2 were further investigated. IPS has been used to evaluate the response of tumors to immune checkpoint inhibitors [39]. For example, when atelizumab was approved for urothelial carcinoma and lung cancer, IPS was then de ned as an evaluation index.
Compared with patients in the high-risk group, our results showed that low-risk patients had higher expression of four representative ICGs and IPS. This suggested that low-risk patients were more likely to have higher immunogenicity and expected to bene t from immunotherapy. In clinical practice, for intractable breast cancer or recurrent or advanced breast cancer, the immune status of the patient's tumor microenvironment could be determined by detecting the expression levels of three genes in puncture specimens, which will be conducive to select an appropriate treatment plan, and develop a reasonable disease monitoring strategy.
In our study, we constructed an ICG-related prognostic gene signature for TNBC patients. The gene signature contained only three genes, and was more cost-effective and easier-to-use in clinical application. Meanwhile, external validation in the METABRIC database further con rmed the reliability of this novel gene signature. Moreover, from the immune status and IPS analysis, it could also be concluded that the gene signature may function as a potential immunotherapy response predictor. All these results will provide important guidance for clinical decision making. However, there were three limitations of our study. Firstly, this gene signature was constructed and validated based on public genomics databases, lacking prospective data to con rm these results. Secondly, the study subjects were dominated by North American population, and it was unclear whether this gene signature was equally valid for other ethnic population. Finally, the biological roles of genes in this gene signature remained to be further explored using experimental methods. Therefore, we would like to collect pathology specimens and clinical data from our own center for further validation and carry out more experimental work focusing on the function and biological mechanism of these hub genes in the future work.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.