CD52 is a prognostic biomarker and correlated with immune features in breast cancer

Background: Breast cancer (BRCA) is the most commonly diagnosed cancer of women, which is aggressive cancer and has a mortality rate. CD52 and its monoclonal antibody (Alemtuzumab) play a critical role in inammatory diseases, but the relationship between CD52 and BRCA is not clear. Methods: We rst used the random forest algorithm to nd the most critical genes related to the prognosis of BRCA patients. Then, according to the analysis of RNA sequence and clinical data of the TCGA dataset, we explored the relationship between CD52 with immune response-related pathways and immune metagenes. The pan-cancer analysis shows the importance of CD52 in a variety of tumors Results:CD52 was related to the prognosis of BRCA patients (p < 0.001). Subsequent analysis based on RNA-seq and clinical data from the TCGA dataset revealed that CD52 is positively correlated with immune response-related pathways and immune metagenes. TIMER analysis showed that CD52 expression was positively correlated with immune inltrating levels of B, CD4+ T, and CD8+ T cells, macrophages, neutrophils, and dendritic cells (DCs) in BRCA (r = 0.466, r = 0.645, r = 0.483, r = 0.149, r= 0.542,r = 0.665, respectively; p < 0.001). CpG sites (cg16068833, cg19743891, cg19743891, cg16664472, cg19677267, cg22517705, and cg27430637) were negatively correlated with CD52 expression (r = -0.662, r = -0.629, r =- 0.598, r = -0.519, r= -0.492, r = -0.445, respectively; p < 0.001). Furthermore, the expression of CD52 was signicantly correlated with the following pathological stages (T stage, N stage, and survival state; p=0.024, p=0.047, and p=0.007, respectively). The results of the pan-cancers study suggest that CD52 may play an important role in the occurrence, development, and prognosis of multiple tumors. Conclusions: These ndings suggested that CD52 is a promising immunotherapy target and prognostic prediction value for BRCA.

Conclusions: These ndings suggested that CD52 is a promising immunotherapy target and prognostic prediction value for BRCA.

Background
Breast cancer (BRCA) is the most commonly diagnosed cancer of women, which is second-deadliest cancer after lung cancer. The overall death rate of breast cancer increased by 0.4% per year in 2 decades since 1975. Even though up to 2017, the total fatalities have declined rapidly by 40%. About 13% of women are likely to be diagnosed with aggressive breast cancer during their lifetime, according to the estimate of the American Cancer Society in 2019 [1]. Invasive breast cancer accounts for a signi cant part [2,3]. Early-stage (stage I and II) perform favorable prognosis with a 5-year survival rate of 98% and 92% bene t by the popularization of mammography and the progression of targeted-therapy.
Nevertheless, the prognosis of breast cancer and the ve-year survival rate reveal signi cant disparity on account of the variety of scales, districts, ages, clinical stages, molecular phenotypes, and local immune in ltration. As a result, poor prognosis is not rare [1,4]. Thus, exploring more acute and useful biomarkers as a predictor is still around the corner.
Tumor-associated macrophages (TAMs) are the most critical components of tumor-in ltrating immune cells in the tumor microenvironment [5]. There are two principal functional states in TAMs: proin ammatory M1 macrophages which are indicated as protective factors for obliterating tumor cells ,   and alternatively activated M2 macrophages (which are considered as unfavorable factors for prompting tumor proliferation) [3,6,7]. Previous research has established that macrophages can decrease the expression of estrogen and progesterone receptors, whereas increasing the expression of urokinase-type plasminogen activator receptor and Ki67 in breast cancer. In addition their results demonstrated a signi cant positive association of TAMs and poor prognosis in breast cancer patients [8].
In this study, we rst performed univariate Cox proportional analysis for selecting prognostic macrophage-related gene signatures. Then, the random forest was recognized to build a 13-gene signature for BRCA, and the variable importance suggested that CD52 is the most critical for further analysis.CD52(CAMPATH-1 antigen) is a glycosylphosphatidylinositol (GPI) -anchored protein of 12 amino acids present on the cell surface of immune cells, including monocytes/macrophages [9,10].
Piccaluga PP et al found that CD52 was up-regulated in peripheral T-cell lymphoma, and the estimation of CD52 expression might provide a theoretical basis for the e cacy of treatment response [11].
Moreover, Alemtuzumab, an anti-CD52 monoclonal antibody, has been investigated as a molecular target for immunotherapy to treat acute myeloid leukemia [12]. Nevertheless, the relationship between CD52 expression, prognostic value, and immune in ltration in BRCA is not clear. Therefore, we analyzed the clinical and molecular data of CD52 in BRCA samples from the TCGA dataset to explore the expression of CD52 and its relationship with immune-related molecules. It may provide a possible basis for the use of Alemtuzumab in the treatment of BRCA patients.

Data source and downloaded
We downloaded available RNA-sequence and clinical data of invasive breast cancer patients from the TCGA database (https://portal.gdc.cancer.gov). The RNA-seq results were combined into the gene expression matrix. We obtained all methylation information from patients with BRCA and normal tissue controls from the UCSC Xena browser (https://xenabrowser.net/).

Extraction of macrophage-related gene matrix and selection process of the target gene
We extracted macrophage-related gene expression patterns according to the gene signatures of M1 and M2 macrophages in the literature [13]. We performed a univariate Cox proportional hazard regression to identify the differentially expressed hypoxia-related genes associated with overall survival time (P<0.05 was considered statistically signi cant). Then we use a random forest to establish a prognosis model.
The most crucial gene was selected as the target gene for further analysis according to the importance of variables. The Kaplan-Meier (KM) method was used to evaluate survival differences. The receiver operating characteristic (ROC) curve identi es the accuracy of the model prediction. We used the STRING database (https://string-db.org/) version 11.0 to assess the protein-protein interaction network information of the target gene.

Relationship between CD52 expression and clinical symptoms
The "ggstatsplot" package validated the relationship between and expression of CD52 in the TCGA database and six clinical symptoms (age, survival state, stage, T stage, M stage, N stage).

GSEA-based enriched KEGG analysis
To detect signi cant differences differentially activated in BRCA, we performed GSEA (Gene Set Enrichment Analysis)-based enriched KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis between low and high CD52 expression phenotype using GSEA software. The enrichment score (ES) >0.4 as a lter and false discovery rate (FDR)value <0.05 was considered to be statistically signi cant.
ssGSEA analysis revealed the immune features of CD52 in BRCA We downloaded 16 signatures from the literature [14], including immune-relevant signature, stromalrelevant signature, and mismatch-relevant signature. The list of these genes is shown in Supplementary  Table 1. We performed ssGSEA (Single-Sample GSEA) analysis to determine the enrichment scores of immune features using the GSVA package of R language [15]. We calculated Pearson correlation values between CD52 expression and immune features based on correlation analysis.

Association between CD52 expression and immune metagenes
We downloaded seven immune metagenes from the literature [16], including IgG, hemopoietic cell kinase HCK , MHC-I (major histocompatibility complex-I), MHC-II (major histocompatibility complex-II), LCK (lymphocyte-speci c kinase), STAT1 (signal transducer and activator of transcription 1), and Interferon. The list of these genes is shown in Supplementary

Correlation of CD52 expression and methylation
We obtained DNA methylation data from the UCSC Xena browser. We calculated the correlation between the CD52 expression and the methylation of the CpG sites by using Pearson's correlation analysis. 0.4 as a lter value of the correlation coe cient.

Assessment of the expression and prognostic importance of CD52 in pan-cancers
We used the TIMER database to evaluate the expression of CD52 in pan-cancers. Moreover, the present study assessed the prognostic importance of CD52 in pan-cancers, using the TCGA analysis database, the Kaplan-Meier Plotter database (http://kmplot.com/analysis/).  Fig.1C). Based on the variable importance in Fig.1B, CD52 is the most crucial gene, and the survival analysis results indicated that the prognosis value of CD52 was signi cant (Fig.1D). The PPI of the CD52 protein showed its value (Fig.1E). High expression of CD52 was associated with low risk and was a protective factor. These results indicated that CD52 is a prognostic marker for further analysis.

Relationship between expression of CD52 and clinical symptoms
By exploring the association between clinical symptoms and expression of CD52 in the TCGA database, we found that there was a very signi cant correlation between CD52 expression and the T stage, N stage, and survival state (p = 0.024, p = 0.047 and p = 0.007) ( Fig.2A-C). Age, N stage, and stage were not signi cantly correlated with the CD52 expression (Supplementary gure 3).
GSEA analysis of CD52-related pathway GSEA analysis results showed that B cell receptor, chemokine, NOD-like receptor, Toll-like receptor, and T cell receptor signaling pathways were signi cantly enriched in the CD52 high expression group, all of which are strictly related to tumor immunity. In contrast, Glycosylphosphatidylinositol (GPI)-anchor biosynthesis and metabolism-related pathways were signi cantly enriched in CD52 low expression samples (Fig.2D).

Assessment of the expression and prognostic importance of CD52 in pan-cancers
We used TIMER and Kaplan Meier Plotter databases to evaluate the expression and prognostic value of CD52 in pan-cancers.

Discussion
Based on the expression of macrophage related genes in TCGA of BRCA patients, univariate Cox proportional analysis and the random forest algorithm were performed to build a prognostic model. The variable importance suggests that CD52 is the most critical gene. Moreover, patients with high CD52 expression have a better prognosis. CpG methylation typically results in abnormal gene expression [19]. In our study, six CpG sites (cg16068833, cg19743891, cg19743891, cg16664472, cg19677267, cg22517705, and cg27430637) were negatively correlated with CD52 expression (r = -0.662, r = -0.629, r =-0.598, r = -0.519, r= -0.492, r = -0.445, respectively; p < 0.001). DNA methylation is most common in CpG dinucleotide and is related to the clinicopathological features of BRCA patients, including stage and histological grade [20,21]. Furthermore, the CD52 expression was signi cantly correlated with the following pathological stages (T stage, N stage, and survival state). These results suggest that CD52 is important for the prognosis of BRCA patients.
CD52 epitopes were expressed on the surface membrane of peripheral lymphocytes, monocytes and macrophages, and on the epithelial membrane of the male reproductive system [22]. Rashidi M et al have demonstrated that CD52 can inhibit the activation of NF -κ B by inhibiting the signal transduction of Tolllike receptor and tumor necrosis factor receptor and thus inhibit the production of in ammatory cytokines by macrophages, monocytes, and dendritic cells [23]. As the GSEA results show, CD52 was signi cantly enriched in a variety of immune-related pathways,such as B cell receptor, chemokine, NOD-like receptor, Toll-like receptor, and T cell receptor signaling pathways. These enrichment pathways are closely related to the immune in ltration in cancer [24,25].
Previous studies demonstrated has demonstrated that immune cell in ltration was associated with activation of the immune response, and it will contribute to anti-tumor effects and get a better prognosis in BRCA [26][27][28]. To clarify the correlation between CD52 expression and immune features based on ssGSEA analysis, we found that CD52 expression is related to T cell and macrophage related pathways and functions. Previous studies have con rmed that immune in ltration is widespread in breast cancer tissues and affects patient prognosis [29,30]. We further investigated the correlation of CD52 expression with multiple levels of immune in ltration in BRCA. Our results indicated a signi cant positive correlation between the in ltration levels of CD8 + T cells, CD4 + T cells, B cells, DC cells, and neutrophils in BRCA and the expression of CD52. Thus, a large number of data con rmed that CD52 plays a role in tumor immunology in regulating BRCA.
Drug Alemtuzumab, an anti-CD52 monoclonal antibody, has been used in the treatment of various immune-related diseases, including multiple sclerosis and in ammatory myopathy [31,32]. We found that CD52 expression is abnormal in breast cancer patients and its role in regulating tumor immunity. Meanwhile, the results of the pan-cancer study suggest that CD52 differentially expressed in multiple tumors, which may play an essential role in the occurrence, development, and prognosis of multiple tumors. These results may indicate the possibility of expanding the application of Alemtuzumab in tumor immunotherapy.
However, the current study was limited by the absence of experimental evidence. Our PPI results suggested the protein-protein interaction of CD52 related-proteins, which may provide the basis for further mechanism study.

Conclusion
These ndings will deepen our understanding of CD52 expression, prognosis, and immune-related features in BRCA.CD52 is a promising immunotherapy target for most cancer patients, and the drug (Alemtuzumab) may also bring new hope for immunotherapy of cancer.