Selection of CD52 as a prognostic marker in BRCA
A univariate Cox proportional hazard regression analysis showed that 237 genes associated with overall survival time (Supplementary figure 3). The random forest prognosis model identified 13 genes (HEXA, TIE1, TMED5, SNX5, CD52, DLAT, IFNAR1, XPNPEP2, STX4, LONRF3, SLAMF8, EXOC5, CLCN7) (Fig.1A, B). We calculated the risk score of each sample and grouped the samples into high-risk and low-risk according to the median risk score (cutoff = 43.255). The prognoses of the high-risk and low-risk groups significantly differed (Supplementary figure 1, Supplementary figure 2). The average 12-, 36-, and 60-month Area Under Curve (AUC) for the CD52 was 0.98 (Fig.1C). Based on the variable importance in Fig.1B, CD52 is the most crucial gene, and the survival analysis results indicated that the prognosis value of CD52 was significant (Fig.1D). The PPI of the CD52 protein showed its value (Fig.1E). High expression of CD52 was associated with low risk and was a protective factor. These results indicated that CD52 is a prognostic marker for further analysis.
Relationship between expression of CD52 and clinical symptoms
By exploring the association between clinical symptoms and expression of CD52 in the TCGA database, we found that there was a very significant correlation between CD52 expression and the T stage, N stage, and survival state (p = 0.024, p = 0.047 and p = 0.007) (Fig.2A-C). Age, N stage, and stage were not significantly correlated with the CD52 expression (Supplementary figure 3).
GSEA analysis of CD52-related pathway
GSEA analysis results showed that B cell receptor, chemokine, NOD-like receptor, Toll-like receptor, and T cell receptor signaling pathways were significantly enriched in the CD52 high expression group, all of which are strictly related to tumor immunity. In contrast, Glycosylphosphatidylinositol (GPI)-anchor biosynthesis and metabolism-related pathways were significantly enriched in CD52 low expression samples (Fig.2D).
CD52 was closely related to different signatures
The correlation analysis result demonstrated that CD52 expression was positively correlated with the activation of immune-relevant signature (Fig.3A), especially CD8 T effector (cor=0.758), Immune checkpoint(cor=0.758), and Antigen processing machinery signature(cor=0.483) (Fig.3B-D).
CD52 was closely related to immune metagenes
The relevant analysis result had indicated that CD52 expression was positively correlated with the activation of immune metagenes (Fig.4A), especially LCK (cor=0.838), MHC-I (cor=0.688), and HCK (cor=0.677) (Fig.4B-D).
CD52 was correlated with immune infiltration level in BRCA
TIMER database was used to evaluate the relationship between CD52 expression and immune infiltration level.CD52 expression was positively correlated with immune infiltrating levels of B cells(r = 0.466, p = 6.29e-78), CD8+ T cells(r = 0.483, p = 4.59e-58), CD4+ T cells (r = 0.645, p = 3.79e-114), macrophages (r = 0.149, p = 2.77e-06), neutrophils (r= 0.542, p = 1.45e-73) and dendritic cells (r = 0.665, p = 1.11e-122) in BRCA (Fig.5A).We also provided the comparison of tumor infiltration levels among BRCA with different somatic copy number alterations for CD52 by using the SCNA module, including deep deletion, arm-level deletion, diploid/normal, arm-level gain, and high amplification. Fig.5B showed CD52 was related to infiltration level for each SCNA category (*p<0.05; **p< 0.01; ***p< 0.001).
Correlation of CD52 expression and methylation
Seven methylation sites (cg00813993, cg16068833, cg19743891, cg19743891, cg16664472, cg19677267, cg22517705, and cg27430637) were identified. There are six methylation sites with a correlation coefficient higher than 0.4 and a P value less than 0.05 (Fig.5A-F).
Assessment of the expression and prognostic importance of CD52 in pan-cancers
We used TIMER and Kaplan Meier Plotter databases to evaluate the expression and prognostic value of CD52 in pan-cancers. CD52 expression was significantly higher in BRCA (Breast invasive carcinoma), CHOL (Cholangiocarcinoma), ESCA (Esophageal Squamous Cell Carcinoma), HNSC (Head-neck squamous cell carcinoma), KIRC (Kidney renal clear cell carcinoma), KIRP (Kidney renal papillary cell carcinoma). However, CD52 expression was significantly lower in BLCA (Bladder Urothelial Carcinoma), COAD (Colon adenocarcinoma), KICH (Kidney chromophobe), LUAD (Lung adenocarcinoma), LUSC (Lung squamous cell carcinoma), PRAD (Prostate adenocarcinoma), READ (Rectum adenocarcinoma) (Figure 7A). Furthermore, the results show that CD52 has prognostic value in eight kinds of cancers, including BRCA, CESC (Cervical squamous cell carcinoma), ESCA, HNSC, LUAD, SARC (Sarcoma), THYM(Thymoma), and UCEC (Uterine corpus endometrial carcinoma) (Figure 7B-G).