1. Analysis of differentially expressed genes in SLE and HCC
The SLE mRNA-Seq datasets GSE17755, GSE46923, GSE50772, GSE72326 and GSE14860, and HCC mRNA-Seq datasets GSE14323, GSE14520, GSE25097, GSE36376, GSE76427 were downloaded through the GEO databases. The SLE and HCC data were collated and normalized respectively, followed by differential expression analysis using GEO2R (Fig. 2.A, B), "Limma" R package (Fig. 2.C, D) and WGCNA (Fig. 3.A-F) for the mRNA expression data of the two diseases, respectively. The setting condition was LogFC ≥ 1, P < 0.05. And the MEblack module in SLE, and MEyellow in HCC were defined as probe modules based on the significance of expression differences. The results showed that there were 234/286/1710 differentially expressed genes(DEGs) in HCC, and 263/64/85 DEGs in SLE using GEO2R, "Limma" R package, and WGCNA, respectively. Then we took the intersection of the DEGs screened using the three methods in HCC and SLE respectively, and the DEGs that appeared in at least two screening methods were defined as the DEGs in this study. The results showed that 254 DEGs were obtained in HCC (Fig. 3.G), and 47 DEGs were obtained in SLE (Fig. 3.H). We further investigated the common DEGs in SLE and HCC and took the intersection of the DEGs in both diseases, and the results showed that three genes were significantly differentially expressed in both diseases (Fig. 3.I). TOP2A, DTL, and CCNB2 (Fig. 3.J-L), in that order.
2. Characteristic gene screening
After the initial screening, we found three common genes among the differentially expressed genes of SLE and HCC. Then we further screened the three intersecting genes using the random forest tree (RF) learning algorithm based on the expression of the three intersecting genes, and the best NTree was selected based on the minimum cross-validation error in 10 cross-validations, and the MTRY and NTree were set to 76 and 500 in our study(Fig. 4.A). Based on the importance circle plot results, we selected the first 2 genes and defined these 2 genes as the common characteristic genes of SLE and HCC (Fig. 4.B).
3. Expression analysis of characteristic genes in SLE and HCC
After identifying the common signature genes in SLE and HCC, this study further analyzed the expression of TOP2A and CCNB2 in both diseases, and the results showed that the expression levels of TOP2A and CCNB2 were significantly upregulated in both SLE and HCC compared to the normal population (Fig. 4. C-F). In order to verify the authenticity of the results, we further verified the expression of the two characteristic genes using HCC mRNA-Seq data from the TCGA database and finally obtained the same conclusion (Fig. 4. G, H). Then, we explored the immunohistochemical staining results of the 2 signature genes in HCC using the HPA (https://www.proteinatlas.org/) database, and the results showed that the levels of both TOP2A and CCNB2 were significantly up-regulated in HCC tissues compared with paracancerous tissues in line with our previous analysis (Fig. 4. I, J ).
4. Protein interaction network and GO/KEGG analysis
We then explored the related genes of TOP2A and CCNB2, including co-expression, co-localization, gene co-occurrence, gene neighborhood, etc., with Confidence > 0.40. The results are presented in the form of protein interaction networks as Fig. S1.A, B in the Supplementary Material. We also used the MOCDE plugin in Cytoscape to extract the main submodules of the two protein interaction networks separately, and the results are shown in Fig. S1.C, D in the Supplementary Material. To further clarify the functions of the related genes of the two signature genes, we further performed GO and KEGG analyses on the related genes of their protein interaction networks to explore the functional localization of TOP2A and CCNB2 and their related genes. The results showed that the interacting genes of TOP2A were mainly enriched in organelle fission, nuclear division, and chromosome segregation at the biological process (BP) level, and at cellular component (CC) level, they were mainly enriched in The spindle, condensed chromosome, and chromosome, centromeric region, etc. were mainly enriched at the cellular component (CC) level. At the level of Molecular functions (MF), the enrichment was mainly in microtubule motor activity, microtubule binding, and tubulin binding, while at the level of KEGG analysis, the enrichment was mainly in Cell cycle-related pathways. While the interacting genes of CCNB2 were mainly enriched in the nuclear division, mitotic nuclear division, and mitotic sister chromatid segregation at the biological process (BP) level, and so on. At the level of cellular component (CC), they were mainly enriched in the spindle, chromosomal region, and chromosome, centromeric region, etc. At the level of Molecular functions (MF), it is mainly enriched in microtubule binding. The KEGG analysis was also enriched in Cell cycle-related pathways(Fig. S1.E, F). Through GO and KEGG analysis, we found that the interactions of TOP2A and CCNB2 were mainly enriched in cell division and cycle-related functions and pathways, suggesting that their related functions may be strongly related to it.
5. Diagnostic efficacy analysis, survival analysis, and risk model construction
Then we also evaluated the diagnostic efficacy of TOP2A and CCNB2 in HCC, and the results showed that the diagnostic ROC-AUC of both genes in HCC was greater than 0.9, which showed good diagnostic efficacy in HCC (Fig. 5. A, B). We also investigated the relationship between the two genes and the survival prognosis of HCC patients, and the results showed that the expression of TOP2A and CCNB2 was associated with poorer survival in patients, whether in terms of overall survival (OS), disease progression-free interval (PFI) or disease-specific survival (DSS). The upregulation of TOP2A and CCNB2 expression was correlated with poorer prognosis of patients (Fig. 5. C-H). Based on the above analysis, we considered that TOP2A and CCNB2 not only showed good diagnostic ability for HCC, but also significantly correlated with the survival prognosis of patients, and were important biomarkers and predictive targets for HCC. Therefore, we constructed a nomogram risk prediction model for HCC based on the expression levels of TOP2A and CCNB2 (Fig. 5.I). To validate the predictive performance of the nomogram risk model, we first plotted a calibration curve to assess the model fit, and the results showed that the model predictions fit well with the standard curve (Fig. 5.J), suggesting that the model has excellent predictive performance. In addition, we also observed that the true positive detection rate of the model was close to the true positive rate in the clinical impact curve of the model, suggesting that the disease specificity of the model is high (Fig. 5.K). In this study, the decision curve analysis(DCA) of the model was also plotted to evaluate the model, and the results showed that the decision curve exhibited a high net clinical yield, also suggesting that the model has good predictive efficacy and has good potential for clinical application (Fig. 5.L).
6. Tumor immune correlation analysis
We also analyzed the relationship between TOP2A, CCNB2and different immune checkpoints and tried to investigate the association between the two and different immune checkpoints. The results showed that TOP2A showed positive correlation with many immune checkpoints such as CD276, TNFSF4, LAIR1, PDCD1, CTLA4, CD86, ICOS, TNFSF15, CD80, HAVCR2 and NRP1 (R > 0.25, Fig. 6.A). And CCNB2 also showed a good positive correlation with common immune checkpoints such as CD276, PDCD1, LAG3, TNFRSF18, TNFSF4, LGALS9, CTLA4, LAIR1, CD86 and HAVCR2 (R > 0.30, Fig. 6.B). In addition, we also found a wide range of positive correlations among different immune checkpoints, suggesting a certain co-expression relationship between the expression of different immune checkpoints, which is beneficial for further research of multi-targeted drugs.
Tumor immunity is an important part of the tumor biological process, and the immune response in the tumor microenvironment has an important feedback role in the disease development and treatment of tumors, which is closely related to patient prognosis. The composition of immune cells in the tumor microenvironment of HCC was also analyzed based on CIBERSORT, as shown in Fig. 6.C. Further analysis of the infiltration levels of each immune cell in normal tissues and HCC showed that the infiltration levels of B cells memory, T cells regulatory (Tregs), NK cells activatedand Macrophages M0 were significantly increased compared to normal tissues, while B cells naive, T cells CD4 memory resting, T cells CD8, T cells gamma delta, Monocytes, Macrophages M1 and Macrophages M2 were significantly decreased (Fig. 6.D, P < 0.05).
Following this, we also analyzed the relationship between different immune cells, and the results showed that except for a few immune cells such as Mast cells resting and NK cells activated, T cells regulatory (Tregs) and B cells memory, which showed significant positive correlations, most of the immune cells showed negative correlations with each other. For example, Macrophages M0 and Macrophages M1, Macrophages M0, Monocytes, Macrophages M0 and T cells CD8, Macrophages M0 and T cells gamma delta, T cells follicular helper with T cells CD4 memory resting, T cells CD4 memory resting with T cells CD8, T cells CD4 memory resting with T cells CD4 memory activated, etc. (R > 0.40, P < 0.001, Fig. 6.E).
In addition, we also investigated the correlation between two characteristic genes, TOP2A and CCNB2 and levels of immune cell infiltration, and the results showed that both TOP2A and CCNB2 showed a good positive correlation with Th2 cells (R > 0.70). CCNB2 showed a high negative correlation with Th17 cells, Neutrophils (Fig. 6.F, R > 0.35, P < 0.05). TOP2A showed a high negative correlation with DC, Th17 cells and Neutrophils (Fig. 6.G, R > 0.35, P < 0.05).
7. Drug sensitivity analysis
The association between the expression levels of TOP2A and CCNB2 and the sensitivity of common drugs was predicted based on the pRRophetic algorithm of drug half-inhibition concentrations and HCC gene expression profiles. The results showed that the sensitivity of dozens of drugs, including 5-Fluorouracil, Crizotinib, Cyclopamine, Dasatinib, Doxorubicin, Erlotinib and Gemcitabine, increased significantly with the increase of TOP2A expression level (Fig. 7.A, P < 0.001). Similarly, the sensitivity of dozens of drugs such as Crizotinib, Cyclopamine, Doxorubicin, Etoposide, Midostaurin, Parthenolide and Sunitinib was also significantly increased with the increase of CCNB2 expression level (Fig. 7.B, P < 0.001). Based on this, the sensitivity of some clinical drugs to HCC can be further speculated, and further studies of HCC therapeutic drugs can be conducted, which are expected to provide new therapeutic strategies for the treatment of HCC, but a large number of clinical trials need to be conducted subsequently.
8. Pathway analysis
Through the previous functional and pathway analysis of the interacting proteins of the two signature genes, we found that they are mainly involved in cell proliferation and cell cycle-related biological processes, so we further explored the pathways that two signature genes involved in. And the results showed that inhibition of CCNB2 expression in the P53 signaling pathway resulted in cell cycle arrest in the G2/M phase. In contrast, we found that the expression of CCNB2 was significantly upregulated in HCC (Fig. 8). Therefore, we speculate that the upregulation of CCNB2 expression may promote the G2/M phase transition and inhibit apoptosis in HCC cells, which may be closely related to the proliferation and derivation of tumor cells.