3.1 Identification of module genes in silicosis and COVID-19
The quality control box plots of GSE32147 and GSE49144 datasets after being merged was shown in Figure S1. For these datasets, after removing the outlier sample with h > 18 (Fig. 2A), the optimal soft threshold, β = 9, was determined according to a scale-free fitting index of 0.80 (Fig. 2B). The silicosis gene set was then divided into eight gene modules, of which the turquoise module had the strongest negative correlation with silicosis (r = -0.90), and the blue module had the strongest positive correlation with silicosis (r = 0.41) (Fig. 2C, D). Moreover, the correlation between turquoise and blue module membership and the gene significance for silicosis was 0.92 and 0.25, respectively, which is statistically significant (Fig. 2E, F).
For the GSE157103 dataset, after removing the outlier sample with h > 100 (Fig. 2G), the optimal soft threshold, β = 12, was determined according to a scale-free fitting index of 0.80 (Fig. 2H). The COVID-19 gene set was then divided into eight gene modules, of which the magenta module had the strongest negative correlation with COVID-19 (r = -0.54), and the brown module had the strongest positive correlation with COVID-19 (r = 0.41) (Fig. 2I, J). Moreover, the correlation between magenta and brown module membership and the gene significance for COVID-19 was 0.82 and 0.40, respectively, which is statistically significant (Fig. 2K, L).
3.2 GO pathway enrichment analysis
For GO enrichment analysis, the top ten significant terms showed that the blue module of silicosis was mainly involved in ameboidal-type cell migration, epithelial cell migration, epithelial migration, muscle system processes, ossification, the regulation of protein binding, the regulation of vasculature development, response to mechanical stimulus, tissue migration, and vascular processes in the circulatory system (Fig. 3A). Concerning the GO enrichment analysis in the turquoise module of silicosis, the top ten significant terms were ameboidal type cell migration, cell chemotaxis, ERK1 and ERK2 cascade, leukocyte mediated immunity, leukocyte migration, myeloid leukocyte migration, neutrophil migration, regulation of angiogenesis, regulation of ERK1 and ERK2 cascade, and regulation of vasculature development (Fig. 3B).
Meanwhile, the top ten GO enrichment terms of the brown module in COVID-19 were lysosome organization, lytic vacuole organization, mitochondrial translational elongation, mitochondrial translational termination, mitochondrial transport, neutrophil activation involved in the immune response, neutrophil degranulation, the positive regulation of proteolysis, the proteasomal protein catabolic process, and the proteasome-mediated ubiquitin-dependent protein catabolic process (Fig. 3C). The top ten GO enrichment terms of the magenta module in COVID-19 were mainly involved in chromosome segregation, the defense response to bacteria, mitotic nuclear division, mitotic sister chromatid segregation, neutrophil activation involved in the immune response, neutrophil degranulation, nuclear chromosome segregation, nuclear division, organelle fission, and sister chromatid segregation (Fig. 3D).
3.3 Identification of shared genes in silicosis and COVID-19
According to the Venn diagram, the positive correlation module of silicosis and COVID-19 shared 20 genes, and the negative correlation module shared 23 genes (Fig. 4A). The PPI network of shared genes included 43 nodes and 33 edges, for which the PPI enrichment P-value was 1.65e-05. After removing the isolated genes, 21 genes were visualized using Cytoscape software (Fig. 4B). Based on the results of the enrichment analysis, these genes were mainly involved in nucleobase – containing small molecule metabolic pro, glial cell activation, mitotic cell cycle process, response to hydrogen peroxide, extrinsic apoptotic signaling pathway, regulation cell cycle process, regulation of cyclin-dependent protein serine/threonir, mitochondrial gene expression, neutrophil degranulation (Fig. 4C). Finally, submodule analysis showed that PRC1, KIFC1, BUB1, MCM6, CCNB2, CDKN3, and RRM2, which were hub shared genes of silicosis and COVID-19, possessed the highest MCODE scores (Fig. 4D).
3.4 Transcription factors, miRNAs, and the disease regulatory network
To reveal the regulatory mechanisms of shared genes at the transcriptome level, we utilized a network-based approach to determine the regulatory relationships between TFs and miRNAs. In the regulatory network of miRNAs, these seven shared genes were linked to 67 miRNAs, with RRM2 having the highest number of miRNA linkages (29) (Fig. 5A). Simultaneously, in the TF regulatory network, these shared genes were linked to 11 TFs, including TAF1, FOXP3, PAX3, ELF1, UBTF, AIRE, CCNE1, CCND1, HSF1, CEBPA, and MYC (Fig. 5B). The fact that there are genes that are shared between different diseases provides a direction for identifying common mechanisms for these diseases. According to the gene-disease association, 25 diseases may share these seven hub genes with silicosis and COVID-19 (Fig. 6). Interestingly, we also found that some lung diseases also share these hub genes; these include non-small cell lung carcinoma and adenocarcinoma of the lung.
3.5 Immune Infiltration Analysis of COVID-19
According to the CIBERSORT algorithm, the immune infiltration abundance of 22 immune cells in 128 samples of GSE157103 were evaluated (Fig. 7A). In the correlation analysis of immune cells, resting CD4 + T cells showed a strong positive correlation (r = 0.54) with plasma cells, whereas resting CD4 + T cells showed a strong negative correlation (r =-0.51) with M2 macrophages (Fig. 7B). The differences in immune cell abundance between patients with COVID-19 and healthy individuals were then explored. As shown in Fig. 7C, the number of naïve B cells, regulatory T cells, activated NK cells, and monocytes was higher in the COVID-19 group than in the normal group. Moreover, the numbers of plasma cells, naïve CD4 T cells, CD4 memory activated T cells, follicular helper T cells, gamma delta T cells, resting dendritic cells, and activated dendritic cells were higher in the normal group than in the COVID-19 group.
Subsequently, we analyzed the correlation between the seven hub-shared genes and 22 immune cells (Fig. 7D-J). PRC1 had the strongest positive correlation with naïve CD4 T cells and the strongest negative correlation with activated NK cells. KIFC1, MCM6, CDKN3, and RRM2 showed the strongest positive correlation with plasma cells and the strongest negative correlation with regulatory T cells. BUB1 showed the strongest positive correlation with plasma cells and the strongest negative correlation with activated NK cells, while CCNB2 had the strongest positive correlation with plasma cells and the strongest positive correlation with monocytes.
3.6 Analysis of COVID-19 single-cell data
Based on the screening criteria, 9,115 and 16,316 cells were retained in the COVID-19 and normal groups, respectively (Fig. 8A). After performing t-SNE dimension reduction, 11 cell clusters were identified (Fig. 8B). Combined with manual and single R package annotations, six cell clusters were annotated: B cells, CD4 + T cells, megakaryocytes, monocytes, and NK cells (Fig. 8C). A heatmap of the top 10 genes expressed in each cell cluster is shown in Fig. 8D. Subsequently, we visualized the cell expression of the seven shared genes on the t-SNE map (Fig. 8E). PRC1 and MCM6 had high expression levels in the various cell subsets, except in megakaryocytes (Fig. 8F). Meanwhile, KIFC1, CCNB2, CKKN3, and RRM2 were all expressed at relatively high levels in NK cells. Lastly, we also visualized the expression levels of the seven genes in the COVID-19 and normal groups using the t-SNE map (Fig. 8G).
3.7 Drug prediction and molecular docking
We predicted small-molecule compounds that might be bound by the seven shared genes in the DSigDB database. As shown in Table 1, the top 10 drugs that may bind to shared genes in the order of their binding score are testosterone, calcitriol, estradiol, cyclosporin A, lucanthone, cryptolepine, dasatinib, piroxicam, troglitazone, and resveratrol. Resveratrol is a natural polyphenol with important antiviral and antioxidant effects, which is beneficial for improving the prognosis of patients with COVID-1917, 18. Therefore, we docked resveratrol with BUB1, KIFC1, and PRC1 proteins and it showed suitable binding energies (≤ -5.0 kcal/mol) (Fig. 9A-C).
Table 1
The top 10 drugs predicted by DSigDB database that may bind to shared genes
Drug | Adj.P value | Combined Score | Binding Genes |
testosterone | < 0.0001 | 2551157.00 | CCNB2; RRM2; PRC1; KIFC1; MCM6; BUB1; CDKN3 |
calcitriol | < 0.0001 | 2055055.00 | CCNB2; RRM2; PRC1; KIFC1; MCM6; BUB1; CDKN3 |
estradiol | 0.0003 | 1173564.00 | CCNB2; RRM2; PRC1; KIFC1; MCM6; BUB1; CDKN3 |
cyclosporin A | 0.0005 | 1057437.00 | CCNB2; RRM2; PRC1; KIFC1; MCM6; BUB1; CDKN3 |
lucanthone | < 0.0001 | 14473.00 | CCNB2; RRM2; PRC1; KIFC1; BUB1; CDKN3 |
cryptolepine | < 0.0001 | 11319.00 | MCM6; BUB1; CDKN3 |
dasatinib | < 0.0001 | 4755.00 | RRM2; PRC1; KIFC1; MCM6; BUB1; CDKN3 |
piroxicam | < 0.0001 | 4219.00 | CCNB2; RRM2; PRC1; MCM6; BUB1; CDKN3 |
troglitazone | < 0.0001 | 3350.00 | CCNB2; RRM2; PRC1; MCM6; BUB1; CDKN3 |
resveratrol | < 0.0001 | 1811.00 | PRC1; KIFC1; BUB1 |