Immune cell infiltration among controls, mild, and severe COVID-19 patients
The results, derived from CIBERSORT algorithm of R program language, converts the normalized gene expression data into 22 immune cell signatures. As can be seen from the distribution of immune cells (Figure 1A), the data were displayed in three groups: control, mild COVID-19 patients, and severe COVID-19 patients. Each column represents a sample, each color represents a different type of immune cell, and the sum is 1. Figure 1B presents an analysis diagram of the principal components of different groups of immune cells. We conduct dimension reduction analysis on these groups of data and conclude that there is a certain degree of clustering in these three groups, but separation and low overlap also exist, which can be used for subsequent analysis. We can observe that the ratio of lymphocytes in COVID-19 patients gradually decreased and the proportion of neutrophils slightly increased with the development of the disease. The results of the heat map of immune cell correlation showed that there were strong correlations between cells, such as navie B cells and gamma delta T cells (R=0.7), and T cells Gamma delta and Eosinophils (R=0.84) (Figure 1C).
We make a pairwise comparison of the data for these three groups. Compared with the controls, the memory B cells and the macrophages in the M0 period in the mild group decreased, while the number of monocytes increased statistically significantly (Figure 2A). Memory B cells in severe Covid-19 patients continued to decline significantly, and monocyte levels continued to rise statistically (Figure 2B). And there was also a decrease in CD8+T cells and resting CD4+ memory T cells. By comparing mild and severe cases, we more intuitively found that the number of memory B cells continues to decline, and resting NK cells and resting mast cells increase with the disease development (Figure 2C). At the same time, we used Lasso regression to screen the immune cells of significance in COVID-19 infection based on the controls and COVID-19 patient samples (Figure 2D, 2E). The intersection of these two methods is taken to obtain the final four meaningful immune cells, that is, memory B cells, macrophages in the M0 stage, resting memory CD4 T cells, and resting mast cells.
Identification of DE-IRGs
GEO2R tool was used to analyze the dataset GSE183071, while the gene expression normalized data with control, mild and severe patients collected in this dataset were comprehensively compared. Since it’s an immune-related gene chip, we finally got a total of 198 DE-IRGs according to adj.P.Val <0.05 (Additional files 2). In order to figure out the dynamic process of gene expression in different stages of COVID-19, we obtained genes by pair comparison of these three stages (screening criteria: adj.P.Val <0.05, |log2 (Fold Change)| >1). Differentially expressed genes in each group were screened and visualized with R package. We obtain the relevant volcano map (Figure 3) and a collection of three data sets comparing the differentially expressed genes in pairs (Additional file 3). Red represents up-regulated genes, blue represents down-regulated genes, and gray represents genes whose expression is not statistically significant. As shown in Fig. 3A and 3B, compared with the mild group, more immune-related differentially expressed genes were obtained in the severe group (4 in the mild group and 32 in the severe group), indicating that the immune response of the organism in the severe stage was more complex. And the gene FCER1A was significantly down-regulated in both mild and severe COVID-19 patients. Finally, we set the intersection between the differential genes obtained from the comprehensive comparison of the three groups and the genes compared in pairs. Finally, 36 DE-IRGs were obtained from the blood expression profile data of GSE183071 (Table 1).
We also made q-q diagram, umap diagram, mean-variance diagram, density diagram and box diagram to ensure the availability of GSE183071 dataset data (Figure 4).
Table 1:36 DE-IRGs from COVID-19 patients blood sample chip
ID
|
adj.P.Val
|
P.Value
|
F.Value
|
MAPK14
|
0.00000713
|
1.44E-08
|
27.9963
|
IRAK3
|
0.00000826
|
3.78E-08
|
25.84862
|
TLR2
|
0.00000826
|
7.96E-08
|
24.25522
|
FCER1A
|
0.00000826
|
8.35E-08
|
24.15655
|
ARG1
|
0.00000896
|
1.27E-07
|
23.2879
|
TLR5
|
0.00000996
|
1.81E-07
|
22.5589
|
IL18R1
|
0.00002481
|
5.51E-07
|
20.35938
|
FKBP5
|
0.00002747
|
6.66E-07
|
19.99774
|
TNFRSF17
|
0.00007726
|
2.34E-06
|
17.66412
|
GPR183
|
0.00009787
|
3.26E-06
|
17.06953
|
BCL6
|
0.00020076
|
7.59E-06
|
15.59795
|
IL1R2
|
0.00020076
|
8.46E-06
|
15.41328
|
TCF7
|
0.00020076
|
8.52E-06
|
15.40219
|
CEACAM8
|
0.00022924
|
1.20E-05
|
14.81809
|
S100A9
|
0.0002757
|
1.66E-05
|
14.2798
|
CD163
|
0.00030508
|
1.91E-05
|
14.05323
|
LTF
|
0.00032037
|
2.16E-05
|
13.85047
|
CD5
|
0.00032037
|
2.20E-05
|
13.8224
|
IFITM1
|
0.00033472
|
2.37E-05
|
13.704
|
CLEC4E
|
0.00035506
|
2.65E-05
|
13.51851
|
IL18RAP
|
0.00050176
|
4.28E-05
|
12.75556
|
CD4
|
0.00051411
|
4.57E-05
|
12.65173
|
CEACAM6
|
0.000569
|
5.33E-05
|
12.41118
|
CIITA
|
0.00077928
|
8.05E-05
|
11.77173
|
IL7R
|
0.0011739
|
1.28E-04
|
11.06578
|
CAMP
|
0.00228898
|
3.24E-04
|
9.70075
|
LEF1
|
0.002489
|
3.57E-04
|
9.55989
|
TGFBI
|
0.00461156
|
9.03E-04
|
8.25603
|
MS4A1
|
0.00510872
|
1.05E-03
|
8.04587
|
CTSG
|
0.00672334
|
1.44E-03
|
7.62101
|
CD24
|
0.00951044
|
2.17E-03
|
7.07321
|
ZBTB16
|
0.01363807
|
3.75E-03
|
6.36077
|
C1QB
|
0.01851245
|
5.80E-03
|
5.80373
|
S100A8
|
0.01909469
|
6.09E-03
|
5.74043
|
SERPING1
|
0.0207565
|
6.71E-03
|
5.61959
|
MX1
|
0.04178061
|
1.65E-02
|
4.5155
|
GO and pathway enrichment analysis
The obtained DE-IRGs were imported into Enrichr (https://maayanlab.cloud/Enrichr/) website for GO and pathway analysis. It divided the analysis results of GO into three parts. For biological processes, the cytokine-mediated signaling pathway is at the top of the list, followed by neutrophil degranulation, which corresponds with clinical data (Figure 5A). And for molecular function, cytokine receptor activity and Toll-like receptor binding are of great importance (Figure 5B). And we know the secretory granule lumen is the most important cellular component for our upload gene list in the part of cellular constituent (Figure 5C). KEGG Pathway analysis is one of the most classic ways for pathway analysis (Figure 6B), and the top 10 pathways can be obtained through the study of relevant databases, including Hematopoietic cell lineage, Inflammatory bowel disease, Cytokine-Cytokine receptor interaction, Tuberculosis, Amoebiasis, Primary immunodeficiency, Salmonella Infection, Acute myeloid leukemia, Pertussis, Neutrophil extracellular trap formation (Relevant data materials are provided in Additional file 3). And the rest of them are WikiPathways database (Figure 6A), Reactome database (Figure 6C), and BioCarta database (Figure 6D).
Protein-proteins interactions and hub gene analysis
Proteins are substantial basis of biological activity. By understanding the central proteins in the pathogenesis of the disease, we can use these proteins as potential targets to mitigate the progression of the disease. Based on the differentially expressed genes, a protein interaction map was constructed by STRING and Cytoscape. Subsequently, we import the results into R and use the “upset” package to screen out six hub genes by using six algorithms (Figure 7A). Finally, we got six immune-related hub genes, CD4, IL7R, BCL6, CAMP, S100A9, and TLR2, which could become potential biomarkers in further research (Figure 7B). To test whether these six genes have potential predictive value, we showed their expression at different stages (Figure 7C).
The Potential Diagnostic Value of Hub Genes of COVID-19
The gene expression of six hub genes at different stages and their diagnostic values as potential biomarkers were analyzed. Figure 8 demonstrates the expression analysis of hub genes. Compared with the control group, the expression of these six hub genes in the severe stage was statistically significant, in which the expressions of IL7R and CD4 were significantly down-regulated, while the expressions of other genes were significantly up-regulated. Although our data showed significant differences in the expression of all screened hub genes in the severe stage, only TLR2 and CAMP showed statistically significant differences in the mild stage. Another dataset, GSE171110, which contains blood transcriptional profiling data from severe COVID-19 patients and healthy individuals, was used to verify that the expression of these six genes was significantly different (Figure 8B).
The diagnostic values of the selected six hub genes were evaluated by the receiver operating characteristic (ROC) curve. Usually, when its area under the curve (AUC) >0.7, indicating that it was considered to have potential value as a biomarker. Firstly, we evaluated the diagnostic values of genes in the severe stage, and it could be seen that all six genes had good diagnostic values (Figure 9A). Among them, BCL6, CAMP, S100A9, and TLR2 could be used as positive markers in the severe stage, and their AUC were all >0.8. CD4 and IL7R as negative markers also have strong diagnostic significance (AUC=0.904, 0.894, respectively). However, only CAMP and TLR2 show good diagnostic value in the early mild stage, which can be used as potential positive predictors of COVID-19(AUC >0.7, figure 9B). In comparison, BCL6 and S100A9 are more likely to take function in the progress of severe covid-19. Similarly, CD4 and IL7R, as negative markers, were not significant in the early stage of the disease. In addition, we also used GSE171110 for validation. The six hub genes in GSE171110 were significantly differentially expressed, and the ROC curve indicated that they had good diagnostic values (Figure 9C).
Relationship between hub gene and immune cell
We already know six hub genes’ vital roles in COVID-19, and we have had four significant infiltrating immune cells. Spearman analysis was used to analyze the relationship between hub genes and immune cells. The results are shown in the figure below (Figure 10). It can be seen that both B cell memory and resting mast cell are strongly correlated with these six genes. For example, for TLR2, the relationship with resting mast cell is the strongest (r=0.59, p=2.70E-05), and for S100A9, is B cell memory (r=-0.56, p=1.00E-04). In addition, we used a single cell database of COVID-19, DISCO, to see the expression of each gene in the single cell sequencing of COVID-19 blood sample (Figure 11).
TFs AND miRNA of hub genes
After removing the miRNA and transcription factors with low correlation, we obtained the correlation action map of the hub gene. Hsa-miR-9, Hsa-miR-765, and the-miR-302b were all related to multiple hub genes. At the same time, several transcription factors, such as STAT3 and MYB, were found to be related to hub genes closely (Figure 12).
Hub genes and disease
It can be seen that the screened hub genes of our COVID-19 database are associated with many diseases (Figure 13), for example, IL7R is related to pneumonia and fever, and CD4 is related to AIDS-related opportunistic infections. And both IL7R and S100A9 have a connection with Liver Cirrhosis. The network diagram further suggests a link between COVID-19 and other diseases.