1. DEGs Screening
We included GSE108000 and GSE135511 into analysis. Descriptions of the datasets were shown in Table 1. GSE108000 dataset included gene expression profiles of pathological specimens of brain motor cortex of 20 patients with multiple sclerosis (specimens with and without meningeal infiltration were collected) and motor cortex specimens from 10 patients without neurological diseases. GSE135511 dataset included gene expression profiles of 30 white matter lesions from 7 patients with chronic active lesion multiple sclerosis and 8 patients with inactive lesion, and 10 white matter samples from control group. All specimens are non-living specimens.
We downloaded series matrix from GSE108000 and GSE135511. GSE108000 contains gene expression profiles of white matter from10 control group and 30 MS specimens, and GSE135511 dataset contains gene expression data of grey matter of 10 control group and 40 MS specimens. We combined and normalized the two datasets, removed batch effect between groups and got the merge dataset. We set the threshold of significant difference of DEGs as p < 0.05 and | log2FC | > 0.5 in order to get more DEGs, and we finally get 882 DEGs between multiple sclerosis and control group, including 399 up-regulated genes and 483 down-regulated genes (Fig. 1A, Fig. 1B).
2. Construction of protein-protein interaction network
We input 882 DEGs into STRING websites, set the confidence score as 0.9 and removed isolated gene. And we obtained a PPI network containing 872 nodes and 1265 edges. We downloaded network information and used Cytohubba plug-in MCC method to get the top 10 hub genes, as shown in Table 2 Top 10 Hub Gene column. The genes were displayed in different color levels according to the critical degree of genes in the network, as shown in Figure 2F. The PPI network of 872 nodes was analyzed with MCODE plug-in, and we got 24 subnetworks. We then selected the first 5 subnetworks (as shown in Table 2) for further genes enrichment analysis.
3. Functional enrichment analysis of differential expressed genes, hub genes and subnetwork genes
GO and KEGG enrichment analysis of the first 10 hub genes was conducted with Cytoscape plug-ins ClueGo and the Cluepedia. As shown in Figure 3.1, results with the highest GO enrichment intensity suggested that the molecular functions of the 10 hub genes were closely related to the activity of MHC class II receptors, polypeptide antigens and polysaccharides binding, ER-phagosome pathway and cytokine signal transduction in the immune system. While the top 5 KEGG enrichment intensity were allograft rejection, graft-versus-host disease, type 1 diabetes, asthma, and autoimmune thyroid disease, all of which were autoimmune diseases. The enrichment results of Cluster 1 obtained from MCODE were similar to the enrichment results of the first 10 genes of Cytohubba. GO analysis suggested that the gene set was related to MHC II, MHC I, glycation, growth factor binding and amyloidosis. Cluster 2 was associated with ubiquitin mediated proteolysis pathway. Cluster 3 was related to spliceosome and mRNA surveillance pathway, and Cluster 4 was related to the production of ROS, RNS and collagen degradation by phagocytes. Cluster 5 was associated with the regulation and signal transduction of TRP channels by inflammatory mediators.
GO (BP, CC, MF) enrichment was performed on the DEGs obtained in step 1 by using "Clusterprofiler" in R package. Results of the most significant pathways were displayed, as shown in figure 3.2. The most significant GO BP showed negative immunoregulation, myeloid differentiation, antigen and positive regulation of cytokines production, control of protein catabolism, glial cells activity and response to interferon gamma pathway, etc. The cellular components are mainly related to collagen fibers, extracellular matrix, adhesion plaques, endocytic vesicles, endocytic vesicle membrane, secretory granular membrane, MHC protein complex, etc. Molecular function is related to amide binding, protein serine/threonine kinase activity, polypeptide binding, carboxylic acid binding, organic acid binding, cytokine binding, immune receptor activity, cell adhesion mediator activity and MHC class II receptor activity. KEGG enrichment analysis results are shown in Figure 3.2. Enrichment pathways can be classified into virus infection (such as the human papilloma virus, swine flu, cytomegalovirus, EB virus, etc.), bacterial infection (salmonella, staphylococcus aureus, etc.), autoimmune diseases (include inflammatory bowel disease, type 1 diabetes, autoimmune thyropathy, asthma, etc.) and others (include atherosclerosis, endogenous ligand and other related pathways).
GO and KEGG enrichment results of the first 10 hub genes obtained by Cytoscape Cytohubba plug-in, the first subgene set Cluster 1 obtained by MCODE plug-in are consistent with enrichment results of DEGs, which mainly focus on immune regulatory pathways and immune-related disease pathways. However, the enrichment results of DEGs are more comprehensive and suggest that in addition to immune response, the response of glial cells of brain tissues to antigens also plays an indispensible role in the pathological mechanisms.
In our study, we also selected Epstein-Barr virus pathway to study how the specific environment factor interacts with susceptible genes of multiple sclerosis. Pathways were visualized by using "PathView" R package. As shown in Figure 3.3, Epstein Barr virus mainly acts on B cells through B cell receptors such as MHC-II and TLR2. After entering B cells, Epstein Barr virus further affects the immune activation pathway and up-regulates genes related to antigen presentation of MHC-I, which further activates cytotoxic T cells. In addition, Epstein-Barr virus may indirectly activate the PI3K pathway and affect cell proliferation and cell cycle.
4. Characteristics of immune infiltration in brain tissues of multiple sclerosis patients
In order to explore the characteristics of immune infiltration in brain tissues of multiple sclerosis from the cellular level, we uploaded the merge dataset through the CIBERSORTX website, calculated 100 times by CIBERSORT deconvolution method and finally we obtained the proportion of 22 immune cells of different samples. A total of 70 samples with a relatively stable proportion of immune cells (including 12 control samples and 58 multiple sclerosis samples) were screened by P<0.05, as shown in Figure 4.1. According to the results of analysis, the proportion of resting mast cells, macrophage M2, neutrophils, plasma cells, T cells CD8 and macrophage M0 were relatively high in multiple sclerosis brain tissues.
The correlation heatmap of immune cells between multiple sclerosis and control was drawn as shown in 4.2A. The correlation coefficient with the highest absolute value was the correlation between follicular helper T cells and eosinophils (R=0.5), suggesting that there was a certain positive correlation between follicular helper T cells and eosinophils. Correlation coefficient of macrophage M0 and neutrophils (R=-0.41) was followed, suggesting there is a negative correlation between macrophage M0 and neutrophils. Overall, the correlations between immune cells in the specimens ere weak to moderate. Differentiation of immune cells in the MS and control samples was analyzed and shown as the violin diagram. As shown in Figure 4.2B, CD4+T cells naive were significantly reduced in the MS group compared with the healthy control group (P=0.013) and resting synaptic cells were significantly increased (P=0.003). The PCA cluster diagram showed differences in the level of immune cell infiltration between MS and healthy control samples (4.2C).
5. Construction of weighted gene co-expression network
In order to further observe the differences in gene expression in different brain regions, we also performed weighted gene co-expression analysis (WGCNA) on the combined dataset to observe the differences in gene expression in white matter lesions, normal white matter tissue, gray matter lesions and normal gray matter tissue of multiple sclerosis patients. The variance of all genes was taken and the top 25% of the variance was extracted to obtain the expression profile of 3709 genes as the input dataset of WGCNA. The optimal soft threshold β value was calculated to be 4 based on R2, slope and mean k parameters. In our study, R2=0.95, which means the connectivity degree of the scaling free network construction is optimal (as shown in Figure 5.1). The network was constructed by step-by-step method, and the minimum number of genes in the module was set as 30. Then we further merged modules of those module similarity above 0.75, and finally we obtained 11 different module module (figure 5.2). We further draw the correlation heatmap between gene modules and tissue characteristics (figure 5.3). Among the modules obtained, the ones with the most genes and the highest correlation with traits were turquoise modules (946 genes). Correlation analysis between gene modules and tissue traits showed that there were significant differences in gene expression between white matter of multiple sclerosis group and control group, and between gray matter of disease group and control group. In addition to distinguishing control group and disease group, the module with the greatest correlation with traits (turquoise module) may well distinguish control group and disease group. There was also a certain degree of differentiation between the chronic active white matter lesions and the chronic active lesion adjacent tissues in the disease group, but no effective gene module that could clearly distinguish the gray matter lesions from the normal gray matter tissue. In our study, we further compared the turquoise gene module with the hub genes of Step 2, and found that all the hub genes were in the turquoise module (as shown in Figure 5.4), suggesting that there was a strong correlation between the hub genes and chronic active white matter lesions. There was no significant difference between normal gray matter and pathological gray matter in multiple sclerosis, but there were significant differences between the two gray matter and the control group. However, in the correlation diagram between gene modules and traits, the modules with the greatest difference correlation fell in the gray module, which belonged to the unclassified gene module. In conclusion, compared with the control population without central disease, patients with multiple sclerosis generally have abnormal gene expression in both gray matter and white matter of brain tissue. The turquose module gene set obtained in our study can further distinguish chronic active lesions and chronic active adjacent tissues in patients with multiple sclerosis.
6. Correlation test between hub genes and immune cells
Spearman correlation analysis was performed between the first 5 hub genes of Cytohubba (HLA-DRA, HLA-DRB1, HLA-DRA5, HLA-DRA and HLA-DPA1) and the 22 immune cells in the samples obtained from CIBERSORT. It was found that the expression levels of 5 hub genes were all negatively correlated with the number of M0 in macrophages, and positively correlated with the number of M1 or M2 in macrophages and the number of gamma delta in T cells. In neuroinflammatory diseases, macrophages play a dual role in the process of tissue damage according to their activation state (M1 / M2). Macrophage M1 can damage neurons, while M2 macrophages are believed to contribute to the regeneration and repair of neurons, and M0 is the resting state [95]. The correlation diagrams of the analyzed gene correlations (P <0.05) was shown as scatter plots in Figure 6.