Prediction and validation of hub-genes related to major depressive disorder based on co-expression network analysis

Background : Major depressive disorder (MDD) is generally among the most prevalent psychiatric illnesses. Significant advances have occurred in comprehension of the MDD biology. However, it is still essential to recognize new biomarkers for potential targeted treatment of patients with MDD. Methods and Results : The present work deals with in-depth comparative computational analyses to obtain new insights, such as gene ontology and pathway enrichment analyses and weighted gene co-expression network analysis (WGCNA) through gene expression dataset. The expression of selected hub-genes was validated in MDD patients using quantitative real-time PCR (RT-qPCR). We found that MDD progression includes the turquoise module genes (p-value = 1e-18, r = 0.97). According to gene enrichment analysis, the cytokine-mediated signaling pathway mostly involves genes in this module. By selection of four candidate hub-genes ( IL6 , NRG1 , TNF, and BDNF ), RT-qPCR validation was performed. A significant NRG1 downregulation was revealed by the RT-qPCR outcomes in MDD. In MDD patients, TNF and IL6 expression were considerably higher, and no considerable differences were found in the BDNF expression. Ultimately, based on ROC analyses, IL6, NRG1 , and TNF had a higher MDD diagnostic performance. Conclusions: Therefore, our study presents information on the intricate association between MDD development and cytokine-mediated signaling thus providing new rationales to develop new therapeutic approaches.


Introduction
One of the most common psychiatric diseases worldwide is major depressive disorder (MDD), which is one of the leading causes of disability and mortality [1].MDD has the characteristics of irritability, low mood, lack of sleep, fatigue, anhedonia, anorexia, attention deficit, guilty feeling, chronic pain, and overexpression of sadness in almost all periods of life [1].Presently, DSM V criteria has been presented to diagnose MDD [2].The patients with this disorder have the mean age of 30.4 years, and there is a double prevalence for females compared to males [1].It is considered an important cause of morbidity and suicide in young adults and adolescents.Thus, it must be recognized and cured in these age groups [1].There are complex causes for MDD including multiple genetic, endocrine factors, and immune system, that are initiated by stress-based psychosocial circumstances [1,3].Indeed, similar pathophysiological changes (metabolic alterations, inflammatory state, and prothrombotic state) were found for major depression and acute stress [3].Incremented activity of the amygdala, medial prefrontal cortex, and ventral stratum against adverse emotional stimuli have been reported by neuroimaging studies [4].Furthermore, epigenetic and genetic factors are also effective in the evolution and development of MDD [1].So far, nearly 200 genes have been reported involved in MDD [5], the most important of which are APOE, 5HTTP/SLC6A4, GNB3, DRD4, HTR1A, SLC6A3, and MTHFR [6].Furthermore, there have been mutations in FKBP5 allele including the hypothalamic-pituitary-adrenal (HPA) axis dysfunction, related to the increased blood cortisol and incremented adrenocorticotropic hormone (ACTH) in plasma [7].The increased cortisol levels and failure in the inhibitory mechanism Interfere with the communication between the HPA axis and the immune system [1,8].
A great deal of effort has been expended over the decades, however, no quantitative and noninvasive test has been presented to diagnose and treat MDD [9].The reason is the presence of several problems in assessing the biological mechanisms for MDD [10].Among the effective network-based methods for this purpose is weighted gene co-expression network analysis (WGCNA), which recognizes modules or clusters of highly correlated genes [11,12].WGCNA analyzes the gene expression datasets through quantification of the associations between independent gene sets and their similar neighbors [12].This instrument is mostly appropriate to identify novel therapeutic target genes or potential biomarkers successfully within the disorders [13][14][15][16][17].
In the present work, the raw data of the MDD samples were analyzed, which were gained from an online database.Following this, 50 specimens from Iranian MDD patients were used to validate the in-silico results.Using transcriptomic data, we aim to identify potential biomarkers or therapeutic targets for MDD.

Microarray dataset and preprocessing
The GEO database (https://www.ncbi.nlm.nih.gov/geo/) was used to download the GSE76826 microarray dataset.GPL17077 Agilent-039494 SurePrint G3 Human GE v2 8x60K Microarray 039381 (Probe Name version) was the basis of GSE76826 dataset including 32 samples.The samples included 12 healthy individuals and 20 patients with MDD. Background correction was conducted utilizing the normexp technique [18] with an offset of 15.To perform between-array normalization, the quantile algorithm was used utilizing normalizeBetweenArrays() function from linear models for microarray data (limma) R package [19].Probe IDs were transformed to gene symbols.The averages for a single expression level per gene per sample were used to collapse probe signals for the same gene when multiple probes existed for genes.All gene symbols were filtered based on their variance across all samples.The subsequent analyses were performed using only the genes with variances in the top 25%.The quality was evaluated using AgiMicroRna Bioconductor package (version 2.40.0).The ggplot2 package in R (version 4.0.3) was considered to discover the similarities between each sample group, using the principal component analysis (PCA) for a dimensional reduction analysis [20].

Differentially Expressed Genes (DEGs) Identification
To compared MDD samples with normal samples differentially expressed gene analysis (DEGA) was performed through the limma package [19].The statistically significant genes were recognized using Student's t-test.Genes with a false discovery rate (adjusted P-value) less than 0.001 and log2 fold change (log2FC)| ≥ 1 were considered as DEGs.To create the volcano plot for DEGs, the Enhanced Volcano was used (version 1.8.0).

Making the co-expression network
The GSE76826 raw data were pre-processed, then WGCNA package [21] in R was used to construct a gene co-expression network using the expression profile of top 25% genes.To form a network with a scale-free topology, a soft threshold power of 7 (scale-free R2 = 0.8) was utilized.
After calculation of adjacency matrices, they were transformed into the topological overlap matrix (TOM).The gene modules (with a minimum module size of 30 genes) were detected using dynamic tree cut algorithm.Ultimately, with a threshold value of 0.25, modules with higher similarity scores were combined.

Module-trait relationships construction
Each module's expression profiles were summarized through their module eigengene (ME) as the eigenvector associated with the first principal component of the expression matrix to recognize modules significantly related to the evaluated clinical traits.To measure the individual gene associations with MDD, the gene significance (GS) values were utilized.Moreover, association between the ME and the gene expression profile was considered for each module to determine the module membership (MM).Moreover, there was a close relation between the most significant elements (central) in the modules and the trait [12] if the MM and GS were correlated highly (defined by Pearson correlation analysis).The signature genes of the modules had GS and MM values greater than 0.70.

Hub-genes detection
The hub-genes were the genes with both GS and MM ≥ 0.7 if expressed differently from the control specimens.The Venn diagram was formed utilizing VennDiagram package (verssion 1.7.3) by selecting filtering DEGs and hub-genes.

Investigating the expression of chosen genes by RT-qPCR
Peripheral blood specimens were gathered from 50 MDD patients and 50 healthy individuals to assess the interested hub-genes.The individuals were identified by neurology specialist utilizing the Diagnostic and Statistical Manual of the American Psychiatric Association (DSM-V) criteria [2].The exclusion criteria include the comorbidity of psychiatric disorders, presence of organic brain syndrome, substance abuse, any kind of general medical disease with cancer and neuropsychiatric signs.A semi-structured interview was performed to choose the control subjects among volunteers with psychiatric disorders.To extract the total mRNA of the samples, the Hybrid-R™ Blood RNA purification kit (GeneALL, Seoul, South Korea) was used based on the instructions of the manufacturer and treated with DNase I to eliminate DNA contamination.Using NanoDrop (Thermo Scientific, Wilmington, DE, United States), the quality and quantity of extracted RNA were evaluated.cDNA was synthesized via the cDNA synthesis Kit (GeneALL) considering the guidelines of the manufacturer.Then, to perform RT-qPCR, the RealQ Plus2x Master Mix (Ampliqon, Odense, Denmark) and the Applied Biosystems 7500 Real-time PCR system (Thermo Fisher Scientific, Inc., Waltham, MA, USA) were used.Table 1 shows the primer sequences applied in RT-qPCR.Here, GAPDH was utilized as an endogenous control for estimating the expression levels for standardizing the relative expression.Then, the data were analyzed using the 2 -ΔΔCT approach.
This study was conducted in duplicate for each sample.

SPSS (version 27), GraphPad Prism (version 8), and R were used to analyze RT-qPCR results,
Receiver Operating Characteristic (ROC) curves, and correlations.Both the specificity and sensitivity of the considered hub-genes expression were evaluated using ROC curve analysis for diagnosis of MDD.The Kolmogorov-Smirnov normality test was then utilized for measuring the variables normality between the two examined groups.A p-value > 0.05 was considered statistically significant for the discrepancy between normally distributed variables, using a Student's t-test.

Preprocessing the data and identifying DEGs
Before DEGA, background correction, gene filtering, and normalization were all conducted.Using the AgiMicroRna Bioconductor package, the quality was monitored.Data distribution after normalization was assessed by using the box plots from the gene expression dataset (Fig. 1).
Fig. 1 GSE76826 boxplot.Gene expression is represented on the vertical axis, while samples are represented on the horizontal axis.
The similar expression level medians in distinct arrays in the box plots indicated the proper correction.Using a PCA plot, the spatial dispersion of specimens was illustrated (Fig. 2).To determine the similarities between specimens, PCA represents the specific structure of the assessed data.
Fig. 2 PCA for GSE76826 dataset.On PC1, all samples are separated by group (MDD or control).
Totally, 194 genes were recognized as DEGs in the MDD specimens utilizing |log2FC| ≥ 1 and thresholds of adjusted p-value < 0.001, including 74 down-regulated and 120 upregulated genes.These genes are shown in volcano plot in Fig. 3.

Results of gene co-expression network analysis
WGCNA included the top 25% of genes in terms of their differences in expression values.The Pearson correlation intensity between each gene-pair was measured by increasing the matrix to a soft threshold and the WGCNA picksoftthreshold function was used to generate an adjacency matrix.One outlier was identified in the clustering of network structure specimens based on the cutreestatic feature.Then, the analysis included 31 samples (Fig. 4a and 4b).With a soft threshold power = 7 (Fig. 4c), 7 modules were obtained (Fig. 4d).A close correlation was found between the turquoise module (r = 0.97, p-value = 1e-18) and clinical properties as a candidate main module (Table 2 and Fig. 5).

Enrichment analysis and hub-genes detection
Turquoise module genes were involved frequently in cytokine-related signaling (Fig. 6).When an asterisk appears next to a P-value, it indicates that the term also has a significant adjusted P-value Considering the association between the features (GS and MM) of the chosen module (Fig. 7a), hub-genes were discovered with strong relation with MDD pathogenesis.Then, the genes with the maximum GS and MM scores in the module were compared with the DEG list and the genes related to cytokines.The ultimate hub-genes were the common genes (Fig. 7b andTable 3).Ultimately, four following genes with the maximum GS and MM scores were chosen for the next step: IL6, NRG1, BDNF, and TNF.

Analyzing the expression of the chosen genes by RT-qPCR
Table 4 shows the demographic data of patients and healthy subjects.RT-qPCR was utilized to assess the expression level of the chosen genes.Fig. 8 displays the relative expression levels of chosen genes in MDD patients and controls.All samples were normalized by GAPDH expression levels.Using the t-test and 2 ^-ddct formula, we evaluated the transcripts' relative expression.
The NRG1 expression was lower significantly in the blood samples of the patient compared to the control blood samples (adjusted p-value < 0.0001, fold change = 3.50).Furthermore, the expression of TNF and IL6 was higher significantly in the MDD patients' blood samples than in the controls (adjusted p-value < 0.0001, fold change = 2.42, adjusted p-value < 0.0001, fold change = 2.05, respectively).There were no significant alterations in BDNF expression among normal samples and MDD patients (adjusted p-value = 0.634).There were no expression difference between the gender subgroups (Table 5).Likewise, the association between all gene pairs' expression levels was determined utilizing the Pearson correlation test (Fig. 9).Moreover, the sensitivity and specificity of the expression of NRG1, IL6, and TNF as biomarkers were obtained using ROC analysis.For all three genes, significant results were found (Fig. 10a-c and Table 6).All three genes can be used as diagnostic biomarkers in MDD, specifically NRG1 (sensitivity = 86% and specificity = 72%), and IL6 (sensitivity = 84% and specificity = 70%), considering the AUC values (area under the curve).Moreover, through ROC analysis, NRG1, TNF, and IL6 expression levels were combined.For discrimination between MDD patients and controls, AUC was 0.96 (Fig. 10d and Table 6).

Discussion
It was highly indicated that a bidirectional pathway includes MDD and a dysregulation of the inflammatory process ("cytokine theory of MDD") [23].It was also found that inflammatory mediators influence several factors (glutamate neurotransmission and monoamine, hippocampal neurogenesis, and GR resistance) that are important in the MDD etiopathogenesis.Inflammatory markers are suggested as a marker to diagnose and treat the depression response [24].In the present work, we evaluated the expression levels of four genes included in cytokine-mediated signaling, their correlation, and their potential diagnostic value for MDD patients.
Cytokines belong to the interleukin (IL) group that are formed by natural killer cells, macrophages, and T lymphocytes [25].Cytokines are categorized as anti-inflammatory or proinflammatory.Tumor necrosis factor α (TNFα), IL-6, and IL-1 are known as pro-inflammatory cytokines, while IL-4, IL-13, and IL-10 are anti-inflammatory cytokines [25].A higher expression of TNF and IL6 genes was found in MDD patients.Cytokines contribute to the normal brain function maintenance through supporting neuronal integrity, synaptic remodeling, and neurogenesis, along with their importance in brain development [26].They also create behavioral responses through effects on neurotransmitter systems [27].The central nervous system (CNS) was reported to enter by the cytokines released by adipose tissue and peripheral immune cells through some areas of the blood-brain barrier or by the active reuptake mechanism thus affecting behaviors [10,28].Inflammatory variables are suggested to result in depression by 1) influencing neurotransmitters, 2) reducing serotonin and induction of glutamate toxicity by stimulation of 2,3 dioxygenase (IDO) in glial cells, 3) suppressing neurogenesis by reduction of Brain-derived neurotrophic factor (BDNF) activity, or 4) increment in HPA axis activity [25,28,29].Pro-inflammatory cytokines stimulated IDO in glial cells.By IDO, the tryptophan is transformed to kynurenine, which is transformed then to neurotoxic quinolinic acid in the brain.
Quinolinic acid is attached to N-methyl D-aspartate (NMDA) receptors.Therefore, depression may be developed by reducing the serotonin levels owing to the IDO-inducing pro-inflammatory cytokines.This is caused by the decreased tryptophan and the glutamatergic neurotoxicity [29].In a meta-analysis, the significance of IDO in linking cytokines to depression is confirmed by the low plasma tryptophan levels in MDD [30].Stress and consequent inflammatory cytokine activation affect neuroplasticity and neurogenesis adversely [31,32].It was reported that hippocampal proliferation is suppressed by IL-1 [33] and a direct suppressive effect is exerted by cytokines such as IL-6 and TNFα on hippocampal neurogenesis [34,35].Moreover, the HPA axis is stimulated by the cytokines incremented levels [36].Cytokines on the HPA axis may be affected directly by stimulating the corticotropin releasing hormone (CRH) [37] or changing the GR expression for induction of GR resistance [38].Consistent with our findings, it was reported that in MDD patients serum levels of TNFα, IL-6, and IL-1 increased [39][40][41].It was highly indicated that this increment returns to normal with antidepressants [42,43].According to a meta-analysis, the selective serotonin reuptake inhibitors (SSRIs) particularly decrease TNFα and IL-6 levels [44].
Electroconvulsive therapy reduces plasma TNFα levels [45].Some studies presented an association between the elevated levels of IL-6, IL-1, and TNFα before treatment with non-response to antidepressant treatment [41,46].
BDNF can be also reduced by inflammatory cytokines interfering with TrkB receptor signaling, which may influence adversely neuroplasticity and neurogenesis [28].Neural plasticity, survival, and migration are regulated by BDNF in the peripheral and central nervous system [47].
In addition to neurons, peripheral cells also release it like endothelial cells, leukocytes, and platelets passing through the blood-brain barrier [48].According to previous studies, in depressed patients, serum and plasma BDNF levels are reduced [48][49][50].It should be noted that the outcomes of our in-silico analysis are in line with former studies, revealing the possible reduction in BDNF expression level in MDD samples.Though, no dramatic change was indicated by our wet-lab results in BDNF expression levels.The reason may be because of the differences in MDD patients' average age between our samples and microarray dataset.BDNF levels are lower in geriatric depression, regarding the subclasses of depression [51].
Moreover, a lower NRG1 expression was found in MDD patients.NRG1 belongs to epidermal growth factor-like proteins [52] that are distributed largely in the frontal cortex, midbrain and cerebellum.They interact with the ErbB family in synaptic plasticity and neurodevelopment [53].Reduced synaptic plasticity and increased inhibitory connections are caused by NRG1 shortage within the cortical projection neurons [54,55].Similarly, chronic stress reduces the spine density in the cortex [56].The medial prefrontal cortex (mPFC) is especially among the critical brain areas included in regulation of the pathological reactivity to stress [57], neuroendocrine response, variations in affective behavior, and cognition that are affected negatively by chronic stress [58].Though, the exact molecular mechanism of chronic stress affecting the neural representation of behavior in the mPFC is not clear.Several studies have been performed on NRG1 in stroke [59][60][61][62][63], cardiovascular diseases [64,65], cerebral malaria [66], and cancer [67,68].It was revealed that tissue damage is weakened by NRG1 in acute brain injuries supporting cardiac functions [61][62][63][64][65][66][67].Furthermore, inflammatory pathways are altered or inactivated by exogenous NRG1 treatments related to the tissue damage during ischemic episodes [61][62][63].It was recently reported that cytokines and chemokines expression and secretion are regulated by NRG1 signaling [69].Besides, Nedd4l-mediated downregulation of NRG1 plays a vital role in mice's depression-like phenotypes in chronic social defeat stress [70], in line with our outcomes.Thus, the involvement of NRG1 in MDD pathogenicity must be extensively studied.
Medical studies have objectively determined a biomarker as "a property and also an indicator for a normal biological procedure, a pathological procedure, or an individual's response to a therapeutic intervention" [71].There are two classes of biomarkers including diagnostic biomarker which is effective to distinguish the absence or presence of a disorder as well as the treatment biomarker for prediction of treatment response.Clinically, it was reported that high specificity (>80%) and sensitivity are essential for a biomarker in the classification and diagnosis of a disorder [72].Furthermore, to use a biomarker clinically, it needs to be non-invasive, reliable, reproducible, and inexpensive.It was found that IL6, NRG1, and TNF might be helpful in differentiation of MDD from healthy people as diagnostic biomarkers.Though, numerous controls and MDD samples are needed to validate these data.

Conclusion
It is concluded that there misregulation of the peripheral blood expression of NRG1, IL6, and TNF are possibly related to an increased MDD risk.Besides, combining the expression levels of these three genes may present a diagnostic biomarker for MDD.However, our findings need to be validated in larger samples of patients.Although our preliminary findings require more investigations, the present study further proved the cytokine-mediated signaling putative role in pathophysiology of MDD.

Fig. 4
Fig.4 Outlier detection using sample clustering.(a) Preprocessing of data and (b) clustering of WGCNA Modules.(c) Soft-thresholding power selection.(d) Cluster dendrogram and module assignment of WGCNA.

Fig. 5
Fig.5 Module-Trait association analysis.The row represents an Eigen gene, and the column represents MDD status.A correlation and p-value are represented in each cell.

Fig. 6
Fig.6 Turquoise module's enrichment analysis.Here are the top 10 enriched pathways, biological process, molecular function, and their corresponding P-values.Colored bars represent significant P-values (< 0.05).

Fig. 7
Fig.7 Hub-genes detection.(a) GS MM turquoise module features were significantly associated with MDD status.Each point represents a gene within each module, which were plotted by GS on the y-axis and by MM on the x-axis.(b) In order to identify which genes in DEGs, cytokine-related genes, and turquoise modules are similar, a Venn diagram analysis was performed.

Fig. 9
Fig.9 Correlation analysis.The diagonal shows the distribution of variables.In addition to the correlation coefficients, the significance level is shown as a star.*, **, and *** show significant correlations at P < 0.05, P < 0.01, and P < 0.001, respectively.

Table 1
Sequences of primers used in RT-qPCR reactions.

Table 2
Characteristics of module colors.

Table 3
Common hub-genes characteristics.

Table 5 .
Relative expression of selected hub-genes.