Transcriptional analysis of lung epithelial cells using WGCNA revealed the role of IRF9 and IFI6 genes in SARS-CoV-2 pathogenicity

The coronavirus disease 2019 (COVID-19) outbreak is an ongoing global health emergence, but the pathogenesis remains unclear. Here, we applied weighted gene co-expression network analysis to comprehensively characterize transcriptional changes in bronchial epithelium cells (NHBE and A549 cells) during SARS-CoV-2 infection. Our analysis identied a network highly correlated to COVID-19 pathogenicity based on MX1, IFIT1, ISG15, IFI6, DDX60, IRF9, PARP9, PGLYRP4, IL36G, SAA2 and IL-8 hub genes. The results also indicated a unique transcriptional signatures of infected cells including IFI6 and IRF9 as novel gene candidates and suggested their prospective mechanism in COVID-19 pathogenesis. The result of hub genes enrichment showed that the most correlation topic in biological process and KEGG were type I interferon signaling pathway, IL-17 signaling pathway, cytokine mediated signaling pathway, and defense response to virus categories which all play signicant roles in restricting viral infection. Also according to the drug-target network, we recognized 54 FDA-approved drug candidates for other indications could potentially use for the treatment of COVID-19 patients through regulation of six hub genes of the co-expression network. Our ndings also showed that the 19 experimentally validated miRNAs regulated the co-expression network through 5 hub genes (SLC19A3, FAM13A, PLA2G16, and HRASLS5). In conclusion, these hub genes had potential roles in the translational medicine and might become promising therapeutic targets further in vitro and in vivo experimental studies are needed to evaluate the role of above mentioned genes in COVID-19.


Introduction
The SARS-CoV-2 pandemic has affected millions of people globally since December 2019. As of 23 rd April 2020, more than 2,650,000 people are reported to be infected, with ~7% mortalities (https://coronavirus.jhu.edu/map.html). The WHO has declared this outbreak as a currently worldwide public health emergency and named its disease as the coronavirus infected disease-19 . This novel pathogen is a single-stranded RNA virus belongs to the Coronaviridae family and can infect mammals and birds [1]. Clinical research has de ned COVID-19 as an acute respiratory tract infection with diverse symptoms including fever, dry cough, fatigue, myalgia, conjunctivitis, and pneumonia. However, some patients develop severe illnesses including pneumonia, acute respiratory distress syndrome, pulmonary edema, acute kidney injury, or multiple organ failure quickly [2][3][4]. Despite the high infection and mortality rates, the host's immune response to infection with Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains poorly understood [5,6].
In multiple cases, involvement of both lungs suggests viral dissemination after the initial infection. Viral RNA has been identi ed in symptomatic patients in the upper airways, with greater viral loads in nasal swabs opposed to those collected from the throat [7]. Comparable viral loads have been observed in an asymptomatic patient, suggesting that the nasal epithelium is a signi cant portal for initial infection, and may act as a main reservoir for viral dissemination across the respiratory mucosa and an essential viral transmission mediating locus. To improve diagnostic and therapeutic approaches, it is important to recognize the cells and mechanisms that host viruses and enable viral replication as well as those which contribute to in ammation and pathogenesis of the disease.
Dysregulated host immune response and in ammatory cytokine production are believed to correlate with severity of the disease and poor prognosis in two other corona virus related infections, SARS coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus; MERS-CoV (MERS-CoV) [8,9]. Nevertheless, the underlying molecular mechanisms of the anomalous in ammatory responses under SARS-CoV-2 infection are still not clear.
Transcriptomic study of cells when infected with viruses is remarkably useful in determining the dynamics of the host immune response as well as genes regulatory networks [10,11]. In this regard, Xiong et al., conducted a transcriptome sequencing of the RNAs isolated from the bronchoalveolar lavage uid as well as peripheral blood mononuclear cells (PBMC) specimens of COVID-19 patients. Their results reveal showed host in ammatory cytokine pro les to SARS-CoV-2 infection in patients, and highlight the association between COVID-19 pathogenesis and excessive cytokine release such as chemokine (C-C motif) ligand 2 (CCL2)/ monocyte chemoattractant protein 1 (MCP-1), C-X-C motif chemokine 10 (CXCL10)/IP-10, CCL3/MIP-1A, and CCL4/MIP1B [11]. In another study, Xu et al., used Single-cell RNA sequencing (scRNA-seq) analysis to evaluate the expression and distribution of Angiotensin-converting enzyme 2 (ACE2) and Transmembrane Serine Proteases (TMPRSSs) genes in kidney cell components. They found that podocytes and proximal straight tubule cells were potential host cells targeted by SARS-CoV-2, resulting in Acute kidney injury (AKI) caused by the SARS-CoV-2-induced cytopathic effect [12].
In this study, weighted gene co-expression network analysis (WGCNA) was used to investigate the transcriptional changes in primary human lung epithelium (NHBE) and transformed lung alveolar (A549) cells cell lines after SARS-CoV-2 infection hope to shed a light in the pathogenesis of this virus.

2.1.Gene expression dataset processing
The RNA-seq gene expression dataset GSE147507 was obtained and downloaded from the Gene Expression Omnibus (GEO) database which is based on Illumina NextSeq 500 platform. In total, this dataset contains raw read count of 20 samples of human lung epithelial cells including independent biological triplicates of primary human lung epithelium (NHBE) and transformed lung alveolar (A549) cells.
The NHBE cells were mock treated or infected with SARS-CoV-2 (USA-WA1/2020) and independent biological duplicates of A549 cells were mock treated or infected with human respiratory syncytial virus (RSV) or seasonal in uenza A virus (IAV). The raw data were modi ed and standardized using R 3.6.1 affy package in Bioconductor [13,14]. Following the mapping of the probe to gene IDs, the expression coe cient of variance was calculated to obtain a list with decreasing coe cient of variance for all processed expression data and the rst 4000 genes with large coe cient of variance expression were selected.

2.3.WGCNA network Construction and identi cation of signi cant modules
Gene co-expression network of treated and control groups reconstructed using WGCNA [13]. Brie y, the matrix of the gene expression pro le was converted into the matrix of pairwise gene similarity according to the Pearson test, followed by conversion into the matrix of adjacency. According to already represented scale-free gene co-expression topological algorithm, a minimum possible of β value was considered in order to adjacency matrix met the scale-free topology criteria. For the next step, topological overlap matrix (TOM) and dissimilarity TOM (dissTOM) were created using TOM similarity and dissimilarity modules. Finally, module identi cation was performed by the dynamic tree cut, while the minimum of module size was de ned as 30. Modules with high similarity score were merged with a threshold of 0.25.

2.4.Feature vectors in WGCNA network
The Module Membership (MM) was de ned as the correlation between individual gene expression and module eigengenes and Gene signi cance (GS) was de ned as the absolute value of the correlation between genes and SARS-CoV-2 infection. Thus, the genes with high GS and MM have been identi ed as intra-modular hub genes that are highly related to the trait.

2.5.Functional enrichment
The ClueGO (version 2.2.5) Plug-in tool on Cytoscape (version 3.6.0) was used to identify and visualize the enriched gene ontology (GO), KEGG pathway and biological pathways in interesting gene modules (kappa score = 0.4).

2.7.Identi cation of candidate regulatory drugs
The well-rmly established Drug-Gene Interaction Database (DGIDB) (http://www.dgidb.org/) was used to predict functional and drug-able hub-genes with the list of commercially available or clinical trial drugs [15].

3.1.Preprocessing and DEGs analysis
We performed quantile normalization to reduce the effects of technical noises. No outliers were observed in dataset samples by sample clustering; thus, all samples were included in the analysis (Fig. S1). Based on DEG analysis between SARS-Cov-2 treated and mock-treated cell lines, a total of 62 (42 up-regulate, 20 down-regulate) and 167 (96 up-regulate, 71 down-regulate) genes were identi ed as DEGs in A549 and NHBE subsets, respectively. These DEGs were then selected for subsequent analysis. The GO and KEGG pathway analysis of DEGs are represented in gure 1.

3.2.Identi cation of WGCNA modules
Based on the variance of expression values, a total of 4000 genes were included in WGCNA. Afterward, β value equal to 12 and 24 was considered for A549 and NHBE cells, respectively and weighted gene coexpression network of treated and mocked samples reconstructed (Fig. S2). The result of dynamic tree cut gave a total of 20 co-expressed modules recognizing by hierarchical clustering dendrogram with range size of 30 (mediumpurple2) to 1313 (brown4) in A549 dataset (Fig. S3) and 24 co-expressed modules with range size of 33 (yellow4) to 841 (darkolivegreen) in NHBE dataset.

3.4.Functional annotation analysis of interesting modules
Gene Ontology (GO) and functional and KEGG pathway enrichment analysis were conducted to explore meaningful biological relevance of the interesting modules using ClueGO (Fig. 3). As shown in gure 3, functional annotations of the Turquoise and Yellowgreen modules showed similar enrichments mainly focused on immune responses against viral infection. Common pathways are as follows: regulation of viral life cycle, negative regulation of viral genome replication, type 1 interferon signaling pathway, NODlike receptor signaling pathway, defense response to viruses such as in uenza A, measles, Herpes simplex 1 virus, and Epstein-bar virus.

3.5.Hub gene identi cation and network analysis of interesting modules
Focusing on the Turquoise and Yellowgreen modules, genes with high GS and MM named as Hubgenes are important for the integrity and proper functioning of the entire network. Co-expression networks were reconstructed for each module using hubgenes with logarithm of fold changes |log 2 FC|≥2. The two hubgene sets which met these criteria (Fig. 4) were then separately imported to GeneMANIA database to visualize interactions between the genes in each co-expression module. The resulted co-expression networks are visualized as red circles in gure 5.

3.6.Identi cation of candidate regulatory miRNAs
In order to nd the potential molecular mechanisms of the hub-genes, their predicted miRNAs were analyzed by the miRWalk database. The experimentally validated miRNAs were shown in gure 5. According to the gure 5, 4 hub genes of interesting modules regulated by these miRNAs were Interleukin-1β (IL-1β), Interleukin 8 (IL-8), and Serum Amyloid A1(SAA1).

3.7.Drug-target network construction
Drug repositioning analysis was performed on the same co-expressed Hub genes to identify potential agents suitable for combating to COVID19. As illustrated in g. 5, hub genes of which are target of detected FDA-approved drugs were Interferon Induced Protein with Tetratricopeptide Repeats 1(IFIT1), ISG15 Ubiquitin Like Modi er (ISG15), CXCL2, IL-1β, SAA1, IL-8.

Discussion
The infection of SARS-CoV-2 can cause severe pneumonia as well as other complications with signi cant morbidity and mortality. Currently, there is no suitable cure or appropriate medication for this fatal lung involvement. Our knowledge about the host immune interaction with SARS-CoV-2 is restricted, making it more di cult for management and developing new therapies. Viral infection typically induces massive changes in the transcriptome of the host, resulting in aberration of host cells' metabolism and modulation of immune response, toward enhancing viral replication [16,17]. Using several samples expression data from human lung epithelial cells including independent biological triplicates of primary human lung epithelium (NHBE) and transformed lung alveolar (A549) cells, we performed transcriptome analysis to better understand the molecular basis of the COVID-19 and identify putative markers. We found that 62 and 167 genes were differential expressed (p-value<0.05, and |log 2 FC|≥2) when A549 and NHBE samples of COVID-19 cells were compared to mock treated control, respectively. As expected for GO functional and KEGG pathway enrichment analysis, these genes belongs to type I interferon signaling pathway, IL-17 signaling pathway, cytokine mediated signaling pathway, and defense response to virus categories which all play signi cant roles in restricting viral infection. We found that the expression level of IL-6 was increased several folds in both studies cells' lines. In parallel, Systemic in ammation and cytokine storms with elevated levels of CXCL family, IL-8, and IL-6 are reported among SARS patients [16,18]. Latest laboratory ndings from Wuhan patients also showed that COVID-19 mild patients had elevated levels of IL-1B, Interferon gamma (IFNγ), CXCL10 and CCL2, while in severe cases, Granulocyte-colony stimulating factor (G-CSF), CXCL10, CCL2 and CCL3 were elevated [2]. Consistent with our results, a signi cant elevated expression of a large number of cytokines was observed in SARS-CoV-2 treated samples compared to mock-treated control. These ndings show that infection with the SARS-CoV-2 virus can resulted to a cytokine storm that associated with severity of the disease. In another study, IL-6 was increased in patients with SARS disease, which is needed to regulate in ammatory response, B-cell differentiation and antibody development [19].
Use of WGCNA for co-expression analysis does not depend on particular contrasts (differential expression) and may establish associations in the study design between the co-expressed genes and important factors. Based on our analysis of data in the convenient models of bronchial epithelium cells (NHBE and A549 cells), we created gene co-expression network using WGCNA and recognized two associated Family Member 9), PGLYRP4, IL36G, SAA2 and IL-8 which had a high signi cance for immune related COVID-19 response. The important role of these genes in host response to various viruses has been described by many studies [21][22][23][24][25][26][27][28][29][30][31]. In line with this, some of these Hubgenes has been recently reported to be greatly related to COVID-19. For example, functional overexpression of MX1 was signi cantly mediates antiviral response to SARS-CoV as well as probably SARS-CoV-2 [21,32]. Elevated expression levels of IL-8 and IL-36G have been declared to be closely related to COVID-19 severity [33,34].
Transcriptional changes of well-characterized Hub-genes of Turquoise module in table 3 shows direct effectors of the IFN-stimulated antiviral response including MX1, IFIT1, ISG15, IFI6, DDX60, OAS1-3 and Signal transducer and activator of transcription 1 (STAT1) in response of SARS-Co-2.
In order to compare differences between SARS-CoV-2 and RSV molecular pathogenicity, we performed DEG analysis on A549 cell which infected with RSV. Interestingly, the pattern of gene expression between two lists revealed diminished antiviral response to SARS-CoV-2 discriminating this response from common robust interferon induction of antiviral genes which reported in other studies after viral infections [35]. However, considering obvious suppressed expression levels of IFN-1 and IFN-3 by means of SARS-CoV-2, the underling mechanism of this unexpected antiviral response remains to be ascertained. In line with this, IRF9 and IFI6, amongst these hub genes are so attracting due to unabated transcriptional response to SARS-Co-2 as compared to RSV. IRF9 is a well-known essential IFN regulatory factor acts as a part of interferon-stimulated gene factor 3 (ISGF3) complex consisting of STAT1, STAT2 and IRF9 that mediates expression of several IFN-inducible genes including MX1, OAS1-3, ISG15, IFIT1, IFI6 as well as transcription factors of IFN-1 expression, interferon regulatory factor 3 and 7 (IRF3 and IRF7) [35,36]. The pivotal role of IRF9 in antiviral responses against various common viruses including respiratory viruses has been well demonstrated [36,37].
Importantly, emerging data suggest that various viruses such as human papillomavirus (HPV), reovirus, hepitis B virus, adenovirus, human cytomegalovirus and RSV evolved mechanisms to evade from IRF9mediated antiviral response [38][39][40][41][42]. Strikingly, IRF9 overexpression ampli es the interferon response to viral infections. Interestingly, more recent study declared that initial concentration of IRF9 is a substantial determinant regulating amplitude, speed as well as strength of IFNα-mediated antiviral signaling [38].
Taking together, these ndings strengthen the idea that unique unmuted expression of IRF9 in response of SARS-Co-2 infection would responsible for observed signi cant up-regulation of at least a subset of ISGs to escape from robust control over IFN-1 and IFN-3 expression. Therefore, the early increase of IRF9 expression levels during early stages of SARS-CoV-2 infection is a crucial regulator of early, enhanced and accelerated initiation immune response despite of suppressed expression of IFN avoiding presumed route used by coronavirus to escape from immune response during early phase of infection. Thus, variations in either IRF9 basal levels or the amount of IRF9 induction rate may has a robust predictive value not only for disease severity and clinical outcomes providing a putative potential target for COVID19-related therapy, but also may be a reason for patient-to-patient differences in disease development.
Whereas, IRF9 hub gene is a pathway activator/regulator, IFI6 is known as an antiviral effector and its expression can be induced by IFN-1 [43]. Recently, IFI6 has been purposed as a gene expression signature distinguishes between viral and bacterial infections of respiratory tract [25]. Importantly, some studies reported an essential role of IFI6 overexpression in pro-survival signaling by subverting TRAIL-mediated mitochondrial loss, cytochrome C release and subsequent apoptosis [44]. In this regard, RSV has been revealed to suppressed IFN-induced up-regulation of IFI6 in a time dependent manner at early phase of infection to escape from IFN-mediated antiviral response [45].
Remarkably, recent study suggests that diffuse alveolar damage and selective viral-mediated type 2 pneumocytes apoptosis would be as the major pathological basis of COVID-19, culprit for both decreased capacity of air exchange and uid leakage result in acute respiratory distress syndrome (ARDS) and even death [46]. These ndings support a model in which IFI6 up-regulation could serve as a possible antiviral strategy to protect healthy respiratory epithelial cells from early apoptosis elicited by surrounding proapoptotic cytokines leading to enhanced IFN production. Therefore, IFI6 could serve as an appealing target for innovate new therapies in combating COVID-19.
On the other hand, it is identi ed that IFI6 up-regulation potentiates the antiviral activity of IFN-α and restricts replication of several RNA viruses including dengue, Zika, West Nile and yellow fever viruses (Dengue virus (DENV), Zika virus (ZIKV), West Nile virus (WNV) and Yellow fever virus (YFV) respectively) as shown by its ability to interfere in YFV replication indirectly via preventing viral invagination into endoplasmic reticulum (ER) [47][48][49]. However, this antiviral mechanism of IFI6 localized in ER seem to be speci c to aviviruses due to lack of inhibitory effect on neither HCV nor coronavirus replication in COS-7 cells that organized in double-membrane ER structures [48]. Moreover, IFI6 is also identi ed to improve IFNinduced antiviral response as a result of activation of STAT3 signaling [45]. Given this ndings, one can speculate that perhaps localization of IFI6 at either mitochondria or ER membrane in various cell types might be an important determinant of IFI6 activity against viruses.
As depicted in gure 5, further network-based drug repurposing is calculated and identi ed novel and potent drugs targeting SARS-CoV-2. However, some of predictions have been showed by recent literature to be bene cial for COVID-19. For instance, recent computational studies offer Cidofovir, Deferoxamine, Cytarabine and Clarithromycin as potent drugs for COVID-19 treatment [50][51][52][53]. ACE2-associated network analysis suggest a crucial role of Fluticasone propionate for COVID-19 treatment [54]. Moreover, Cyclosporine exhibited antiviral e cacy (with the half maximal inhibitory concentration (IC50) value of 5.82 µM) against SARS-CoV-2 [55]. Notably, considering immunomodulatory and anti-oxidative properties as well as bene cial roles of melatonin in reducing vessel permeability, anxiety, and ameliorating sleep quality, strongly suggest potential adjuvant use of melatonin for critical care patients infected by COVID-19 [56]. Importantly, emerging data supporting the bene cial role of Dipyridamole, a convenient antiplatelet agent, against broad spectrum of viruses mainly the positive-stranded RNA viruses as well as COVID-19 which stimulate type I interferon responses, decrease lung injury and suppressed in vitro replication of COVID-19 [46]. Moreover, corticosteroids such as Hydrocortisone, Beclomethasone and Fluticasone have been suggested to might have bene cial effects for the sever patients with pulmonary edema in combination with ventilator to prevent COVID-19 associated-ARDS development [46].
Additionally, Naproxen is a familiar non-steroid anti-in ammatory drug (NSAID) currently used as a part of triple combination therapy (Azithromycin, Naproxen and Prednisolone) that largely administered by Iranian's physicians to treat patients with COVID-19 that supposed to reduce duration of hospitalization [57].
According to gure 5, Ribavirin and PEG interferon α-2b are shown to interact with IFIT1 which its vital role in the IFN-induced host defense against RSV, IAV, HBV and vesicular stomatitis virus (VSV) has been reported by several studies [23,24]. Ribavirin is a well-known directly acting antiviral drug against hepatitis C virus (HCV) with a broad-spectrum of antiviral [58]. It has been reported that ribavirin exhibits an inhibitory effect against zoonotic paramyxoviruses leading to severe respiratory infection such as Hendra virus (HeV) and Nipah virus (NiV) [59]. Moreover, ribavirin has been reported to effectively prevents RSV reproduction [60]. Surprisingly, recent modeling and docking analysis suggests the effectiveness of ribavirin as a potent drug for COVID-19 treatment [58]. PEG interferon α-2b is another broad-acting antiviral drug used for hepatitis treatment that its inhibitory effect on coronaviruses has been demonstrated by in vitro studies [61]. Interestingly, both IFN-α and ribavirin therapy of HCV infection could lead to up-regulation of IFI6 [62]. Consistent with our results, Ribavirin and PEG interferon α-2b have been suggested in the guidelines for the Prevention, Diagnosis, and Treatment of COVID-19 issued by National Health Commission (NHC) of the People's Republic of China [61]. Taking together, Ribavirin and PEG interferon α-2b may offer a potential combination therapy for COVID-19 that emerged with promising results by synergistically targeting IFIT1 and IFI6 hub genes.

Conclusion
In summary, we used weighted gene co-expression network analysis to construct a gene co-expression network for COVID-19 and identi ed network highly correlated hub genes including MX1, IFIT1, ISG15, IFI6, DDX60, IRF9, PARP9, PGLYRP4, IL36G, SAA2 and IL-8 as well as unique transcriptional signatures of this virus including IFI6 and IRF9 as novel gene candidates and suggested their prospective mechanism in COVID-19 pathogenesis. Moving forward, these hub genes had potential roles in the translational medicine and might become promising therapeutic targets further in vitro and in vivo experimental studies are needed to evaluate the role of above-mentioned genes in COVID-19.     (see Table 4    Module-trait relationship. Each row corresponds to a module eigengene and column corresponds to SARS-CoV-2 trait. Numbers in each cell represent the corresponding correlation and p value. A) A549 dataset and B) NHBE dataset.

Figure 3
Biological processes and pathways detected within the A) Turquoise module from A549 dataset and B) Yellow-green module from NHBE dataset. Gene ontology and pathway analysis was performed using signi cant genes across all datasets. Node size represents the counts (number of genes that are part of Page 22/24 each pathway), and node color corresponds the statistical signi cance. The darker the pathway node, the more statistically signi cant it is, with a gradient from red (p-value 0.05-0.005) to black (p-value < 0.0005).

Figure 4
Similarity assessment between DEGs and hub genes of the interested module from A549 (A) and NHBE (B) cells using Venn diagram. A total of 7 and 4 hub genes which were similar in both list were chosen and then imported to GeneMANIA to construct a co-expression network.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementary20200423.docx