Bioinformatics analysis of expression pro � ling by high throughput sequencing for identi � cation of potential key genes among SARS-CoV-2 / COVID 19


 Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is pandemic recently emerged and is rapidly spreading in humans. However, the precise molecular mechanisms of the advancement and progression of SARS-CoV-2 infection remain unclear. The current investigation attempted to identify and functionally analyze the differentially expressed genes (DEGs) between SARS-CoV-2 infection and mock by using comprehensive bioinformatics analyses. The GSE148729 expression profiling by high throughput sequencing was downloaded from the Gene Expression Omnibus (GEO) and analyzed using the limma package in R software to identify DEGs. Pathway and gene ontology (GO) enrichment analysis of the up and down regulated genes were performed in ToppGene. The HIPPIE database was used to evaluate the interactions of up and down regulated genes and to construct a protein-protein interaction (PPI) network using Cytoscape software. Hub genes were selected using the Network Analyzer plugin. Subsequently, extensive target prediction and network analyses methods were used to assess, target gene - miRNA regulatory network and target gene - TF regulatory network. Receiver operating characteristic (ROC) analysis was utilized for validation. A total of 928 DEGs (461 up regulated genes and 467 down regulated genes) were identified between SARS-CoV-2 infection and mock samples. The Pathway enrichment analysis results showed that these up and down regulated genes were significantly enriched in cytokine-cytokine receptor interaction, and ascorbate and aldarate metabolism. Several significant GO terms, including the response to biotic stimulus and oxoacid metabolic process, were identified as being closely associated with these up and down regulated genes. The top hub genes and target genes were screened and included JUN, FBXO6, PCLAF, CFTR, TXNIP, PMAIP1, BRI3BP, FAHD1, PROX1, CXCL11, SERHL2 and CFI. ROC curve analysis showed that messenger RNA levels of these ten genes (DDX58, IFITM2, IRF1, PML, SAMHD1, ACSS1, CYP2U1, DDC, PNMT and UGT2A3) exhibited better diagnostic efficiency for SARS-CoV-2 infection and mock. The current investigation identified a series of key genes and pathways that may be involved in the progression of SARS-CoV-2 infection, providing a new understanding of the underlying molecular mechanisms of SARS-CoV-2 infection.


Introduction
Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is key causes of respiratory disease in pandemic form [1] and named as COVID - 19. In 2020, more than 406,207 people die of SARS-CoV-2 complications in the entire world. Restriction of the spread of SARS-CoV-2 virus has been di cult, but transmissions of this virus develop to require close contact through large-particle aerosols [2]. SARS-CoV-2 infections in humans cause a severe pneumonia that is disastrous in about 11% of infected individuals [3].
Intensive investigation has been achieved on the virulence and evolution of the SARS-CoV-2 virus [4]. However, very little is known about the in uence of speci c genes or biomarkers in humans that add to the susceptibility to SARS-CoV-2 infections. Consequently, it needs more effort to clarify the molecular mechanism underlying SARS-CoV-2 infections advancement and progression, holding promise for nding potential drug targets and diagnostic biomarkers of SARS-CoV-2 infections.

Protein-protein interaction network construction and module analysis
The online Human Integrated Protein-Protein Interaction rEference (HIPPIE) (http://cbdm.unimainz.de/hippie/) [19] database was used to identify potential interaction among the up and down regulated genes and associated with various PPI data bases such as IntAct (https://www.ebi.ac.uk/intact/) [20], BioGRID (https://thebiogrid.org/) [21], HPRD (http://www.hprd.org/) [22], MINT  Module analysis was applied to analyze more connected gene groups. In addition, the module analysis results were further analyzed for pathway and GO enrichment analysis.
Construction of target genes -TF regulatory network NetworkAnalyst database (https://www.networkanalyst.ca/) [44] is a bioinformatics platform for predicting target genes (up and down regulated genes) and TF pairs. In the current investigation, the targets genes were predicted using JASPAR (http://jaspar.genereg.net/) [45]. The screening criterion was that the TF target exists in the TF database concurrently. The target genes -TF regulatory network was depicted and visualized using Cytoscape software.

ROC analysis
Receiver operating characteristic (ROC) curve analysis was implemented to calculate the sensitivity and speci city of the up and down regulated genes for SARS-CoV-2 infections diagnosis using the R "pROC" package [46]. An area under the curve (AUC) value was determined and used to label the ROC effect.

Results
Data pre-processing and identi cation of DEG Expression pro ling by high throughput sequencing dataset GSE148729 was downloaded from GEO database. After quality detection of microarray raw data, microarray based on GPL18573 platform, including 14 SARS-CoV-2 infections samples and 8 mock samples. Limma in R package was used for preprocessing and gene differential expression analysis of microarray data ( Fig. 1A and Fig. 1B). Total of 928 DEGs (fold change >1, Fig. 2) consisting 461 signi cantly up regulated genes and 467 signi cantly down regulated genes, for the subsequent bioinformatics analysis. The hierarchical clustering heat map of up and down regulated genes shows the differences between SARS-CoV-2 infections and mock ( Fig. 3 and Fig. 4).

Pathway enrichment analysis of DEGs
To analyze the biological classi cation of up and down regulated genes, functional and pathway enrichment analyses were performed using ToppGene. Pathway analysis revealed that the up regulated genes were mainly enriched in superpathway of steroid hormone biosynthesis, aspirin triggered resolvin E biosynthesis, cytokine-cytokine receptor interaction, TNF signaling pathway, calcineurin-regulated NFATdependent transcription in lymphocytes, ATF-2 transcription factor network, cytokine signaling in immune system, interferon signaling, steroid hormone metabolism, androgen and estrogen metabolism, genes encoding secreted soluble factors, cytokines and in ammatory response, toll receptor signaling pathway, interleukin signaling pathway, nuclear factor kappa B signaling, c-Jun N-terminal kinases MAPK signaling, 17-beta hydroxysteroid dehydrogenase III de ciency and adrenal hyperplasia type 3 or congenital adrenal hyperplasia due to 21-hydroxylase de ciency are listed in Table 2, while down regulated genes were mainly enriched in catecholamine biosynthesis, noradrenaline and adrenaline degradation, ascorbate and aldarate metabolism, steroid hormone biosynthesis, signaling mediated by p38-gamma and p38-delta, aurora A signaling, biological oxidations, the citric acid (TCA) cycle and module 70 (nodes 20 and edges 41) revealed that the four signi cant modules were mainly enriched in cytokine-cytokine receptor interaction, in uenza A, Jak-STAT signaling pathway, in ammation mediated by chemokine and cytokine signaling pathway, response to cytokine, response to biotic stimulus, response to virus and response to endogenous stimulus. Similarly, PEWCC1 was used to identi ed the total 1405 modules in the PPI network of down regulated genes and the four signi cant modules were selected (Fig. 10). Pathway and GO enrichment analysis, module 14 (nodes 111 and edges 135), module 15 (nodes 107 and edges 121), module 27 (nodes 81 and edges 97) and module 45 (nodes 54 and edges 75) revealed that the four signi cant modules were mainly enriched in hedgehog signaling events mediated by Gli proteins, metabolism of proteins, metabolism of lipids and lipoproteins, the citric acid (TCA) cycle and respiratory electron transport, mitochondrion, growth, intrinsic component of plasma membrane and cellular lipid metabolic process.

Construction of target genes -miRNA regulatory network
To further understand the regulatory network between miRNA and target genes, various miRNA databases such as TarBase, miRTarBase, miRecords, miR2Disease, HMDD, PhenomiR, SM2miR, PharmacomiR, EpimiR and starBase through miRNet were constructed by Cytoscape. As shown in Fig. 11, the target genes -miRNA regulatory network for up regulated genes with 5667 edges and 2144 nodes was obtained for target genes. Different up regulated target genes regulated by miRNA are shown in Table 7, which top ve miRNAs such as hsa-mir-4511, hsa-mir-3924, hsa-mir-4478, hsa-mir-3650 and hsa-mir-4252 had been predicted to regulate target genes such as TXNIP, PMAIP1, APOL6, CHAC1 and KLF2. Pathway and GO enrichment analysis revealed that target genes in this network were mainly involved in NOD-like receptor signaling pathway, Direct p53 effectors, signaling receptor binding and response to cytokine. Similarly, as shown in Fig. 12, the target genes -miRNA regulatory network for down regulated genes with 6679 edges and 2236 nodes was obtained for target genes. Different down regulated target genes regulated by miRNA are shown in Table 7, which top ve miRNAs such ashsa-mir-4312, hsa-mir-4527, hsa-mir-4673, hsa-mir-4496 and hsa-mir-3153 had been predicted to regulate target genes such as BRI3BP, FAHD1, CPM, HOXA13 and TMBIM6. Pathway and GO enrichment analysis revealed that target genes in this network were mainly involved in mitochondrion, metabolic pathways, metabolism of proteins and ion transport.

Construction of target genes -TF regulatory network
To further understand the regulatory network between TFs and target genes, JASPAR through NetworkAnalyst were constructed by Cytoscape. As shown in Fig. 13, the target genes -TF regulatory network for up regulated genes with 2844 edges and 439 nodes was obtained for target genes. Different up regulated target genes regulated by TF are shown in Table 8, which top ve TFs such as FOXC1, GATA2, YY1, FOXL1 and NFKB1 had been predicted to regulate target genes such as PROX1, CXCL11, C4A, JAK2 and IL15RA. Pathway and GO enrichment analysis revealed that target genes in this network were mainly involved in viral genome replication, cytokine-cytokine receptor interaction, innate immune system, in uenza A and Jak-STAT signaling pathway. Similarly, as shown in Fig. 14, the target genes -TF regulatory network for down regulated genes with 3191 edges and 505 nodes was obtained for target genes. Different down regulated target genes regulated by TF are shown in Table 8, which top ve TFs such as FOXC1, GATA2, YY1, NFIC and FOXL1 had been predicted to regulate target genes such as SERHL2, CFI, TCN1, PGP and PGK1. Pathway and GO enrichment analysis revealed that target genes in this network were mainly involved in mitochondrion, peptidase activity, drug metabolic process, metabolic pathways and carbon metabolism.

ROC analysis
As these ve up and ve down regulated genes are prominently expressed in SARS-CoV-2 infection, we performed a ROC curve analysis to evaluate their sensitivity and speci city for the diagnosis of SARS-CoV-2 infection. As shown in Fig. 15 Discussion SARS-CoV-2 infection transmits worldwide and its mortality rate has increased in recent years. This research represents the rst complete study of a genes linked with SARS-CoV-2 virus infection, using bioinformatics analysis. However, the molecular mechanisms of SARS-CoV-2 infection remain poorly understood. Thus, probable biomarkers for diagnosis and treatment with high competence are crucially demanded. Microarray technology has been proved to be a needful approach to diagnose novel biomarkers in SARS-CoV-2 infection.
In the current investigation, expression pro ling by high throughput sequencing dataset was analyzed to obtain DEGs between SARS-CoV-2 infections samples and mock samples. A total of 928 DEGs were identi ed, including 461 up regulated genes and 467 down regulated genes. Genes such as SOCS3 [47] and IL1A [48] were liable for progression of in uenza virus infection, but these genes may be linked with progression of SARS-CoV-2 infection. Genes such as KLF6 [49], DUSP1 [50] and OLFM4 [51] were associated with development of respiratory syncytial virus infection, but these genes may be responsible for advancement of SARS-CoV-2 infection. IL6 was involved in progression of SARS-CoV-2 infection [52]. Genes such as MAGT1 [53], CD24 [54] and UGT1A7 [55] were involved in progression of various viral infections, but these genes may be associated with development of SARS-CoV-2 infection. AQP1 was linked with development of porcine reproductive and respiratory syndrome virus infection [56], but this gene may be responsible for progression of SARS-CoV-2 infection.
Target gene -miRNA regulatory network and target gene -TF regulatory network analysis (up and down regulated genes) can be regarded as key to the understanding of SARS-CoV-2 infection and might also lead to new therapeutic approaches. Hub genes such as PROX1 [151] and HOXA13 [152] were responsible for development of hepatitis B virus infection, but these genes may be essential for advancement of for SARS-CoV-2 infection. As given in the target gene -miRNA regulatory network and target gene -TF regulatory network for up and down regulated genes, APOL6, CHAC1, KLF2, CPM (carboxypeptidase M), CFI (complement factor I) and PGP (phosphoglycolate phosphatase) are novel biomarkers for SARS-CoV-2 infection.  Protein-protein interaction network of down regulated genes. Red nodes denotes down regulated genes Modules in PPI network. The green nodes denote the up regulated genes Figure 10 Page 28/28

Conclusion
Modules in PPI network. The red nodes denote the down regulated genes Figure 13 The network of up regulated genes and their related TFs. The green circles nodes are the up regulated genes, and purple triangle nodes are the TFs Figure 14 The network of down regulated genes and their related TFs. The green circles nodes are the down regulated genes, and blue triangle nodes are the TFs.