Network of “drug-target-SARS-CoV-2 Related Genes” Through Integrated Analysis of Pharmacology and Geo Database

Coronavirus Disease 2019 (COVID-19) respiratory disease rapidly caused a global pandemic and social and economic disruption. The combination of Traditional Chinese medicine (TCM) and Conventional Western medicine (CWM) is more effective for COVID-19 treatment. Moreover, TCM and CWM are important data source for developing new drug targets and promote strategies treat SARS-CoV-2 infections. However, many studies have analyzed the therapeutic mechanism of CWM or TCM alone for COVID-19, it is still unclear the interaction mechanism between TCM and CWM on COVID-19. This paper integrates network pharmacology and GEO database to mine and identify COVID-19 molecular therapeutic targets, providing potential targets and new ideas for COVID-19 gene therapy and new drug development. It includes: 1) using TCMSP, TTD, PubChem and CTD databases to analyze drug interactions and associated phenotypes for SARS-CoV-2, to correlate drug and disease interaction mechanisms to screen key drug targets; 2) using GEO database to correlate differential genes and drug targets to screen potential antiviral gene therapy targets, to construct regulatory network and key points of SARS-CoV-2 therapeutic drugs; 3) using computer simulation of molecular docking to screen virus-related proteins for new drugs.


Introduction
The recently discovered beta-coronavirus severe acute respiratory syndrome (SARS-CoV-2), the causative agent of Coronavirus Disease 2019 (COVID-19) respiratory disease rapidly caused a global pandemic and social and economic disruption 1,2 . As of October 23th, 2020, a total of over 41 million cases of  were reported and more than 1100 thousand lives were claimed globally (https://covid19.who.int). SARS-CoV-2 is sense single-stranded genomic RNA virus, approximately 30 kb in length, encodes four structural proteins: the spike (S), membrane (M), envelope (E) and nucleocapsid (N) proteins 3 . S protein binds to angiotensin-converting enzyme (ACE2) receptor on the cytoplasmic membrane of type 2 lung cells and intestinal epithelium. After binding, S protein was cleaved by the host membrane serine protease TMPRSS2, which promoted the virus entry 4 .These structural proteins are components of the mature virus and play a crucial role in viral structure integrity, or as in the case of the spike protein, for viral entry into the host 5,6 . The SARS-CoV-2 genome contains 14 open reading frames (ORFs), preceded by transcriptional regulatory sequences (TRSs) 7 . The structural and accessory proteins are translated from a set of nested sub-genomic (g) RNAs. Underlining how SARS-CoV-2 hijacks the host during infection absolutely could promote the development, characterization, and deployment of an effective vaccine or antibody prophylaxis or treatment against SARS-CoV-2 and could prevent morbidity and mortality and curtail its epidemic spread 8 .
Traditional Chinese medicine (TCM) formulations had been originally developed for earlier viral diseases and been used in COVID-19 treatment 9 . TCM was extensively wildly used to control the epidemic in China 10 . Shufeng Jiedu Capsule and Lianhua Qingwen capsule have good clinical effect on patients with COVID-19 11,12 . The further study of TCM may lead to the identi cation of novel antivirus compounds for the treatment of SARS-CoV-2 or other emerging fatal viral diseases 9 . The antiviral medicine mainly target on host protease or other cellular proteins, such as teicoplanin, arbidol, chloroquine and derivatives, and niclosamide; and target on viral proteins, such as niclosamide, lopinavir/ritonavir, remdesivir, favipiravir, and ribavirin 13 . Combination of TCM and conventional Western medicine (CWM) can effectively inhibit SARS-CoV-2 replication and prevent the production of proin ammatory cytokines induced by SARS-CoV-2 infection. Many studies have only analyzed the therapeutic mechanism of CWM or TCM alone for COVID-19, so this study integrated the mechanism of CWM and TCM together to explore their generality and characteristics. Based on the concept of "disease-target-compound-drug", network pharmacology systematically analyzes the mechanism of drug action and its interaction network from multiple levels and angles 14,15 . Network pharmacology combines the information of genome, proteome, drugs, diseases and other related databases with experimental veri cation data through the method of big data mining, so as to establish an inter related network of " disease-target-compound-drug " 16,17 . Therefore, this method can be used to systematically analyze the intervention and in uence of TCM and CWM on COVID-19, reveal the mechanism of action of drugs in human body, correlate the pharmacophores and interactions of various drugs, and screen the core targets. In addition, the accumulation of big data resources is conducive to the screening of molecular markers for disease prevention 18 . Biomolecular targets are the basis of accurate diagnosis and personalized drug design, which can be used to capture speci c disease signals in the early stage to formulate appropriate treatment plans 18 . Screening of molecular targets for COVID-19 therapy is of great signi cance for understanding the interaction mechanism between SARS-CoV-2 and host. Therefore, it is of great signi cance to integrate network pharmacology and GEO database to screen COVID-19 molecular therapeutic targets with clinical value. Identifying the regulatory effects of core hub genes based on big data mining can provide potential targets and new ideas for COVID-19 gene therapy and new drug development.
Materials And Methods 2.1 Active compounds and drug target genes Chemical ingredients and chemical target genes were collected from the Traditional Chinese Medicines for Systems Pharmacology Database and Analysis Platform (TCMSP) Database (http://tcmspw.com/tcmsp.php). In TCMSP database (http://tcmspw.com/tcmsp.php), the nal candidate active ingredients were screened, according to the oral bioavailability (OB) ≥30%, drug-likeness (DL) ≥0.18, drug half-life (HL)≥10. The compound target genes were imported into UniProt (https://www.uniprot.org/) that de ning the species as human, and obtained the gene names corresponding to the target genes.
The differential expression genes (DEGs) between EGFR-TKI sensitive and resistance were calculated from public dataset of Gene Expression Omnibus (GEO). GSE147507 dataset include SARS-CoV-2 infected NHEB and A549 cell lines, and the uninfected group. SARS-004 group indicated that SARS-CoV-2 infected NHBE cells at MOI 2, Cov-002 group indicated that SARS-CoV-2 infected A549 cells at MOI 0.2, and the control group were the corresponding uninfected cell lines. For assay the deregulation gene expression, "limma" package of R software was employed to test the DEGs. mRNAs with log2 fold change (|log2FC|, P < 0.01) considered to be differentially expressed mRNAs.
2.3 Identify core sub-network from compound-target network For identifying core targets from compounds-targets network, the overlap of DEGs and targets were extracted. There eight overlap targets were identi ed. We selected eight targets and their neighbors (only compounds) as core sub-network.

Construction of Gene Enrichment Analysis
To better understand the processes associated with COVID-19 and Chinese medicine treatment, we performed GO terms and KEGG pathway enrichment analyses. The GO enrichment analysis provides three structured networks of de ned terms to describe gene attributes. Enriched GO terms are classi ed according to biological process (BP), molecular function (MF), and cellular component (CC). The KEGG (http://www.genome.jp/kegg/) is a database for large-scale systematic analysis of molecular interaction networks of genes or proteins. The DAVID bioinformatics resources consist of an integrated biological knowledge base analytic tools, aimed at systematically extracting biological meaning from large gene or protein lists. We used the web-based search engine, DAVID, to determine over-represented GO terms and KEGG pathways with thresholds of an enrichment score > 2, count > 5, and P < 0.05 and analyzed COVID-19-related pathways and GO terms. Venn diagram and bubble graphs of were performed using the OmicShare tools, a free online analysis tool (http://www.omicshare.com/tools). Signi cant pathway terms of KEGG were mapped into a bubble graph. The big and higher bubbles represent those highly signi cantly enriched pathway terms.

Protein-Protein Interaction Network Analysis
Protein-protein interaction (PPI) networks included information on the biological processes and molecular functions of cells. We used the online search tool for recurring instances of neighboring genes (STRING, Version 9.1) (http://www.string-db.org) to predict the interactions. The Cytoscape software 3.7.2 (http://cytoscape.org/) was used to visualize networks. To identify crucial relationships in the PPI network, potentially overlapping modules that were densely connected were subsequently identi ed using Molecular Complex Detection (MCODE) plugin in the Cytoscape program. The MCODE plugin was used to re-analyze the clusters among the network according to the k-core = 2. A value of P < 0.01 was considered the signi cant threshold.

Network Construction of KEGG Pathway
To better analysis the holistic mechanism of Chinese herbs in endometriosis treatment, the subnetworks pathway was compiled by following the procedures: All targets of Chinese herbs in endometriosis treatment were submitted to an online tool KEGG Search Pathway (https://www.genome.jp/kegg/tool/map_pathway1.html). Based on the mechanism of endometriosis, multiple pathways were integrated and overlapped according to cross-talk targets in these maps. Based on the cross-talk of the pathways, we further constructed a targets-pathways network of Chinese medicine treatment.

Network Construction
We established a network analysis of the relevant compound targets. The components of the target networks for the herbs were constructed using the Cytoscape software 3.7.2. The nodes in each network were evaluated based on three indices: degree, node betweenness, and node closeness. Degree indicated the number of edges between a single node and other nodes in a network. Node closeness represented the inverse of the sum of the distance from one node to other nodes. The importance of a node in a network was indicated by the values of these indices, with higher values indicating greater importance.

Component-molecular target docking
From RSCB PDB database https://www.rcsb.org/ Download the 3D structure pdb format les of SARS-CoV-2 related protein, angiotensin converting enzyme II (ACE2) and angiotensin converting enzyme (ACE) remove ligands and non-protein molecules (such as water molecules) in target proteins by using discovery studio 2020 client software, and then save them as PDB su x les. From PubChem database https://pubchem.ncbi.nlm.nih.gov/ Download the SDF format le of compound 2D structure. Using pyrx software, we rst upload the protein le after water removal and hydrogenation, convert it into pdbqt format le, then upload the compound le to minimize its energy, and convert it into pdbqt format le. Finally, Vina is used for docking. The binding energy is less than 0, which indicates that the ligand and the receptor can spontaneously bind. The active components with binding energy ≤-5.0kj/mol were selected as the screening basis of antiviral particles for COVID-19 target.

Identi cation of effective compounds and target genes of COVID-19 clinical TCM
Effective compounds and target genes were identi ed through integrated bioinformatics analysis based on TCMSP, TTD, PubChem, CTD and GEO datasets (Fig.S1) (Fig.1A). PubChem analysis showed the hub compound structure ( Fig.1B and C). There are 77 kinds of effective components associated with prescriptions 1, 3, 4, 6, 7, 9 and 10 (Table S2). Each prescription has its own unique active ingredients for patients with different diseases. Prescription 1 has 23 unique active ingredients, prescription 2 has 3, prescription 3 has 3, prescription 4 has 3, prescription 6 has 2, prescription 7 has 2, prescription 8 has 8, prescription 9 has 3, prescription 10 has 43 (Fig.1A). In addition, TCM target genes were collected for GO and KEGG pathway analysis, and the result implied that target genes signi cantly involved positive regulation of transcription from RNA polymerase II promoter, nucleus and protein binding category (Fig.1D), and mainly involved in pathway in cancer, TNF signaling and Osteoclast differentiation (Fig.1E).

Screening of joint effective compounds and drug-target genes by analysis of combination of Chinese and Western Medicine
We rstly constructed Venn graphs of TCM and CWM compounds and target genes based on TCMSP and TTD/PubChem database, and found 3 hub compounds and 64 target genes ( Fig.2A and B). We further analyzed 3D structure of 3 hub compounds (Baicalein, Estrone and Quercetin) based on NCBI PubChem, and the result showed the three structures are similar, Baicalein and Quercetin have large conjugation systems, and Estrone only has conjugation of benzene rings (Fig.S2B). Hub target genes signi cantly involved in cytosol and protein binding category, and pathway in cancer and Hepatitis B, according to GO and KEGG pathway analysis ( Fig. 2C and Fig.S2A). Moreover, Cytoscape was used to construct the regulatory network of "Traditional Chinese medicine-Hub compound-Target gens" (Fig.2D), which indicated that Quercetin had more complex network systems and Estrone had less target genes. MAPK14, MAPK3, MCL1 connected Quercetin and Baicalein, OPRM1, ADRB2, SCN5A, EGF and ACE connected Estrone and Quercetin.

Identi cation of differentially expressed genes (DEGs) induced by SARS-CoV-2 infection based on GEO database: GSE147507
This project is based on GEO database to download the original data of GSE147507. The data set explored the response of NHBE cell lines and A549 cell lines to SARS-CoV-2 infection at the transcriptional level, with a total of 110 samples. The raw data was analyzed by R language and the conditions P < 0.05 and |log2-FC|>1 de ned as differential genes. The results showed that there were 1177 differentially expressed genes in A549 cells and 1407 genes in NHBE cells. The volcano plots of DEGs among Cov-set and SARS-set were shown in Fig.3A and B. Venn analysis showed that 8 upregulated genes and 113 down regulated genes were the same in the two kinds of cells ( Fig.3C and Table  1). NCBI database was used to analyze the tissue expression pro le characteristics of hub differential genes (Fig.S3). The results showed that the up-regulated differentially expressed genes induced by SARS-CoV-2 had high tissue speci city. SYNE1 was highly expressed in ovary, CDKN2C in adipose tissue, gnpda1 in kidney, RPAGD in goiter, ANKH in pancreas and PRODH in small intestine. The down-regulated genes induced by virus were highly expressed in bone marrow, lymph node, spleen, lung, cecum, gallbladder and bladder. In addition, the expression levels of pivotal genes in pancreas and liver tissues were signi cantly lower than those in other tissues. Among all DEGs, only C3, C1R, C1S, ASS1, ERRF11, KYNU, HSD11B1, SERPINA3, SAA1 and CFB were signi cantly expressed in liver tissues, while ERRF11, SERPINA3 and BCL2A1 were signi cantly expressed in pancreas.
The DEGs KEGG pathway enrichment showed DEGs signi cantly involved in pathway of Herpes simplex infection, in uenza A, Measles and TNF signaling (Fig3D). GO analysis results indicated DEGs were enrichment in type I interferon signaling pathway, defense response to virus, cytoplasm, cytosol and protein binding (Fig.3E). The above results indicated that SARS-CoV-2 infection wildly down-regulated the immune response of host.

Identi cation of key hub DEGs by integrating internet-pharmacology and GEO database
In order to further screen molecular targets for COVID-19 regulation, and provide targets for exploring the mechanism of interaction between virus and host and screening new drugs, we further conduct molecular docking between drug targeted genes (DTGs) and differential expressed genes (DEGs) induced by SARS-CoV-2 infection, and screen core pivotal genes. Venn analysis was carried out on the target genes of ten TCM prescriptions and the DEGs induced by virus infection. The results showed that the number of key genes in the target genes of SARS-004 group was more than that of Cov-002 group (Fig.4A). These joint genes were down regulated by viral infection in both Cov and SARS groups ( Fig.4B and C). Furthermore, Venn analysis was carried out on the related genes of each prescription and GEO set, and found the common associated genes ( Fig.S4A and B). ICAM1, IL1A, PTGS2, SLPI, STAT1, TOP2A were the common key genes in CoV and SARS groups. Except TOP2A, they were all down regulated by virus infection (Fig.   S4C).
Besides, based on TTD (Therapeutic Target Database), GEO and TCMSP databases, the drug targets of Chinese and Western medicine and the differential expressed genes induced by virus were analyzed, and 14 core pivotal genes were obtained. IL6, IL8, IL1B, NFKBIA, PTGS2, STAT1 and THBD were all present in both Cov and SARS groups. Except THBD, other genes were down regulated by viral infection in both groups ( Fig.4D and E). Finally, the interaction compounds of core hub genes were analyzed by CTD database, and two common compounds, folic acid and ozone, were screened (Fig.S4E). Folic acid can down regulate the expression of core hub genes, and zone can up regulate the expression of core hub genes.

Identi cation of Component-molecular target docking
The hub genes and SARS-CoV-2 related genes were respectively done component-molecular target docking with compounds. The lower the Binding energy is, the rmer the binding between compound and molecular target. This means the compound have greater ability to make a difference. The binding energy was lower than 5KJ/mol as the screening condition, and the docking results were listed ( Table 2 and  Table 3), indicating that these compounds had good binding ability to target proteins. Among the four compounds, the binding energy between folic acid and target protein were -7.2/kcal.mol -6.8/kcal.mol -6.5/kcal.mol -8.5/kcal.mol -10.0/kcal.mol and -6.8/kcal.mol, generally lower than that of other compounds. NFKBIA and PTGS2 also have low binding energy with compounds ( Table  2 and Fig. 5A). All the four compounds had lower binding energy with ACE2, among which folic acid was the most signi cant. The binding energy between the compounds and ACE2 is slightly higher than that of the compounds with ACE. Among the eight proteins of SARS-COV-2, Nucleocapsid Phosphoprotein (N) had the lowest binding energy with compounds. The results with the minimum binding energy for each protein were selected for visualization ( Table 3 and Fig. 5B).

Discussion
SARA-Cov-2 infection molecular therapeutic network model is built based on intelligent computer and big data integration bioinformatics. The project mainly relies on database analysis to analyze the interactions and related phenotypes among SARA-Cov-2 drugs, and analyze the key drug targets by correlation analysis. Infected cells and normal cells were investigated by GEO database. This research mainly relies on the published data of the database to integrate bioinformatics analysis to obtain the key drug targets and potential targets of gene therapy, explore the mechanisms of effective compounds in the treatment of COVID-19.
The category of "epidemic disease" in traditional Chinese medicine has accumulated rich experience in ghting epidemics for thousands of years. After analyzing the active compounds of ten TCM prescriptions, a common core compound, kaempferol, was found. It has been shown that kaempferol not only has some anti-in ammatory, anti-cough and expectorant effects, but also lowers blood glucose and prevents vascular complications of diabetes 19 . Coronavirus infection can induce COVID-19 patients with normal glucose tolerance to increase blood glucose, sustained hyperglycemia is not only conducive to virus replication in the body, but also reduce the body's ability to resist infection, hyperglycemia patients develop serious illness and face a higher risk of death 20,21 . This indicates that TCM can play a full therapeutic role in COVID-19. In addition, GO function was performed on the target sites of TCM, and the results showed that the target genes were enriched in the positive regulation of transcription from RNA polymerase II promoter, oxidation-reduction process, protein binding. Conducting KEGG pathway analysis revealed that target genes were mainly involved in pathway in cancer, TNF signaling pathway, and osteoclast differentiation. It was shown that these terms play a key role in the treatment of COVID-19 by TCM.
In the association analysis of TCM and CWM, three central compounds were identi ed, namely, baicalein, estrone and quercetin. Baicalein is a avonoid with antihyperglycemic activity 22 , which, like kaempferol, can be useful in patients with diabetes mellitus. Quercetin, like baicalein, is a avonoid that has the bene cial effects on in ammation and immunity 23 . Study had shown that quercetin may decreases the frequency and duration of respiratory tract infections as an effective intervention; however, more research is needed 24 . Natural estrogens are mainly estradiol, estrone and estriol. Numerous studies had shown that estrogens are associated with the development of liver cancer and work by binding to estrogen receptors 25 . What's more, it had shown that estrogen treatment silences the in ammatory reactions and decreases virus titers leading to improved survival rate in animal experiments 26 . It is evident that TCM and CWM are effective in targeting patients with comorbid diabetes, liver damage, and kidney damage. GO analysis was performed on the overlapping fraction of targets obtained from target association analysis of TCM and CWM to obtain target enrichment in cytosol and protein binding category. KEGG analysis was performed to obtain target enrichment in the pathway in cancer and Hepatitis B. The commonality between TCM and CWM in the treatment of COVID-19 was shown.
In the analysis of differential genes induced by viral infection, up-regulated genes such as gnpda1 were found to be highly expressed in the kidney, RPAGD in the thyroid, ANKH in the pancreas, PRODH in the small intestine, and down-regulated genes were highly expressed in the bone marrow, lymph nodes, spleen, lung, cecum, gallbladder, and bladder. In addition, key genes were expressed at signi cantly lower levels in pancreatic and liver tissues. It is hypothesized that these genes are key targets in the clinical symptoms of COVID-19 that cause fever, cough, chest tightness, and diarrhea and sore throat symptoms. After GO functional analysis of the differential genes, it was found that the differential genes were mostly enriched in the type I interferon signaling pathway, defense response to virus, cytoplasm, cytosol, and protein binding. KEGG analysis revealed that the differential genes were mostly enriched in herpes simplex infection, in uenza A, and TNF signaling pathways. It is evident that SARS-CoV-2 infection affects the immune response of the host. It had shown that after infection with SARS-CoV-2, uncontrolled in ammatory mediators in the body, causing a cytokine storm, will lead to an excessive immune response, consistent with the results of differential gene enrichment analysis 27 . In addition, the results of GO classi cation showed that both TCM drug target genes and virus induced genes were signi cantly involved in protein binding of molecular functions. Whether TCM can inhibit protein synthesis more effectively, thus hindering the replication and proliferation of virus in the host?
After analyzing the drug targets and differential genes separately, we obtained six core hub genes, namely IL6, CXCL8, IL1B, NFKBIA, PTGS2, STAT1, by correlating them to establish a " disease-targetcompound-drug " regulatory network. SARS-CoV-2 causes activation of different immune cells that helps to secrete proin ammatory cytokine IL6 and other in ammatory cytokines then causes cytokine storm 28 . Both IL8 and IL1B encode proteins that are cytokines and, like IL6, are more abundant in critically ill patients and statistically different (P<0.05) 27,29 . Analysis of the KEGG pathway for six core hub genes yielded four more important nodes: legionellosis, interleukin-6-mediated signaling pathway, IL-17 signaling pathway, and Leishmaniasis. Among them Legionellosis may involve pneumonia.IL-17 signaling pathway plays an important role in both acute and chronic in ammatory responses. This reveals the mechanism of herbal and Western medicine in the treatment of COVID-19.
Identi cation of potential molecular targets is essential for drug repurposing [30][31][32] . Through the construction of molecular docking model between hub host gene and hub compound, it was found that folic acid (FA) and protein ILB, PTGS2, STAT1 were stable binding, and estrogen was stable binding to IL6, IL8. Estrone has good binding properties to proteins, and exogenous estrogen is recommended as a drug for the prevention and treatment of COVID-19 26 . Folic acid may be a potential drug target. FA had a relatively lower binding energy to the protein, suggesting that it works better as a drug to treat COVID-19. This suggests that folic acid is a potential drug to treat COVID-19. FA have preventive effects of on Zika virus-associated poor pregnancy outcomes in immunocompromised mice. Mice with FA treatment showed lower viral burden and better prognostic pro les in the placenta including reduced in ammatory response, and enhanced integrity of BPB 33 .It had indeed been shown to related to pregnant women with COVID-19 34 . However, the mechanism by which folic acid has apparently protected pregnant women during the COVID-19 pandemic has not been determined.
For the precise treatment of COVID-19, our research performed molecular docking between drug compounds and viral proteins. The role of N in RNA recognition, replicating, transcribing the viral genome, and modulating the host immune response is indispensable 35 . The 3CL protease and the NSP13 helicase are crucial to viral replication 36,37 . The spike proteins help in fusing into the host and the NSP9 replicase plays a major role in viral replication 38 .The ORF3a protein can activate the NLRP3 in ammasome by promoting TRAF3-dependent ubiquitination of ASC 39 . The ORF7A protein blocks cell cycle progression at G0/G1 phase via the cyclin D3/pRb pathway 40 .The ORF8 can mediate immune suppression and evasion activities potentially 41 . These proteins play an important role in the viral infection of the host, so they are selected for molecular docking with the effective compounds obtained from the analysis. 3CL Protease and ORF3a protein, ORF8 also have low binding energy with the compounds. The binding of Phosphoprotein N to folic acid was stable, indicating that it can be targeted for molecular therapy. Together, above results showed that the binding energy between FA and N is the lowest, so it was speculated that N might be the target of folic acid in the treatment of COVID-19.

Conclusions
In general, through the analysis of the effective compounds and drug targets of COVID-19 clinical TCM and CWM, we found the pivotal genes and effective compounds, and further established the "drug-targetvirus infection" molecular network through the GEO database combined with the virus induced differential genes, and screened out the potential new drugs (FA) and new molecular targets (SARS-CoV-2 N protein and PTGS2) for COVID-19 treatment, which promote underlying the interaction mechanism between virus and host, and provide a new insight for the development of new drugs.  Tables   Table 1 Total