Bioinformatics analysis of single-cell mRNA-seq of SARS-CoV-1 and SARS-CoV-2 infection compared to MERS-CoV from Sequence Read Archive (SRA) database reveals novel targets for therapies

The global pandemic of COVID-19 caused by SARS-CoV-2 is still threatening the world. By May 13, 2020, more than 40 million people have been infected by SARS-CoV-2 and almost 300 thousand deaths were reported. The discovery and development of anti-viral drugs and vaccines are being conducted worldwide and the understanding of the molecular responses of a single cell to SARS-CoV-2 is in urgent need. The comparative analysis of gene expression in SARS-CoVs and MERS-CoV infected Calu-3 cells reveals that although the coronaviruses cause similar acute respiratory distress syndromes, the molecular responses of Calu-3 cells to SARS-CoVs infections showed a unique signature. A total of 64 correlated differentially expressed genes (DEGs) were identied in this study. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses indicated that the DEGs were signicantly involved in biological process of ‘Response to interferon-γ’, ‘Viral life cycle’, ‘Phagosome’ and ‘Epstein-Barr virus infection’. STRING analysis showed that the DEGs that were up-regulated after SARS-CoVs infections but down-regulated after MERS-CoV infections showed a strong interaction network. Molecular Complex Detection (MCODE) analysis further rened a unique network consisted of eight hub genes out of 64 DEGs, which are involved in cytokine response (CXCL8, CCL20, and CSF2), ISGylation (ISG15), macrophage activation (ITGAM), complement system (C3), and NFκB signaling pathway (TRAF1 and NFκB2). The unique network identied here will be a potential ngerprint for distinguishing SARS-CoVs infections from MERS-CoV infection. The identication of the eight hub genes will lead to the discovery of new possible therapeutic targets for ghting COVID-19.


Introduction
Since December, 2019, a novel coronavirus, known as severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2), spreads rapidly and globally. To date, the coronavirus disease 2019  caused by SARS-CoV-2 has become a world pandemic with a high mortality 1 .
The global outbreak of severe acute respiratory syndrome (SARS) happened in 2002 from Guangdong Province, China 2 . At that time, SARS-CoV-1 was not closely related to any known human or animal coronavirus or torovirus (Drosten et al., 2003;Ksiazek et al., 2003). Patients' age ranged from 1 to 100 years 2 . Fever of 38ºC or more, cough, myalgia and shortness of breath were the most frequent symptoms (Peiris et al., 2003a). SARS-CoV-1 was reported to have affected 8096 people and caused 774 deaths with the fatality ratio of 9.6% 2 . SARS-CoV-1 belongs to the coronavirus genus in the Coronaviridae family and has a positive-sense RNA genome of 29.7 kb (He et al., 2004a). Similar to other coronaviruses, the spike (S) proteins of SARS-CoV-1 are responsible for virus binding, fusion and entry, and are major inducers of neutralizing antibodies (Gallagher and Buchmeier, 2001;Ho et al., 2004). It has been demonstrated that membrane-associated angiotensin-converting enzyme 2 (ACE2), an essential regulator of heart function, is a functional receptor in target cells for SARS-CoV-1 (Dimitrov, 2003;Li et al., 2003;He et al., 2004b).
The latest local outbreak of SARS was reported in April 2004, in Beijing and Anhui Province, China. Then SARS-CoV-1 seemed to disappear in human world 2 .
Ten years after the emergence of SARS, Middle East respiratory syndrome (MERS) was rst identi ed in Saudi Arabia in 2012 3 . MERS-CoV is a novel member of the betacoronavirus and distinct from SARS-CoV-1 (Peiris et al., 2003b;van Boheemen et al., 2012). The typical presentation of MERS-CoV disease is fever, cough and shortness of breath, similar to SARS 3 . But infection with MERS-CoV can be asymptomatic 3 . At the end of January 2020, a total 2519 cases of MERS, including 866 associated deaths were reported globally. The fatality ratio is quite high at around 34.3% 3 . MERS-CoV has a large, positive-sense RNA genomes of 30.1 kb. The MERS-CoV also utilizes S proteins for interaction with and entry into target cells (Raj et al., 2013). Raj et al. identi ed that dipeptidyl peptidase 4 (DPP4) functions as a cellular receptor for MERS-CoV (Raj et al., 2013). However, DPP4, expressed on the surface of several cell types, does not share any sequence or structural similarities to ACE2 (Li, 2005;Wang et al., 2013). To date, there are still about 200 MERS cases reported every year, mainly in Saudi Arabi 3 . E cient vaccines against MERS-CoV are still under development (Haagmans et al., 2016;Jiaming et al., 2017). Now, human is facing with the new challenge of COVID-19 caused by SARS-CoV-2. Typical clinical symptoms of COVID-19 are fever, dry cough, breathing di culties, headache and pneumonia . Most COVID-19 patients with other diseases, such as diabetes mellitus, hypertension, cardiac diseases, obesity or cancer, are prone to get severe (Liang et al., 2020;Wang et al., 2020). By May 13, 2020, the number of con rmed COVID-19 cases is 4,170,424, including 287,399 deaths with the fatality ratio of 6.9% 4 . And the number is still increasing. SARS-CoV-2 is closely related to SARS-CoV-1, sharing 79.6% sequence identity . SARS-CoV-2 uses the ACE2 for entry and the serine protease TMPRSS2 for S proteins priming, same with SARS-CoV-1 (Matsuyama et al., 2010;Hoffmann et al., 2020;Walls et al., 2020). On the other side, the receptor-binding domain (RBD) in the S proteins of SARS-CoV-2 differs from SARS-CoV-1. Thus SARS-CoV-2 is supposed to have an RBD that binds with high a nity to ACE2 (Andersen et al., 2020;Wan et al., 2020). The threat of SARS-CoV-2 is much serious than previous SARS-CoV-1 and MERS-CoV. However, the origin, transmission path and infection mechanism of SARS-CoV-2 are still elusive. Effective medicines and vaccines against SARS-CoV-2 are still under investigation.
Consequently, it is urgent to understand the molecular responses of a single cell to coronaviruses infection. Here, we compared the public single-cell (Calu-3 cell) mRNA-seq data from Sequence Read Archive (SRA) database among MERS-CoV, SARS-CoV-1, and SARS-CoV-2 and identi ed 64 differentially expressed genes (DEGs). We further re ned eight potential hub proteins in responding to SARS-CoVs. The results will provide clues for understanding the molecular mechanism of SARS-CoV-2 infection and provide potential strategies for combating the ongoing COVID-19 outbreak and preventing future emergence of lethal coronaviruses.

Results
Differentially expressed genes in the infected Calu-3 cells In the MERS-CoV infected datasets, a total of 282 up-regulated genes and 1181 down-regulated genes were identi ed ( Figure 1A & Figure 2). In the SARS-CoVs infected datasets, more genes were up-regulated ( Figure 1B&C). In detail, a total of 761 up-regulated genes and 55 down-regulated genes were identi ed in SARS-CoV-1 infected cells ( Figure 2) and a total of 1229 up-regulated genes and 396 down-regulated genes were identi ed in SARS-CoV-2 infected cells ( Figure 2). The intersections from the upset-plot demonstrated that 64 DEGs were changed in all three coronaviruses infection groups compared to control groups ( Figure 2). The list of 64 DEGs was illustrated in Figure 3. Seven genes were up-regulated both in the MERS-CoV infected cells and SARS-CoVs infected cells, whereas 15 genes were down-regulated both in the MERS-CoV infected cells and SARS-CoVs infected cells ( Figure 2). In addition, 41 genes were downregulated in MERS-CoV infected cells, but up-regulated in SARS-CoVs infected cells ( Figure 2). Only one gene was up-regulated in MERS-CoV infected cells but down-regulated in SARS-CoVs infected cells ( Figure 2). Interestingly, no gene was identi ed that is up-regulated in SARS-CoV-1 but down-regulated SARS-CoV-2, and no gene was identi ed that is down-regulated in SARS-CoV-1 but up-regulated SARS-CoV-2 ( Figure 2).

GO enrichment and KEGG pathway analyses
The GO enrichment analyses suggested that the DEGs were signi cantly involved in biological process of 'Response to interferon-γ' and 'Viral life cycle' ( Figure 4A & Table S1). In addition, KEGG analysis results indicated that the DEGs were mainly enriched in the pathways of 'Phagosome' and 'Epstein-Barr virus infection' ( Figure 4B & Table S2).
PPI network and hub genes detection STRING database analysis identi ed two networks with 38 DEGs involved in the interaction. In the main PPI network, 63 nodes and 88 edges were identi ed, which included up-regulated and down-regulated DEGs that either share the same or opposite trend in MERS-CoV infected cells and SARS-CoVs infected cells ( Figure 5A). The DEGs that were involved in the main PPI network contained 26 genes that were down-regulated in MERS-CoV infected cells but up-regulated in SARS-CoVs infected cells, six genes that were down-regulated both in MERS-CoV infected and SARS-CoVs infected cells, two genes that were upregulated both in MERS-CoV infected and SARS-CoVs infected cells, and one gene that was up-regulated in MERS-CoV infected cells but down-regulated in SARS-CoVs infected cells. A total of seven genes had more than 5 nodes and were highlighted by black boxes in the PPI network, including ICAM1, HLA-B, EGR1, CXCL8, ITGAM, OASL, and ISG15 ( Figure 5A). In STRING analysis, all the co-expressed DEGs were involved in immune response, response to cytokine and stress. Molecular Complex Detection (MCODE) analysis further identi ed ten top hub genes, including CXCL8, ITGAM, C3, CCL20, CSF2, EGR1, ISG15, TRAF1, NFKB2, and IFITM3 ( Figure 5B). Eight out of ten predictive genes were also predicted by GO and KEGG analyses. However, EGR1 only predicted by KEGG analysis and IFITM3 only predicted by GO analysis.

Diseases related to the DEGs
The results returned by DAVID database showed that a total of 83 diseases were related to the DEGs with a p-value < 0.05 (Table S3). Type II Diabetes, edema, rosiglitazone, pharmacogenomic, infection, and cancer had the ratio (gene counts in each disease to 64 input DEGs) > 30% and were considered as mainly related diseases ( Figure 6).

Materials And Methods
Data collection Data objects were obtained from Sequence Read Archive (SRA) database in National Center for Biotechnology Information (NCBI) by sra-tools (SRA Handbook [Internet]. Using the SRA, 2010). The sra les, returned by the 'prefetch' function, were converted to fastq format using the 'fasterq-dump' function.
The following high-throughput sequencing data of the mRNA from human lung cancer cell line, Calu-3 cells were selected as the study target. All samples of treatment groups have been post-infected by the respective coronaviruses for 24 h.

Data process
Quality control checks on the raw sequence data were performed by FastQC 4 . The mapping of the sequence data were performed by HISAT2 (Kim et al., 2015). The expression level of mRNA was counted by featureCount (Liao et al., 2014), using human genome index GRCh38 provided by HISAT2 (Kim et al., 2019). The expression matrixes created by featureCount were then estimated by MultiQC (Ewels et al., 2016) and analyzed in R software (version R 4.0.0) using the DESeq2 R package (Love et al., 2014). The fold change (FC) of the original data was log 2 transformed and the log 2 FC shrinkage was performed by apeglm ( Genomes (KEGG) database (Kanehisa, 2000(Kanehisa, , 2019Kanehisa et al., 2019). The GO and KEGG enrichment analyses were performed by clusterpro ler R package (Yu et al., 2012). The adjusted p-value < 0.05 was considered signi cant.

Protein-protein interaction (PPI) analysis
To better understand the interactions between the proteins that correspond to the DEGs, STRING database was used to visualize the PPI network (Szklarczyk et al., 2015). Cytoscape software (Shannon, 2003) was used to re ne the PPI network by selecting the interactions with a combined score of >0.4 and the MCODE (Bader and Hogue, 2003) algorithm was used to extract the hub genes. The parameters of MCODE used in this study were as follows: the degree of cut-off = 2; cluster nding = haircut; node score cut-off = 0.2; k-core = 2; and the maximum depth = 100.

DEGs in other diseases
The relationship between the DEGs and other diseases were analyzed by searching in the DAVID (Huang et al., 2009b(Huang et al., , 2009a using database 'GAD_DISEASE', 'GAD_DISEASE_CLASS', and 'OMIM_DISEASE'. The p-value < 0.05 was considered signi cant.

Discussion
The present study identi ed 64 correlated DEGs in SARS-CoVs infected and MERS-CoV infected cells ( Figure 3). No gene was identi ed that has opposite expression trend between SARS-CoV-1 infected and SARS-CoV-2 infected cells ( Figure 2). The comparison of the differences and similarities of DEGs among

MERS-CoV infected, SARS-CoV-1 infected and SARS-CoV-2 infected cells demonstrated that the molecular responses of Calu-3 cells to SARS-CoV-2 infection are quite similar to SARS-CoV-1 infection but different from MERS-CoV infection. Genes that were down-regulated in the MERS-CoV infected cells but up-regulated in SARS-CoVs infected cells are responsible for telling the difference between MERS-CoV
and SARS-CoVs infections. CXCL8, CCL20, CSF2, ITGAM, C3, ISG15, TRAF1, and NFKB2 were identi ed as the main hub genes since all of them were predicted important in STRING, GO and KEGG analyses ( Figure  biomarker for acute respiratory distress syndromes (García-Laorden et al., 2017). The up-regulation of the CXCL8 gene expression by SARS-CoV-1 is resulted from a direct effect of the virus at the cellular level (Spiegel and Weber, 2006). On the contrary, a study using human lung tissues has found that CXCL8 transcription was up-regulated only by SARS-CoV-1, but not SARS-CoV-2 infection , whereas a study using mouse lung and brain tissues have reported that a signi cant increase of CXCL8 expression after infection with MERS-CoV (Hao et al., 2019). However, in the present study, by analyzing the single Calu-3 cell mRNA sequencing data, the up-regulation of the CXCL8 was only found in the cases of SARS-CoV-1 and SARS-CoV-2 infections but not in the case of MERS-CoV infection. In the case of CCL20, a study using human peripheral blood mononuclear cells on CCL20 has reported that an early enhancement in the expression of CCL20 after the infection of SARS-CoV-1 (Ng et al., 2004). However, the study on the expression of CCL20 after the infection of SARS-CoV-2 and MERS-CoV is still insu cient.
CSF2, which is called colony-stimulating factor 2, is a white blood cell growth factor that also functions as a cytokine (Francisco-Cruz et al., 2014). CSF2 ghts infection by rapidly growing its number. Interestingly, CSF2 is reported to exhibit a signi cantly increased serum level at the early stage during SARS-CoV-2 infection . But the up-regulation of CSF2 is not observed at the early stage of SARS-CoV-1 and MERS-CoV infections, whereas only a moderate up-regulation is observed at the late stage (Sun et al., 2020). The infection mechanism of SARS-CoV-2 may be different with SARS-CoV-1. Therefore, CSF2 is a potential unique biomarker for SARS-CoV-2.
ITGAM (also known as CR3A) is one protein subunit that forms macrophage-1 antigen (Mac-1), which is involved in the innate immune system and mediates in ammation (Solovjov et al., 2005). Studies have reported an increase of ITGAM expression in SARS-CoV-1 (Reghunathan et al., 2005;Kong et al., 2009) and SARS-CoV-2 infected cases (Wangzhou et al., 2020), but such increase has not been reported in MERS-CoV infected cases.
C3 (complement component 3), which is a pivotal complement component in the complement system and contributes to innate immunity (de Bruijn and Fey, 1985), involving coronaviruses infection  and pro-in ammation (Mastellos et al., 2019). The binding between C3 and ITGAM has been experimentally veri ed by a nity chromatography (Micklem and Sim, 1985). SARS-CoV-1 has been reported to induce the up-regulation of C3 (Danesh et al., 2011). Furthermore, because C3-de cient mice infected with SARS-CoV-1 were reported to exhibit less respiratory dysfunction (Gralinski et al., 2018), the inhibition of C3 was also considered a potential target to alleviate the in ammatory lung complications of SARS-CoV-2 infection (Mastaglio et al., 2020;Risitano et al., 2020).
ISG15 is an ubiquitin-like protein and is called interferon-stimulated gene 15. It is tightly related to speci c signaling pathways that are involved in innate immunity. The main cellular function of ISG15 is ISGylation but also has a wide range of anti-viral activity (Morales and Lenschow, 2013;Dzimianski et al., 2019). The papain-like proteases (PLpro) derived from MERS-CoV and SARS-CoV-1 have deISGylating activities and are reported to bind ISG15 for the antagonism of the innate immune response (Lindner et al., 2007;Bailey-Elkin et al., 2014;Mielech et al., 2014;Ratia et al., 2014). Additionally, a current study has demonstrated that the infection of SARS-CoV-2 will induce the expression of interferon-induced transmembrane family genes, including ISG15 (Khoury et al., 2020). But in the case of SARS-CoV-2, on evidence has been showed that PLpro can bind with ISG15.
TRAF1 (TNF receptor-associated factor 1) and NFKB2 (nuclear factor NFκB p100 subunit) are involved in the NFκB signaling pathway that is linked to in ammation and is also predicted as a target pathway for the COVID-19. The expression level of TRAF1 was reported to increase during the SARS-CoV-1 infection (Mitchell et al., 2013) and SARS-CoV-2 infection (Rian et al., 2020a). The expression response of TRAF1 to MERS-CoV infection has not been reported yet. Meanwhile, an increased expression of NFKB2 was reported in COVID-19 patients (Yan et al., 2020), SARS-CoV-2 ( Rian et al., 2020), and SARS-CoV-1 infected cells (Tang et al., 2005), but the expression level of NFKB2 were not signi cantly changed in MERS-CoV infected Calu-3 cells (Selinger et al., 2014), which is in consistent with the data analyzed in the present study ( Figure 3).

Conclusion
In summary, the eight hub genes identi ed in the present study are involved in the cytokine response (CXCL8, CCL20, and CSF2), ISGylation (ISG15), macrophage activation (ITGAM), complement system (C3), and NFκB signaling pathway (TRAF1 and NFKB2). Those genes are also highly related to other diseases, such as Type II Diabetes, edema, rosiglitazone, and cancer, which brought a further understanding of the molecular mechanism that COVID-19 patients with those diseases will be more likely to have severe symptoms and mortality. Although the expression level change of CXCL8 has been reported differently in MERS-CoV and SARS-CoV-2 infection, the other hub genes, according to the data published to date, showed the same expression level change as in the present study. The unique network of the eight hub genes will be a potential ngerprint for telling SARS-CoVs infections from MERS-CoV infection and exploit new possible therapeutic targets for ghting COVID-19.      Diseases enrichment analysis. Top 20 diseases enriched in DAVID database and ranked by -log (p-value).

Declarations
The red dotted line represented p = 0.05. The blue columns that over the red dotted line denoted signi cant diseases that are related to the DEGs. The orange dots represented the ratio of gene counts in each disease to 64 input DEGs.