DOI: https://doi.org/10.21203/rs.3.rs-1766515/v1
We performed a meta-analysis that consisted of re-analyzing publicly available RNA-sequencing datasets across representative species within Orthohantavirus and across host cell types to identify differentially expressed genes associated with hantavirus infection. In addition, we determined signaling pathways that were significantly perturbed in human tissue, as well as predicted existing drugs that could be repurposed to potentially reduce the effects of acute hantavirus infection.
NCBI-GEO and ENA databases were searched to find datasets for this analysis, then processed using the Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data (ARMOR) to identify differentially expressed genes (DEGs). The results from this analysis were then processed using signaling pathway impact analysis (SPIA) to identify enriched signaling pathways. Finally, that data was processed using the Pathways2Targets program to identify existing therapeutics that target the impacted cellular pathways.
Our results include affirming the roles of the previously identified interferon pathway and the MYBL1 and CCR7 genes, as well as newly identified genes, including SLC27A3, TSEN34 and neuron associated NOG and AMIGO1. We predicted dinaciclib, alvodicib, and ruxolitinib, among many other drugs, that could be useful as potential HFRS and HCPS treatments.
These data can be used in the design of future wet lab experiments to better characterize the human transcriptional response to infection, as well as to improve host-based diagnostics and/or therapeutics to reduce the severity of hantavirus infection.
Orthohantavirus is a viral genus that includes the hantaviruses that are pathogenic in humans. Members of this viral genus are transmitted by various rodent hosts throughout Asia, Europe, the Americas, and parts of Africa [1]. Hantavirus is generally well adapted to its rodent vector and is unlikely to cause disease in that host. However, when humans come into direct contact with infected rodent vectors or their feces, virions can enter host cells and potentially cause an infection [2]. Once human infection occurs, hantaviruses are far more likely to cause severe disease, permanent injury, or even death. Human-to-human spread of hantaviruses generally does not occur, with the exception of some rare cases of Andes virus [3]. However, as metropolitan areas continue to expand, the threat of hantavirus spillover into humans is increasingly becoming a threat. This is especially true in eastern Asia, where thousands of cases of hemorrhagic fever with renal syndrome (HFRS) due to hantavirus infection are reported each year [1]. New world hantavirus disease, known as hantavirus cardiopulmonary syndrome (HCPS), is far less common, but has been reported to have up to a 50% fatality rate [4]. It was the new world hantavirus Sin Nombre virus that was responsible for the 1993 hantavirus outbreak in the Four Corners region of the western United States [5].
Like all members of the Bunyavirales family, an Orthohantavirus virion is composed of a segmented negative-sense RNA genome enclosed by a spherical capsid. Each of its three segments encodes reading frames that code for the RNA-dependent RNA polymerase (RDRP), glycoprotein, and the nucleocapsid (L, M, and S segments respectively) [6]. The glycoprotein precursor is cleaved by endogenous proteases to form a heterodimeric protein receptor [7]. The integrin β3 surface protein serves as the primary receptor for all infectious hantaviruses [8]. Past work has shown that hantaviruses do not directly cause cytopathic effects in their host; rather that the host immune response to infection causes the breakdown of endothelial integrity, leading to severe symptoms or death. To better understand the pathogenesis of hantavirus, it is essential to characterize the underlying host intracellular transcriptional responses. Currently, there are no FDA-approved treatments for hantavirus infection besides supportive therapy, but effective vaccines are being investigated for several strains of hantavirus [9]. However, vaccination may be difficult to administer due to the sporadic and relatively limited nature of many hantavirus outbreaks.
Hantavirus infections are characterized by the loss of endothelial barrier integrity leading to severe symptoms and death, as well as high levels of T-cell proliferation in affected tissue [10]. Interestingly, several prior experiments have shown that hantavirus infection causes no cytopathic effects in vitro [7]. Instead, the highly elevated levels of cytokines along with T-cell levels in infected tissue indicate that host inflammatory responses are primarily responsible for the observed clinical signs and symptoms.
While several studies have examined the proteomics and health outcomes of infected individuals, relatively little has been done to transcriptionally characterize human infection. Furthermore, no studies have compared the in vitro and in vivo host transcriptional response during hantavirus infection. Evidence of differentially expressed genes (DEGs) across tissues would provide evidence that further research is required to understand how these DEGs affect differences in morbidity in human tissues. This study seeks to further examine how infection impacts host gene expression and intracellular signaling pathways by re-analyzing existing human RNA-seq data collected during hantavirus infection. We anticipate that this approach will enable us to better compare the DEGs across tissue types to predict more effective host-based therapeutics.
A search of the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/geo/), was performed in mid-2021 to find RNA-sequencing datasets for hantaviruses using terms including “hantavirus”, “Seoul virus”, “andes virus”, “sin nombre virus” and “Hantaan virus”. In addition, these same terms were used to find relevant datasets within the European Nucleotide Archive (ENA) database. The corresponding RNA-sequencing data for five GEO series were retrieved from the Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) at NCBI: GSE158712, GSE161354, GSE133319, GSE133634, GSE133751[11–13]. Sequencing data with the experiment accession number PRJEB41624 was also downloaded from ENA and used in this analysis[14]. These datasets were generated from either human cell cultures, or patients who were infected with either Hantaan Virus (HTNV) or Puumala virus (PUUV). The SRA files were downloaded and converted to fastq format using GNU wget and version 2.10.1 of the NCBI sratools software package (https://github.com/ncbi/sra-tools).
Much of the data processing was managed by the Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data (ARMOR) [15]. Within this workflow the data was processed as follows: quality control of the reads was performed using fastQC, adapters and low-quality reads were trimmed using TrimGalore, reads were mapped and quantified to the GRCh38 human transcriptome using Salmon [16], and differential gene expression was calculated using edgeR [17]. The list of DEGs from this workflow was subsequently analyzed with the Signaling Pathway Impact Analysis (SPIA) algorithm [18], which uses pathway enrichment to identify cellular pathways using the Biocarta, KEGG, NCI, Panther, and Reactome databases [18–22]. The SPIA output file from this analysis was then analyzed with the Pathways2Targets program to identify existing therapeutics that target the impacted cellular pathways [23] from the opentargets.org database [24]. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was used to identify and visualize relevant protein-protein interactions [25].
We began by collecting the relevant fastq files for human-derived samples infected with Orthohantavirus strains. In total, we included 73 samples from six datasets in this meta-analysis, including 23 controls, and included all pertinent data and metadata that could be extracted from the NCBI-GEO and ENA databases [11–14]. Fastq files were divided into in vitro and in vivo groups for the analysis according to sample type and each group was separately input to ARMOR. The in vivo dataset consists of sequencing data from peripheral blood mononuclear cells (PBMC) of HFRS (Hantaan virus infected) patients, with relevant control fastq files. The in vitro dataset consisted of A549 and HUVEC cells infected with either HTNV or Puumala virus, with relevant control fastq files.
We next used the automated ARMOR bioinformatics workflow to calculate genes with significantly altered expression from the raw fastq files. We observed approximately 5900 significant DEGs in the in vivo dataset (Fig. 1), which consisted of human PBMC samples (corrected p-value ≤ 0.05). This relatively high number of DEGs was expected since the infected material was being affected by the host immune response. Interestingly, four of the most significant 50 genes have been previously identified in earlier hantavirus studies including: OLFM1, CCR7, KLRG1, and PTGDR2–all of which were downregulated [26–29]. We observed the most significant upregulated genes found were NUSAP1, KIF20A, CDC25C, and DEPDC1.
We then wanted to perform a similar analysis on the in vitro dataset. In this case, our analysis identified only 26 significant DEGs during infection including: IFIT1, OAS2, CMPK2, IFIT3, IFI6 (Additional file 1). We found that three genes were downregulated, while the remaining 23 significant genes were upregulated (Fig. 2).
To better understand whether these gene products directly interacted with each other, we used the STRING database to retrieve protein-protein interaction data [25]. We found that this protein network revealed most of the proteins directly interact with each other, with two proteins not connected to the rest of the graph and a third protein not represented in the database (Fig. 3).
After identifying the significant DEGs, we used them as input to the signaling pathway impact analysis (SPIA) algorithm, which performs robust statistical analyses to identify cellular pathways that were significantly enriched in DEGs [18]. We identified 79 enriched pathways in the in vivo dataset, largely associated with transcription, cellular repair, homeostasis, and the cell cycle, with a few associated with immune response (Table 2). In contrast, the in vitro dataset enrichment identified 11 pathways, all of which are associated with the antiviral response (Table 3).
Pathway Name |
Status |
Corrected p-value |
Source Database |
---|---|---|---|
Cytokine-cytokine receptor interaction |
Inhibited |
4.96E-12 |
KEGG |
Systemic lupus erythematosus |
Activated |
0.000115 |
KEGG |
Cell cycle |
Activated |
0.000276 |
KEGG |
Carbohydrate digestion and absorption |
Activated |
0.000529 |
KEGG |
Staphylococcus aureus infection |
Activated |
0.00062 |
KEGG |
Alcoholism |
Activated |
0.006032 |
KEGG |
Generic Transcription Pathway |
Activated |
7.52E-12 |
Reactome |
Activated PKN1 stimulates transcription of AR (androgen receptor) regulated genes KLK2 and KLK3 |
Activated |
5.01E-11 |
Reactome |
RNA Polymerase I Promoter Opening |
Inhibited |
5.01E-11 |
Reactome |
Meiotic recombination |
Activated |
5.26E-11 |
Reactome |
Pathway Name |
Status |
Corrected p-value |
Source Database |
---|---|---|---|
Influenza A |
Activated |
1.13E-08 |
KEGG |
Herpes simplex infection |
Activated |
9.39E-06 |
KEGG |
Measles |
Activated |
0.000117 |
KEGG |
RIG-I-like receptor signaling pathway |
Activated |
0.005427 |
KEGG |
Interferon Signaling |
Inhibited |
3.68E-31 |
Reactome |
Interferon alpha/beta signaling |
Activated |
1.21E-27 |
Reactome |
Cytokine Signaling in Immune system |
Inhibited |
8.60E-18 |
Reactome |
Antiviral mechanism by IFN-stimulated genes |
Inhibited |
5.06E-10 |
Reactome |
ISG15 antiviral mechanism |
Inhibited |
5.06E-10 |
Reactome |
Negative regulators of RIG-I/MDA5 signaling |
Activated |
5.49E-06 |
Reactome |
RIG-I/MDA5 mediated induction of IFN-alpha/beta pathways |
Inhibited |
7.24E-05 |
Reactome |
We next wanted to predict existing drugs that could be repurposed as host-based anti-viral therapies. We consequently used the enriched pathway data generated by SPIA as input to the Pathways2Targets algorithm [23]. This software identifies known drug targets in the significant pathways, then searches the public opentargets.org database to generate a list of potential therapeutics that could affect each pathway. This method was justified since reversing the phenotype of the significantly impacted pathways should reduce viral replication and/or pathogenesis. We sorted the results of this analysis to prioritize therapeutics that were predicted to affect multiple pathways. We intentionally selected this approach to maximize the potential antiviral effect of each drug by simultaneously interfering with multiple intracellular processes that the virus uses to replicate. We generated predictions for the top 20 drugs for the in vivo and in vitro datasets (Table 4). The top hits for the in vivo analysis included dinaciclib, alvocidib, and roniciclib.
Predicted Therapeutic Name |
Number of Target Pathways |
---|---|
DINACICLIB |
33 |
ALVOCIDIB |
33 |
RONICICLIB |
33 |
SELICICLIB |
33 |
AT-7519 |
33 |
MILCICLIB |
33 |
ATALUREN |
33 |
ELX-02 |
33 |
MT-3724 |
33 |
UCN-01 |
24 |
PANOBINOSTAT LACTATE |
23 |
ROMIDEPSIN |
23 |
PANOBINOSTAT |
23 |
BELINOSTAT |
23 |
ENTINOSTAT |
23 |
ABEXINOSTAT |
23 |
VORINOSTAT |
21 |
TACEDINALINE |
19 |
AMG-232 |
17 |
APG115 |
17 |
Predicted Therapeutic Name |
Number of Target Pathways |
---|---|
RUXOLITINIB |
7 |
BARICITINIB |
7 |
RUXOLITINIB PHOSPHATE |
7 |
TOFACITINIB |
7 |
TOFACITINIB CITRATE |
7 |
FILGOTINIB |
7 |
UPADACITINIB |
7 |
RONTALIZUMAB |
6 |
SIFALIMUMAB |
6 |
PF-06823859 |
6 |
PEGINTERFERON ALFA-2A |
5 |
PEGINTERFERON ALFA-2B |
5 |
INTERFERON BETA-1A |
5 |
INTERFERON ALFA-2B |
5 |
INTERFERON BETA-1B |
5 |
INTERFERON ALFACON-1 |
5 |
ULIXERTINIB |
5 |
LY3214996 |
5 |
RAVOXERTINIB |
5 |
MK-8353 |
5 |
The purpose of this meta-analysis was to better characterize and compare the in vitro and in vivo human intracellular transcriptional response to infection with various hantaviruses. Specifically, we identified unique sets of DEGs in both comparisons that have been identified previously, as well as novel DEGs that could provide additional knowledge into the underlying mechanisms of viral pathogenesis. We determined statistically significant signaling pathways that were enriched with the identified DEGs for each set of samples, and subsequently used these pathways to predict potential therapeutics that could be relevant to natural infections.
We believe that this analysis was somewhat hindered by the relatively small number of samples in the public datasets. Specifically, the public data that we used came from HTNV infected tissue, rather than a variety of several hantaviruses. This is somewhat unsurprising due to various logistical reasons including the difficulty and danger of culturing hantavirus in a laboratory setting, or a perceived lack of interest in or access to hantavirus. Such studies can be quite useful in identifying potential causes or compounding factors in HFRS and HCPS. Other hantaviruses, such as Andes virus and Sin nombre virus, may cause far fewer infections each year, but infections by these pathogens have a higher mortality rate. Much research is still needed to determine whether the differential expression patterns vary between species of hantaviruses and between infected human tissue types, but this study attempts to address hantaviruses as a whole in an attempt to identify common expression patterns between different viral species. Our findings could potentially be applied across multiple hantaviruses.
In vivo, we found that of the top 10 genes found, only the NUSAP1 gene, which plays a role in microtubule organization [30], was upregulated. MYBL1 was the top hit, which is a protooncogene associated with promoting piRNA expression, was found to be strongly downregulated.
As stated earlier, of the top 100 identified genes, only OLFM1, CCR7, KLRG1, and PTGDR2 have been identified in previous hantavirus publications. OLFM1, which produces olfactomedin and was found to be associated with hantavirus in a previous study, is also found to be highly expressed in neuronal tissue and may aid in nervous system development; however, relatively little is understood about many of its functions in other cell types [31]. Interestingly, while this gene is expressed in monocytes, its function is unknown and may warrant future investigation [32]. Additionally, three of the top 10 genes found in this analysis, ZNF365, NOG and AMIGO1, regulate proper neuronal development [33–36]. Hantavirus rarely affects the nervous system and has not been observed to directly infect nervous tissue, but these genes may also play a role in development in other cell types which can be affected during hantavirus infection.
CCR7, or C-C Motif Chemokine Receptor seven, is upregulated in lymphocytes and monocytes during a variety of viral infections. A previous study showed that CCR7 induces monocyte egress during hantavirus infection, and therefore a decrease in the number of CCR7-expressing monocytes occurs in the bloodstream [27]. While this may account for the decrease in overall expression, it is unknown if lymphocyte expression of this gene is modulated by hantavirus infection. Similarly, KLRG1 is upregulated in T-cells during viral infection, but it is unclear if it is modulated by hantavirus infection [28].
PTGDR2, or the prostaglandin D2 receptor, promotes inflammation during a viral infection [37]. While no studies of this gene in association with hantavirus have been performed in humans, a 2014 study found that deer mice, a natural non-symptomatic host for the virus, upregulate this gene during active infection of Andes virus, a species of hantavirus [29]. Our analysis showed that this gene is strongly downregulated in humans during infection of HTNV, another hantavirus. Although our observation could be due to a difference in viral species or strain, the gene product may play a role in human pathogenesis of the virus that is not seen in murine species, and likely warrants additional investigation.
The most significant upregulated genes in vivo NUSAP1, KIF20A, CDC25C, and DEPDC1 were associated with cellular division, especially in rapidly dividing cells, which is expected during active viral infection and tissue repair [38–42]. In addition, CDC25C has been shown to play a role in viral replication of HSV, however there is no indication if this is true in other viruses such as hantavirus [43].
In vitro, of all the DEGs, only SLC27A3, AC090527.2 (a novel, uncharacterized gene not shown on figure STRING), and TSEN34 were not associated with interferon stimulation and are not part of the antiviral response. SLC27A3, which is a Very Long-Chain Acyl-CoA Synthetase, is both a synthetase and a transport protein. It has a role in brain development but has not previously been associated with viral pathogenesis or the immune response [44].The tRNA Splicing Endonuclease 34 (TSEN34) gene, and novel gene AC090527.2 similarly have no immune associated function. Finally, Immunoglobulin Heavy Constant Gamma 2 (IGHG2), is shown to be strongly downregulated. This is unexpected since the endothelial cells used in this analysis normally do not produce immunoglobulin. This may be a result of the cells being immortalized, or as an artifact of the analysis. However, at least one prior study has shown that primary endothelial cells can produce antibodies in vitro [45].
The pathway analysis revealed a much wider variety of genes and perturbed pathways in the in vivo dataset as compared to the in vivo dataset. We believe that one of the primary causes of this difference is likely the higher content and complexity of cellular signals that are present in vivo. As the putative cause for the disease is a cytokine storm, we expected to see cytokine signaling along with the interferon response in addition to repair pathways that would not be seen in vitro [46].
Using data from the pathway analysis, medications identified for the in vivo dataset generally treat rapidly dividing and surviving cells, such as in cancer. Specifically, the drugs we predicted would affect the highest number of in vivo pathways in this analysis included those that target cyclin-dependent kinases and JAK kinases such as dinaciclib, alvocidib, and roniciclib. All three of these therapeutics have shown promise as a host-based anti-viral in other pathogens [47–50], but none have been identified or tested for hantaviruses previously. Interestingly, our approach ranked these drugs higher than other potential therapeutics that were further down on the list that have previously been used as host-based anti-virals, such as vorinostat [51–53]. In contrast, the therapeutic prediction analysis from the in vitro dataset identified baricitinib and interferon derivatives, which are commonly used for viral infections [54–56]. As of now, there are no effective treatments for hantavirus infection, and the current suggested remedy is supportive therapy such as bed rest and IV fluids [9]. We believe that testing a subset of these drugs in a laboratory setting would be justified, and that laboratory validation of these findings could improve treatment options for patients in the clinic.
The primary purpose of this study was to analyze the existing public RNA-seq datasets to improve our understanding of the response of human cells during hantavirus infection and to predict existing drugs that could be repurposed as host-based therapeutics. Specifically, we identified seven genes that are novel to the human response to hantaviruses, as well as multiple signaling pathways and drugs that could be repurposed as therapeutics. Such treatments could benefit patients by reducing the occurrence of HFRS and HCPS. Further research is needed to understand how hantaviruses impact expression in the human cell and to improve the identification, development, and testing of effective treatments.
ARMOR: Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data
DEGs: differentially expressed genes
SPIA: signaling pathway impact analysis
HFRS: hemorrhagic fever with renal syndrome
HCPS: hantavirus cardiopulmonary syndrome
RDRP: RNA-dependent RNA polymerase
GEO: Gene Expression Omnibus
NCBI: National Center for Biotechnology Information
ENA: European Nucleotide Archive
SRA: Sequence Read Archive
HTNV: Hantaan Virus
PUUV: Puumala virus
PBMC: peripheral blood mononuclear cells
STRING: Search Tool for the Retrieval of Interacting Genes/Proteins
Ethics approval and consent to participate
Not applicable
Consent for Publication
Not applicable
Availability of data and materials
All data generated or analyzed during this study are included in this published article are including as additional file 1.
Competing Interests
The authors declare that they have no competing interests
Funding
This research was funded by the College of Life Sciences at Brigham Young University. No additional external funding was used.
Authors' contributions
BEP: Principal Investigator, Writing, Editing, Conceptualization. JK: Writing, Computational Analysis, Data Interpretation. All authors read and approved the final manuscript.
Acknowledgments
We would like to thank the Office of Research Computing at Brigham Young University for their support on this project.