Statistics of H2V data
Due to the variation in the availability of studies, H2V datasets vary in viruses. As shown in Table 2, genes/proteins that respond to SARS-CoV-2 infection exist in seven datasets, namely DEGs, PPIs, DEPs, DPPs, DTPs, DUPs and SAPs. In comparison, genes/proteins that respond to SARS-CoV and MERS-CoV infections exist in three (DEGs, PPIs and DEPs) and two (DEGs and PPIs) datasets, respectively. There are DEGs datasets for the response to infections of all viruses. There are 9321 human genes responding to MERS-CoV infection, while the number of genes in response to SARS-CoV infection drops to 2249 and then to as low as 1395 for SARS-CoV-2 infection. PPIs datasets are also available for the response to infections of all viruses. There are 1581, 1150, and 296 interaction pairs between human proteins and the respective protein of SARS-CoV-2, SARS-CoV and MERS-CoV. DEPs datasets are available for the response to SARS-CoV-2 and SARS-CoV infections; the number of human proteins in response to infections of the two viruses are 253 and 66, respectively. Datasets DPPs, DTPs, DUPs and SAPs are only available for the response to SARS-CoV-2 infection, and the number of response proteins are 2198 (5046 phosphorylation sites), 232, 516 (730 ubiquitination sites) and 610.
To know whether there are common proteins participating in different processes in response to SARS-CoV-2 infection, the intersection of DEPs, DPPs, DTPs and DUPs were analyzed. Figure 1a shows that both expression and translation of 11 proteins change dramatically upon infection, that both phosphorylation and ubiquitination of 180 proteins change remarkably upon infection, and that one protein experiences noticeable change in expression, phosphorylation, translation and ubiquitination. We then used Venn diagrams to analyze genes/proteins that are common in response to different virus infections. This would help to elucidate the fundamental mechanisms of viral pathogenesis. Figure 1b shows that 130 common genes encounter significant difference in expression upon infection. Figure 1c shows that 62 human proteins are able to interact with all of the three viruses.
Overview of H2V
As shown in Figure 2a, there is a navigation bar and a search box in the header on the web page. The search box accepts queries from the user and then tries to match anything that looks like a gene or protein. The navigation bar provides access to all resources in the database. The SARS2 drop-down menu links to response genes/proteins to SARS-CoV-2 infection. Similarly, the SARS1 and MERS drop-down menus link to response genes/proteins to SARS-CoV-1 and MERS-CoV infections, respectively. Under the Utilities drop-down menu, useful utilities, such as the link to download data from or upload data to H2V, are provided. On the page that lists response genes/proteins, the genes/proteins are shown in a table in rows, with additional information about the gene/protein shown in columns (Figure 2b). The Score column in the table indicates the reliability of the gene/protein, and the score was calculated as the number of studies in which the gene/protein was identified [29]. The genes/proteins in the table are clickable. Clicking on a gene/protein will link to another page showing details of how the gene/protein responds to virus infection. This page comes with two fantastic features: one is to examinate changes of the gene/protein at different timepoints post infection (Figure 2c), and the other is to discover known drugs that target the gene/protein. For PPIs, an embedded sequence viewer, as shown in Figure 2d, is provided for easy inspection of the gene/protein annotation in the viral genome. In addition, PPIs can also be visualized as an interaction network on the page (Figure 2e).
Application cases
To facilitate rapid drug discovery to treat COVID-19 at the pandemic time, H2V provides a drug finder which can be used to find drugs of a given protein based on the UniProt accession number. The found drugs, with DrugBank identifiers, will then be displayed on the lower part of the same page. For example, searching Q9BYF1 will find a few drugs, including Chloroquine and Hydroxychloroquine (Figure 3a).
To help users establish a concrete perception of how all genes/proteins change dynamically over time post infection, H2V provides a utility named Data animation for this purpose. On the page, a setting panel is provided to select data for animation. For example, Figure 3b shows the setting to animate DPPs in response to SARS-CoV-2 infection. The results (Figure 3c and 3d) of this example demonstrate that more human proteins are differentially phosphorylated at 24 h than at the very beginning after the infection of SARS-CoV-2. This indicates that the human body responds to SARS-CoV-2 infection by continuously rewiring cellular pathways.
H2V can be used to analyze integrated findings from different studies. Figure 4 shows an example of using the Enrichment analysis utility to analyze enriched pathways of DPPs that respond to SARS-CoV-2 infection. DPPs identified in at least two studies were analyzed at first (also referred to as analysis 1). After setting parameters on the left in Figure 4a, the analysis was implemented by clicking the button at the bottom. When the analysis was completed, the input DPPs for analysis were listed on the right in Figure 4a, and the result was shown in Figure 4b. It shows that seven pathways were enriched, including FAS signaling pathway, p38 MAPK pathway, and PDGF signaling pathway. It is supposed that findings repeated by independent studies would be more reliable than the unrepeatable ones, so the same analysis (also referred to as analysis 2) was performed for DPPs identified in at least one study. This time, more pathways were enriched, and the top seven pathways were shown in Figure 4c. The comparison shows that the top two pathways in analysis 1 were not in the top seven pathways of analysis 2. This indicates that the inclusion of DPPs of low confidence could distort the analysis result. H2V can be used to remove confounding factors to acquire reliable biological inferences.