Between viral targets and differentially expressed genes in COVID-19: the sweet spot for therapeutic intervention

The COVID-19 pandemic is raging. It revealed the importance of rapid scientiﬁc advancement towards understanding and treating new diseases. To address this challenge, we adapt an explainable artiﬁcial intelligence algorithm for data fusion and utilize it on new omics data on viral-host interactions, human protein interactions, and drugs to better understand SARS-CoV-2 infection mechanisms and predict new drug-target interactions for COVID-19. We discover that in the human interactome, the human proteins targeted by SARS-CoV-2 proteins and the genes that are differentially expressed after the infection have common neighbors central in the interactome that may be key to the disease mechanisms. We uncover 185 new drug-target interactions targeting 49 of these key genes and suggest re-purposing of 149 FDA-approved drugs, including drugs targeting VEGF and nitric oxide signaling, whose pathways coincide with the observed COVID-19 symptoms. Our integrative methodology is universal and can enable insight into this and other serious diseases. from the pathway. pathways the in KEGG contain at one of the following


Introduction 1
The ongoing COVID-19 pandemic exposed the shortcomings of healthcare systems and devastated the economy 1-3 . A major which we did not use in the data fusion to be able to use them to validate our predictions. Overall, 187 out of the 573 (32.64%) predicted DTIs are present in at least one of these databases (Supplementary Table S2).
Interestingly, among the 143 genes targeted in the predicted DTIs obtained by our data fusion only one is a host protein 138 targeted by the viral proteins; it is HDAC2 targeted by cannabidiol. To explore the other 142 genes and their possible relations 139 with SARS-CoV-2 infection, we study their connection to the host proteins that interact with the viral proteins (we termed them 140 viral interactors (VIs)) in the context of the MIN. We find that 58 drug targeted genes obtained by the data fusion are direct 141 neighbors of the VIs and the remaining 84 genes are at distance 2 or 3 in the MIN from the VIs (79 are at distance 2 and 5 are 142 at distance 3). In addition, to further explore the relation of the genes targeted by COVID-19 proteins after the infection, we  In summary, we adapt a data fusion framework that by jointly decomposing the viral-host protein interactions and the 150 host-drug interactions successfully predicts new DTIs between the human targets and existing drugs that could be re-purposed. 151 Moreover, we validate through external databases one third of the predicted DTIs. Lastly, when focusing on the targeted 152 proteins in the predicted DTIs, we find that one third of the targeted proteins directly connect the host proteins that interact After finding that one third of the human targets in the predicted DTIs directly connect in the MIN to both the human proteins 160 that interact with the viral proteins (viral interactors, VIs) and those corresponding to differentially expressed genes (DEGs) in 161 COVID-19 infection, we further explore how the VIs and the DEGs are connected in the human interactome, in particular in 162 the above described MIN. Our reasoning is that neighboring genes can act as links between the signal inputs, VIs, and the 163 observed outputs, such as dysregulated genes, and may thereby be involved in the disease mechanisms. 164 We use the 332 host genes reported by Gordon et al. 2020 5 as the set corresponding of viral interactors (we term this gene 165 set the "VI"). For the DEG set, we use the 1,910 DEGs identified by Blanco-Melo, D. et al. 2020 14 in lung tissue samples 166 from 2 infected patients (see section "Datasets, pre-processing and matrix construction" in Methods). Furthermore, since 167 previous studies showed that disease genes tend to form densely connected communities 33 in the MIN, we identify direct 168 network neighbors of both of the above described gene sets (we term these two new gene sets the "VI neighbors" and "DEG 169 neighbors"). As shown in Figure 4A, these two sets have 52.30% of overlap (statistically significant with p-value = 0, using 170 hypergeometric test; for more details see section "Analysis of the molecular interaction network and its wiring patterns" in 171 Methods) and hence, we also explore this overlap as a separated gene set (termed the "common neighbors"). Thus, VI and 172 DEG genes, while mostly disjoint, are largely (52.30%) indirectly connected by their neighbors. To fully explore the entire set 173 of neighbors in the MIN network of proteins participating in VIs and the protein products of DEGs in COVID-19 disease, we 174 study separately those VI neighbor and DEG neighbor genes that overlap and those that do not overlap, and within those that do 175 not overlap, we term the neighbors of only VIs the "VI-unique neighbors" and the neighbors of only DEGs the "DEG-unique 176 neighbors". The rest of the genes in the MIN that are not present in any of these five gene sets (VI, DEGs, VI-unique neighbors, 177 DEG-unique neighbors, common neighbors) are term "background genes".

178
To establish whether a SARS-CoV-2 infection affects proteins that are central in the MIN, we analyze the above described 179 gene sets by the following commonly used network properties: four centrality measures (degree, eigenvector, betweenness and 180 closeness centralities) and the clustering coefficient (for more details see section "Analysis of the molecular interaction network 181 and its wiring patterns" in Methods). As shown in Figure 4B, VI and DEG genes show significantly higher degree centralities 182 (p < 0.0001) compared to the background genes, indicating their importance in the MIN. In addition, genes in both of these 183 sets have a higher clustering coefficient than the background genes, indicating their higher tendency to form clusters (Table   184 1). Notably, the common neighbor gene set exceeds both VI and DEG genes in all of these measures except for closeness 185 centrality. Thus, common neighbor genes are likely to participate in many functions, since they are central in the MIN. The the ones deregulated after the infection, and hence, they might be key for understanding the underlying molecular mechanism of COVID-19.  Table 1. Network properties of molecular interaction network (MIN), focusing on the following gene sets: viral interactors (VI), differentially expressed genes after infection (DEG), overlap of the direct network neighbors in the MIN of these two sets (common neighbors), neighbors of the VI and DEG gene set that were not in the common neighbor genes set (VI-unique neighbors and DEG-unique neighbors), and the rest of the genes in the MIN (background genes).
To assess whether the genes participating in the aforementioned sets have similar biological functions in the MIN network, 1.485025e-02). We perform the same enrichment analysis for the rest of the gene sets and find that VI-unique neighbor, 209 DEG-unique neighbor and background genes are not enriched in viral processes (see Supplementary Table S6, Supplementary   210   Table S7, and Supplementary Table S8). These results indicate that the common neighbor genes participate in SARS-CoV-2 211 infection and hence, they might be potential drug targets to treat COVID-19.

212
Based on these results, we conclude that SARS-CoV-2 proteins mainly interact with central human proteins, or influence the 213 expression of host proteins that are central in the MIN. Moreover, we find that the neighbors of these two gene sets (common 214 neighbor genes of the VIs and the DEGs) are also central in the MIN. Interestingly, the common neighbor genes are enriched We check whether any of these 149 drugs targeting common neighbor genes have been investigated for treating COVID-19; 230 we use the CORona Drug InTEractions (CORDITE) database (https://cordite.mathematik.uni-marburg.de). Also, we ask 231 whether they are part of interventional clinical trials currently being conducted (retrieved from https://clinicaltrials.gov). As 232 shown in Supplementary Table S11, 17.44% and 11.40% of the drugs involved in the common neighbor DTIs are listed in 233 CORDITE and subject to at least one active clinical trial on COVID-19, respectively. These results demonstrate the relevance 234 of the predicted DTIs. 235 We perform an enrichment analysis across multiple functional annotation databases: Gene Ontology (GO), KEGG, 236 REACTOME and CORUM (for more details see section "Enrichment analysis of gene and drug clusters" in Methods). As Therefore, we propose to further investigate the well tolerated drugs that modulate NO signaling and its related pathways. A 262 potential candidate from our list of common neighbor DTIs is triflusal, which is known to interact with NFKB, NOS2, PDE10A 263 as well as PTGS1, and for which we predict PTGS2 and NOS3 as additional target genes. Triflusal is a trifluoromethylated 264 analogue of acetylsalicylic acid, which is not yet under investigation as COVID-19 treatment, unlike acetylsalicylic acid. Of 265 note, both triflusal and acetylsalicylic acid act as anticoagulants and a recent study associated anticoagulation with lower 266 mortality and intubation rates for hospitalized COVID-19 patients, providing further evidence for the validity of our findings 54 .

267
Related to VEGF-signaling, we suggest as a putative target gene KDR (VEGFR-2), which appears in the common neighbor In this work, we adapt our GNMTF-based data fusion framework to predicted candidate target genes and existing drugs that 283 could be re-purposed for treating COVID-19. Moreover, we investigate within the human interactome the interplay between 284 the human proteins that are directly targeted by the SARS-CoV-2 proteins and those genes that are differentially expressed 285 after COVID-19 infection. Our study reveals that the host proteins targeted by viral proteins and the differentially expressed 286 genes are indirectly connected by their neighbors (we termed common neighbor genes). Furthermore, we find that the common 287 neighbors are enriched in various viral processes and hence, might be key to the infection mechanisms used by the virus. By 288 focusing on the predicted drug-target interactions involving FDA-approved drugs and targeting the common neighbor genes, 289 we utilize our integrative framework to predict novel drug-target interactions for genes related to the disease-affected pathways.

290
In particular, we find NO and VEGF signaling as potential molecular pathways whose functions are very similar with several 291 observed COVID-19 symptoms.

292
The framework we adapt in this study differs from other network-based computational studies for drug re-purposing applied

303
The presented data fusion framework exhibits robust performance, as exemplified by its capability to identify previously 304 predicted DTIs involving drugs under current clinical investigation. Beyond its application in this work, the framework is highly 305 versatile and has been successfully applied to identify of cancer driver genes, patient stratification and drug re-purposing 21 . To   Figure 1A shows a schematic illustration of the datasets used in this study.

356
Following our previous data fusion methodology 21 , we used Graph-regularized non-negative matrix tri-factorization 357 (GNMTF) to simultaneously decompose each of the two relation matrices into a product of three non-negative low-dimensional 358 matrices while preserving the network structure of the MIN and DCS. The two decompositions, R 12 ≈ G 1 H 12 G 2 and R 23 ≈ 359 G 2 H 23 G 3 , share the matrix factor G 2 fusing the data via simultaneously decomposing the VHI and DTI networks. The network 360 structure of the MIN and DCS is preserved by adding two regularization terms (tr(G 2 L 2 G 2 ) and tr(G 3 L 3 G 3 ), respectively), so 361 that G 2 favors grouping together genes that interact in the MIN and that G 3 favors grouping together drugs that are chemically 362 similar in the DCS network. Figure 1B shows an illustration of the GNMTF. Briefly, the low dimensional matrices can be 363 obtained by solving the optimization problem shown in equation 1: where || · || F denotes the Frobenius norm and tr denotes the trace of a matrix. The objective function, J, is heuristically 365 minimized with an iterative method, starting from an initial solution and using multiplicative update rules to converge towards a 366 locally optimal solution 65 . The final decomposition (used for predicting novel DTIs) was obtained by using the Singular Value on its cluster stability measured by the dispersion coefficient. In particular, the hard clustering procedure was applied to the 372 corresponding matrix factor G i , obtaining a clustering encoded in a connectivity matrix C i , which is defined as a binary matrix 373 where its rows and columns are the clustered entities (viral proteins, human genes or drugs) and 1 means that both entities belong 374 to the same cluster. By applying this procedure with Random Acol initialization, we computed the average of the obtained 375 C i 's, C i , and measured the stability of these clusterings according to the dispersion coefficient: ρ k i = 1 n 2 ∑ n l=1 ∑ n j=1 4(C l j − 1 2 ) 2 .

387
The matrix factors G n 2 ×k 2 2 and G n 3 ×k 3 3 , from GNMTF decomposition, are the cluster indicators of genes and drugs, respectively; based on their entries, n 2 genes are assigned to k 2 clusters and n 3 drugs are assigned to k 3 clusters, respectively. In particular, To compute the functional enrichments of the common neighbor genes, either for the whole list of genes, or for the 49 common 396 neighbor genes that were predicted to be targeted by FDA-approved drugs, we used the gprofiler Python package v.1.0.0 397 (parameters: organism="hsapiens", source=c("GO","KEGG","REAC","CORUM")) 68 . We used this software for its capability 398 to perform the enrichment analysis across multiple functional annotation databases.

399
To assess the quality of the obtained clusters of genes and drugs, we computed the enrichment of biological annotations in 400 the clusters. For each gene (or equivalently, protein, as a gene product) in the network, we used the most specific experimentally where N is the number of annotated genes (drugs) in the cluster, X is the number of genes (drugs) in the cluster that are   Figure 1. Illustration of the data and framework. a) Schematic illustration of datasets used in this study. Three data types are represented: SARS-CoV-2 proteins (in orange), human genes (in green) and drugs (in blue). Two relational datasets connect different types of data: virus-host protein-protein interactions (VHIs) and drug-target interactions (DTIs). Network structural knowledge from these data types is contained in the molecular interaction network (MIN) and the drug chemical similarity (DCS) network. b) Graph-regularized non-negative matrix tri-factorization (GNMTF) used for fusing the VHIs, DTIs, MIN and DCS networks. The matrix factor G 2 is shared across decompositions to simultaneously decompose the VHI and DTI networks. Network structure (topology) information from the MIN and DCS networks are incorporated into the data fusion by using two regularization terms (illustrated by arcs with arrows). The parameters k 1 , k 2 and k 3 indicate the numbers of clusters of viral proteins, human genes and drugs, respectively.   The illustration shows that many of the pathways are tied to NO and VEGF signaling. NO production is directly related to VEGFR-2 receptor and at the same time NO regulates VEGF signaling pathway among others: inflammatory signaling, hypoxia signaling and platelet aggregation.