Text mining based COVID-19 comorbid diseases Network
The total number of unique disease terms associated with COVID-19 was 559. To reduce the influence of the noise on the network, for each disease, we analysed the number of occurance of disease terms (freqencies) from the 6238 abstracts (Supplementary Table 1). The wordcloud analysis revealed 54 diseases with their term frequencies range from 2 to 552 correspond to font size 1% and 100% (50 out of 54) respectively (Fig. 1, Supplementary Table 2). The strong disease associations were further ensured by DO score reduced the number of diseases to 54 (Supplementary Table 3). Finally, 44 diseases indicated a dense association among themselves (89 edges) with the DO score threshold at 0.5 (Fig. 2). Like many empirically observed biological networks, the disease-disease network was also a scale- free network whose degree distribution followed a power law (data not shown), that is most nodes connect to only a few other nodes whereas a few nodes act a hubs with a large number of links. The most connected disease hubs were clustered into five catogories (cluster 1 to 5) based on their topological properties of the network (Fig. 3 (A, B) Supplemantry Table 4). The cluster 1, 2, 3 and 4 consituted most densly connected networks with almost 10 diseases each. The cluster 1 was having the highest number of diseases with the node sizes >10 in 7 out of 44 diseases). In which earlier studies indicated that densly connected diseases in the network should shared similar Medical Subject Headings (MeSH) classsifications35 (Fig. 3C). The DO terms of the core network were mapped to the MeSH communities and investigated whether they fell under similar disease MeSH classifications. For instance, Cluster 1 shared plumonary diseases were in the same classes of disease unique code with two sets of diseases closely defined in 3 digits. The set 1 diseases were Pulmonary edema (UID:D011654), Viral pneumonia (UID:D011024) and Pulmonary fibrosis (UID:D011658) whereas the set 2 diseaes were Asthma (UID:D001249) and Bronchitis (UID:D001991).
Figure 1: The word cloud represents COVID-19 associated diseases derived through text mining. There are 50 unique major diseases, symptoms and disorders with frequency of appearance ranges from 2 to 552 correspond to font size 1% and 100% respectively. The frequencies are highly correlated with the prevalence of comorbid diseases contribute to the severity in COVID-19 patients29. This consolidated disease network can be readily used by the medical professionals for their regular references.
Figure 2: In the construction of Disease-Disease network, 44 non-isolated diseases (nodes) linked on the basis of disease ontology (DO) based semantic similarity score. The diseases were connected with each other by 89 potential links. The nodes indicate the diseases and node colors on the basis in disease frequency of appearance in the abstracts. The nodes were represented in two shapes circle (diseases) and square (disorders) as well as in three colors yellow (size > 75), green (sizes 75 to 25) and blue (size < 25). The width of the link between diseases represents the similarity score, the significantly comorbid links with (similarity score > 0.5) shown in gray links. The diseases are grouped based on the human system organs and type of disease/disorder (in different background colors with label).
Figure 3: (A) The DO score based clustering of the 44 comorbid diseases of covid-19 were shown in dendrogram. The dendrogram showing five disease clusters namely clusters 1, 2, 3, 4 and 5. (B) In the Heatmap, scaled from 0 to 14, the rows are representing 35 diseases and the column are representing nine topological properties namely Neighborhood, Degree, Eccentricity, Average shortness, topological coefficient, Clustering coefficient, Closeness coefficient, radiality, betweenness of the each disease node. Please note that all the Cluster 1 (pulmonary related diseases) and their symptoms are ranked high in the network in terms of DO association score as well as the network properties (blue backgrounds). (C) Highly comorbid pulmonary diseases of COVID-19 represented in the subnetwork. The colors, shapes of the nodes, edges are as in Figure (2). The highly significant (DO score > 0.5) edges are represented red lines.
Assessment of COVID-19 comorbid respiratory diseases by means of associated genes
Using CTD, we associated 203 unique genes with 8 comorbid disorders of Cluster 1. As defined earlier (methods), among 203 genes, 34 genes were involved in interacting with atleast two of the 6 diseases leading to the 79 disease-gene associations. These genes were mainly associated with highly prevalent six diseases of COVID-19 (Fig. 1) excluding the Bronchitis, Pneumothorax and Neonatal asphyxia (Fig. 4). The disease gene relationships were further investigated using disease gene association (DGA) scores derived from various association indices such as Jaccard Index, Simpson Index, Geometric Index and Cosine Index (Supplementry Table 5). The DGA scores revealed that plumonary fibrosis was the most dominant disease in the Cluster 1. In addition, the disease was strongly associated with asthma, Chronic Obstructive Pulmonary Disease (COPD), plumonary edema and ARDS. In the Figure 4, genes strongly associated with two diseases were refered as „driver genes‟. Furthermore, in the network almost 30% of genes (10 genes) were interacting with 3 diseases and revealed the dense disease-gene associations. The functional pathway enrichment of the driver genes revealed the dominant pathway as “oxygen signalling”. In addition, our study identified the clinically relevant disease assocation especially ARDS (highiest prevalent disease of COVID-19) strongly associated with plumonary fibrosis through ACE2, CCL2, EDN1, TIMP1 and viral pnuemonia through ACE. These findings highlighted that these 34 genes named “driver genes”could play the major role in the manifestation as well as regulation of the six major diseases/disorders of COVID-19.
Figure 4: The strongly associated pulmonary diseases (D) in red ellipse (second row) and driver genes (G) in green squares of row 3 (G-2D, genes having interaction more than 2 diseases), row 4 (G-3D, genes having interaction more than 3 diseases), row 5 (G-4D, genes having interaction more than 4 diseases), from Cluster 1 is shown as bipartite network. The interaction of each disease with its corresponding genes represented in different color (grey, yellow, violet, pink, blue and black). The number of disease-gene interactions are also mentioned say D-22G, represent the disease asthma interacting with 22 genes and so on. Similarly, the disease- gene-disease association represented in terms of Jaccard score (first row) and the values in square box. Please note the ARDS interacts with viral pneumonia and fibrosis through ACE and ACE2 respectively with the Jaccard score 0.2.
Tripartite network based prepurposing drugs to treat comorbid diseases of COVID-19
A drug to be effective against COVID-19 patients in their severe statge should be within or in the immediate interaction with the “COVID-19-target network”. We used CTD to extract 505 drugs or chemicals associated with the 6 target network diseases. This expanded network contained a total of 662 disease-drug and 2284 drug-gene connections. We applied the filtering strategy discribed in the Materials and methods section to extract an integrated scale free disease-chemical-gene network. We observed that only few drug nodes with large number of links (Hubs) and the degree distribution followed a power law. The overall network consisted 34 genes, 6 plumonary diseases and 15 drugs (Supplimentry Figure 1). We first focused on Food and Drug Administration (FDA) approved drugs namely Dexamethasone, Tretinoin, Acetylcysteine, Oxygen, Simvastatin and Aspirin except the chemical Resveratrol (Fig. 5). Nevertheless, resveratrol is already available as a nutritional supplements in many countries beside its challenges in translation to the clinical drug36. We further investigated whether a network motif analysis could help to prioritize drug targets based on the associations between diseases and their surrounding genes. Based on the “ guilt by association” rule – diseases similar to each other are more likely to be affected by the same genes/ pathways, chemicals involved in the same genes are more likely to be highly assoicated with these diseases. For instance, in the network “Plumonary Edema” and “Plumonary fibrosis” shared 15 out of 34 genes. Interestingly, these 15 genes also assoicated with two chemicals “Resveratrol” and “Oxygen” (Fig. 6). Evidently, KEGG pathway enrichment analysis of the 15 genes revealed the key regulation of virus and oncogenic/hypoxic pathways (Fig. 6). This result supported our assoication rule that similar diseases can be treated by same drugs, allowing us to make hypotheses for drugs repositioning purpose. In this analysis, the most densly connected drug hub was “resveratrol” having its association with 4 diseases (out of 6 diseases) and 28 genes (out of 34 genes) (Fig. 5). In this background “Resveratrol” was repositioned for the treatment highly comorbid respiratory disorders of COVID-19 namely asthma, phenmonia, plumonary fibrosis and ARDS.
Figure 5: A tripartite network of pulmonary diseases node (red circles, level 1), drugs node (brown circles, level 2), genes node (green circles, level 3) is shown. The interaction between the different nodes are shown in three color lines. The edges of diseases with drugs and genes are yellow square and green squares respectively. The edges of drugs with diseases and genes are red and blue squares respectively. The numbers inside the squares above and below the circles represent the number of edges of the particular node. Please note that our predicated FDA approved drugs (6) and nutraceutical (1) are strongly interacting with driver genes as well as more than three pulmonary comorbid diseases of COVID-19. Notice the highest number of edges for the drug resveratrol in terms of diseases and genes.
Figure 6: The resveratrol and oxygen chemicals (yellow circle) subgraph of tripartite network is shown (Figure 6). The 15 genes (out of 34 driver genes) highly perturbed by resveratrol drug are shown in green squares. Please note that the same 15 genes also shared by the chemical oxygen (yellow circle). These genes also perturbed during the disease manifestation of pulmonary fibrosis (red ellipse) and pulmonary edema (red ellipse). Notice that the functional enrichment of these genes strongly associate them with major oxygen signaling pathways (HIF, ERK, MAPK, TLR, .. in cyan circle) and leads to the exacerbation of several virus (pink circles) and cancers (violet circles).
Resveratrol significantly reduced hypoxia induced vascular leakage
The effect of resveratrol on transvascular fluid leakage assessed by quantitation of sodium fluorescein dye leakage (Fig. 7B). After exposure to hypoxia, the mean fluorescein dye leakage (223.33±23.60 rfu/g) of lung tissue was significantly higher (p<0.05) as compared to normoxic condition. The animals pre-treated with resveratrol (15 mg/Kg BW) were exposed to hypoxia showed a significantly lower (p<0.05) mean relative fluorescence values (130.75±5.63 rfu/g) in lungs as compared with hypoxic control (223.34±23.60) animals. Overall, the administration of resveratrol (15 mg/Kg BW) showed a significant decrease in fluorescein dye leakage (lung vascular leakage) in hypoxia exposed animals. However, the rfu values of resveratrol treated hypoxia exposed animals were significantly higher than the normoxic control values.
Figure 7: (A) Rats were injected with drug in their tail vein. (B) Rats were exposed to simulated altitude of 25000 feet (7620 m) at 22°C for 8 h to observe the effect of Resveratrol (15 mg/kg BW) on hypobaric hypoxia- induced vascular permeability. Values are mean ± SD (n=6). Significant test between groups were determined by using one-way ANOVA followed by Tukey test. * versus Control; # versus Hypoxia. The experimental conditions were labelled as normoxia (N), normoxia plus drug (N+D), hypoxia (H) and hypoxia plus drug (H+D)