This study uses data from the study shared by Burkhardt et al. and pre-trained vectors suitable for reuse (Burkhardt et al., 2019). The dataset cluster contains the following data from Decagon.
- 964 different polypharmacy side effects derived from a wider side effect dataset (Tatonetti et , 2012), each seen at least 500 times.
- Graph network consisting of 645 drugs and 19085 protein nodes (4,651,131 drug-drug, 18596 drug-protein node)
- Graph network hosting protein-protein and drug-protein relationships (total 8,083,300 pieces)
The LitCovid dataset was used to determine the drugs used in Covid-19 treatment. Covid-19 focused articles published in Pubmed are updated daily and added to this dataset. As of the day of the study in Mid June 2020, there are bibliographic format and summaries of about 17288 studies within the dataset.
B. DDI with Graph Convolutional Networks (GCN)
Convolutional Neural Networks (CNN) and GCN are quite similar in architectural structure. However, GCN uses graphs as input (Bastings et al., 2017). A standard GCN architecture is shown in Figure 1.
It is aimed to explore graph properties and signals for GCN (Kipf & Welling, 2019). It is assumed that each node has the properties it has from neighboring nodes and the relationships it establishes with these nodes (Z. Wu et al., 2019). Due to the Convolution layers and activation function (such as ReLU), the properties of all nodes are scanned. Depending on the study, the GCN output can be produced in different formats as a graph, featuring or representing bilateral relations.
Detecting relationships over biomedical data is one of the main study areas of GCN (Zhang et al., 2019). DDI studies or graph-based studies for the detection of side effects and ADRs are available in the literature. Through these studies, DDI predictions are carried out with different approaches. As shown in Figure 2, a graph containing drugs and proteins as nodes was created in the Decagon study.
It can be easily realized from Figure 2 that estimates of possible side effects are produced by examining drug-drug, drug-protein and protein-protein interactions. In this study, the graph shown in Figure 2 is used as the GCN input and the results of drug interaction (in the form of drug1, drug2, side_effects) are obtained as output. Using this infrastructure, Burkhardt et al. (Burkhardt et al., 2019) have also classified side effects according to diseases or organ systems.
C. Choosing the Target Drugs
In this study, a combination of three different sources was used to select the drugs whose interactionس with other drugs would be calculated. The first is the Decagon study, from which we use the network infrastructure. Another source is the LitCovid dataset, which compiles Covid-19 oriented studies on PubMed. At the last stage, it is an online system called “covid19-druginteractions.org” that predicts drug interactions and is shared by Liverpool University.
Firstly, 645 drugs from the Decagon project were chosen in the 'drug_names' dataset. In the next step, the frequency of being in the LitCovid dataset was measured for each of these drugs. The target is to identify the most mentioned drugs in Covid 19 studies published in PubMed. In order to count the number of times the drugs were mentioned, a series of operations were performed in the LitCovid data set. To avoid producing misleading results in the frequency calculation of the terms that are mentioned in the same article, only the abstract sections of the studies were searched. This way, it is easier to get more consistent results regarding the number of papers that include a drug. Data preprocessing steps have been applied to reduce the dataset only to the 'abstract' sections and make it searchable. On the normalized data, 645 drugs were subject to frequency measurement.
After frequency measurement on the LitCovid dataset, it was realized that 543 drugs were never mentioned in the dataset. The mentioning frequency of the remaining 102 drugs in the dataset was calculated as 9.06 on average. However, when we removed the 8 most frequently used drugs from the produced list, the mentioning frequency of the remaining 94 drugs drops below 3. For this reason, experiments continued by focusing only on the first 8 drugs listed in Table 1 with their mentioning frequency in the dataset. Table 1 also shows whether the drugs obtained by the LitCovid scan are available in the online system accessible from the link: www.covid19-druginteractions.org.
It has been observed that 6 of the selected drugs are also available on the 'www.covid19- druginteractions.org' website designed to show drug interactions for Covid-19.
Although Heparin and Clozapine drugs are not found in this system, they have been addressed in many studies for Covid-19 treatment (Leung et al., 2020; Perna et al., 2020; Tang et al., 2020; Thachil, 2020). Some of the studies on these drugs have been carried out in the form of clinical trials directly on patients with Covid-19. Considering their frequent history in the literature, the two drugs Heparin and Clozapine were included in this study.
Experiments were conducted using a modified version of the project shared by Burkhardt, and using the pre-trained vectors of this project (GitHub - Hannahburkhardt/Predicting_ddis_with_esp: Predicting Adverse Drug-Drug Interactions with Neural Embedding of Semantic Predications, n.d.). Thus, there was no need for the re-vectorization and training of the Decagon network of medicines and proteins. Only the drugs selected in the previous section were sent as input to the side effect estimation module used in the project. This module was then updated based on the format of each input; interactions with all drugs in the network are monitored and the 5 highest scoring results are produced. It is also possible to measure whether the side effects defined in the project are included in any disease or organ system. In another update, all the side effect results resulting from the interaction of the target drugs with other drugs were classified, and the organ systems and the disease groups that these drugs may cause the most were measured. The way the project works and its differences from previous works is depicted in Figure 3.
The first part, shown as Section I in Figure 3, reflects the arthitecture of the past study, while the second part represent the arthitecture of the current study. The vector representation shown in Figure 3 was created by adding the ESP study to the Decagon study. These vectors are between 0 and 1, indicating that the effect will increase as the value gets closer to 1 and will decrease when it is closer to 0. Throughout the study, all other drugs shown in Table 1 were subject to the same steps, with Chloroquine shown in the example.