MTDDI: a graph convolutional network framework for predicting Multi-Type Drug-Drug Interactions

— Although the polypharmacy has both higher therapeutic efficacy and less drug resistance in combating complex diseases, drug-drug interactions (DDIs) may trigger unexpected pharmacological effects, such as side eﬀects, adverse reactions, or even serious toxicity. Thus, it is crucial to identify DDIs and explore its underlying mechanism (e.g., DDIs types) for polypharmacy safety. However, the detection of DDIs in assays is still time-consuming and costly, due to the need of experimental search over a large drug combinational space. Machine learning methods have been proved as a promising and efficient method for preliminary DDI screening. Most shallow learning-based predictive methods focus on whether a drug interacts with another or not. Although deep learning (DL)-based predictive methods address a more realistic screening task for identifying the DDI types, they only predict the DDI types of known DDI, ignoring the structural relationship between DDI entries, and they also cannot reveal the knowledge about the dependence between DDI types. Thus, here we proposed a novel end-to-end deep learning-based predictive method (called MTDDI) to predict DDIs as well as its types, exploring the underlying mechanism of DDIs. MTDDI designs an encoder derived from enhanced deep relational graph convolutional networks to capture the structural relationship between multi-type DDI entries, and adopts the tensor-like decoder to uniformly model both single-fold interactions and multi-fold interactions to reflect the relation between DDI types. The results


Introduction
The polypharmacy, also termed as drug combination, is becoming a promising strategy for treating complex diseases (e.g., diabetes and cancer) in recent years [1].
When two or more drugs are taken together, they may trigger unexpected side effects, adverse reactions, and even serious toxicity [2].The pharmacological effects trigged by multiple drugs in the treatment are named drug-drug interactions (DDIs).DDIs can be divided into two cases.One case is that a pair of drugs triggers only one pharmacological effect, another is that a pair of drugs causes two or more related pharmacological effects.We call the former as a single-fold interaction and the latter as a multiple-fold interaction.For example, the interaction between Sucralfate and Metoclopramide tells "Sucralfate may decrease the excretion rate of Metoclopramide, resulting in a higher serum level".Apparently, the pair of these two drugs may trigger two related pharmacokinetic effects, that is, Excretion and Serum Concentration.
Therefore, it is crucial to identify DDIs and unravel their underlying mechanisms for polypharmacy safety.However, it is still both time-consuming and costly to detect DDIs among a large scale of drug pairs in assays.Over the past decade, this build-up of experimentally-determined DDI entries boosts the application of computational methods to find the potential DDIs [3], especially machine learning-based methods.
Recent years, other deep learning(DL)-based predictive methods [11, 12, 22] have been developed to address another screening task of identifying the pharmacological effects caused by known DDIs, that is, predicting multi-type DDIs.For example, DeepDDI [11] designs a nine-layer deep neural network to predict 86 types of DDIs by using the structural information of drug pairs as inputs.Lee et al [22] predicts the pharmacological effects of DDIs by using three drug similarity profiles including the structural similarity profile, Gene Ontology term similarity profiles and target gene similarity profiles of known drug pairs to train the three-layer autoencoder and a eight-layer deep feed-forward network.DDIMDL [12] predicts DDI events by using the drug similarity features computed from chemical substructures, targets, enzymes and pathways to separately train three-layer deep neural networks (DNNs), and then averages (sums up) individual predictions of those trained DNNs as the final prediction.
Despite these efforts on identifying multi-type DDIs, there still exists the following space to improvement for DL-based methods.i) Existing DL-based methods require the known DDI as input, while the interactions of most drug pairs are unknown.Therefore, it is necessary to develop new algorithms to identify whether an unknown drug pair has one or more pharmacological effects.ii) More DDIs can form an interaction network that help to improve the predictor's performance, however existing DL-based methods treat drug pairs as independent samples, ignoring the structural relationship between DDI entries.iii) Existing DL-based methods cannot reveal the knowledge (e.g., the excretion of a drug slows down due to its increasing serum concentration caused by a DDI) about the relation between DDI types.To address above issues, we proposed a novel predictive method (called MTDDI) to identify whether an unknown drug pair results in one or more pharmacological effects.
The main contributions of our work are described as follows: i) MTDDI leverages an encoder by an enhanced relational graph convolutional network (R-GCN) to capture the structural relationship between multiple-type DDI entries.ii) MTDDI employs a tensor-like decoder to uniformly model both single-fold interactions and multi-fold interactions for identifying whether an unlabeled type-specific drug pair results in one or more pharmacological effects.iii) MTDDI adopts a set of type-specific feature importance matrices (i.e., a tensor) in decoder to reveal the dependency between DDI types by calculating their correlations.

Datasets
We built the multi-type DDI dataset by collecting DDI entries from DrugBank(July 16, 2020 [23] in the following steps.First, we downloaded the completed XML-formatted database (including the comprehensive profiles of 11,440 drugs), from which we selected 2,926 small-molecule drugs and their drug chemical structures and drug binding proteins.After extracting all descriptive sentences of DDIs from the XML file, we totally collected 859,662 interaction entries among 2,926 drugs.Furthermore, we obtained 274 different interaction patterns by parsing these sentences.According to pharmacological effects triggered by DDIs [24], we finally grouped these patterns into 11 types of DDIs, including Absorption, Metabolism, Serum Concentration, Excretion, Synergy Activity, Antagonism Activity, Toxicity Activity, Adverse Effect, Antagonism Effect, Synergy Effect, and PD triggered by PK [25]. The task of multi-type DDI prediction directly discriminates whether an unknown drug pair results in one or more pharmacological effects of interest (Fig. 1-C).It learns a set of functions mapping

Suppose
This work focuses on the task of multi-type DDI prediction since the second task is just its degraded version.Referring to DDI-triggered pharmacological effects as interaction types, we represent a set of multi-type DDIs as a multi-relation complex network (, ), where vertices are drugs and edges between vertices are multi-type interactions (Fig. 2-A).Let  = { 1 ,  2 , … ,   } be the vertex set,  = { 1 ,  2 , … ,   } be the interaction type set, and �  ,   ,   � be the interaction of type  caused by the pair of drug   and drug   .Furthermore,  is decomposed into m sliced sub-networks  = { 1 ,  2 , … ,   } regarding interaction types (Fig. 2-A

Feature extraction
In addition to interaction entries, we extracted drug chemical structures, which are represented by SMILES strings, as well as drug binding proteins (DBPs), including targets, enzymes, transporters, and carriers.Drug chemical structures were encoded into feature vectors by Extended Connectivity Fingerprints (ECFPs) [26] and MACCSkeys Fingerprints [26], respectively.ECFPs represent a molecular structure through circular atom neighborhoods as 1024-dimensional binary vector, where each element denotes the presence or the absence of a specific functional substructures.In contrast, the MACCSkeys Fingerprints represent a molecular structure as 166-dimensional binary vector w.r.t. a set of pre-defined substructures.These two fingerprints are computed by the rdkit package of python, and the radius of ECFPs neighborhood is set to 4.
Moreover, we consider DBPs (targets, transporters, enzymes, and carrier proteins) as the third type of drug features, because they are crucial factors when a DDI occurs.
Sequentially, drug is represented as a 3334-dimensional binary vector in which each element indicates whether the drug binds to a specific protein.Finally, by calculating Tanimoto coefficients between drug feature vectors, we obtained three drug similarities derived from ECFP_4, MACCSkeys and DBPs, respectively.

Model construction
Upon above representation of multi-type DDIs, we cast the task of multi-type DDI prediction as the multi-relational link prediction, and design an end-to-end Multiple-Type Predictor for Drug-Drug Interactions (MTDDI) to address this task.
MTDDI contains an encoder ℱ  and a decoder ℱ  .
Derived from the multi-relation GCN (R-GCN) [27][28][29][30], we construct a multi-layer R-GCN in which encoder ℱ  extracts a global latent feature matrix  × ( ≪ ) by capturing the topological feature matrices { ×  } of all drugs across {  }.However, the primary multi-layer GCN causes the over-smoothing issue that makes all the nodes in a network have highly similar feature values.To relax the over-smoothing issue, ℱ  doesn't use the outputting embedding representations of its final layer, but it sums the embedding representations (named residuals) of its hidden layers together as its final embedding feature matrix .In addition, considering a few of possible missing interactions among the network, ℱ  utilizes a pre-defined drug similarity matrix to constrain the similar drugs more close in the embedding space.
Since the original decoder in the primary GCN [30] is just an inner production ZZ T between drug embedding vectors, it cannot reflect the essence of multi-type interactions.R-GCN employs RESCAL [31], which utilizes m additional type-specific feature association matrices   to capture the essence of multi-type interactions (i.e.,     ).Inspired by literature [27,32], we suppose that feature importance varies across interaction types, and we also assume that interaction types are not completely independent to each other.Therefore, our decoder ℱ  adopts a tensor factorization-like matrix operation to integrate the embedding feature matrix , m type-specific feature importance matrices, and an average feature association matrix to reconstruct the multi-type DDIs network (i.e.,       ).
Finally, our MTDDI trains ℱ  and ℱ  simultaneously to obtain an end-to-end model for implementing the multi-type DDI prediction., rows in the colorful matrix) by capturing their complex topological properties.A residual strategy (i.e., the black arrow) is added from the second hidden layer to the last hidden layer.Meanwhile, a drug similarity matrix is employed to constrain similar drugs as close as possible in the embedding space (i.e., the purple matrix).(C) The decoder of MTDDI.It is a tensor factorization-like matrix operation, which integrates the embedding feature matrix, type-specific feature importance matrices {  }, and an average feature association matrix  to reconstruct the multi-type DDIs network.(D) An example to illustrate a layer of R-GCN in the encoder.An interest node (i.e., blue node) aggregates both the features of its first-order neighbor nodes (i.e., orange) and its own in each of m sliced networks to update its features (i.e., green bar).Then, all the updated features are accumulated and passed through a ReLU activation function to produce its final embedding (i.e., the colorful vector).The whole multi-type DDI network are propagated by a p-layer R-GCN to capture the information of its p-th order neighbors.(E) Statistics on different pharmacological effects caused by DDIs.
From the left to the right, the interaction types are: Absorption, Metabolism, Serum Concentration, Excretion, Synergy Activity, Antagonism Activity, Toxicity Activity, Adverse Effect, Antagonism Effect, Synergy Effect, and PD triggered by PK.Y-axis indicates their occurring numbers.(F) Proportional distribution of the number of single-fold and multi-fold DDIs.79.6% DDIs are single-fold, 19.36% are two-fold and 1.04% are three-fold.

Encoder in Multi-relation graph convolutional network
We employed the extension GCN (i.e., R-GCN) to extract the node embedding in the multi-type DDI network.First, the network  is decomposed into m sliced sub-networks { 1 ,  2 , … ,   }, in which each slice accounts for a specific interaction type (Fig. 2-A).Then, both the feature vector   (0) of drug   (or node   ) and those of its neighbors in   are aggregated by a graph convolutional operation.After that, similar aggregations across all the sliced subnetworks are further summed up to generate the updated feature vector   (1) of drug   .Such a single layer of R-GCN integrates the topological neighborhood of drug   across interaction types which it involves.For any layer in a multi-layer R-GCN, the general propagation rule is defined as: where  , = |   | is a normalization constant,    denotes the set of   's neighbors in   , ℎ  () is the input feature vector and   () is the trainable weight matrix in the k-th layer of R-GCN, and σ is a non-linear element-wise activation function (i.e., ReLU).Last, the aggregation process is propagated through p layers of R-GCN to obtain the final embedding feature vector   () of drug   .
Such a multi-layer propagation of R-GCN enables the extraction of higher-order topological features of multi-type DDI network [33].However, it usually causes the 'over smoothing' issue derived from GCN [33], where the features of the neighboring drugs, even all drugs in the case of many layers, are extremely similar.As a result, a good GCN contains only a few of hidden layers (e.g., the number of layers is less than or equal to 2) [28][29][30].To enhance the ability of GCN's network representation, a residual strategy is adopted to relax 'over smoothing' issue for multi-layer R-GCN.
Let the final embedding features outputted by the Encoder ℱ  be   .For a p-layer R-GCN, we set   as: Notedly, this sum requires that the dimensions in different layers are same.Due to the first hidden layer accounts for the dimension reduction of the high-dimensional one-hot features   (0) , the residual strategy just starts the sum from the second hidden layer.
Moreover, it is anticipated that two interacting drugs are close in the embedding space generated by ℱ  .Thus, possible interactions can be deduced among those close drugs according to their embedding features [30].However, the existing interaction with missing label between two drugs possibly causes their remoteness in the network.
Missing interactions between these drugs would aggravate the learning of ℱ e .
Therefore, under the consideration that similar drugs tend to interact in terms of chemical structures [2] or binding proteins [34], pre-defined drug similarities, taken as a regularization item s i,j ⋅ �z i − z j � 2 2 , is employed to constrain similar drugs as close as possible in the embedding space.Refer to Section 2.5 Loss Function for details.

Decoder
Once the encoder ℱ  generates drug embedding features {  }, which integrate topological information across interaction types, the decoder ℱ  sequentially employs {  } to reconstruct the multi-type DDI network  � .In the case of binary DDI prediction, the inner production      indicates how likely drug   interacts with drug   .In order to reflect the difference between interaction types, R-GCN employs        to calculate the likelihood of being a type-specific interaction, where {  } are m specific-type associative matrices.Inspired by literature [35], we suppose that feature importance varies across interaction types, and we also assume that interaction types are not completely independent to each other.Therefore, our decoder ℱ  adopts a tensor factorization-like matrix operation          to calculate the type-specific interaction likelihood.Thus, how likely the pair of drug   and drug   triggers an r-type pharmacological effect can be formally defined as the scoring function: where   and   are the 1 ×  embedding vectors of drug nodes   and   respectively,   is a  ×  feature importance diagonal matrices concerning type r,  is a  ×  feature association matrix across different interaction types, (•) is the Sigmoid function that converts the confidence score of being an r-type interaction into a probability value of [0,1].

loss function
The encoder ℱ  and the decoder ℱ  can be trained as an end-to-end model of multi-type DDI prediction.The loss function of MTDDI is composed of two components.The first one measures the difference between the original multi-type interaction network  and the reconstructed network  � .The second one is a regularization item, which keeps the similar drugs as close as possible in the embedding space.
Let    be the true label of a triplet (  , ,   ) for the pair of drug   and drug   in the r-th slice network   , and    be the predicted probability of being an interaction of type r.For the r-th slice network   , its loss function    is defined by a binary cross entropy as follows: The positive samples are taken as the interactions in   while the negative samples are randomly sampled among its unlabeled drug pairs.The number of negative samples is same as that of positive samples.For all the sliced networks, the global loss function is defined as  = ∑     =1 .
Let  = � , � ∈ [0,1], ,  = 1,2, … ,  be the drug similarity matrix.The regularization item is defined as: where   and   are the embedding representations of drug   and drug   generated by the encoder respectively.It can be written in an elegant matrix form as follows: where  is an  ×  feature matrix stacked by feature vectors,  =  −  is a Laplace matrix,  is an  ×  diagonal matrix derived from  and its element

𝑖𝑖
. This regularization item utilizes pre-defined drug similarities to constrain similar drugs as close as possible in the embedding space.This idea is similar as that in literature [35].
Therefore, the final loss of MTDDI is as follows: where α is a hyper parameter to adjust the weight of similarity constraint in the training phase.

Assessment
In order to measure the performance of MTDDI, the whole DDIs dataset is randomly split into a training set, a validation set and a testing set.The training set is used to train the learning model and the validation set is used to tune the model to ensure an optimal predictive performance.The testing set is used to measure the generalization performance of the model on unlabeled data.In each experiment, we use 75% samples of the DDIs datasets as the training set, 5% samples as the validation data, and the remaining 20% samples as the testing data.The splitting process is usually repeated many times (e.g., 20 times) with different random seeds and the average performance of these repetitions is reported as the final performance.
Since our task is a multi-type prediction problem, a group of metrics is used to measure the prediction, including the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPR), Accuracy, Recall, Precision, and F1-score.Remarkably, Recall, Precision, and F1-score have their macro versions and micro versions, respectively.Macro metrics reflect the average performance across different interaction types.For example, Macro Precision is defined as the average of the Precision values of different interaction types.In contrast, Micro metrics is analogous to corresponding metrics in binary classification by summing the numbers of true positive, false positive, true negative and false negative samples across all interaction types, respectively.Their definitions are as follows: where   ,   ,   and   represent the number of true positive, true negative, false positive and false negative samples in the i-type DDI prediction, respectively;  is the number of DDI interaction types.In addition, AP@50 is employed to measure the values of Macro Precision in each DDI type on average in terms of top-50 predicted DDIs.For any of above metrics, the greater value, the better prediction.

Results and Discussion
We designed some experiments to address the following questions: 1) Does MTDDI improve multi-type DDI classification?2) Can MTDDI achieve a good predictive performance in multi-type DDI prediction? 3) How both the residual strategy and the similarity regularization in the encoder help the prediction?4) How the feature importance matrices in the decoder help finding the dependency between DDI types?

Parameter settings
To learn a good model of multi-type DDI prediction, we first determined the architecture of the encoder as follows.The one-hot encoding of 2926 nodes in the multi-type DDI network were adopted as the input features of our MTDDI.The encoder is composed of four hidden layers, in which the number of neurons is determined empirically.Besides, to accommodate the residual strategy, the second, the third and the fourth hidden layers contains the same number of neurons.Thus, the numbers of neuron in the input and four hidden layers are 2926, 1024, 128, 128 and 128, respectively.
With this encoder architecture and the tensor factorization-like decoder, we performed a grid search with Adam optimizer [36]

DDI classification
In order to answer the first question, we compared MTDDI with other three state-of-the-art multi-class classification models, including DeepDDI [11], Lee's model [22], and DDIMDL [12].We only focus on these deep learning-based models because they have demonstrated superior performance to regular shallow models.In common, these methods first treat rows in a drug similar matrix as corresponding drug feature vectors, then set the concatenation of two feature vectors as the feature vector of a DDI, and last train a multi-layer DNN with both feature vectors and types of DDIs as the classifier.Differently, in terms of model architecture, DeepDDI is a model for homogeneous interaction feature (i.e., chemical structure) whereas both Lee's model and DDIMDL are two models for accommodating heterogeneous DDI features (e.g., pathway, GO terms and binding proteins).
Moreover, to cope with the high dimension of DDI feature, they utilized various tricks to enhance their models.DeepDDI [11] employed the Principal Component  In order to further verify the performance of our MTDDI to predict the new DDIs and their interaction types in unknown DDIs, The inspiring prediction impels us to perform a novel transductive inference of potential DDIs among all drug pairs and their interaction types.Such an inference validates the performance of MTDDI in practice.To accomplish this task, we first used the whole dataset with known DDIs to train MTDDI, then employed the trained MTDDI to infer how likely unlabeled drug pairs trigger specific pharmacological effects among 11 interaction types.After that, we ranked these unlabeled drug pairs in each interaction type according to their type-specific predicting scores.In the prediction results of multi-fold interactions, 17 out of 50 two-fold predicted candidates and 8 out of 60 three-fold predicted candidates are confirmed, respectively.

The comparison results in
The detailed results are listed in Table S2 of supplement file.As illustrated, we picked up a two-fold interaction case (Case 8) and a three-fold interaction case (Case 18) to show how MTDDI contributes to find multi-fold interaction cases.For the example of two-fold interaction, DrugBank states "Acebutolol may increase the arrhythmogenic activities of Digoxin." , while DDI Checker states "Using Acebutolol together with Digoxin may slow your heart rate and lead to increased side effects."(Case 8).Both statements show that the pair of Digoxin and Acebutolol triggers a PK antagonistic activity and further results in a PD adverse effect.For the example of three-fold interaction, two statements are similarly found, but contain three pharmacological effects as follows "Voriconazole may increase the blood levels and effects of Trazodone" and "The risk or severity of QTc prolongation can be increased when Trazodone is combined with Voriconazole" (Case 18).The pair of Voriconazole and Trazodone increases both PK serum and PD synergy of Trazodone, but also increases the risk of adverse effects as well.In total, these newly-predicted multi-type DDIs demonstrate the potentials of our MTDDI in practice.

Influence of hidden layers, residual strategy and similarity regularization in encoder
In this section, we investigated how three factors (i.e., the number of hidden layers, the similarity regularization, and the residual strategy) in the encoder affect the performance of MTDDI.First, after removing the similarity regularization and the residual strategy in MTDDI, we adopted MTDDI two variants, that is, MTDDI with 2 hidden layers (denoted as MTDDI-2) and 4 hidden layers (denoted as MTDDI-4).The shown in Table 3, from which we can obtain three following crucial points.
(1) MTDDI-4 is worse than MTDDI-2 in all the measuring metrics.Obviously, the increment of the number of hidden layers decreases the predictive performance because of the "over smoothing" issue derived from GCN.
(2) Compared with MTDDI-2 and MTDDI-4, MTDDI-4-R owing to the residual strategy achieves the significant improvement.Thus, the residual strategy can relax the "over smoothing" issue in the case of deeper GCN architecture.
(3) Compared with these variants, the full architecture of MTDDI having the additional similarity regularization further improves the prediction.Thus, the similarity regularization helps constrain similar drugs as close as possible in the embedding space to cope with the issue that missing interaction label between similar drugs causes their remoteness in the network.
In summary, with the help of residual strategy, MTDDI can accommodate deep GCN architecture (e.g., containing >2 layers).Also, its similarity regularization further helps capture missing interactions.
Table3.Performance of similarity constraint and residual strategy in MTDDI

Influence of different implementations of decoder
Since the decoder in MTDDI is loosely coupled with the encoder, we should adopt various decoder models.In this section, we compared three implementations of the decoder, including the inner production      in the traditional GCN, the type-specific association        in R-GCN, as well as our type-specific importance association z i D r RD r z j T .According to their original algorithms, these three implementations are denoted as InnerProd, RESCAL and DEDICOM, respectively.
See Section 2.3.2Decoder for details.To further validate whether feature importance matrices capture the dependency between DDI types, we calculated the pairwise correlations among matrices{D r }.First, we calculated their correlations by diagonal vectors of matrices {D r } since these matrices are diagonal matrices (Figure 3).Then, we categorized these types into a pharmacokinetic group (PK) and a pharmacodynamic group (PD) in terms of their pharmacological behaviors.The PK group contains the first 7 types while the remaining types belong to the PD group.After that, we calculated the average values of absolute correlations within PK and within PD (denoted as C PK and C PD ), and the average value of absolute correlations between PK and PD (denoted as C B ), respectively.The results reveal that C PK (0.264) is significantly greater than C PD (0.086), and C B (0.344) is the greatest.Similarly, we calculated the average values (in Figure 3) of absolute correlations within the individual DDI types, and found that Absorption is the maximum (0.301) and Antagonism Effect is the minimum (0.074).

The comparison results in
Moreover, we enumerated the correlations between individual DDI types.For example, Absorption is significantly related with Serum Concentration (ρ = −0.55) and Toxicity Activity (ρ = −0.45),respectively; Synergy Activity is significantly related with Synergy Effect (ρ =-0.53);Antagonism Effect is independent to Synergy Effect (ρ = −0.005).All the p-values of correlation entries are significantly less than 0.0001.Totally, the results in Figure 3 demonstrate that DDI types are not independent to each other, and some of them show significant correlations.Thus, the feature importance matrices in the decoder can capture the dependency relation of DDI types in some sense, and they would contribute uncovering the forming mechanism of DDI as well as finding potential synergistic drug combinations with the aid of more medical knowledges.

DrugBank
The metabolism of Fluvastatin can be decreased when combined with Glyburide.Heat map of correlation analysis for different DDI types.
n drugs  = {  } and k interactions ℒ = �  � among them.The traditional DDI prediction, multi-type DDI classification and multi-type DDI prediction are different pharmacological tasks. The task of traditional DDI prediction learns a function mapping ℱ:  ×  → {0,1} to deduce potential interactions between unlabeled drug pairs among  (Fig.1-A). The task of multi-type DDI classification identifies what pharmacological effects caused by known DDIs are (Fig.1-B).It learns a function mapping ℱ: ℒ → {  },  = 1,2 …  , where   is the pharmacological effect type of DDIs, and T is the cardinality of all pharmacological effects.

Figure 2 .
Figure 2.Overall framework of MTDDI and multi-type DDI statistics.(A) Decomposition of the multi-type DDIs network.The multi-type DDI network is decomposed into m sliced (i.e., type number) subnetworks, which are represented by m adjacent matrices and are taken as the input of the encoder.(B) The encoder of MTDDI.It constructs a p-layer multi-relation GCN (R-GCN) to encode drugs in the multi-type DDI network into embedding vectors (i.e., rows in the colorful matrix) by capturing their complex topological properties.A residual strategy (i.e., the black arrow) is added from the second hidden layer to the last hidden layer.Meanwhile, a drug similarity matrix is employed to constrain similar drugs as close as possible in the embedding space (i.e., the purple matrix).(C) The decoder of MTDDI.It is a tensor factorization-like matrix operation, which integrates the embedding feature matrix, type-specific feature importance matrices {  }, and an average feature association matrix  to reconstruct the multi-type DDIs network.(D) An example to illustrate a layer of R-GCN in the encoder.An interest node (i.e., blue node) aggregates both the features of its first-order neighbor nodes (i.e., orange) and its own in each of m sliced networks to update its features (i.e., green bar).Then, all the updated features are accumulated and passed through a ReLU activation function to produce its final embedding (i.e., the colorful vector).The whole multi-type DDI network are propagated by a p-layer R-GCN to capture the information of its p-th order neighbors.(E) Statistics on different pharmacological effects caused by DDIs.From the left to the right, the interaction types are: Absorption, Metabolism, Serum Concentration, Excretion, Synergy Activity, Antagonism Activity, Toxicity Activity, Adverse Effect, Antagonism Effect, Synergy Effect, and PD triggered by PK.Y-axis indicates their occurring numbers.(F) Proportional distribution of the number of to tune major parameters of our MTDDI, including epoch, learning rate, batch size, and the hyper parameter for the similarity regularization.The epoch, referring to as the number of training iterations, was tuned from the list of values {5, 10, 20, 30, 40, 50, 60, 70}.The learning rate, determining whether and when the objective function converges to the optimal values, was empirical investigated from the list {0.0001, 0.001, 0.005, 0.01, 0.05, 0.1}.A mini-batch strategy, sampling a fixed number of drug pairs in each batch, was tuned from {50, 100, 200, 400, 600, 1000, 2000}.The hyper parameter α, adjusting the weight of similarity constraint, was examined from the list {0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5}.Finally, we experimentally determined a well-trained MTDDI by setting the epoch as 40, the learning rate as 0.001, the batch-size as 400, and the hyper parameter α as 0.02.
Analysis (PCA) to reduce the feature dimension before training the nine-layer DNN. Lee et al [22] first utilized three three-layer autoencoder for three source of raw DDI features respectively, and then concatenated three sources of dimension-reduced features as the training feature of the eight-layer DNN.DDIMDL [12] trained four three-layer DNNs for four sources of DDI features respectively, and averaged the individual predictions of those trained DNNs as the final prediction.These methods are designed for the classification of multi-type DDIs, and they determine the pharmacological effect type for a given DDIs, while our MTDDI exceeds the task with the direct discrimination of whether an unknown drug pair results in one or more pharmacological effects of interest.Thus, our MTDDI is accommodated to the version of multi-class classification task.In detail, all DDIs are divided into the training samples, and the test samples.For each type, the DDIs belonging to this DDI type are considered as the positive samples, and the DDIs not belonging to this DDI type are considered as the negative samples.We implemented DeepDDI, Lee's model and DDIMDL with their published source codes and the default parameters.
Finally, we picked up top-20 type-specific candidates in each interaction type and validated them by both the latest version of DrugBank (version 5.1.8,on January 18, 2021) and the online Drug Interaction Checker tool (Drugs.com).The validation was performed in both single-fold interactions and multi-fold interactions respectively.In the prediction results of single-fold interactions, 40 out of 220 predicted DDI candidates (18.2%) are confirmed.The average rank of 40 verified DDIs is 7.75, indicating that our MTDDI can effectively detect the potential DDIs as well as different types of DDIs.The detailed results can be found in TableS1of supplement file.We further picked up some validated DDI candidates (i.e., Case 31, Case 34, Case15 and Case 16) to show how DDI prediction contributes to synergistic drug combination and drug contraindication.For example, when two drugs of Pregabalin and Benmoxin are combined, the therapeutic efficacy of Benmoxin can be increased (Case 31).In addition, the therapeutic efficacy of Mebanazine can be increased when used in combination with Pregabalin (Case 34).In contrast, the risk or severity of QTc prolongation can be increased when Quinidine is combined with Promethazine (Case 15).Besides, the risk or severity of serotonin syndrome can be increased when Linezolid is combined with Ergotamine (Case 16).These results manifest that the MTDDI can provide a preliminary screening for synergistic drug combination and drug contraindication.

Figure 3 . 4 Conclusions
Figure 3. Heat map of correlation analysis for different DDI types.

Figures Figure 1
Figures

Figure 2 Overall
Figure 2

Table 1
predicting the multi-fold interactions.In this sense, we run a multi-type DDI prediction to demonstrate the good predictive performance of MTDDI.The results in Table2 show that our MTDDI can effectively predict the single-fold and multi-fold DDIs.Table2.Performance of MTDDI for multi-type DDIs prediction in both single-fold and multi-fold DDIs Table 4 show that InnerProd is the worst and DEDICOM is the best..The potential reason of DEDICOM significantly outperforming two other models is as follows.The inner production      only indicates how likely drug   interacts with drug   , but it cannot model interaction types.In contrast, RESCAL reflects the difference between interaction types and models the likelihood of being a type-specific interaction by m additional type-specific feature association matrices {  }.Compared with RECAL, to indicate how likely the pair of drug   and drug triggers an r-type pharmacological effect, DEDICOM employs a global feature association matrix , as well as m additional type-specific diagonal matrices {  }, which reflects that feature importance varies across interaction types.

Table S2 . Investigation of novel multi-fold DDIs predicted by MTDDI
Acebutolol may increase the arrhythmogenic activities of Digoxin.(drugbank)Using acebutolol together with digoxin may slow your heart rate and lead to increased side effects.(drugs.com)Sufentanil may have additive effects in lowering your blood pressure.You may experience headache, dizziness, lightheadedness, fainting, and/or changes in pulse or heart rate.