In this section, we first present the DTI prediction problem. Then, we provide an overview of the graph-DTI framework. After that, we present the input and GNN layers of the framework, respectively. Finally, we discuss the DTI prediction problem with Graph-DTI.
Problem Formulation
In a typical DTI scheme, we set \({N}_{d}\) and \({N}_{t}\) for the drug and protein, respectively. Then, we also define the drug-target interaction matrix \(Y \in {(0, 1)}^{\left|{N}_{d}\right|\times \left|{N}_{t}\right|}\), where \(\left|{N}_{d}\right|\)denotes the number of drugs and \(\left|{N}_{t}\right|\) denotes the number of target proteins. In the matrix Y, for each node \({y}_{i,j}=1\left(i\in {N}_{d},j\in {N}_{t}, i\ne j\right)\), a value of 1 indicates that protein j has interaction with drug i; otherwise, when y_(i,j)=0, it may be an undetected potential DTI. The DTIs prediction problem can be considered as a semi-supervised classification problem in machine learning, and for a given DTI matrix Y, our goal is to predict whether drug i (\(i\in {N}_{d}\)) has a potential interaction with protein j (\(j\in {N}_{t}\)) that has not been previously discovered. To achieve this goal, our main task is to learn to predict \({\widehat{y}}_{i,j}=\phi \left(i,j\right|\beta ,Y)\), where \({\widehat{y}}_{i,j}\) denotes the probability of protein j interacting with drug i and β denotes the model parameter of the function φ.
Overview
Figure 1 shows the overview of Graph-DTI. It takes as input the processed DTI matrix and the heterogeneous network obtained from the dataset preprocessing. It outputs the interaction values of drug-target pairs. The central idea of Graph-DTI is to encode the nodes and their topological neighborhood information into distributed representations by considering higher-order neighbor structure information and topological structure information using graph neural networks. Therefore, we designed Graph-DTI as a three-step DTI prediction framework.
-
The original dataset is processed to construct a heterogeneous network graph, while the DTI in the graph is constructed mask。
-
Encoding of node features between drug-target pairs and their adjacency structures in heterogeneous networks.
-
Prediction is performed based on the coding vector input to the classifier obtained from the previous step.
Overall, in the first step, we extract DTI data sources from the dataset, which contains drug-drug pairs, protein-protein pairs, drug-target pairs, drug chemical structure similarity scores and protein sequence similarity scores, while constructing a heterogeneous network. In the second step, we use Graph-DTI to extract the features of nodes and their related nodes' neighborhood structures from the constructed heterogeneous networks. In the third step, to further predict the interaction values between drug-target pairs, Graph-DTI is used to output the potential representations of the nodes and the neighborhood topology between the drug-target pairs. Then, we calculate the scores between them and output a true interaction value. We will specify the details of the model below.
DTI Extraction and Heterogeneous Network Construction
The first step of Graph-DTI is divided into two parts, including the construction of a heterogeneous network and the construction of a mask for the DTI.
Graph-DTI predicts unknown DTIs from a heterogeneous network constructed from drug and target proteins, where drug and target proteins are denoted as nodes and DTIs and other interactions or associations are denoted as edges. The heterogeneous network includes 2 types of nodes, drug and target, and 5 types of edges: drug-drug interaction (DDI), drug-target interaction (DTI), protein-protein interaction (PPI), drug chemical structure similarity, and protein sequence similarity. sequence similarity).
The heterogeneous network is defined as an undirected graph G = (N, E), where N is the space of nodes and E is the space of edges belonging to the set of object types O and R. O = {Drug, Protein}, R = {Drug-target-interaction, Drug-drug-interaction, Protein-protein-interaction, Drug-structure-similarity, Protein-sequence-similarity}. The number of drugs is 708, the number of targets is 1512, and the number of nodes of N is 2220. the number of drug-target-interactions is 1923, the number of drug-drug-interactions is 10036, the number of protein-protein-interactions is 7363, the number of drug-structure-similarity is the square of 708, and the number of protein-sequence-similarity is the square of 1512. Specifically, drug-target-interaction, drug-drug-interaction and protein-protein-interaction are treated by binary matrices as sparse matrices representing edges, which are undirected and unweighted for all three types of edges. Also, the scores in drug-structure similarity and protein-sequence similarity are used as weights for these two types of edges, respectively, i.e., these two types of edges are undirected and weighted. The specific construction process can be detailed in Fig. 2. In the heterogeneous graph, two identical nodes can be connected by more than one edge.
Neighborhood Message Passing
For a given heterogeneous network graph, the model aims to automatically learn node embeddings in the network topology, that is, characteristic representations corresponding to a function-mapping node, preserving the original topological features as much as possible, which can greatly facilitate DTI prediction. Suppose that the features of the node u are \({h}_{u}\in R\). Because of the features of the edges are usually high-dimensional, linear operations are introduced to reduce the dimension of the node features, that is concatenate \({h}_{u}\) and \({h}_{v}\), by linear layer to W⋅ (\({h}_{u}\left|\right|{h}_{v}\)) =\({W}_{u}{h}_{u} +{W}_{v}{h}_{v}\), \({W}_{u}\) and \({W}_{v}\) are the left and right halves of the linear matrix W, respectively. The process of calculating edge features is formula (1). Multiply the source node features \({h}_{u}\) with the edge features \({h}_{v}\) to obtain the message, then sum over all the message to update \({h}_{u}\), and multiply it by 2 to get the output feature \({h}_{v}{\prime }\).
$${h}_{v}{\prime }=2\text{*}\sum _{u\in {N}_{i}}({h}_{u}\text{*}({W}_{u}{h}_{u} +{W}_{v}{h}_{v})), \left(1\right)$$
Node Embedding Learning
As a special graph representation method, the GCN uses neural networks to encode the graph nodes, using the graph structure embedding as a computer-tractable vector matrix. Although the GCN is defined on the Fourier domain, it can also be understood on the spatial domain, namely the so-called message passing mechanism, or gathering information from the neighborhood and then updating the central node each time [31]. The \({H}^{\left(l\right)}\) is the input feature of each node in layer l, when l = 1, the original feature of the node. W is a linear transformation matrix, used as a convergence node feature dimension transformation (or as a map), \(\delta\) is an activation function, and A is adjacency matrix. The result for H · W is multiplied by the adjacency matrix A and is used to select the first-order neighbor nodes, which is essentially the information transmission among the neighbors. The adjacency matrix is renormalized, to express the adjacency matrix A as a constant c, to obtain the formula (2).
\({H}^{(l+1)}\) = \(\delta \left(\sum _{j\in {N}_{i}}\frac{1}{{c}_{i}}{W}^{\left(l\right)}{H}^{\left(l\right)}\right), \left(2\right)\)
In the formula (2), \({W}^{\left(l\right)}\) is shared by all edges, meaning that all edges are of the same type. However, the goal of this study is to make the link prediction of the potential DTI on the heterogeneous network, so in the heterogeneous graph G = (N, E), N represents the set of entities, and E represents the set of relationships, where drug entities are represented as d (d∈N), target protein entities as t (t∈N), and edges of the DTI relationship type are represented as \({r}_{dt}=(d,t)({r}_{dt}\in \mathbf{E})\). Therefore, formula (2) needs to be converted to formula (3) with R-GCN method. \({C}_{d},{r}_{dt}\) are regularized constants. \({C}_{d},{r}_{dt}\)=\(\left|{N}_{d}^{r}\right|\) means the number of neighbor nodes sets of node d in the condition of edge is \({r}_{dt}\). \(\delta\) is the nonlinear activation function ReLU (x) = max (0, x).
$${h}_{d}^{l+1}=\delta ({W}_{0}^{(l)}{h}_{d}^{(l)}+\sum _{{r}_{dt}ϵE}\sum _{tϵ{N}_{d}^{r}}\frac{1}{{C}_{d},{r}_{dt}}{W}_{t}^{\left(l\right)}{h}_{t}^{\left(l\right)}) \left(3\right)$$
Link Prediction and Negative Sampling
The basic idea of link prediction is to calculate the likelihood score of having a link \({y}_{d,t}\) by the required predicted node d and t representing vectors \({h}_{d}^{\left(l\right)}\) and \({h}_{t}^{\left(l\right)}\). Among them, \({h}_{d}^{\left(l\right)}\) and \({h}_{t}^{\left(l\right)}\) are respectively obtained from formula (3), and \({y}_{d,t}\) is calculated as formula (4). After passing the MLP linear layer, sigmoid smoothing process obtains the final score.
$${y}_{d,t} = sigmoid\left(W\text{*}\left({h}_{d}^{\left(l\right)}|\left|{h}_{t}^{\left(l\right)}\right)\right) \right(4)$$
$$Loss = {\sum }_{{t}_{i}∼Pn\left(t\right),i=\text{1,2},...,k}{max(\text{0,1}-{y}_{d,t}+{y}_{d,{t}_{i}})}^{2} \left(5\right)$$
The training involves comparing the difference in scores between two connected nodes and between any pair of nodes. Because of DTI prediction usually serves as a binary classification problem, with drug-target protein interactions with known interactions as positive examples, and unknown interactions as negative examples. Table 1 shows the generalized confusion matrix to evaluate the performance of classification models. For example, given an edge represented by (d, t), the score of the model from that edge is higher than the score between the nodes t' sampled by d from any noise distribution \({t}_{i} ∼Pn\left(t\right)\), namely \({y}_{d,t}\)>\({y}_{d,{t}_{i}}\), and negative sampling is required. The loss function is smoothed using a margin loss in formula (5), where k is the multiple of the negative sample sampling.
Table 1
The confusion matrix for performance evaluation.
|
Positive (Predicted)
|
Negative (Predicted)
|
Positive (Actual)
|
True Positive (TP)
|
False Negative (FN)
|
Negative (Actual)
|
False Positive (FP)
|
True Negative (TN)
|