Prediction of Protein-Protein Interactions Based On Weighted Extreme Learning Machine And Speed Up Robot Features

: Background ： Protein–protein interactions (PPIs) are involved in a number of cellular processes and play a key role inside cells. The prediction of PPIs is an important task towards the understanding of many bioinformatics functions and applications, such as predicting protein functions, gene - disease associations and disease - drug associations. Given that high - throughput methods are expensive and time - consuming, it is a challenging task to develop efficient and accurate computational methods for predicting PPIs .


Background
As one of the most fundamental elements in living organisms, proteins make important contributions in nearly all fundamental biological processes in the cell. A large number of studies have shown that Protein-protein interactions (PPIs) play a variety of key roles in understanding functional properties of proteins and their potentials as biomarkers. More and more evidences have indicated that knowledge of PPIs can provide certain help for better understanding of the molecular mechanisms involved in biological activity, the regulation of protein function, and the underlying disease mechanisms of cellular and genetic. Although many high throughput methods, including the yeast 2-hybrid system, protein chips [1][2][3][4], and immunoprecipitation [5,6], are typically used to identify PPIs, experimental methods for identifying PPIs are expensive and time-consuming and suffer from high rates of false positives and false negatives [7][8][9][10]. Thus, an increasing number of studies have focused on computational approaches to identify PPIs [11][12][13].
A number of methods are limited due to the difficulty in computing and dependence on a large number of homologous proteins [14][15][16]. Therefore, it is very important to identify PPIs by exploiting efficient computational approaches based on protein sequence information [17][18][19].
As always, a large number of researches have been conducted to predict PPIs by developing highly effective computational methods. You et. al [20]proposed a new Multi-scale Local Descriptor (MLD) feature extraction method based on protein sequence and used the Random Forest (RF) to carry out classification. The MLD can capture multi-scale local information and RF is an ensemble learning approach. Huang et. al [21] proposed a new computational method called WSRC-GE that combined weighted sparse representation (WSRC) with global coding (GE) for PPIs prediction. Wang et. al [22] presented a new computational method through combining Discrete Cosine Transform (DCT) feature extraction method with ensemble Rotation Forest (RF) classifier for PPIs prediction. An et. al [23] proposed a computational model called MKRVM-GWO that is a classification algorithm of multi-kernel RVM based on gray Wolf optimization. In order to capture the information of protein interaction, the proposed method takes full account of the local and global characteristics of protein-protein interactions position, which achieves good experimental results. Zhang et. al [24] proposed a new computational prediction model, which combined Random Tree with Genetic Algorithm to predict PPIs based on protein sequence. The prediction model obtained good prediction results. Yang et al [25] used the k-nearest neighbors to carry out classification and employed Local descriptors to extract feature from protein sequence. Guo et. al [26] presented a novel computational model called SVM-AC, which used Auto-correlation to generate feature vectors based on protein sequence and employed SVM classifier to predict PPIs. An et. al [27] proposed a new feature extraction method that can capture the continuous and discontinuous information of protein-protein interaction by using the PSSM matrix coding of local protein sequence. A number of key features can be integrated by using serial multi-feature Fusion. The above methods can explore the correlational information between protein pairs, such as, coevolution, co-localization and co-expression. [28][29][30]. It is highly urgent at present to develop efficient computational approaches so as to further improve the accuracy of PPIs prediction .
In the study, a novel computational method named WELM-SURF was developed to predict PPIs. The proposed method used Position Specific Scoring Matrix (PSSM) to capture protein evolutionary information and employed Speed Up Robot Features (SURF) to extract key feature from PSSM of protein sequence. The Weighted Extreme Learning Machine (WELM) is featured with short training time and great ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, WELM classifier was used to carry out classification. The cross-validation results showed that the WELM-SURF obtained 97.36% and 95.12% of average accuracy on yeast and human dataset, respectively. The prediction ability of WELM-SURF was also compared with those of ELM-SRUF, SVM-SURF and other existing approaches. The comparison results further verify that WELM-SURF is obviously better than other methods.

Datasets
In this study, yeast and human datasets were used as experimental datasets for evaluating the proposed method. These datasets can be obtained from the publicly available Database of Interacting Proteins (DIP) [31]. The protein sequence contained in yeast and human need to be cleaned for better executing the proposed method. The cleaning strategies are as follows: (1) The protein sequences whose length was less than 50 residues were removed. (2) In order to eliminate bias of homologous protein sequence, the protein sequences whose length was equal or larger than 40% were considered to be homologous and thus also removed. By using the strategy, the yeast and human experimental dataset were created. In total, the yeast contained 5594 negative protein pairs and 5594 positive protein pairs. Similarly, the human contained 4262 negative protein pairs and 3899 positive protein pairs. Consequently, the experimental datasets of yeast and human are created, which contains 11188 protein pairs and 8161 protein pairs in total.

Position Specific Scoring Matrix (PSSM)
In the paper, we used Position Specific Scoring Matrix (PSSM) to extract evolution information contained protein sequence. It is because of PSSM contains not only evolution information of protein sequence, but also the position information. In the experiment, Position Specific Iterated BLAST (PSI-BLAST) tool [32] is used to transform each protein sequence into a PSSM matrix. Figure 1 shows the schematic of a PSSM. Figure 1 the schematic of a PSSM Where 20 represent 20 amino acids, L represents the length of a given sequence, and represents the score of the ℎ amino acid in the ℎ position for the query sequence. The > 0, < 0 or = 0. If > 0, it indicated that the ℎ amino acid is easily mutated into the ℎ amino acid during the evolution process, and a larger value indicates a higher mutation probability. Conversely, if < 0, the position is conservative and the probability of mutation is small. Smaller are more conservative. For obtaining highly and widely homologous sequences, the parameter e-value of PSI_BLAST tool was set to 0.001 and three iterations were selected.

Speed up robot features (SURF)
Speeded Up Robust Features (SURF) is the improvement of Scale Invariant Feature Transform (SIFT). In terms of algorithm execution efficiency, it runs faster than SIFT algorithm. In SIFT, Lowe uses Gaussian difference to approximate the Laplace Gaussian distribution to find the scale space. In contrast, SURF uses Box Filter to approach LoG. A major advantage of this approximation is that it is easier to compute the convolution with the box filter using the integral image, which can be done in parallel at different scales. SURF algorithm depends on the determinant and position of Hessian matrix and consists of the following two steps: feature point detection and feature proximity description..

1) Feature Point Detection
SIFT algorithm uses continuous Gaussian filters of different scales to process the image and to detect the invariant feature points through the Gaussian difference. Instead, the square filter is used in SURF to replace the Gaussian filter used in SIFT so as to achieve the Gaussian approximation. The filter can be expressed as: The square filter can greatly improve the computation speed through using integral graph that only calculates the value the four corners of the square filter. The determinant value of hessian matrix represents the change around pixel points. Since SURF USES hessian matrix of spot detection to identify feature point whose value should be defined as the maximum or minimum value of determinant. In addition, in order to achieve scale invariance, SURF also USES the determinant of scale σ to carry out detection of feature point. For example, Given a point p=(x, y) in the graph, the Hessian matrix of scale σ is can be represented as follows: Where the L xx (p, σ) , L xy (p, σ), L xy (p, σ) and L yy (p, σ) are the gray-order image after the second order differentiation. The SCALE of SURF isn't continuous Gaussian ambiguity and down sampling processing. On the contrary, it is determined by the size of square filters. The lowest scale (initial scale) of square filter of is 9 × 9, which is approximately σ =1.2 Gaussian filter. The size of the upper scale filter will get larger and larger, such as15 × 15,21 × 21, 27 ×

…
The transformation formula of its scale is as follows: The feature point descriptor of SURF algorithm uses the concept of Haar wavelet transform. In order to keep the feature point rotation invariant, it is necessary to give the feature point a direction. The descriptors of SURF calculate Haar wavelet transform along the X-and Y-direction within a radius dimension of 60 pixels around the feature points. The wavelet response obtained is weighted by a Gaussian function centered on the feature points. The x and y components of each wavelet response in the interval are added together to obtain a vector. Among all vectors, the longest (i.e., the one with the largest x and y components) is defined as the direction of this feature point. After the direction of the feature point is set, the pixels around it need to be based on this direction to establish the descriptor. Every 5*5 pixel points are taken as a sub-region. A total of 16 sub-regions are set within the range of 20*20 pixel points around the feature points, and the x and y directions within the sub-regions are calculated. It can be seen from the above process that since the Haar feature of 400 pixels (20×20) around each key point needs to be counted in this process, the dimensions of its descriptor vector would be 1600 (400×4) dimensions in total .Finally, a feature vector with dimensional 64 can be generated.
In the paper, we assumed that each PSSM is an image matrix. As a result, SURF feature extraction method was used to generate feature vectors and its dimensional is 64. The technology roadmap of the proposed method is shown in Figure 2.

Weighted Extreme Learning Machine (WELM)
In consideration of not all samples class is evenly distributed, as a result, how to efficiently execute classification for class samples is a challenge task. Therefore, in order to solve the problem of samples classification, Weighted Extreme Learning Machine (WELM) was put forward by Zong et al [33] to solve the problem of unbalanced data classification based on Extreme Learning Machine (ELM). For the classification for PPIs datasets, we also build the WELM model based on ELM for predicting PPIs. The network structure of ELM is shown in Figure 3.
The output function of ELM can be defined as follow: WELM has two weighting strategies [34], one is automatic weighting and can be defined as follow: Where Count(t i ) represents the number of class t in the training sample. The other sacrifices the classification accuracy of the majority class for obtaining the classification accuracy of the minority class. This splits the minority class and the majority class into 0.618: 1(golden ratio) and is defined as follow: The output weight of WELM hidden layer can be represented as follow: Where the weighting matrix is a × diagonal matrix, and the diagonal elements correspond to N samples. Different weights are assigned to different sample classes, and the weighting weights of the same class are the same.
This algorithm retains the advantages of simple and easy implementation of mapping function or kernel function and can be directly used for multi-category classification problems. Based on cost sensitive learning idea, WELM assigns weights to each training sample, and accordingly N weights form an N×N diagonal matrix. In general, if the training sample comes from a minority class, a relatively large weight is to be assigned to it; otherwise a relatively small weight is assigned to the training sample if it comes from a majority class. By the method of weighting, the influence of minority classes on the classification results can be enhanced and that of majority classes can be weakened correspondingly. The advantage of the WELM lies in that the link matrix ω is introduced on the basis of the extreme learning machine, which can set different weighting system for each sample that needs to be classified, so that the sample can obtain the corresponding output weight of hidden layer. Compared with BP neural network and support vector machine, WELM has the advantages of fast training speed, simple parameter setting and strong generalization ability on the premise of guaranteeing classification performance.
The WELM has the advantage of short training time and good generalization ability and can efficiently execute classification for class samples by optimizing the loss function of weight matrix. In consideration of PPIs dataset is very class samples, so we used WELM model to predict PPIs and adopt the automatic weighting strategy in the study. The prediction flowchart of WELM-SURF model is displayed in Figure 4.

Performance Evaluation
In the study, the following measures were employed to assess the performance of WELM-SURF. represents Sensitivity, is specificity ， represents Precision, and MCC is Matthews's correlation coefficient. In addition, Receiver Operating Curve (ROC) was used to further evaluate the performance of WELM-SURF in the experiment.

Performance of the proposed WELM-SURF model
In this study, based on computational methods, a prediction model called WELM-SURF was proposed to predict PPIs. It used WELM to execute classification and employed SURF to generate high efficiency features. Above all, the performance of WELM-SURF was evaluated on benchmark datasets. Generally, overfitting would affect the experimental results. Therefore, the whole datasets were divided into the training datasets and independent test datasets for preventing overfitting. Specifically, the human dataset was randomly split into 5 equal parts, four parts of which were selected as the training set and the remaining as independent test dataset. The yeast dataset was also processed with the same strategy. Meanwhile, to evaluate the ability of WELM-SURF to predict PPIs, the WELM-SURF was carried out on yeast and human dataset under five-fold cross-validation. For fair comparison, several parameters of the WELM were optimized through the grid search for ensuring fairness. To be specific, the number of Hidden layers was set to 3000, C was set to be 200, and the default values were adopted for other parameters. Table 1-2 show the results of five-fold cross-validation of WELM-SURF model on yeast and human dataset, respectively.
As can be seen from Table 1, under five-fold cross-validation ,the proposed WELM-SURF performs an average accuracy of 97.36 %, an average TPR of 96.69%, an average PPV of 97.04% and an average MCC of 92.23%. Similarly, another promising finding from Table 2 is that the WELM-SURF also achieves better prediction results on human dataset, whose average accuracy, average TPR, average PPR, and average MCC are 95.12%, 93.80%, 91.64% and 90.97% respectively. The prediction results demonstrate that the proposed WELM-SURF model is suitable for PPIs prediction.
The good experimental results for PPIs prediction can mainly be attributed to the SURF feature extraction method and WELM classifier. The main advantage of the WELM-SURF model is that SURF method can extract key evaluation feature from PSSM and WELM classifier has the strong classification ability of class samples. As discussed, this is mainly due to the following three reasons: (1) PSSM contains not only the position information, but also the evolution information of protein sequence. In addition, it also retains plenty of prior information. This makes it possible to provide certain help in extracting the sequence evolutionary information. (2) SURF uses the concept of "scale space" to capture features at multiple scale levels, which not only increases the number of available features but also makes the method highly tolerant to scale changes. This makes it possible to capture protein-protein interaction information and extract high efficiency features from PSSM. (3) The WELM has the advantage of short training time and good generalization ability and thus can efficiently execute classification by optimizing the loss function of weight matrix. Therefore, WELM is used to carry out classification and performs much better for identifying PPIs in the study. More specifically, the WELM can make the information of class distribution well perceived by assigning larger weight to the minority class samples and push the separating boundary from the minority class towards the majority class through using weight strategy. In this sense, it can provide an advantage for sensitive learning by assigning different weight. As is shown, the results demonstrate two things: First, SURF method is very suitable for extracting protein sequence feature; Second, the WELM classifier performs well for predicting PPIs, rendering good results.

Comparison with the ELM-based and SVM-based Methods
Experimental results demonstrate that the WELM-SURF model can accurately and efficiently predict PPIs and thus obtain good prediction results. However, to demonstrate the performance improvement of the WELM-SURF model, WELM performance was compared with that of Extreme Learning Machine (ELM) classifier and the Support Vector Machine (SVM) classifier by using the same SURF approach on yeast and human datasets, respectively. In order to ensure fair comparison, several parameter settings of ELM were optimized by means of grid search approach. Specifically, the number of hidden layers of ELM was set to 126 and other parameters would take the default value. Similarly, by using the same strategy as described above, the RBF kernel parameters of the SVM were optimized, where c was 0.3 and g was 5.2 and other parameters took the default value. In addition, SVM classifier used the LIBSVM tool [35] to carry out classification. Table 3-6 below show the results of five-fold cross-validation of ELM-SURF and SVM-SURF on yeast and human dataset, respectively. Meanwhile, the comparison of ROC Curves on yeast and human dataset between WELM, ELM and SVM are shown in Figure 5-6 below. As outlined in Table 3-4, the ELM-SURF model achieved 94.04% average accuracy and the SVM-SURF model obtained 91.79% average accuracy on yeast dataset. Similarly, as can be seen from Table 5-6, the results of average accuracy 92.04% and 89.58% are obtained by the ELM-SURF model and the SVM-SURF model on human dataset, respectively. When comparing the results with those of ELM-SURF and SVM-SURF, it must be pointed out that the performance of WELM classifier is significantly better than the other two classifiers. At the same time, from Figure 5 and Figure 6, the ROC curves of WELM classifier are also significantly better than that of the other two classifiers. A major reason for good prediction results is that the WELM has the advantage of short training time and good generalization ability and can efficiently execute classification for class samples by optimizing the loss function of weight matrix. Specifically, it can make the information of class distribution well perceived by assigning larger weight to the minority class samples and pushing the separating boundary from the minority class towards the majority class by means of weight strategy. From the above analysis, this paper comes to the conclusion that the proposed WELM-SURF model is useful tools for PPIs prediction, as well as other bioinformatics tasks.

Comparison with Other Methods
To further validate the prediction ability of the WELM-SURF model, the comparison results between WELM-SURF model and the previous methods on yeast and human dataset were displayed in Table7-8. As can be seen from Table 7, the average accuracy of WELM-SURF is obviously higher than those of the other six approaches on yeast dataset. Similarity, Table 8 displays the prediction accuracy obtained by WELM-SURF model is also significantly better than those of the other six methods on human dataset. A similar conclusion could be reached by comparing the results from Table 7-8 that the proposed WELM-SURF model has an excellent prediction capability and can be used for the quality prediction of PPIs.

Conclusion
In the study, a new computational method called WELM-SURF was put forward for PPIs prediction, which combines the Weighted Extreme Learning Machine (WELM) with Speed up robot features (SURF) to predict PPIs based on protein evolutionary information. By comparing with experimental results, the performance of WELM-SURF is significantly better than those of the ELM, SVM and other previous methods in the domain. The excellent performance of WELM-SURF mainly attributes to the following several important factors: (1) PSSM contains not only the position information, but also the evolution information of protein sequence. In addition, it also retains plenty of prior information. This makes it possible to provide certain help in extracting the sequence evolutionary information. (2) SURF uses the concept of "scale space" to capture features at multiple scale levels, which not only increases the number of available features but also makes the method highly tolerant to scale changes. This makes it possible to capture protein-protein interaction information and extract high efficiency features from PSSM. (3) The WELM has the advantage of short training time and good generalization ability and can efficiently execute classification by optimizing the loss function of weight matrix. The WELM can make the information of class distribution well perceived by assigning larger weight to the minority class samples; meanwhile it pushes the separating boundary from the minority class towards the