MicroRNAs (miRNAs) are a kind of endogenous non-coding RNA with a length of ~ 22nt, which regulates the expression of target mRNAs by controlling the expression of target genes through sequence complementary pairing [1]. The sequence of miRNA is very short, and it is only expressed in specific tissues or cells at specific stages, so miRNAs are not well known to people before and usually called dark matter in life [2]. In 1993, Lee et al. [3] identified the first miRNA gene, lin-4, in Caenorhabditis elegans. Since then, numerous studies have shown that miRNAs play an important role in life processes, including cell metabolism, proliferation, apoptosis, and development [4–8]. Besides, miRNAs are also involved in the occurrence and development of many human diseases, such as prostatic neoplasms, breast neoplasms, and so on [9–11]. Therefore, identifying the potential miRNA-disease associations is crucial in the research and treatment of human diseases. Traditional experimental methods have high accuracy in predicting the miRNA-disease associations, but such methods are often limited to the disadvantages of small scale, high time-consuming and cost. Hence, using computational methods to predict the potential associations has gradually attracted more and more researchers.
In the past few years, there are many computational methods have been developed to predict the miRNA-disease associations. For example, Chen et al. [12] developed a model named RBMMMDA, which utilizing the restricted Boltzmann machine to predict multi-type associations between miRNAs and diseases. This method can not only discover new potential associations between miRNAs and diseases but also indicate the corresponding association types. Chen et al. [13] proposed a novel method based on heterogeneous graph inference (HGIMDA). This approach takes advantage of the miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and known miRNA-disease associations. It breaks through the limitations of traditional methods and can be used for new miRNAs and diseases without any known associations. You et al. [14] constructed a heterogeneous graph and utilized the depth-first search algorithm (PBMDA). Compared with other previous models, this method has better reliability and accuracy. Chen et al. [15] proposed a new method of within and between score, named WBSMDA. This method can be used for diseases without any known related miRNAs. Wang et al. [16] proposed a method of the logistic model tree (LMTRDA) by combining miRNA sequence information, miRNA functional similarity, and disease semantic similarity. Li et al. [17] designed a novel method (MCMDA) for the prediction of potential miRNA-disease associations by updating the known association adjacency matrix. Zheng et al. [18] developed a prediction model based on the machine learning method. This model combines Gaussian interaction spectrum kernel similarity information, disease semantic similarity, and miRNA functional similarity and sequence information. Furthermore, it respectively utilizes the auto-encoder neural network (AE) and random forest for feature extraction and training. Zheng et al. [19] developed a novel model based on the distance sequence similarity method (DBMDA). This method utilizes the regional distance to calculate the global similarity and is implemented through a chaotic game representation algorithm based on miRNA sequences, which provides a new idea for the field of miRNA-disease prediction.
At present, most existing state-of-the-art algorithms only make use of the single known miRNA-disease associations for potential miRNA-disease association prediction. However, diseases are mainly caused by the disturbance of a complex of interacting multiple biomolecules, rather than the abnormity of a single biomolecule. In addition, the functionally dependent molecular components in human cells form a complex biological network, in which proteins are an important part of human tissues and cells. The protein-miRNA associations and protein-disease associations have been confirmed by many previous experiments [20–22]. Therefore, we proposed a novel method to predict the miRNA-disease associations based on the miRNA-protein-disease network and the GraRep network embedding method (NEMPD). More specifically, we firstly constructed and comprehensively analyzed a tripartite miRNA-protein-disease network by integrating the miRNA-protein and protein-disease associations (see Fig. 1). Secondly, the network representation method can be used to get the embedding representation of nodes from the network while maintaining the network property. In recent years, network embedding methods such as LINE [23], DeepWalk [24] and so on, have been applied to several bioinformatics problems and have good performance. In this article, we choose the GraRep [25] method to learn the associations with proteins (behavior information) of miRNAs and diseases. Thirdly, the behavior information of miRNAs and diseases is combined with their own attribute information (disease semantic similarity and miRNA sequence information) to represent the 16427 known miRNA-disease pairs downloaded from HMDD [26] database. Finally, the Random Forest classifier was utilized to train the converted miRNA-disease feature pairs. The pipeline of NEMPD is shown in Fig. 2. In the experimental results, under five-fold cross-validation, the average AUC and AUPR of NEMPD is respectively 0.9233 and 0.9301. Furthermore, we measured the performance of NEMPD with different feature combinations and classifiers. Besides, in order to further test the performance of NEMPD, we conducted case studies of three major human diseases. All the results demonstrate that NEMPD has a good performance and can be used as a reliable model in the field of miRNA-disease association prediction.