Identification of essential proteins will provide valuable information for medicine and other related disciplines, especially for the diagnosis and treatment of diseases and drug design. Various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.
In this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential protein prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.
Employing the non-negative matrix factorization method and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential protein identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.