Identification of Essential Proteins based on Non-negative Matrix Factorization

doi:10.21203/rs.3.rs-1237007/v1

Download PDF

Research Article

Identification of Essential Proteins based on Non-negative Matrix Factorization

https://doi.org/10.21203/rs.3.rs-1237007/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Identification of essential proteins will provide valuable information for medicine and other related disciplines, especially for the diagnosis and treatment of diseases and drug design. Various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.

Results

In this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential protein prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.

Conclusion

Employing the non-negative matrix factorization method and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential protein identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Matrix Factorization

Protein-protein interaction

Essential protein

Table 1 - Impact of the parameter λ on the performance of NDM

λ	Top 100	Top 200	Top 300	Top 400	Top 500	Top 600
0	0.78	0.77	0.74	0.72	0.67	0.63
0.1	0.88	0.83	0.76	0.73	0.69	0.64
0.2	0.93	0.86	0.79	0.75	0.7	0.65
0.3	0.93	0.88	0.81	0.75	0.71	0.66
0.4	0.92	0.88	0.82	0.76	0.7	0.67
0.5	0.93	0.87	0.82	0.75	0.7	0.66
0.6	0.9	0.88	0.83	0.76	0.7	0.66
0.7	0.9	0.87	0.81	0.76	0.7	0.66
0.8	0.9	0.85	0.78	0.74	0.69	0.64
0.9	0.86	0.77	0.7	0.69	0.67	0.63
1	0.49	0.52	0.51	0.52	0.5	0.49

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Identification of Essential Proteins based on Non-negative Matrix Factorization

Status:

Version 1

Abstract

Figures

Full Text

Tables

Additional Declarations

Status:

Version 1