Background: Some proposed methods for identifying essential proteins have better results by using
biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.
Results: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression.
Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7
Loading...
Posted 14 Dec, 2020
On 01 Mar, 2021
Received 26 Feb, 2021
On 27 Jan, 2021
Invitations sent on 24 Jan, 2021
On 24 Jan, 2021
Received 24 Jan, 2021
On 02 Dec, 2020
On 02 Dec, 2020
On 02 Dec, 2020
On 30 Sep, 2020
Received 09 Sep, 2020
Received 09 Sep, 2020
On 08 Sep, 2020
Invitations sent on 08 Sep, 2020
On 08 Sep, 2020
On 09 Aug, 2020
On 08 Aug, 2020
On 08 Aug, 2020
On 07 Aug, 2020
Posted 14 Dec, 2020
On 01 Mar, 2021
Received 26 Feb, 2021
On 27 Jan, 2021
Invitations sent on 24 Jan, 2021
On 24 Jan, 2021
Received 24 Jan, 2021
On 02 Dec, 2020
On 02 Dec, 2020
On 02 Dec, 2020
On 30 Sep, 2020
Received 09 Sep, 2020
Received 09 Sep, 2020
On 08 Sep, 2020
Invitations sent on 08 Sep, 2020
On 08 Sep, 2020
On 09 Aug, 2020
On 08 Aug, 2020
On 08 Aug, 2020
On 07 Aug, 2020
Background: Some proposed methods for identifying essential proteins have better results by using
biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.
Results: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression.
Conclusions: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7
Loading...