Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the '80s various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by Machine Learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in protein files. Our model develops a multi-class classifier program named DLFSA for assigning protein Secondary Structure Elements(SSE) using Convolutional Neural Networks(CNN). A fast and efficient GPU based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. Our model uses only Cα coordinates for secondary structure assignments. The model is successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10
The full text of this article is available to read as a PDF.
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
Posted 09 Mar, 2021
Received 03 Mar, 2021
Invitations sent on 02 Mar, 2021
On 21 Jan, 2021
Posted 09 Mar, 2021
Received 03 Mar, 2021
Invitations sent on 02 Mar, 2021
On 21 Jan, 2021
Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the '80s various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by Machine Learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in protein files. Our model develops a multi-class classifier program named DLFSA for assigning protein Secondary Structure Elements(SSE) using Convolutional Neural Networks(CNN). A fast and efficient GPU based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. Our model uses only Cα coordinates for secondary structure assignments. The model is successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10
The full text of this article is available to read as a PDF.
This is a list of supplementary files associated with this preprint. Click to download.
Loading...