Model optimization
In this study, the output from RNN is directly connected to the output layer. We experimented with various number of fully connected hidden layers and observed no noticeable improvement in accuracy and only worsened overfitting problem. For the LSTM cell in the RNN network, we tried various numbers of units and found that 32 units produced best accuracy with the minimum number of weights.
Multi-class cross-entropy loss is used as the cost function. For the ADAM optimizer, a learning rate of 0.001 produced a satisfactory result. The training over the training data set was terminated at 20 epochs due to noticeable overfitting after that, and it took about 18 minutes on the Linux workstation we used.
Here we plot the cost and accuracy versus epoch for the window size of 19 as an example. In Fig. 2 (right), the cost dropped significantly during the first 3 epochs, and followed by a gradual decrease. Correspondingly, the accuracy increased dramatically during the first 3 epochs followed by a gradual change. After 8 epochs, the accuracy on the training data set kept increasing, but the one on the validation data set started to degrade, apparently due to over-fitting. The optimal accuracy on validation data set is 0.836 at 8th epoch (Fig. 2 (left)). The models with the best accuracy on the validation data set were saved and used to benchmark the testing data set.
The finally minimized cost and all benchmark metrics numbers for the three data sets of training, validation and testing are given in Table 1.
Table 1
Summary of benchmarks for Q3 prediction models.
Window size | 7 | 9 | 11 | 13 | 15 | 17 | 19 | 21 |
Optimal epochs | 17 | 11 | 18 | 20 | 8 | 7 | 8 | 13 |
Optimized cost | 3526.74 | 3146.50 | 2800.08 | 2561.26 | 2499.68 | 2411.42 | 2297.41 | 2149.44 |
Training data set accuracy | 0.781 | 0.808 | 0.826 | 0.845 | 0.840 | 0.846 | 0.848 | 0.859 |
Validation data set accuracy | 0.776 | 0.798 | 0.814 | 0.824 | 0.831 | 0.834 | 0.836 | 0.838 |
Test data set accuracy | 0.774 | 0.800 | 0.815 | 0.827 | 0.834 | 0.837 | 0.845 | 0.843 |
As shown in Table 1, LocalNet performs generally better as window sizes increase. For validation data set, the best accuracy is reached at the window size of 21, and for test data set, the best window size is 19. We tried to extend the window size further and observed no significant improvement on prediction accuracy or even degraded performance on the validation data set. This implies that protein secondary structures are mainly determined by local sequences; long range interaction, as claimed in several literatures, does not seem necessary to achieve a good prediction accuracy.
Performance on three states of helix, strand and coil
We measured the performance of the optimal model for window size of 19 residues on CASP11[31], CASP12 [32] and CASP13[33] datasets, which contain 105, 96, and 125 domain sequences, respectively. The performance of LocalNet is comparable among these four data sets (Fig. 3). Taken CASP11 as an example, the detailed prediction accuracies of Q3, H, E, and C are 85.0%, 92.6%, 82.2%, and 60.5%, respectively. Besides, the prediction accuracy of H is higher than 90% for CASP11, CASP12, CASP13, and Culled PDB.
Comparison of the recent predictors
We compared the performance between LocalNet and other state-of-the-art models on three independent datasets: CASP11, CASP12, and CASP13. All protein targets (template-based and free-modeling targets) were used to evaluate LocalNet and the results are listed in Table 2. For data sets of CASP11 and CASP13, LocalNet’s accuracy (85.0%) is comparable to those of DCRNN, MUFOLD-SS and Ensemble of Contextnet. For CASP12, LocalNet performs worse than these three top performers with Q3 accuracy of 80.5%, but it is still better than Spider3, RaptorX and DeepProf.
DCRNN used both a deep convolutional and recurrent neural network with multiscale CNNs and three layers of BGRU, and it is much more complicated than LocalNet. DCRNN’s input includes protein amino acid sequence, long-range contacts, sequence pattern, and other amino acid profiles. DCRNN’s performance on the three CASP data sets is only marginally better than LocalNet, which only used a single RNN module and local amino acid sequences.
In terms of input, both SPIDER3-single and our model are based upon amino acid sequences only. The LSTM-BRNN structure of SPIDER3-single is similar to SPIDER3, but the accuracy is significantly lower. The authors contribute the accuracy of 72.5% to using the whole protein sequence as input and capturing long-range interactions between residues. LocalNet, having a much simpler structure than LSTM-BRNN, achieved better accuracy, and the sliding window strategy may account for the enhanced accuracy. By using a short window of amino acid sequence instead of the entire protein sequence as input, we are able to generate a much larger number of samples to train LocalNet. Sufficient sample size is particularly crucial for deep learning models to extract the functional relationship between variables.
For each feature and each dataset, the best three scores are marked in bold. Models which implement RNN or LSTM algorithms are marked italic. Empty cells represent predictions that were not reported. The Q3 accuracy is taken from the papers. [22, 34–36]
Table 2
Comparison of PSSP models of Q3 accuracy on CASP11-13.
Method/Algorithm | CASP11 | CASP12 | CASP13 | year |
PSIPRED[18] | 80.7% | 79.2% | 80.7% | 1999 |
JPRED4[34] | 80.4% | 78.5% | -- | 2015 |
DCRNN[35] | 85.3% | -- | -- | 2016 |
SPIDER3[19] | 81.5% | 79.8% | 81.7% | 2017 |
MUFOLD-SS[20] | 85.2% | 83.4% | 79.6% | 2018 |
CRRNN[36] | 84.2% | 82.6% | -- | 2018 |
Porter5[21] | | -- | 82.9% | 2018 |
RaptorX[37] | 81.0% | 78.6% | 81.1% | 2018 |
DeepCNF[23] | 84.7% | 82.1% | 80.2% | 2019 |
Ensemble of Contextnet[24] | -- | 82.7% | 84.9% | 2019 |
NetSurfP-2.0 (hhblits)[38] | -- | 82.4% | -- | 2019 |
DeepProf[39] | -- | 76.4% | -- | 2019 |
Bi-LSTM ensemble[40] | 84.3% | -- | -- | 2019 |
DNSS2[41] | -- | -- | 82.2% | 2019 |
2DCNN-BLAST[26] | 81.5% | -- | -- | 2020 |
LocalNet | 85.0% | 80.5% | 85.0% | 2020 |
Time performance
The processing of a single protein took on average 0.027 s (0.003 s-1.2 s) on NetSurfP-2.0 [37]. For DeepSeqVec [36], the average running time for a single protein was 0.08 with a minimum of 0.006 for the batch containing the shortest sequences (67 residues on average) and a maximum of 14.5 s (9860 residues on average). The only processing LocalNet needs is to break protein sequences into continuous fragments, and it took less than a mini second even for proteins of 9860 residues.