3.1 Predicted Secondary Structure and Predicted Solvent Accessibility
Initially, the I-TASSER recognizes basic templates from the PDB by multiple threading approach LOMETS, with full-length atomic models produced by iterative fragment assembly simulations based on templates. Function insights of the targeted molecule are then obtained by rethreading the three-dimensional models via the BioLiP database of protein functions.
Figure 1 shows the first part of the predicted secondary structure of the SARS-CoV–2 spike glycoprotein tested in Jordan defined as (H) Helix, (S) Strand and (C) Coil, in addition to the predicted accessibility of the solvent within a value range from zero (lowest accessible) to nine (highest accessible).
3.2 Predicted normalized B-factor
Figure 2 shows the B-factor, which is a value indicating the extent of inherent residue/atomic thermal mobility in proteins. In I-TASSER, in conjunction with sequence profiles obtained from sequence databases, this value is deduced from the PDB threading template proteins. The B-factor profile described in the figure below corresponds to the target protein’s normalized B-factor, as determined by B=(B’-u)/s.
3.3 Top Ten threading templates used by I-TASSER
I-TASSER modeling starts from the PDB library structure templates, which LOMETS identifies. LOMETS is a meta-server threading approach with multiple threading programs, where each threading program can create tens of thousands of template alignments. I-TASSER uses only the most important models in the threading alignments, the value of which is determined by the Z-score, i.e., the difference between the raw and the average scores in the standard deviation unit. The templates in Figure 3 are the ten best templates from the LOMETS threading programs chosen. Typically, a prototype with the highest Z-value is chosen for each threading program, where the threading programs are sorted according to the average efficiency of the large-scale tests.
In Figure 3, all remaining residues are colored in black; the color is therefore given to those residues that are the same as the residue in the sequence of the request. The coloring mechanism is based on the property of amino acids, which are vividly colored by polar while dark shaded non-polar residues. The rank of templates lists the top 10 thread templates used by I-TASSER. Ident1 is the template sequence percentage identity in the area that is aligned to the query sequence of the thread. Ident2 is the sequence identity percentage for the entire query sequence template chains. Cov represents the alignment coverage and is proportional to the number of aligned residues divided by the query protein frequency. Norm. Z mark is the threading alignment’s uniform Z symbol. Aligning to the standardized Z-point>1 is good alignment and vice versa. The top 10 alignments reported above (in order of their ranking) are from the following threading programs:
1: MUSTER 2: FFAS-3D 3: SPARKS-X 4: HHSEARCH2 5: HHSEARCH I 6: Neff-PPAS
7: HHSEARCH 8: pGenTHREADER 9: PROSPECT2 10: PRC [12].
3.4 Top five final models predicted by I-TASSER
For each target, an extensive collection of structural conformations is generated by I-TASSER simulations called decoys. I-TASSER uses the SPICKER to cluster all architectural structures based on the pair-sided similarity and records up to 5 models corresponding to the five largest structural clusters. The reliability of each model is evaluated quantitatively by a C-score based on the value of threaded prototype alignments and the parameters of convergence of structural mounting simulations. C-score is usually [–5,2], where a higher-value C-score means a more positive and vice versa scale.
Following the association observed between these attributes, the TM-score and RMSD are calculated using the C and the protein frequency. Since the group size classes the top 5 models, in some situations, a higher C-score is possible for the lower-ranking models. While the first model is better in most cases, lower-level models can also be better than higher-level models as seen in our research. If the I-TASSER simulations converge, less than 5 clusters can have been generated; it usually shows that because of the converged simulations, the models have good quality (Figure 4). The top five proteins structurally close to the spike glycoprotein in the Protein Data Bank (as identified by TM-align) are listed in Table 1. In Table 2 the top five hits of closest Enzyme Commission (EC) numbers and active sites are listed.
Protein rankings are based on the structural alignment TM score in the PDB library between the query template and known structures. RMSDa the RMSD among structurally aligned residues of TM-align; IDENa is the structurally related region’s percentage sequence identity; Cov reflects the alignment range of the TM-alignment and is proportional to the sum by the length of query protein of structurally aligned residues. 5x58A: Prefusion structure of SARS-CoV spike glycoprotein, conformation 1 (viral protein); 6nzkA: Structural basis for human coronavirus attachment to sialic acid receptors (viral protein); 3aoiM: RNA polymerase-Gfh1 complex (Crystal type 2), (transcription, transferase/DNA/RNA); 1ileA: Isoleucyl-tRNA synthetase (aminoacyl-tRNA synthetase); 1ug9A: Crystal Structure of Glucodextranase from Arthrobacter globiformis I42 (hydrolase).
1ileA: Isoleucyl-tRNA synthetase (aminoacyl-tRNA synthetase): 1k32A: Crystal structure of the tricorn protease (hydrolase); 3eqlM: Crystal structure of the T. Thermophilus RNA polymerase holoenzyme in complex with antibiotic myxopyronin (transferase); 2pdaA: Crystal structure of the complex between pyruvate-ferredoxin oxidoreductase from Desulfovibrio africanus and pyruvate (oxidoreductase); 1ej6A: Reovirus core (virus).
One powerful way of multiple secuence allihnemnt is the Multiple Alignment using Fast Fourier Transform (MAFFT) as shown in Figure 5 (a&b) below [8].