Conformational Variability Assessment of the Mutation Sites for D614G, B.1.1.7, and B.1.351 using SSSCPreds

Complementary techniques for the analysis of mutation sites at exible regions, in which the position of atoms could not be determined by cryo-electron microscopy (Cryo-EM) such as the furin cleavage site of SARS-CoV-2, are necessary. The prediction data from SSSCPreds, a deep neural network-based prediction software of conformational exibility or rigidity in proteins, can give insight into the conformational variability of mutation sites. We nd that although the conformation of G614 is rigid, which is assigned as a left-handed (LH) α-helix-type one, that of D614 is exible without the hydrogen bonding latch to T859. The rigidity of glycine which stabilizes the local conformation more effectively than that of aspartic acid with the latch, thereby contributes to the reduction of S1 shedding and increase in infectivity. Further it is predicted that no other amino acid allows the same conformation and stability as the glycine mutation in D614. The individual mutations in B.1.1.7 and B.1.351 have a lower effect and are not comparable to the overwhelming effectiveness of the D614G mutation. SSSCPreds provides important conformational exibility insights into the deep neural network-based understanding of the current mutation sites and the potential for new ones in future.


Introduction
With the rapid expansion of the coronavirus disease 2019 (COVID-19) pandemic in 2020 1 , a rate of 23.12 substitutions per year for SARS-CoV-2 is currently observed, and the evidence of possible reinfection with SARS-CoV-2 has been shown 2 . The infection has spread through the process of natural selection so that predicting and tracking the impact of spontaneous mutations is necessary. Further, complementary techniques for the analysis of mutation sites at exible regions, in which the position of atoms could not be determined by cryo-electron microscopy (Cryo-EM) such as the furin cleavage site of SARS-CoV-2 3  Recently, we reported a deep neural network-based prediction program of conformational exibility or rigidity in proteins (SSSCPreds) 6 using supersecondary structure code (SSSC) [7][8][9] . The sequence exibility/rigidity map of SARS-CoV-2 RBD (receptor binding domain), obtained from SSSCPreds, resembles the sequence-to-phenotype maps of ACE2-binding (angiotensin-converting enzyme 2-binding) a nity and expression, which was experimentally obtained by the deep mutational scanning 10 . It suggests that the identical SSSC sequences among the ones predicted by three deep-neural-network-based systems correlate well to the sequences with both lower ACE2-binding a nity and lower expression.
The frequency of mutations increases with the exponential increase in the number of infected people.
The conformational exibility of the protein sequences is deeply related to the ease of infection, and the accurate prediction is very important to make a countermeasure of COVID-19. In this paper, we report the conformational variability assessment of the mutation sites for D614G, and those of further strains B.1.1.7 and B.1.351.

D614G mutation
As shown above, the D614G variant is now the dominant form worldwide 3 . Recently, Gobeil and coworkers described that Cryo-EM structures reveal altered RBD disposition; antigenicity and proteolysis experiments reveal structural changes and enhanced furin cleavage e ciency of the G614 variant 3 .
However, the underlying factor of why glycine, and not other amino acids, can induce the effective strain replacement has not been explained.
The sequence exibility/rigidity maps of all of the single amino acid mutations at the D614G mutation site using SSSCPreds indicate that only the mutation to glycine makes the other-type conformation ("T" conformation) rigid and reproduces the observed "T" conformations of Cryo-EM structures (Fig. 1a). On the other hand, although SSSCPred200 suggests the "T" conformation for D614, SSSCPred100 and SSSCPred predict the β-sheet-type conformations ("S" conformations). This means that the site of D614 is exible without the hydrogen bonding latch between D614 and T859 (Fig. 1b,c).
Both observed "T" conformations of Cryo-EM structures for D614 and G614 (Protein Data Bank [PDB] ID code 6XR8 and 6XS6) are the same conformation 11 , which is assigned as a left-handed (LH) α-helix-type one (Fig. 1b,c) 12 . In general, the LH α-helix is stabilized by only glycine because glycine does not have chirality. The rigidity of glycine which stabilizes the local conformation more effectively than that of aspartic acid with the latch, thereby contributes to the reduction of S1 shedding 13 (Fig. 2, Extended Data Fig. 1, and Although the K417N mutation site can also contact with ACE2, the SSSCPreds data suggest exibility of the nearby site (Extended Dataset 1 and Extended Data Fig. 4).
The mutations of HV69-70 deletion, Y144 deletion, and P681H for B.1.1.7 and D80A, D215G, and R246I for B.1.351 also correspond to the edge of exible regions, in which the position of atoms could not be determined (Fig. 2, Extended Dataset 1, and Extended Data Fig. 2). The SSSCPreds data of P681H mutation indicate the stabilization of the sequence CASYQT with identical β-sheet-type conformations before the furin cleavage site (Extended Data Fig. 3).

Discussion
In this study, the conformational variability of the mutation sites for D614G, B.1.1.7, and B.1.351 has been evaluated by using SSSCPreds. The overwhelming D614G mutation is rationalized by the more rigid conformation of glycine than that of aspartic acid, which is assigned as a LH α-helix-type one.

ONLINE METHODS
The FASTA-format les containing the amino acid sequences and SSSCs of protein subunits were obtained from the observed Protein Data Bank (PDB) les 15 by using the SSSCview program (available online at https://staff.aist.go.jp/izumi.h/SSSCPreds/index-e.html) 9 .
The original and mutation sequences of protein subunits were converted to the predicted SSSCs by using the SSSCPreds program (available online at https://staff.aist.go.jp/izumi.h/SSSCPreds/index-e.html) 6 .

Data availability
The reference models and the original amino acid sequences were downloaded from the PDB. The mutation sequences were obtained based on References 4 and 5.