Antigen evolution from D614, to G614, and to Delta subtype of SARS-CoV-2


 Since 2019, the antigens from Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) are keeping in evolution from initial D614, to G614, to Delta, and to Omicron. Seven of the peptides from SARS-CoV-2 are analyzed. The finding is that, the rule or route for the evolution of their antigens are exactly from “rough” status to “precise”. In this paper, we would like to show the initial data how the antigens from initial D614, conducted evolution to G614, to Delta of SARS-CoV-2. This finding can help the development of reagents for detecting the both “rough” and “precise” antigens, or even help for the development of the vaccines against SARS-CoV-2. And finally, to help the control of epidemic covid-19. According to such rule or route, the “common precise” antigens of the SARS-CoV-2 can be designed in silicon, developed in laboratory, and confirmed in animals. The way should be very tough and long.


Introduction
Corona virus disease 2019  has broken out since 2019.
In December 2019, corona virus disease 2019 was rst reported in China (Zhou, Yang et al. 2020).
The epidemic, which was rst broken out in Wuhan, China, started on 12 December 2019, had caused 2,794 laboratory-con rmed infections. These infections had 80 deaths until 26 January 2020.
Two of the whole sequences of SARS-CoV-2, MN908947 (Wu, Zhao et al. 2020), MT007544 (Caly, Druce et al. 2020), were reported on March 18, 2020 and June 25, 2020, respectively. Both are D614. That means, from the CDS 21563 to 25384 of the whole genome, it is gene "S". This "S" gene's product is "surface glycoprotein", and a "structural protein". In this protein, the 614 th amino acid is "D". This D614 strain is the rst generation of SARS-CoV-2. The protein ID is "QHD43416".
Based on the 6244 global cases of covid-19, there were only 9.9% of G614 before March 1, 2020, but it rapidly increased to 54% on March 10, 2020 in only 10 days. And became to the domain strain (Korber, Fischer et al. 2020).
In the second stage of these 2 years, on June 21, 2020, World Health Organization (WHO) Situation Report recorded over 8.7 million COVID-19 cases and 460,000 deaths. The numbers are increased daily (Korber, Fischer et al. 2020).
On August 20, 2020 (Korber, Fischer et al. 2020) and October 8, 2020, the detailed sequencing data of G614 strain of SARS-CoV-2 was reported (Gobeil, Janowska et al. 2020). The protein IDs are "7KDK_A", "7LX5_B" (Pymm, Adair et al. 2021) and "7KEC_C". This is a trimer protein. The authors mentioned that, SARS-CoV-2 D614G is a 3 RBD down Spike Protein Trimer without the P986-P987 stabilizing mutations (S-GSAS-D614G). However, by analyzed in detail, the 7LX5-B is still in the non-mutated status, D614. This might not interfere the function of the other two units, already mutated to G614, in this trimer.
Six months later, until January 2021, there were 78 million people had been infected, and more than 1.7 million patients were dead (Gaebler, Wang et al. 2021). Nine folds more cases and 4 folds more deaths in the 6 months duration.
It is still in pandemic status after two years.
Normally, once a microbe infected to a person, in around 14 days, the generated IgG could protect the infected person. The gained immunity could against the infected microbe, and the outcome should be recovery. In the other end, the microbe could not be easily transmitted from one person to another. So, the epidemic status should also be controlled in 14 days.
Why the corona virus disease 2019 is already keeping epidemic for 2 years? The major reason should, at least, be the generated antibody, IgG, could not protect the host to get infect the new mutant strains of SARS-CoV-2, with brand new antigens.
The SARS-CoV-2, the pathogen of the corona virus disease 2019 is not stable for the antigens in its spikes. That means, it is keeping changing. In these two years period, the antigens of the SARS-CoV-2 are keeping shifting from D614, to G 614, Delta, and nally Omicron.
The epidemic data shows the G614 is the next generation of D614. As the consequence, the G614 had a higher infectivity (Zhang, Jackson et al. 2020) and a higher toxicity to human. It gained the stronger capability to survival under the human immunity.
There are too many papers mentioned such evolution of the antigens, from D614, to G614, and to Delta. But limited number papers demonstrated the reason why the D614 went to G614 evolution, and the G614 to Delta.
Once an antigen invades to human body, the immune system will generate a speci c antibody to against such antigen. When such antigen wants to invade human body again in a second time, it must do some mutation to perform evolution, otherwise will failed to enter the human body by the defense of established immune system speci c to against the non-mutant strain. Normally, one antigen matches one antibody. That is true, there exists many cross reactions between antigens and antibodies, but the e ciency might not as good as the original antigen-antibody pairs. The Antibody-Dependent Enhancement (ADE) might happen due to such cross reactions. The matching of antigen and antibody is very similar to the pairing of lock and key.
In the 1990s, at the University of Graz, Austria, one student owed one key to open the lock of his or her self's and did not be able to open any other locks. But their professor held one key to open all locks of his or her students. The professor's key is very similar to any of the students' keys, but the difference should be a little bit "smaller" than the student's keys. Why it is "smaller", but is not "bigger" than the students' keys? The smaller key can enter the space of the original students' keys, the bigger key cannot enter the space of the original keys.
Same rules might t the evolution of the antigens. The evolution antigen needs to be smaller and similar compared by its non-mutant original antigen. It is true at least in the case of D614 to G614, G614 to Delta. Because "G" is the smallest amino acid among all 20 amino acids, and for sure, "smaller" than "D". This paper will try to nd a regulation or a rule why the mutant was happened in the position of D614, but not in other position. Or if the other mutation happened, why the order was rst from D614 to G614, and second from G614 to Delta.
The amino acid sequences of SARS-CoV-2, the D614 and the G614 mutant are compared with the online software of Clustal Omega.
In the alignment result of Clustal Omega, all "G" amino acids are marked by the yellow color (Figure 1, 2, 3, 4).
The total amino acids and their molecular weights are listed in Table 1.
Seven candidates for potential mutant peptides were searched. The criterion is the author's hypothesis, the mutant is happened in the "bigger" amino acids. Compared with "G", all other 19 amino acids are "bigger" amino acids. That means, the non-"G" amino acids are potential part for mutation or evolution.
Thus, seven non-"G" fragments were selected.
From the longest, F718, 43 amino acids, to the size of D614 fragment, 37 amino acids, there are 3 candidates.
For compare the different between the D614 and G614, G614 was selected to analyze even it is a "G" contained fragment.
Such candidates were analyzed by their molecular weight of each amino acids. Their standard deviations (SD) are calculated by Excel.

Results
3.1 Con rmed mutated peptide G614, Delta, and other candidates of potential mutant peptides As showed in Figure 1, the candidates of mutant peptide are the arrowed fragments. One of them, is D614, TNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYST, 37 amino acids. This non-"G" fragment contains the D614 amino acid. This D614 already be con rmed to be mutated to G614. This potential mutated fragment is already become a con rmed mutated peptide.
The other longer non-"G" peptide is, N148, 38 amino acids, VYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE, showed in Figure 3. This fragment is also a con rmed peptide already mutated to Delta type, exactly on the base of G614 mutant.
The other similar peptide is, I358, 41 amino acids EVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCY, showed in Figure 1.
In the other hand, some of non-"G" peptide are shorter than 37 amino acids. F58, 36 amino acids, VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVS, showed in Figure 3.
Because the shorter peptides owed lower possibility to be mutated, only 3 of them are analyzed.
3.2 The molecular weight of con rmed mutated peptide D614, G614, Delta and other candidates of potential mutant peptides As showed in Table 2, the D614, peptide with 37 amino acids, has a mean molecular weight of 129.3, the standard deviation (SD) for the individual molecular weight of its amino acids is 25.23. This peptide does contain the biggest amino acid, Tryptophan (W), with molecular weight of 204.2262.
N148 has 38 amino acids, mean molecular weight is 140.1, SD is 26.92, with "W". Its SD is bigger than the SD of D614, 25.23, but smaller than any other SDs of "W" contained fragments. Delta subtype is already mutated in this peptide.
The other peptide's mean molecular weights, their SDs, and the status of contain "W" or not are also indicated in Table 2.
The details are as following.
I358 has 41 amino acids, mean molecular weight is 134.2, SD is 31.66, with "W". G614 has 37 amino acids, mean molecular weight is 127.7, SD is 26.74, with "W". Its SD is bigger than the SD of D614, 25.23.
Although F58, I1018 and D1138 contained shorter peptide than D614, the 37 amino acids peptide. They are selected as references and analyzed for comparing with their longer buddies.

Discussion
Like the pairs of keys and locks, the more precise the key, the more neness for their outlook structures.
For proteins or antigens, the SD of molecular weight can work as an indicator of its " neness". The bigger the SD, the more the " neness" or "precise". In the other way, the smaller the SD, the more the "roughness" or "rough".
D614 already mutated to G614. It's SD of the molecular weight is 25.23. The molecular weight for Aspartate (D) is 133.1032. For the smallest amino acid, Glycine (G), the molecular weight is 75.0669.
D614 is already mutated or evolution to G614, and its size is already gotten smaller. The SD for D614 is 25.23, it changed to 26.74 for G614. Thus, the structure of G614 is the result of an evolution, a more "precise" or complex status.
This evolution is essential for the virus. The rst generation of the virus, usually was in "rough" status, stimulated human host to generate "rough" antibody to against the virus invaders. The war will be end if the antibody killed the virus. The virus would not to accept such failing. To survival for its species, if possible, the virus has to evolution to a more "precise" status, like from D614 to mutate to G614. The rst generation of antibody would not be able to recognize the more "precise" or more complex antigens in the secondary generation of the virus.
Why the mutation was happed rst in the fragment of D614, but not in the longer fragments, F718, I 358, N148? And why N148 is the second mutated in Delta type after G614 mutation?
In these 3 longer fragments, the SD for I358 is 31.66, for N148, the SD is 26.92. Compared with the D614, both of their SD are bigger than 25.23. Such "precise" fragments would not have the reasons to mutate. And after the D614 mutated to G614, the N148, the second "rough" fragment will promote to the "roughest" fragment, this cause the Delta type mutation, from G614 (with non-mutated N148), to the Delta type with mutation G614, and mutated N148 (E156del, F157del, R158G).
The F718, with a SD of 20.69, even smaller than 25.23. Why this "rough" fragment did not mutate before D614? The possible reason might be the distribution of the amino acids owing some bias. It does not contain any of the biggest amino acid, Tryptophan (W). If any peptide did not contain "W", it might not trend to be mutated in the very beginning.
To exclude the non-"W" fragments as a potential candidate for the rst order evolution, is a concern of statistic bias to interfere the SD calculation.
The tryptophan does have its own biochemical functions, one of them is that, it can be translated from a "stop codon" in mRNA. Without stop codon, the protein chain would be extended. It was reported that he TGA codon in Spiroplasma is for tryptophan, instead of a stop signal in other species (Meng, Gu et al. 2010).
In our knowledge, Spiroplasma species can generate non-speci c super allergic reaction in the human body. For this reason, tryptophan may play some roles for the virus evolution.
Tryptophan its self is also an in ammatory mediator. The potential evolution might not be happened rst in the non-"W" fragments.
Sum up, like D614 mutated to G614, G614 mutated to Delta, the SARS-CoV-2 might start conduct its evolution in the non-"G", fragments, contains "W" amino acid, with the "roughest" status. The end point for the evolution should be the "precise" status. The effect of evolution is dose dependent, with the dose of "roughness". The rst evolution was happened from D614 to G614, consisted with the smallest SD, 25.23, with "W" in D614. The second evolution was happed from G614 to Delta type, also consisted with the second smallest SD, 26.92, with "W" in G614. The Delta mutation is on the base of G614 mutation.
The mutations in the Delta and Omicron types of SARS-CoV-2 are also follow such rules. We would like to discuss that in other papers.
This nding can help the development of reagents for detecting the both "rough" and "precise" antigens, or even help for the development of the vaccines against SARS-CoV-2. And nally, to help the control of epidemic covid-19.
According to such rule or route, the "common precise" antigens of the SARS-CoV-2 can be designed in silicon, developed in wet laboratory, and con rmed in animals. The way should be very tough and long.

Declarations Author Contributions
Peijun Zuo searched for the information, performed the analysis and wrote the paper. Professor Dr. Liping Li provided the key advices.  Tables   Tables 1 and 2 are available in the supplementary les section.

Figures
Omega, all "G" amino acids are marked by the yellow color. I358 and D614 are indicated by the arrows. G614 is in the same position of D614 but mutated in the fragments of "7KDK_A" and "7KEC_C".

Figure 2
The amino acids of F718 and I1018. I1018 are indicated by the arrows.

Figure 3
The amino acids of F58 and N148.
In the alignment result of Clustal Omega, all "G" amino acids are marked by the yellow color. F58 and N148 are indicated by the arrows. Delta mutant, E156del, F157del, R158G is already happened in N148, after the G614 mutant.

Figure 4
The amino acids of D1138.
In the alignment result of Clustal Omega, all "G" amino acids are marked by the yellow color. D1138 is indicated by the arrow.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.