Omicron BA.2 (B.1.1.529.2): high potential to becoming the next dominating variant

The Omicron variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has rapidly replaced the Delta variant as a dominating SARS-CoV-2 variant because of natural selection, which favors the variant with higher infectivity and stronger vaccine breakthrough ability. Omicron has three lineages or subvariants, BA.1 (B.1.1.529.1), BA.2 (B.1.1.529.2), and BA.3 (B.1.1.529.3). Among them, BA.1 is the currently prevailing subvariant. BA.2 shares 32 mutations with BA.1 but has 28 distinct ones. BA.3 shares most of its mutations with BA.1 and BA.2 except for one. BA.2 is found to be able to alarmingly reinfect patients originally infected by Omicron BA.1. An important question is whether BA.2 or BA.3 will become a new dominating “variant of concern”. Currently, no experimental data has been reported about BA.2 and BA.3. We construct a novel algebraic topology-based deep learning model trained with tens of thousands of mutational and deep mutational data to systematically evaluate BA.2’s and BA.3’s infectivity, vaccine breakthrough capability, and antibody resistance. Our comparative analysis of all main variants namely, Alpha, Beta, Gamma, Delta, Lambda, Mu, BA.1, BA.2, and BA.3, unveils that BA.2 is about 1.5 and 4.2 times as contagious as BA.1 and Delta, respectively. It is also 30% and 17-fold more capable than BA.1 and Delta, respectively, to escape current vaccines. Therefore, we project that Omicron BA.2 is on its path to becoming the next dominating variant. We forecast that like Omicron BA.1, BA.2 will also seriously compromise most existing mAbs, except for sotrovimab developed by GlaxoSmithKline.


Introduction
On November 26, 2021, the World Health Organization (WHO) declared the Omicron variant (B.1.1.529) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) initially discovered in South Africa a variant of concern (VOC). Within a few days (i.e., December 1, 2021), an artificial intelligence (AI) model predicted the Omicron variant to be about 2.8 times as infectious as the Delta variant, have a near 90% likelihood to escape current vaccines, and severely compromise the efficacy of monoclonal antibodies (mAbs) developed by Eli Lilly, Regeneron, AstraZeneca, and many others, except for GlaxoSmithKline's sotrovimab [1]. The subsequent experiments confirm Omicron's high infectivity [2,3], high vaccine breakthrough rate [4,5], and severe antibody escape rate [6][7][8]. The U.S. Food and Drug Administration (FDA) halted the use of mAbs from Eli Lilly and Regeneron in January 2022. Due to its combined effects of high infectivity and high vaccine breakthrough rate, the Omicron variant is far more transmissible than the Delta variant and has rapidly become the dominating variant in the world. , which were first detected in November 2021 in South Africa [9]. Among them, BA.1 lineage is the preponderance that has ousted Delta. Compared to the reference genome reported in Wuhan, Omicron BA.1 has a total of 60 mutations on non-structure protein (NSP3), NSP4, NSP5, NSP6, NSP12, NSP14, S protein, envelope protein, membrane protein, and nucleocapsid protein. Among them, 32 mutations are on the spike (S) protein, the main antigenic target of antibodies generated by either infection or vaccination. Fifteen of these mutations affect the receptor-binding domain (RBD), whose binding with host angiotensin-converting enzyme 2 (ACE2) facilitates the viral cell entry during the initial infection [10]. BA.2 shares 32 mutations with BA.1 but has 28 distinct ones. On the RBD, BA.2 has four unique mutations and 12 shared with BA.1. In contrast, the Delta variant has only two RBD mutations. BA.3 shares most of its mutations with BA.1 and BA.2, except for one on NSP6 (A88V). It also has 15 RBD mutations, but none is distinct from BA.1 and BA.2. Nationwide Danish data in late December 2021 and early January 2022 indicate that Omicron BA.2 is inherently substantially more transmissible than BA.1 and capable of vaccine breakthrough [11]. Israel reported a handful of cases of patients who were infected with original Omicron BA.1 strain and have reinfected with BA.2 in a short period [12]. Although BA.2 did not cause worse illness than the original Omicron BA.1 strain, its reinfection is very alarming. It means the antibodies generated from the early Omicron BA.1 were evaded by the BA.2 strain. It is imperative to know whether BA.2 will become the next dominating strain to reinfect the world population.
Studies show that binding free energy (BFE) between the S RBD and the ACE2 is proportional to the viral infectivity [10,14,15]. In July 2020, nature selection favoring more infectious variants was discovered as the fundamental law of biology that governs SARS-CoV-2 transmission and evolution [16], including the occurrence of Alpha, Beta, Gamma, Delta, and Omicron variants. Natural selection in SARS-CoV-2 mutations was conformed beyond doubt in April 2021 [17]. Two vital RBD mutation sites, N501 and L452, that later appeared in all main variants, Alpha, Beta, Delta, Gamma, Delta, Epsilon, Theta, Kappa, Lambada, Mu, and Omicron, were also predicted in July 2020 [16]. These discovery and predictions may not be achieved by experimental means.
Currently, there are no experimental results about the infectivity, vaccine breakthrough, and antibody resistance of BA.2 and BA.3 [13]. In this work, we present a comprehensive analysis of Omicron BA.2 and BA.3's potential of becoming the next prevailing SARS-CoV-2 variant. Our study focuses on the S protein RBD, which is essential for virus cell entry. The RBD is not only crucial for viral infectivity but also essential for vaccines and antibody protections. An antibody that can disrupt the RBD-ACE2 binding would directly neutralize the virus [18][19][20]. We integrate tens of thousands of mutational and deep mutational data, biophysics, and algebraic topology to construct an AI model. We systematically investigate the binding free energy (BFE) changes of an RBD-ACE2 complex structure and a library of 185 structures of RBD-antibody complexes induced by the RBD mutations of Alpha, Beta, Gamma, Delta, Lambda, Mu, BA.1, BA.2, and BA.3 to reveal their infectivity, vaccine-escape potential, and antibody resistance. Using our comparative analysis, we unveil that the Omicron BA.2 variant is about 1.5 times as infectious as BA.1 and about 4.2 times as contagious as the Delta variant. It also has a 30% higher potential than BA.1 to escape existing vaccines. Therefore, we project the Omicron BA.2 is on its path to becoming the next dominating variant.  Figure 1 a shows the three-dimensional (3D) structure of Omicron BA.1 [3]. At the RBD, Omicron BA.1, BA.2 and BA.3 share 12 RBD mutations, i.e., G339D, S373P, S375F, K417N, N440K, S477N, T478K, E484A, Q493R, Q498R, N501Y, and Y505H as shown in Figure 1 b. However, BA.1 has distinct RBD mutations S371L, G446S, and G496S, BA.2 has S371F, T376A, D405N, and R408S, and BA.3 has S371F, D405N, and G446S. Figures 1 d, e and f present the BFE changes of the RBD-ACE2 complex induced by the RBD mutations of Omicron AB.1, BA.2 and BA.3, respectively. The larger the BFE change is, the higher infectivity will be. Since natural selection favors those mutations that strengthen the viral infectivity [16], the most contagious variant will become dominant in a population under the same competing condition. The accumulated BFE changes are summarized in Figure 1 g. A comparison is given to other main SARS-CoV-2 variants Alpha, Beta, Gamma, Delta, Theta, Kappa, Lambda, and Mu. The Delta variant had the highest BFE change among the earlier variants and was the most infectious variant before the occurrence of the Omicron variant, which explains its dominance in 2021. Omicron BA.1, BA.2, and BA.3 have BFE changes of 2.60, 2.98, and 2.88 kcal/mol, respectively, which are much higher than those of other major SRAS-CoV-2 variants. Among them, Omicron BA.2 is the most infectious variant and is about 20 and 4.2 times as infectious as the original SARS-CoV-2 and the Delta variant, respectively. Our model predicts that BA.2 is about 1.5 as contagious BA.2, which is the same as reported in an initial study [12]. Another report confirms that Omicron BA.2 is more contagious than BA.1 [11]. Therefore, Omicron BA.2 may eventually replace the original Omicron strain BA.1 in the world.

Vaccine breakthrough
Omicron BA.1 is well-known for its ability to escape current vaccines [5,6]. Its 15 mutations at the RBD enable it to not only strengthen its infectivity by a stronger binding to human ACE2 but also create mismatches for most direct neutralization antibodies generated from vaccination or prior infection. Although BA.1, BA.2, and BA.3 share 12 RBD mutations, BA.1 has 3 additional RBD mutations, BA.2 has 4 additional RBD mutations, and BA.3 has one mutation the same as that of BA.1's additional ones and two mutations the same as those of BA.2's additional ones. Therefore, it is important to understand their vaccine-escape potentials. Currently, no experimental result has been reported about the vaccine-breakthrough capability of BA.2 and BA.3.
Experimental analysis of the variant vaccine-escape capability over the world's populations is subject to many uncertainties. Different vaccines may stimulate different immune responses and antibodies for the same person. Different individuals may have different immune responses and antibodies from the same vaccine due to their different races, gender, age, and underlying medical conditions. Uncontrollable experimental conditions and different experimental methods may also contribute to uncertainties. Consequently, it is impossible to accurately characterize a variant's vaccine-escape capability (or rate) over the world's populations.
In our work, we take an integrated approach to understanding the intrinsic vaccine-escape capability of SARS-CoV-2 variants. We collect a library of 185 known antibody and S protein complexes and analyze the mutational impact on the binding of these complexes [1,22]. The results in terms of mutation-induced BFE changes serve as the statistical ensemble analysis of Omicron subvariants' vaccine-breakthrough potentials. This molecular-level analysis becomes very useful when it is systematically applied to a series of variants.  Figure 3 and analysis of Alpha, Beta, Gamma, Lambda, and Mu is presented in Figure S2. Accumulated BFE changes are provided in Figure 3 a1, b1, and c1. Obviously, all Omicron subvariants have more negative accumulated BFE changes than positive ones, showing their antibody resistance. Among them, BA.2's distribution is extended to a wider negative domain, showing its strongest antibody resistance. In contrast, Delta variant's statistics is given in Figure 3 d1, showing a smaller domain of distribution.
As discussed earlier, it is difficult to obtain a variant's true vaccine-escape rate over world's populations. However, a molecular-based comparative analysis can offer desirable information. Figures 3 a2, b2, c2, and d2 depict the number of antibody-RBD complexes that is regarded as disrupted by BA.1, BA.2, BA.3, and Delta mutations, respectively, under different thresholds ranging from 0 kcal/mol, -0.3 kcal/mol, to <-3  kcal/mol. Previously, threshold -0.3 kcal/mol was used to decide whether a mutation disrupts an antibody-RBD complex [1], which gives rise to 163, 168, and 164 disrupted antibody-RBD complexes, respectively for BA.1, BA.2, and BA.3. The corresponding rates of potential vaccine breakthrough are 0.88, 0.91, and It is interesting to compare our analysis with experimental results [5]. In Figure 3 f, the sensitivity of 28 serum samples from COVID-19 convalescent patients infected with an earlier SARS-CoV-2 strain (D614G) was tested against pseudotyped Omicron, Alpha, Beta, Gamma, Delta, Lambda, and Mu [5]. The results indicate the Omicron (BA.1) and Delta variant have 8.4 and 1.6 fold reductions, respectively, to the mean neutralization ED50 of these sera compared with the D614G reference strain. Figure 3 e presents a comparison of accumulated negative BFE changes for variants Omicron BA.1, BA.2, BA.3, Alpha, Beta, Delta, Gamma, Lambda, and Mu. For each antibody-RBD complex, we only consider disruptive effects by setting positive BFE changes to zero and sum over RBD mutations (e.g., 15 mutations for Omicron BA.1 and 2 for Delta) to obtain the accumulated negative BFE change. As such, we have 185 accumulated negative BFE changes for each variant. We use the mean of these 185 values to computed the fold of affinity reduction, which can be compared for different variants against the original virus reported in Wuhan (BFEchange average = 0). The RBD mutations of the Delta variant cause 1.5 fold reduction in the neutralization capability. In the same setting, Omicron BA.1, BA.2, and BA.3 may lead to about 21, 27, and 18 fold increases in their vaccine-breakthrough capabilities. As such, BA.2 is about 30% more capable to escape existing vaccines than BA.1 and 17 times more than the Delta variant. Our prediction has a correlation coefficient of 0.9 with the experiment. With its highest infectivity and highest vaccine-escape potential, the Omicron BA.2 is set to take over the Omicron BA.1 in infecting the world population.

Antibody resistance
The design and discovery of mAbs are part of an important achievement in combating COVID-19. Unfortunately, like vaccines, mAbs are prone to viral mutations, particularly antibody-resistant ones. Early studies predicted that Omicron BA.1 would compromise the anti-COVID-19 mAbs developed by Eli Lilly, Regeneron, AstraZeneca, Celltrion, and Rockefeller University [1]. However, Omicron BA.1's impact on GlaxoSmithKline's mAb, called sotrovimab, was predicted to be mild [1]. These predictions have been confirmed and the FDA has halted the use of Eli Lilly and Regeneron's COVID-19 mAbs. Currently, Glax-oSmithKline's sotrovimab is the only antibody-drug authorized in the U.S. for the treatment of COVID-19 patients infected by the Omicron variant. An important question is whether sotrovimab remains effective for the BA.2 subvariant that might drive a new wave of infections in the world population.
In this work, we further analyze the efficacy of these mAbs for BA.2 and BA.3. Our studies focus on Omicron subvariants' RBD mutations, which appear to be optimized by the virus to evade host antibody protection and infect the host cell.    Figures 4 c2, c3  and c4. It is noticed that BA.1 mutation G446S has a disruptive effect on AZD1061. AZD8895 is weakened by two shared mutations. The AZD1061-AZD8895 combination is also disrupted by shared mutation Q493R. Therefore, the efficacy of AstraZeneca's mAbs would be reduced should BA.2 prevail in world populations.
As shown in Figure 4 d2, Celltrion's mAb CT-P59 is prone to shared mutations Q493R and E484A. BA.2 mutations may not bring additional destruction. However, the shared mutations pose a threat to Celltrion's mAb, which implies its efficacy would not restore should BA.2 prevail. Figures 4 e2 and f2 present BA.1 and BA.2's mutational impacts on Rockefeller University's mAbs. C135 is mainly disrupted by Omicron BA.1 and its C144 is made ineffective by shared mutation E484A. Therefore, C135 might become effective if BA.2 dominates.
Finally, we plot mutational impacts on antibody S309's binding with RBD in Figure 4 g2. Antibody S309 is the parent antibody for Sotrovimab developed by GlaxoSmithKline and Vir Biotechnology, Inc. It is seen from the figure that there is only one disruptive BFE change of -0.47kcal/mol and the rest of the BFE changes are mostly positive. The BA.2 mutations have little effect on S309. Therefore, we expect a mild effect from Omicron BA.1 and BA.2 on sotrovimab.
It is interesting to understand why S309 is the only antibody that is not significantly affected by Omicron variants. Figure 4 show that all mAbs that compete with the human ACE2 for the receptor-binding motif (RBM) are seriously compromised by Omicron subvariants because most of the RBD mutations locate at the RBM. A possible reason is that Omicron subvariants had optimized RBD mutations at the RBM to strengthen the viral infectivity and evade the direct neutralization antibodies. Consequently, all mAbs that target RBM are seriously compromised by Omicron subvariants. Figures 4 e1 and g1 show that antibodies C135 and S309 do not directly compete with ACE2 for the RBM. However, C135 is still very close to the RBM and significantly weakened by some Omicron mutations. In contrast, S309 is further away from the RBM and escapes from Omicron's RBD mutations.

Materials and Methods
The deep learning model is designed for predicting mutation-induced BFE changes of the binding between protein-protein interactions. A series of three steps consist of training data preparation, feature generations, and deep neural network training and prediction (see Figure S2). Here, we briefly discuss each step and leave more details in Supporting Information. Readers are also suggested literature [16,27,28] for more details.
Firstly, the training data is prepared to comprise experimental BFE changes and next-generation sequencing data. SKEMPI 2.0 [29] is the fundamental BFE change dataset. Additionally, SARS-CoV-2 related datasets are the mutational scanning data of the ACE2-RBD complex [30][31][32] and the CTC-445.2-RBD complex [32]. Next is to prepare the features. It is required a variety of biochemical, biophysical, and mathematics features from PPI complex structures, such as surface areas, partial charges, van der Waals interaction, Coulomb interactions, pH values, electrostatics, persistent homology, graph theory, etc. [16,33] A detailed list and description of these features are provided in Supporting Information. In the following, the key idea of the element-specific and site-specific persistent homology is illustrated briefly. As the persistent homology [34,35] introduced as a useful tool for data analysis for scientific and engineering applications, it is further applied to molecular studies [27,36]. For 3D structures, atoms are modeled as vertices in a point cloud. Then edges, faces, etc. can be constructed as simplices σ which form simplicial complexes X. Groups C k (X), k = 0, 1, 2, 3 are sets of all chains of kth dimension, which is defined as a finite sum of simplices as i α i σ k i with coefficients α i . The boundary operator ∂ k therefore, maps C k (X) → C k−1 (X) as where σ k = {v 0 , · · · , v k } and [v 0 , · · · ,v i , · · · , v k ] is a (k−1)-simplex excluding v i with ∂ k−1 ∂ k = 0. The chain complex is given as The k-th homology group H k is defined by Thus, the Betti numbers can be defined by the ranks of k-th homology group H k . Persistent homology can be devised to track Betti numbers through a filtration where β 0 describes the number of connected components, β 1 provides the number of loops, and β 2 is the number of cavities. Therefore, using persistent homology, the atoms of 3D structures are grouped according to their elements, as well as the atoms from the binding site of antibodies and antibodies. The interactions and their impacts on PPI complex bindings are characterized by the topological invariants, which are further implemented for machine learning training.
Lastly, a deep learning algorithm, artificial/deep neural networks (ANNs or DNNs), is used to tackle the features with datasets for training and predictions [28]. A trained model is available at TopNetmAb, a SARS-CoV-2-specific model, whose early model was integrating convolutional neural networks (CNNs) with gradient boosting trees (GBTs) and was trained only on the SKEMPI 2.0 dataset with a high accuracy [33].
Recent work with predictions from TopNetmAb [22,28,37] is highly consistent with experimental results. One should notice it is important with the help of the aforementioned deep mutational datasets related to SARS-CoV-2. The Pearson correlation of our predictions for the binding of CTC-445.2 and RBD with experimental data is 0.7 [28,32]. Meanwhile, a Pearson correlation of 0.8 is observed of the predictions of clinical trial antibodies against SARS-CoV-2 induced by emerging mutations in the same work [28] compared to the natural log of experimental escape fractions [38]. Moreover, the prediction of single mutations L452R and N501Y for the ACE2-RBD complex have a perfect consistency with experimental luciferase data [28,39]. More detailed validations are in Supporting Information.

Conclusion
The Omicron variant has three subvariants BA.1, BA.2, and BA3. The Omicron BA.1 has surprised the scientific community by its large number of mutations, particularly those on the spike (S) protein receptorbinding domain (RBD), which enable its unusual infectivity and high ability to evade antibody protections induced by viral infection and vaccination. Viral RBD interacts with host angiotensin-converting enzyme 2 (ACE2) to initiate cell entry and infection and is a major target for vaccines and monoclonal antibodies (mAbs). Omicron BA.1 exploits its 15 RBD mutations to strengthen its infectivity and disrupt mAbs generated by prior viral infection or vaccination. Omicron BA.2 and BA.3 share 12 RBD mutations with BA.1 but differ by 4 and 3 RBD mutations, respectively, suggesting potentially serious threats to human health. However, no experimental result has been reported for Omicron BA.2 and BA.3, although BA.2 is found to be able to alarmingly reinfect patients originally infected by Omicron BA. 1 [12]. In this work, we present deep learning predictions of BA.2's and BA.3's potential to become another dominating variant. Based on an intensively tested deep learning model trained with tens of thousands of experimental data, we investigate Omicron BA.2's and BA.3's RBD mutational impacts on the RBD-ACE2 binding complex to understand their infectivity and a library of 185 antibodies to shed light on their threats to vaccines and existing mAbs. We unveil that BA.2 is about 1.5 and 4.2 times as contagious as BA.1 and Delta, respectively. It is also 30% and 17-fold more capable than BA.1 and Delta, respectively, to escape current vaccines. It is predicted to undermine most existing mAbs, except for sotrovimab developed by GlaxoSmithKline. We forecast Omicron BA.2 will become another prevailing variant by infecting populations with or without antibody protection.

Data and model availability
The structural information of 185 antibody-RBD complexes with their corresponding PDB IDs and the results of BFE changes of PPI complexes induced by mutations can be found in Section S2 of the Supporting Information. The TopNetTree model is available at TopNetmAb. The detailed methods can be found in the Supporting Information S3 and S4. The validation of our predictions with experimental data can be located in Supporting Information S5.

Supporting information
The supporting information is available for S1 Supplementary figures: analysis of variant mutation-induced BFE changes for Alpha, Beta, Gamma, Lambda, and Mu variants (the extension of Figure 3).  Figure S1 provides the statistic analysis of BFE changes of RBD-ACE2 induced by mutations from Alpha, Beta, Gamma, Lambda, and Mu.

S2 Supplementary data
The Supplementary Data.zip contains four files as listed in the following subsection.

S2.1 Disrupted antibodies
File antibodies BFEs.csv shows the BFE changes of antibodies disrupted by Omicron mutations.

S3 Supplementary feature generation methods
In this section, the workflow of the deep learning-based BFE change predictions of protein-protein interactions induced by mutations for the present SARS-CoV-2 variant analysis and prediction will be firstly introduced, which includes four steps as shown in Figure S2: (1) training data preparation; (2) feature generations of protein-protein interaction complexes; and (3) prediction of protein-protein interactions by deep neural networks. Next, the validation of our machine learning-based model will be demonstrated, suggesting consistent and reliable results compared to the experimental deep mutations data.

S3.1 Methods for BFE change predictions
In this section, the process of the machine learning-based BFE change predictions is introduced. Once the data pre-processing and SNP genotyping are carried out, we will firstly proceed with the training data preparation process, which plays a key role in reliability and accuracy. A library of 185 antibodies and RBD complexes, as well as an ACE2-RBD complex, are obtained from Protein Data Bank (PDB). RBD mutation-induced BFE changes of these complexes are evaluated by the following machine learning model. According to the emergency and the rapid change of RNA virus, it is rare to have massive experimental BFE change data of SARS-CoV-2, while, on the other hand, next-generation sequencing data is relatively easy to collect. In the training process, the dataset of BFE changes induced by mutations of the SKEMPI 2.0 Figure S2: Illustration of genome sequence data pre-processing and BFE change predictions.
dataset [29] is used as the basic training set, while next-generation sequencing datasets are added as assistant training sets. The SKEMPI 2.0 contains 7,085 single-and multi-point mutations and 4,169 elements of that in 319 different protein complexes used for the machine learning model training. The mutational scanning data consists of experimental data of the binding of ACE2 and RBD induced mutations on ACE2 [30] and RBD [31,32], and the binding of CTC-445.2 and RBD with mutations on both protein [32].
Next, the feature generations of protein-protein interaction complexes are performed. The elementspecific algebraic topological analysis on complex structures is implemented to generate topological bar codes [28,[40][41][42]. In addition, biochemistry and biophysics features such as Coulomb interactions, surface areas, electrostatics, et al., are combined with topological features [22]. The detailed information about the topology-based models will be demonstrated in subsection S3.2. Lastly, deep neural networks for SARS-CoV-2 are constructed for the BFE change prediction of protein-protein interactions [28]. The detailed descriptions of dataset and machine learning model are found in the literature [16,28,43] and are available at TopNetmAb.

S3.2.1 Topology features
Among all features generated for machine learning prediction, the application of topology theory makes the model to a whole new level. Those summarized as other inputs are called as auxiliary features and are described in Section S3.3.2 and S3.3.3. In this section, a brief introduction about the theory of topology will be discussed. Algebraic topology [40,41] has achieved tremendous success in many fields including biochemical and biophysical properties [42]. Special treatment should be implemented for biology applications to describe element types and amino acids in poly-peptide mathematically, which element-specific and site-specific persistent homology [16,33]. To construct the algebraic topological features on protein-protein interaction model, a series of element subsets for complex structures should be defined, which considers atoms from the mutation sites, atoms in the neighborhood of the mutation site within a certain distance, atoms from antibody binding site, atoms from antigen binding site, and atoms in the system that belong to type of {C, N, O}, A ele (E). Under the element/site-specific construction, simplicial complexes is constructed on point clouds formed by atoms. For example, a set of independent k + 1 points is from one element/site-specific set U = {u 0 , u 1 , ..., u k }. The k-simplex σ is a convex hull of k +1 independent points U , which is a convex combination of independent points. For example, a 0-simplex is a point and a 1-simplex is an edge. Thus, a m-face of the k-simplex with m+1 vertices forms a convex hull in a lower dimension m < k and is a subset of the k+1 vertices of a k-simplex, so that a sum of all its (k−1)-faces is the boundary of a k-simplex σ as where u 0 , ...,û i , ..., u k consists of all vertices of σ excluding u i . The collection of finitely many simplices is a simplicial complex. In the model, the Vietoris-Rips (VR) complex (if and only if B(u ij , r) ∩ B(u i j ′ , r) = ∅ for j, j ′ ∈ [0, k]) is for dimension 0 topology, and alpha complex (if and only if ∩ ui j ∈σ B(u ij , r) = ∅) is for point cloud of dimensions 1 and 2 topology [42].
The k-chain c k of a simplicial complex K is a formal sum of the k-simplices in K, which is c k = α i σ i , where α i is coefficients and is chosen to be Z 2 . Thus, the boundary operator on a k-chain c k is such that ∂ k : C k → C k−1 and follows from that boundaries are boundaryless as a sequence of complexes by boundary maps. Therefore, the Betti numbers are given as the ranks of kth homology group H k as β k = rank(H k ), where H k = Z k /B k , k-cycle group Z k and the k-boundary group B k . The Betti numbers are the key for topological features, where β 0 gives the number of connected components, such as number of atoms, β 1 is the number of cycles in the complex structure, and β 2 illustrates the number of cavities. This presents abstract properties of the 3D structure.
Finally, only one simplicial complex couldn't give the whole picture of the protein-protein interaction structure. A filtration of a topology space is needed to extract more properties. A filtration is a nested sequence such that Each element of the sequence could generate the Betti numbers {β 0 , β 1 , β 2 } and consequentially, a series of Betti numbers in three dimensions is constructed and applied to be the topological fingerprints in Figure S2.

S3.2.2 Residue-level features
Mutation site neighborhood amino acid composition Neighbor residues are the residues within 10Å of the mutation site. Distances between residues are calculated based on residue C α atoms. Six categories of amino acid residues are counted, which are hydrophobic, polar, positively charged, negatively charged, special cases, and pharmacophore changes. The count and percentage of the 6 amino acid groups in the neighbor site are regrading as the environment composition features of the mutation site. The sum, average, and variance of residue volumes, surface areas, weights, and hydropathy scores are used but only the sum of charges is included.
pKa shifts The pKa values are calculated by the PROPKA software [44], namely the values of 7 ionizable amino acids, namely, ASP, GLU, ARG, LYS, HIS, CYS, and TYR. The maximum, minimum, sum, the sum of absolute values, and the minimum of the absolute value of total pKa shifts are calculated. We also consider the difference of pKa values between a wild type and its mutant. Additionally, the sum and the sum of the absolute value of pKa shifts based on ionizable amino acid groups are included.
Position-specific scoring matrix (PSSM) Features are computed from the conservation scores in the position-specific scoring matrix of the mutation site for the wild type and the mutant as well as their difference. The conservation scores are generated by PSI-BLAST [45].
Secondary structure The SPIDER2 software is used to compute the probability scores for residue torsion angle and residues being in a coil, alpha helix, and beta strand based on the sequences for the wild type and the mutant [46].

S3.2.3 Atom-level features
Seven groups of atom types, including C, N, O, S, H, all heavy atoms, and all atoms, are considered when generating the element-type features. Meanwhile, other three atom types, i.e., mutation site atoms, all heavy atoms, and all atoms, are used when generating the general atom-level features.
Surface areas Atom-level solvent excluded surface areas are computed by ESES [47].
Partial changes Partial change of each atom is generated by pdb2pqr software [48] using the Amber force field [49] for wild type and CHARMM force field [50] for mutant. The sum of the partial charges and the sum of absolute values of partial charges for each atomic group are collected.
Atomic pairwise interaction interactions Coulomb energy of the ith single atom is calculated as the sum of pairwise coulomb energy with every other atom as where k e is the Coulomb's constant, r ij is the distance of ith atom to jth atom, and q i is the charge of ith atom. The van der Waals energy of the ith atom is modeled as the sum of pairwise Lennard-Jones potentials with other atoms as where ǫ is the depth of the potential well, and r i is van der Waals radii.
In atomic pairwise interaction, 5 groups (C, N, O, S, and all heavy atoms) are counted both for Coulomb interaction energy and van der Waals interaction energy.
Electrostatic solvation free energy Electrostatic solvation free energy of each atom is calculated using the Poisson-Boltzmann equation via MIBPB [51] and are summed up by atom groups.

S4 Supplementary machine learning methods
The topology-based network model for BFE change predictions induced mutations on SARS-CoV-2 studying applies a deep neural network structure. Similar approaches have been widely implemented in the energy prediction of protein-ligand binding [52] and protein-protein interactions [33]. The neural network method maps an input feature layer to output layer and mimics biological brains for solving problems where numerous neuron units are involved and weights of neurons are updated by backpropagation methods. To make more complicated structure in order to extract abstract properties, more layers and more neurons in each layer can be constructed. In the training process, optimization methods are applied to avoid overfitting issue, such as dropout methods [53] that a partial of computed neurons of each layer is dropped. For the model cross validations, the Pearson correlation of 10-fold cross-validation is 0.864, and the root mean square error is 1.019 kcal/mol.

S4.1 Deep learning algorithms
A deep neural network is a neural network method with multi-layers (hidden layers) of neurons between the input and output layers. In each layer, the single neuron gets fully connected with the neurons in the next layer. It should preserve the consistency of all labels when applying the model for mutation-induced BFE change predictions. The loss function is constructed as follows: where N is the number of samples, f is a function of the feature vector x i parameterized by a weight vector W and bias term b, and λ represents a penalty constant.

S4.2 Optimization
The backpropagation is applied to evaluate the loss function starting from the output layer and propagates backward through the network structure to update the weight vector W and bias term b. Since gradient calculation is required, therefore, we apply the stochastic gradient descent method with momentum, which only evaluates a small part of training data and can be considered as calculating exponentially weighted averages, which is given as where W i is the parameters in the network, L(W i , b i ) is the objective function, η is the learning rate, X and y are the input and target of the training set, and β ∈ [0, 1] is a scalar coefficient for the momentum term. The momentum term involved accelerates the converging speed.

S5 Supplementary validation
In the main content, we briefly summarized validations of our machine learning predictions and experimental data. For large quantitative validations, we compared the BFE change prediction for mutations on S protein RBD to the experimental deep mutational enrichment data on RBD binding to human ACE2 and CTC-445.2 induced by RBD mutations [22,28,32]. To make these validations, we eliminated the experimental deep mutational enrichment data of RBD binding to human ACE2 and CTC-445.2 from the training sets and set them as testing sets, which have 1539 and 1500 samples, respectively. In the validation of RBD and CTC-445.2 complex, there is a very high correlation between the enrichment data and machine learning predictions, as well as the validation of RBD binding to ACE2, with Pearson correlations are 0.69 and 0.70, respectively. The deep mutational enrichment data can give a proportional descriptor of the affinity strength of protein-protein interactions induced by mutations. The machine learning methods, however, give stable and equalized evaluations, while experimental data might be different dramatically due to conditions and environments.
In addition, we compared our machine learning results with other experimental data, which are escape fraction, pseudovirus infection changes, and IC 50 fold changes [28]. In the comparison of 35 cases to experimental escape fractions on RBD binding to clinical trial antibodies induced by emerging mutations, our machine learning predictions have a Pearson correlation of 0.80. Especially, those high escaping mutations E484K and E484Q on LY-CoV555, and mutations K417T and K417N on LY-CoV016, are indicated by both our predictions and the experimental data [28]. We also use the pattern comparisons of our prediction to experimental data. Lastly, we collected experimental data from different literature [54][55][56][57]. According to variations from different research groups, they were summarized in increasing/decreasing patterns of emerging variant (including co-mutations) impacts on antibody therapies in clinical trials. In total, there are 20 pattern comparisons with an excellent agreement between various experimental data and our predictions, except for a minor discrepancy [28].