Predicting Future Mutations To Inform Vaccine Design

Inﬂuenza viruses constantly evolve 1 , and mismatches between predicted and circulating strains impact vaccine effectiveness 2 . A barrier to predicting the season-speciﬁc dominant strains is the limited ability to predict future mutations, or estimate the numerical likelihood of speciﬁc future strains. In this study, we introduce a biology-aware sequence similarity metric based on deep pattern recognition of emergent evolutionary constraints, that calculate the odds of future mutations, outperforming WHO recommended ﬂu vaccine compositions almost consistently over the past two decades. Periodic adjustment of the Inﬂuenza vaccine components is necessary to account for antigenic drift 1,3 . The ﬂu shot in each hemisphere is annually prepared at least six months in advance, and is based on a cocktail of historical strains determined by the WHO via global surveillance 4 , hoping to match the circulating strain(s) in the upcoming ﬂu season. A variety of hard-to-model effects hinders this prediction, which despite observed cross-reactive effects 2 , have limited vaccine effectiveness in recent years 5 . A key barrier to improving prediction of dominant circulating strains is the missing ability to numerically estimate the likelihood of speciﬁc future mutations. The state of the art urgently needs tools to compute the likelihood of a strain replicating in the wild to spontaneously give rise to another by random chance. Currently this likelihood is qualitatively equated to sequence similarity, which is measured by how many mutations it takes to change one strain to another. In reality, the odds of one sequence mutating to another is a function of not just how many mutations they are

Influenza viruses constantly evolve 1 , and mismatches between predicted and circulating strains impact vaccine effectiveness 2 . A barrier to predicting the season-specific dominant strains is the limited ability to predict future mutations, or estimate the numerical likelihood of specific future strains. In this study, we introduce a biology-aware sequence similarity metric based on deep pattern recognition of emergent evolutionary constraints, that calculate the odds of future mutations, outperforming WHO recommended flu vaccine compositions almost consistently over the past two decades.
Periodic adjustment of the Influenza vaccine components is necessary to account for antigenic drift 1,3 . The flu shot in each hemisphere is annually prepared at least six months in advance, and is based on a cocktail of historical strains determined by the WHO via global surveillance 4 , hoping to match the circulating strain(s) in the upcoming flu season. A variety of hard-to-model effects hinders this prediction, which despite observed cross-reactive effects 2 , have limited vaccine effectiveness in recent years 5 .
A key barrier to improving prediction of dominant circulating strains is the missing ability to numerically estimate the likelihood of specific future mutations. The state of the art urgently needs tools to compute the likelihood of a strain replicating in the wild to spontaneously give rise to another by random chance. Currently this likelihood is qualitatively equated to sequence similarity, which is measured by how many mutations it takes to change one strain to another. In reality, the odds of one sequence mutating to another is a function of not just how many mutations they are apart to begin with, but also how specific mutations incrementally affect fitness. Ignoring the constraints arising from the need to conserve function makes any assessment of the mutation likelihood open to subjective bias. Here, we show that a precise calculation is possible when sequence similarity is evaluated via a new biology-aware metric, which we call the q-distance.
We show that by learning from the mutational patterns of key surface proteins Hemaglutinnin (HA) and Neuraminidase (NA) for Influenza A (selected for their known roles in cellular entry and exit 6 ), we can improve forecasts for the future dominant circulating strain under seasonal antigenic drift.
We begin by collecting HA/NA nucleotide sequences from two public databases (NCBI and GISAID, See SI- Table IV,% PW; HHH distinct sequences used), uncovering a network of dependencies between individual mutations revealed through variations of the aligned sequences. These dependencies define our organism-specific model referred to as the quasi-species network or the Qnet (Fig. 1). The q-distance, informed by the inferred Qnet, adapts to the specific organism, allelic frequencies, and nucleotide variations in the background population.
Using aligned genomic sequences sampled from similar populations, e:g: HA from Human Influenza A in year 2008, we construct the Qnet via customized machine learning algorithms to learn models for predicting the mutational variations at each sequence index using other indices as features. For example, in Fig. 1a, the predictor for index 1274 uses variation at index 1064 as a feature, and the predictor for index 1064 uses index 1314 as a feature, and so on -ultimately uncovering a recursive dependency structure. The Qnet predicts the nucleotide distribution over the base alphabet (the four nucleic acid bases ATGC) at any specific index, conditioned on the nucleotides making up the rest of the sequence of the gene or genome fragment under consideration. Finally, we define the q-distance (See Eq. (3) in Materials and Methods) as the square-root of the Jensen-Shannon (JS) divergence 7 of these conditional distributions from one sequence to another, averaged over the entire sequence. Invoking Sanov's theorem on large deviations 7 , we show that the likelihood of spontaneous change is bounded above and below by a simple exponential function of the q-distance.
The mathematical intuition behind relating the new distance to change-probability is the same as in the prediction  of a biased outcome when we sequentially toss a fair coin. With an overwhelming probability, such an experiment with a fair coin should result in roughly equal number of heads and tails. However, "large deviations" can happen, and the probability of such rare events is quantifiable 8 with existing theory. We show here that the likelihood of a spontaneous transition of a genomic sequence to a different variant by random chance may also be similarly bounded, given we have the Qnet as an estimated model of the evolutionary constraints.    Importantly, the q-distance between two sequences may change even if only the background population changes (See SI- Table II, where the distance between two fixed sequences vary when we vary their collection years). Sequences may have a large q-distance and a small edit distance, and vice versa (although on average the two distances tend to be positively correlated, see SI- Table III). Hence for tracking drift in Influenza A, we construct a seasonal Qnet for each subtype and protein that we consider.
For predicting future strains, we hypothesized that since the probability of a drift exponentially decreases with an increasing q-distance, the centroid of the strain distribution in our metric will change slowly. If true, the strain selected closest to the "q"-centroid will be a good approximation of next season's dominant strain. We tested this hypothesis on past two decades of sequence data for Influenza A (H1N1 and H3N2), with promising results: the q-distance based prediction demonstrably outperforms WHO recommendations by reducing the distance between the predicted and the dominant strain (Fig. 2). Here, we identify the dominant strain to be the one that occurs most frequently, computed as the centroid of the strain distribution observed in a given season in the classical sense (no. of mutations). For H1N1 HA the Qnet induced recommendation outperforms the WHO suggestion by > QI7 on average over the last 19 years, and > VI7 in the last decade in the northern hemisphere. The gains for NA over the same time periods for H1N1 for the north are > TH7 and > PP7 respectively. For the southern hemisphere, the gains for H1N1 over the last decade are > UP7 for HA, and > SH7. The full table of results is given in SI- Table I. Fig. 2 illustrates the relative gains computed for both subtypes and the two hemispheres (since the flu season occupy distinct time periods and may have different dominant strains in the northern and southern hemispheres 3 ). Additional improvement is possible if we recommend multiple strains every season for the vaccine cocktail ( Fig. 2e,f,k,l). The details of the specific strain recommendations made the Qnet approach for two subtypes (H1N1, H3N2), for two genes (HA, NA) and for the northern and the southern hemispheres over the previous IW years are enumerated in the Supplementary text in Tables SI-Table V through SI- Table XIV.
Comparing the Qnet inferred strain (QNT) against the one recommended by the WHO, we find: 1) the residues that only the QNT matches correctly with DOM (while the WHO fails) are largely localized within the receptor binding domain (RBD), with > SU7 occurring within the RBD on average (see SI- Fig. 1a for a specific example), and 2) when the WHO strain deviates from the QNT/DOM matched residue, the "correct" residue is often replaced in the WHO recommendation with one that has very different side chain, hydropathy and/or chemical properties (See SI- Fig. 1b-f), suggesting deviations in recognition characteristics. Combined with the fact that we find circulating strains are almost always within a few edits of the DOM (See SI- Fig. 2), these observations suggest that hosts vaccinated with the QNT recommendation is more likely to have season-specific antibodies that are more likely to recognize a larger cross-section of the circulating strains.
Focusing on the average localization of the QNT to WHO deviations in the HA molecular structure, the changes are observed to primarily occur in the HA1 subunit (See SI- Fig. 1g-i, HA0 numbering used, other numbering conversions are given in Supplementary Tab. XVI), with the most frequent deviations occurring around the % PHH loop, the % PPH loop, the % IVH helix, and the % IHH helix, in addition to some residues in the HA2 subunit (% RW & % IPR). Unsurprisingly, the residues we find to be most impacted in the HA1 subunit (the globular top of the fusion protein) have been repeatedly implicated in receptor binding interactions [9][10][11] . Thus, we are able to fine tune the future recommendation over the state of the art, largely by modifying residue recommendations around the RBD and structures affecting recognition dynamics.
Calculation of q-distance is currently limited to similar and aligned sequences, e:g: Influenza strains from different subtypes, hosts or seasons. A multi-variate regression analysis (See Materials and Methods) indicates that the most important factor for our approach to succeed is the diversity of the sequence dataset (See Supplementary text, SI Table XV). Arguably, simply reducing the edit distance from the dominant strain is not guaranteed to translate to a better immunological protection. Nevertheless consistent improvement in this metric achieved purely via computational means suggests the possibility of improvement over current practice. In conclusion, we introduce a data-driven distance metric to track subtle deviations in sequences. The ability to predict future flu strains via subtle variations in a limited set of immunologically important residues suggest that the tools developed here could lead to more effective escape-resistant vaccines.

Data Source
In this study, we use sequences for the spike (S) protein on betacoronaviruses 12 , which plays a crucial role in host cellular entry, and the Hemaglutinnin (HA) and Neuraminidase (NA) for Influenza A (for subtypes H1N1 and H3N2), which are key enablers of cellular entry and exit mechanisms respectively 13 . We use two sequences databases: 1)National Center for Biotechnology Information (NCBI) virus 14 and 2) GISAID 15 databases. The former is a community portal for viral sequence data, aiming to increase the usability of data archived in various NCBI repositories. GISAID has a somewhat more restricted user agreement, and use of GISAID data in an analysis requires acknowledgment of the contributions of both the Submitting and the Originating laboratories (Corresponding acknowledgment tables are included as Supplementary files). We use a total of 30,204 sequences in our analysis (See SI-Tab. IV).
Next, we briefly describe the details of the computational framework.

Qnet Framework
In defining the q-distance, we do not assume that the mutational variations at the individual indices of a genomic sequence are independent (See Fig 1b in the main text). Irrespective of whether mutations are truly random 16 , since only certain combinations of individual mutations are viable, individual mutations across a genomic sequence replicating in the wild appear constrained, which is what is explicitly modeled in our approach. The mathematical form of our metric is not arbitrary; JS divergence is a symmetricised version of the more common KL divergence 7 between distributions, and among different possibilities, the q-distance is the simplest metric such that the likelihood of a spontaneous jump (See Eq. (9) in Methods) is provably bounded above and below by simple exponential functions of the q-distance.
Consider a set of random variables X a fX i g, with i P fI; ¡ ¡ ¡ ; Ng, each taking value from the respective sets ¦ i . A sample x P N I ¦ i is an ordered N-tuple, consisting of a realization of each of the variables X i with the i th entry x i being the realization of random variable X i . We use the notation x i and x i; to denote: Also, D@SA denotes the set of probability measures on a set S, e:g:, D @¦ i A is the set of distributions on ¦ i . We note that X defines a random field over the index set fI; ¡ ¡ ¡ ; Ng. Also, to clarify the biological picture, we refer to the sample x as an amino acid or nucleotide sequence, identifying the entry at each index with the corresponding protein residue or the nucleotide base pair.
Definition 1 (Qnet). For a random field X a fX i g indexed by i P fI; ¡ ¡ ¡ ; Ng, the Qnet is defined to be the set of predictors¨a f¨ig, i:e:, we have:¨i X j i ¦ j 3 D @¦ i A ; (2) where for a sequence x,¨i@x i A estimates the distribution of X i on the set ¦ i .
We use conditional inference trees as models for predictors 17 , although more general models are possible.

Qnet Induced Biology-Aware Distance Between Strains
Definition 2 (Pseudo-metric Between Sequences). Given two sequences x; y P N I ¦ i , such that x; y are drawn from the populations P; Q inducing the Qnet¨P ;¨Q, respectively, we define a pseudo-metric @x; yA, as follows: where J@¡; ¡A is the Jensen-Shannon divergence 18 and E i indicates expectation over the indices.
The square-root in the definition arises naturally from the bounds we are able to prove, and is dictated by the form of Pinsker's inequality 7 , making sure that we satisfy the requirement that distances along a path in a constructed phylogeny sum linearly. This allows standard algorithms to be used for phylogeny construction.
Importantly, the q-distance defined above is technically a pseudo-metric since distinct sequences can induce the same distributions over each index, and thus evaluate to have a zero distance. This is actually desirable, since we do not want our distance to be sensitive to changes that are not biologically relevant. The intuition is that not all sequence variations brought about by substitutions are equally important or likely. Even with no selection pressure, we might still see random variations at an index if such variations do not affect the replicative fitness. Under that scenario, the corresponding¨i will predict a flat distribution no matter what the input sequence is, thus contributing nothing to the overall distance. And even if two strains x; y have the same entry at some index i, the remaining residues might induce different distributions¨i based on the remote dependencies, i:e:, the entries in x i ; y i . Also, it matters if the sequences come from two different background populations P; Q, i:e:, if the induced Qnets¨P ;¨Q are different. Thus, if we construct Qnets for H1N1 Influenza A separately for the collection years 2008 and 2009, then the same exact sequence collected in the respective years might have a non-zero distance between them, reflecting the fact that the background population the sequences arose from are different, inducing possibly different expected mutational tendencies.
Next, we induce a q-distance between a sequence and a population and between two populations.

In-silico Corroboration of Qnet Constraints
We carry out in-silico experiments to corroborate that the constraints represented within an inferred Qnet are indeed reflective of the biology in play. To that effect, we compare the results of simulated mutational perturbations to sequences from our databases (for which we have already constructed Qnets), and then use NCBI BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to identify if our perturbed sequences match with existing sequences in the databases (and if so, then where and how many matches they produce). The objective here is to compare such Qnet constrained perturbations against random variations. The results are shown in SI- Fig. 3, where we find that in contrast to random variations, which rapidly diverge the trajectories, the Qnet constraints tend to produce smaller variance in the trajectories, maintain a high degree of match as we extend our trajectories, and produces matches closer in time to the collection time of the initial sequence -suggesting that the Qnet does indeed capture realistic constraints.

Significance Test for Population Membership & Progressive Drift in Population Characteristics
For our modeling to be reliable, we need a quantitative test of how well the Qnet represents the data and whether we need to re-calculate the predictors or we have sufficiently many sequences. Here, we formulate an explicit membership test to address this.
Definition 4 (Membership Probability of a seuqnce). Given a population P inducing the Qnet¨P and a sequence x, we can compute the membership probability of x: ! P x P r@x P P A a N jaI ¨P j @x j Aj xj ¡ Note that x j is the j th entry in x, and is thus an element in the set ¦ j . Since we are mostly concerned with the case where ¦ j is a finite set,¨P j @x j Aj xj is the entry in the probability mass function corresponding to the element of ¦ j which appears at the j th index in sequence x.
We can carry out this calculation for a sequence x known to be in the population P as well, which allows us to define the membership degree ! P x .
Definition 5 (Membership Degree). Let X be a random field representing a population P , ie:. X a x is a randomly drawn sequence from P . Then the membership degree ! P is a function of the random variable X: Note that ! P takes values in the unit interval H; I, and the probability x is a member of the population P is ! P @X a xA, denoted briefly as ! P x or ! x if P is clear from context.
Since ! P @XA is a random variable, we can now compute sets of sequences that better represent the population P , and ones that are on the fringe. We can also evaluate using a pre-specified significance-level if a particular sequence is not from the population P , thus identifying if we need to recompute the predictors¨, or split the base population. We can set up a hypothesis testing scenario to determine if sequences are indeed from a test population, as follows: Significance Test for the Validity of Inferred Model: Given a population P, inducing a Qnet¨P , and a sequence x, we assume the null hypothesis is x P . We reject the null hypothesis at a pre-specified significance , if P r@! P @XA ! P @X a xAA (8) The fraction of newly observed sequences that do not reject the null hypothesis can then be used as an estimate of the species-specific divergence in population characteristics.

Theoretical Probability Bounds
The Qnet framework allows us to rigorously compute bounds on several quantities of interest, and these bounds are rigorously established in Theorem 1. The fundamental bound is on the probability of a spontaneous change of one strain to another, brought about by chance mutations. While any sequence of mutations is equally likely, the "fitness" of the resultant strain, or the probability that it will even result in a viable strain, is not. Thus the necessity of preserving function dictates that not all random changes are viable, and the probability of observing some trajectories through the sequence space are far greater than others. The Qnet framework allows us to explore this constrained dynamics, as revealed by a sufficiently large set of genomic sequences.
With the exponentially exploding number of possibilities in the sequence space, it is computationally intractable to exhaustively model this dynamics. Nevertheless, we can constrain the possibilities using the patterns distilled by the Qnet construction.
We show in Theorem 1 that at a significance level , with a sequence length N, the probability of spontaneous jump of sequence x from population P to sequence y in population Q, P r@x 3 yA, is bounded by: where ! Q y is the membership probability of strain y in the target population. The ability to estimate the probability of spontaneous jump between sequences in terms of has crucial implications. It allows us to 1) construct a new phylogeny that directly relates the probability of jumps rather than the number of mutations between descendants. 2) simulate realistic trajectories in the sequence space from any given initial strain, and 3) estimate drift in the sequence space by analyzing the statistical characteristics of the diffusion occurring in the strain space.

More Fit in the Target Population Makes Jump More Probable
As an immediate consequence of Eq. (9), we can argue that the lower bound of the likelihood of a jump to a target sequence is higher if the final sequence is more fit in the target population. Note that the membership degree by definition quantifies the probability of generating a sequence from our inferred qnet, and since we are far more likely to collect dominant strains when we survey a population, it follows that the membership degree is related to the qualitative notion of fitness.
Conversely, as the fitness of the initial strain (in the neighborhood of ! P x a I) measured by its membership degree falls, the minimum probability of going through a spontaneous jump is higher. We can see this by first noting that for x y: ! P x a I A P r@xjyA a H (10) which follows since each term in the product on the right hand side in Eq. (17) is either zero or one if ! P x a I, and there is at least one zero since x y. To see that the suppression of probability of jump is not simply true if ! P x a I but also in the neighborhood, note that: which implies that in the neighborhood of ! P x a I, we have: i ¨P i @x i A yi I R I ¨Q i @y i A yi > H (12) implying that the distance decreases as the membership degree of x falls, thus lowering the lower bound on the probability of a spontaneous jump. The argument is not necessarily true if x is not in the neighborhood of ! P x a I in the first place, and so is of lesser practical interest.
Next, we briefly describe the key applications of the Qnet framework explored in this study, highlighting the predictions made and validations obtained.

A Biology Aware Phylogeny
There are more than one computational approach to construct phylogenies, but a majority of these algorithms require a notion of distance between biological sequences, and the edit distance is the one that is most commonly used to construct phylogenies. Using the Qnet induced distance described earlier we can construct phylogenetic trees distinct from those obtained using the classical metric. More importantly, the qnet induced phylogeny is reflective of evolutionary change in a manner that conventional trees are not. As we follow a path in an Qphylogeny, we can explicitly compute the probability of the changes represented by that path. This probability is bounded above and below by a function of the total path length, i:e:, the sum of the q-distances along the path. We can show that for the path x a x H 3 ¡ ¡ ¡ x k 3 ¡ ¡ ¡ x m a z, we have: p VN P I ¢ log P r@x 3 zA m iaI log ! x i p VN P I ¢; where ¢ a m iaI @x k I ; x k A (13) Considering only the lower bound, log P r@x 3 zA m iaI ! x i p VN P I ¢ (14) where ! x i is the membership probability in the base population of the strain x i . Thus, we relate closer phylogenetic distance to explicit probability of spontaneous jump. Note that the definition of the distance function in the Qnet framework allows the summation in Eq. (13), allowing standard tools to construct the phylogenetic tree.

Predicting Seasonal Strains
Analyzing the distribution of sequences using the q-distance allows us to estimate seasonal drift, which is particularly applicable to Influenza and Influenza-like viruses for which periodic adjustments of vaccine components are necessary to account for antigenic variations.
Our prediction is based on the following intuition: since the probability of spontaneous jump to a strain further away in the q-distance is exponentially lower, the q-centroid of the strain distribution (the centroid computed in the q-distance metric) observed over a season is expected to move slowly, and will be close to the dominant strain in the next season. Thus, we estimate the predicted dominant strain x tCI at time t C I, as a function of the observed population at time t as follows: x tCI a rg min xPP t yPP t @x; yA (15) where P t is the sequence population at time t. Here the unit of time is chosen to reflect the appropriate frequency over which vaccine components are re-assessed. In the case of Influenza, this is typically one year. Using this formulation, we test if the predicted strains actually turn out to be closer to the dominant strain in the classical edit distance, when compared against the WHO vaccine recommendation for that season. Our results in Fig. 2 in the main text show that our hypothesis turns out to be correct with few exceptions.  (16) where ! Q y is the membership probability of strain y in the target population Q (See Def. 4), and @x; yA is the q-distance between x; y (See Def. 2).

Proof of Probability Bounds
Proof. Using Sanov's theorem 7 on large deviations, we conclude that the probability of spontaneous jump from strain x P P to strain y P Q, with the possibility P Q, is given by Writing the factors on the right hand side as: we note that¨P i @x i A,¨Q i @y i A are distributions on the same index i, and hence: j¨P i @x i A yi ¨Q i @y i A yi j yiP¦i j¨P i @x i A yi ¨Q i @y i A yi j (19) Using a standard refinement of Pinsker's inequality 19 , and the relationship of Jensen-Shannon divergence with total variation, we get: where a H is the smallest non-zero probability value of generating the entry at any index. We will see that this parameter is related to statistical significance of our bounds. First, we can formulate a lower bound as follows:

Multivariate Regression to Identify Factors in Strain Prediction
We investigate the key factors that contribute to our successful prediction of the dominant strain in the next season. We carry out a multivariate regression with data diversity, the complexity of inferred Qnet and the edit distance of the WHO recommendation from the dominant strain as independent variables. Here we define data diversity as the number of clusters we have in the input set of sequences, such that any two sequences five or less mutations apart are in the same cluster. Qnet complexity is measured by the number of decision nodes in the component decision trees of the recursive forest.
We select several plausible structures of the regression equation, and in each case conclude that data diversity has the most important and statistically significant contribution (See SI-Tab. XV in Supplementary text).

DATA MANAGEMENT
Models generated in this study is included as supplementary material, and working software is publicly available at https://pypi.org/project/quasinet/. Accession numbers of all sequences used, and acknowledgement documentation for GISAID sequences in also available as supplementary information.

ACKNOWLEDGMENTS
This work is funded in part by the Defense Advanced Research Projects Agency (DARPA) project #FP070943-01-PR. The claims made in this study do not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
3 Q-distance validation in silico using Influenza A sequences from NCBI database. Panel a illustrates that the Qnet induced modeling of evolutionary trajectories initiated from known haemagglutinnin (HA) sequences are distinct from random paths in the strain space. In particular, random trajectories have more variance, and more importantly, diverge to different regions of the landscape compared to Qnet predictions. Panel b-e show that unconstrained Q-sampling produces sequences maintain a higher degree of similarity to known sequences, as verified by blasting against known HA sequences, have a smaller rate of growth of variance, and produce matches in closer time frames to the initial sequence. Panel c shows that this is not due to simply restricting the mutational variations, which increases rapidly in both the Qnet and the classical metric. . . . . . . . . . 5      Fig. 1. Sequence comparisons. The observed dominant strain, we note that the correct Qnet deviations tend to be within the RBD, both for H1N1 and H3N2 for HA (panel a shows onbe example). Additionally, by comparing the type, side chain area, and the accessible side chain area, we note that the changes often have very different properties (panel b-f). Panels g-i show the localization of the deviationbs in the molecular structure of HA, where we note that the changes are most frequesnt in the HA1 subunit (the globular head), and around residues and structures that have been commonly implicated in receptor binding interactions e:g the 200 loop, the 220 loop and the 180-helix.  Fig. 3. Q-distance validation in silico using Influenza A sequences from NCBI database. Panel a illustrates that the Qnet induced modeling of evolutionary trajectories initiated from known haemagglutinnin (HA) sequences are distinct from random paths in the strain space. In particular, random trajectories have more variance, and more importantly, diverge to different regions of the landscape compared to Qnet predictions. Panel b-e show that unconstrained Q-sampling produces sequences maintain a higher degree of similarity to known sequences, as verified by blasting against known HA sequences, have a smaller rate of growth of variance, and produce matches in closer time frames to the initial sequence. Panel c shows that this is not due to simply restricting the mutational variations, which increases rapidly in both the Qnet and the classical metric.