1. CFR comparison in different Countries.
Mortality calculations during the epidemics are difficult, mostly due to calculation biases: during the initial period of the epidemic, many patients were diagnosed with COVID-19 only after developing critical illness or even at the time of death, whereas asymptomatic or paucisymptomatic patients were untested, leading to an underestimation of the denominator . Additional significant biases affect mortality curves: to name a few, the parameters used for death counting, the rigidity of lockdown measures, population age. Over time Countries started adopting better policies for diagnostic PCR testing and lockdown strategies, and consequently the spread of the virus was better monitored and the data were more carefully determined. We chose to analyze the Country-specific data relative to the number of COVID-19 deaths in April 2020, when some of the initial biases were likely attenuated, using the method described . The number of deaths of a specific day was divided by the total number of infected cases reported 14 days before. This method considers the fact that 14 days are the average lag time estimated between the first symptoms to death . The data analyzed for Italy, France, Germany, Spain, UK, Sweden and USA are reported in Figure 1(a). For all Countries we observed a decrease in the CFR values over time, with the exception of Germany (that maintains a very low value overall) and Sweden (where no decrease is observed). We identified two critical elements that might affect CFR among these Countries: a) the number of PCR tests made and b) the total number of positive cases for each Country. Since the second parameter b) depends on the first parameter a), we introduced a corrective Country-specific factor ρ = a / b, that was later used to normalize the CFR previously calculated (Table 1). Data obtained through this normalization model are reported in Figure 1(b). By taking only the data calculated on the 30th of April and representing them in a bubble plot (Figure 2), we clearly identify the presence of three clusters of Countries. Group 1 includes Germany and has a very low normalized CFR (0.31% C.I. (95%) [0.29 : 0.33] on April 30th 2020). Group 2 includes Italy, USA and Spain and has an intermediate value of normalized CFR (1.62% C.I. (95%) [1.51:1.72]; 1.65% C.I. (95%) [0.97:2.33]; 1.76% C.I.(95%) [1.36:2.15], respectively, on April 30th 2020). Group 3 includes France, Sweden and UK (3.49% C.I. (95%) [3.23:3.76]; 3.92% C.I.(95%) [3.83:4.02]; 3.90% C.I.(95%) [3.25:4.27], on April 30th 2020). The difference among cluster's CFR (respectively 0.31% vs 1.68% vs 3.78%) was statistically significant (p<0.001). Also, all pairwise comparisons were significant (p-adjusted with Holm method <0.001).
This result could be furtherly refined by considering the variability of the lag time due to patients age, i.e. older people (>70 y.o.) have a lower lag time  compared to others. However, even if the daily number of death patients divided per age is available for each Country, we could not provide in this study a further normalization of the CFR taking into account patients age, since a similar daily database of infected people divided per age is not publicly available. Anyway, since the infection mostly leads to death older people or those that have ongoing severe illnesses (i.e. cardiovascular diseases, diabetes, cancer), we can speculate that the overall estimation of the CFR is driven by this class of patients. Therefore, the observed CFR curves observed among different Countries through the introduction of an innovative corrective factor ρ, might be explained mainly by the different policies that were enacted by each Country. To further support this hypothesis, we note that in Countries of group 3 where lockdown was not put in place (i.e. Sweden) or it was adopted late, and less SARS-CoV-2 PCR tests were executed (i.e. in UK and France), normalized CFR is higher than in the other groups. Although further data are needed to refine the CFR estimation, we improved the CFR estimate by using a new corrective factor which considers two important variables (number of positives and number of PCR tests performed). In fact, several sources of variability affect CFR but for modifiable confounding factors, a standardization process could help to reduce the biases, improving the interpretability and comparability of CFR across Countries.
2. Lockdown impact on viral mutation spread
A database of 487 genome sequences isolated from patients infected with SARS-CoV-2 in Italy, Spain, Germany, France, UK, Sweden and USA has been randomly collected from the GISAID database, aligned and compared to the SARS-CoV-2 reference genome. A total of 27 genomes were considered in January 2020, 91 in February 2020, 210 in March 2020 and, finally, 159 genomes in April 2020. We analyzed 54 genome samples collected in Italy, 61 in Spain, 62 in Germany, 52 in France, 80 in UK, 50 in Sweden and 128 in the United States (Table 2).
We studied the evolution of the mutation patterns in the selected Countries from January to April 2020, and we reported only the recurrent mutations occurring more than 10 times in the time range considered, as described elsewhere . The occurrence of each mutation in a specific Country has been normalized by the number of genomes collected in that geographic area for each timeframe, dividing the silent by the non-silent mutations (Figure 3). Interestingly, the number of nonsynonymous mutations increases over time during the spread out of Asia, and appears to stabilize in April (Figure 3, top panel). The pattern of nonsynonymous mutations changes quite dramatically from January to February, when such mutations appeared for the first time. More in detail, part of the genomes analyzed in January 2020 belong to patients infected in China or to patients in close contact to those travelling or coming back from Asia. In February, most Countries decided to suspend flights at first from and to China and, after, only few communications were maintained between nations and during that month locally transmitted outbreak cases occurred. We observed a pattern of recurrent mutations which reached a homogeneous distribution across the different Countries in March 2020. This observation is confirmed also in April 2020 in all the analyzed Countries. It is likely that lockdown policies implemented in this period greatly reduced further viral spread from Asia and hampered mixing of SARS-CoV-2 strains among Countries. We observed a similar pattern for silent mutations (Figure 3, bottom panel).
Overall, our data show a number of silent mutations (nt241, nt3037) and nonsynonymous mutations (nt14408, nt23403 and nt28881-28882-28883) (Figure 3). Among the nonsynonymous mutations, we note the occurrence of an already observed mutation at position 14408, which is located in the viral RNA-dependent RNA-polymerase (RdRp gene), a key component of the replication/transcription machinery . This mutation (Figure 3, depicted in red) emerged in February 2020 and is quite homogeneously distributed across all the Countries analyzed. This is also observed for a mutation occurring in the spike protein (nt23403, Figure 3, depicted in black) and to a minor extent for a mutation in the nucleocapsid phosphoprotein (nt28881-28882-28883, Figure 3, depicted in blue). The occurrence of the mutation in the RdRp (nt14408) is always associated with that of the spike protein (nt23403), of the nsp3 mutation (nt3037) and of the mutation in the leader protein (nt241). A different pattern of hotspot mutations characterized viral genomes detected in patients from the United States. In February we initially detected three novel mutations (in position 17747, 17858 and 18060), that were not found elsewhere. These mutations were found predominantly in the viral genomes sequenced in Washington State (USA). The occurrence of this isolated pattern over time reflects the viral spreading of a more “European-like” strain (nt241, nt3037, nt14408 and nt23403) in the rest of the US. Overall, the occurrence of this “European-like” group varies from 32.5% of analyzed genomes (in USA) to 100% (in Italy). Our data confirm the previous observations made by Korber et al. , when the authors hypothesized that this mutation group, associated with the G clade, could enhance viral fitness, possibly due to the nt23403 mutation that triggers a significant amino acid substitution in a strongly immunogenic linear epitope of Spike protein, which might affect neutralizing antibodies sensitivity.
3. Emerging of new mutations
We noted the emergence of other recurrent mutation sites over time, both nonsynonymous (nt25563, nt28863) and silent (nt2480, nt2558, nt9476, nt15324, nt20268 and nt28656). The nonsynonymous mutations occur in the ORF3a and ORF9 (nucleocapsid phosphoprotein), causing the amino acid mutation Q56H (glutamine to histidine) and S197L (serine to leucine). All these mutations are found in most Countries and they are not exclusively reported in a specific geographic area. An additional recurrent mutation has been detected exclusively in genomes from Swedish at nt24368 (G to T transition); this mutation, which is located in the spike protein sequence, appeared in March (carried by 20% of genomes analyzed) and its frequency more than doubled in April (52% of genomes analyzed). This mutation triggers an amino acid substitution at position 936, from an aspartic acid to a tyrosine, with a significant shift in terms of isoelectric point from 2.85 to 5.64. D936 residue in SARS-CoV-2 Spike protein corresponds to the E918 residue of the homologue protein of SARS-CoV, and it is located in the heptad repeat 1 (HR1) domain [14, 15]. Heptat repeat 1 interacts with heptad repeat 2 (HR2) domain and form a six-helix bundle fusion core, able to bring viral and cellular membranes in close proximity, promoting fusion and infection of host cell [16, 17]. This makes HR1 and HR2 good target candidates for drug design. Recently, D936 (site of the recurrent mutation) has been proved to bind to R1185 of the heptad repeat 2 (HR2) domain through a salt bridge. Additional studies are required to further characterize if G936 mutant, present in April in more than half of Swedish genomes analyzed, could provide some beneficial advantages in terms of viral fitness, as observed for mutation nt23403 . Among the Countries in the different groups there are no significant differences in the distribution of mutations, since the recurrent mutation pattern is comparable among different Countries (Figure 3, top panel). The only significant difference is the newly emerged mutation nt24368, that in our database was detected only in the genomes analyzed in Sweden.