Emergence of a putative novel SARS-CoV-2 P.X lineage harboring N234P and E471Q spike protein mutations in Southern Brazil

Novel SARS-CoV-2 lineages are constantly reported worldwide, raising concerns about transmissibility, virulence, immune response and vaccine/antigenic escape. Variants of concern (VOCs), as B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma) and B.1.617.2 (Delta), caused epidemic outbreaks due their higher potential of transmissibility when compared with earlier waves of SARS-CoV-2 in 2019. B.1.1.28 lineage has been evolving in Brazil since February 2020 and originated P.1 (VOC), P.2 (VOI) and other P.Xs proposed as new variants. This lineage harbors specific defining mutations including two non-synonymous substitutions in the Spike (S) protein (D614G and V1176F). In this study, employing variant calling analysis on FASTQ reads and phylogenetic inference, we report a potentially new SARS-CoV-2 P.X variant. Variant calling mutational profile was investigated and presented additionally non-synonymous mutations when compared to B.1.1.28, including N234P and E471Q in S protein. Further studies are required to understand the spread of P.X variant and its potential effects on transmissibility and immune escape.


Introduction
Since the beginning of the SARS-COV-2 pandemic, in 2019, we have seen a consecutive emergence of Variants of Concern (VOCs), as B.1.1.7 (Alpha) [1], B.1.351 (Beta) [2], P.1 (Gamma) [3] and B.1.617.2 (Delta) [4], which were rst identi ed in UK, South Africa, Brazil and India, respectively. These VOCs have been responsible for an expressive number of cases and increased transmissibility. Antibody response, virulence, reinfection potential and vaccine e cacy against the emergence variants, are not yet fully known, posing a risk for future outbreaks and e cacy in vaccination programs [5]. In Brazil, the slowness in the vaccination program and the undemanding restrictive measures are certainly a risk for the emergence of new variants. SARS-CoV-2 phylogenetic studies are essential to new variants monitoring and to highlight some speci c mutations that could confer tness advantages and immunological resistance [6]. Corona-ômica-BR, a Brazilian network of researchers that is dedicated to SARS-CoV-2 genomic surveillance in Brazil, has strongly contributed to keep the monitoring updated and available (http://www.corona-omica.brmcti.lncc.br/#/).
In this post, we report the nding of a potential newly P.X variant emergence. Sequencing of 26 highquality SARS-CoV-2 whole-genomes were retrieved through the Illumina MiSeq platform. The sequences generated herein were rst classi ed as B.1.1.28, but showed unique additional non-synonymous mutational pro le when applying variant calling approach on FASTQ reads, performed on Geneious Prime software. Variant calling and phylogenetic tree analysis were carried out in order to describe the putative new variant. This analysis allows monitoring the emergence of new variants in real time, providing a great contribution to SARS-CoV2 genomic surveillance studies.

Methods
The genomic SARS-CoV-2 study was conducted at Laboratório de Microbiologia Molecular, Universidade Feevale, Rio Grande do Sul (RS), Brazil. A total of 26 SARS-CoV-2 positive samples were collected between between late April and mid-June. There were 20 passengers in Uruguaiana city, crossing the border between Argentina and RS in Southern Brazil; four samples from Secretaria Municipal de Saúde (SMS) and the remaining samples from Garibaldi city, RS, received in our laboratory.
Brazilian SARS-CoV-2 complete genomes and the reference sequence (EPI_ISL_402124) (>29 kb) were retrieved from the GISAID database and aligned with the sequences generated herein. Sequence alignment was performed using Clustal Omega and the reference sequence from Wuhan was applied as an outgroup. The Maximum Likelihood phylogenetic analysis under General Time Reversible model allowing for a proportion of invariable sites and substitution rates were inferred empirically in IQ-TREE v2.1.2 web server [7] applying 200 replicates and 1000 bootstrap and the analysis corroborated the previous results. Furthermore, variant calling approach was conducted through Geneious Prime software, between a FASTQ samples dataset and a known reference sequence (hCoV-19/Wuhan/WIV04/2019), analyzing Single Nucleotide Polymorphisms (SNPs) and amino acid changes.

Results And Discussion
Consensus sequences were classi ed as B.1.1.28 according to the Pangolin COVID-19 Lineage Assigner tool (https://github.com/hCoV-2019/pangolin). Nevertheless, analyzing the FASTq reads through variant calling method, additional non-synonymous mutations were identi ed when compared to those of B.1.1.28. From 26 complete genomes, eight additional non-synonymous mutations were identi ed in ORF1ab (Q3777H, M3934I, Q3998R, L4182F, E4572D, RV4573QI, Q4576K, Y5601I), seven in S protein (G232A, I233L, N234P, N234K, T236S, V362E, E471Q), two in N protein (P13L, D63E) and one in ORF8 (A65S). Among those, six drew attention due to the total frequency in the samples and appear to be signatures of the putative new PX variant, specially N234P and E471Q in S protein, and M3934I and L4182F in ORF1ab (Table 1). B.1.1.28 and P.2 were the most predominant strains in Brazil, until the introduction of P. 1 [8], which replaced almost completely those two former variants. It's important to highlight that B.1.1.28-de ning lineage mutations were also found in ORF1ab (L3930F and P4715L), S protein (D614G and V1176F) and N protein (RG203KR), con rming the descent of this putative new lineage. It is likely that B.1.1.28 is evolving and that the ndings described herein may represent a real-time scenario of this. The fact that the additional mutations have not yet been detected in all of the sequences is expected, as described in a P.1 introduction study, in which just 13% of B.1.1.28 sequences already presented E484K P.1 signature mutation [9]. Furthermore, it would not be surprising if the additional mutations are completely replaced in the coming months.
Among the likely P.X lineage S protein signatures, N234P was found in 13 genomes and E471Q in 23.
Observing the presence of these mutations, the minimum and maximum frequency per sample was around 26-30% and 70-78%, respectively (Table 1). E471Q has already been detected in some countries, such as India, in a local circulating variants study [10], nevertheless, it has never been reported in Brazil, nor has it been associated with B.1.1.28 lineage. A systematic study based on all possible future S protein mutations tested E471Q and the results showed an increased binding a nity on the receptorbinding domain (RBD) [11], which can increase viral tness.
ML phylogenetic tree showed that the sequences generated herein clearly clustered into a separated group in a highly supported monophyletic clade (Figure 1 Table   Table 1 is available in the Supplementary Files. Figure 1