NoV outbreak settings and geographical locations
According to ten county-level CDCs, there were 203 NoV outbreaks in Shenzhen between the period September 2015 and August 2018. The most outbreaks were from Nanshan and no outbreak was from Yantian (Figure 1). Information on the outbreak size was reported for 197 (97.0%), ranging from 5 to 115 cases per outbreak (Table 1). Of the 203 outbreaks, 91.6% occurred in school settings, with 8.4% occurring in non-school settings (Table 2). The reported outbreaks peaked in the cold season, especially from November to March (Figure 2).
Genotypic distribution of identified NoV
Of the 203 outbreaks detected as caused by NoV according to real-time RT-PCR from September 2015 to August 2018, 150 were successfully genotyped. Of these 150 outbreaks with genotype information, 137 (91.3%,137/150) and 12 (8.0%,12/150) were classified into GII and GI genogroups. 1 (0.6%,1/150) outbreak involved both GI- and GII-positive samples. A total of 15 capsid genotypes and 15 polymerase genotypes were identified. The dominant genotype was GII.2[P16]. In addition, we identified a novel recombinant genotype GII.Pe/GII.17 that had not been previously found in Shenzhen before (Table 3).
Genotype distribution and outbreak characteristics
For outbreaks caused by the GII.2 strain, most occurred in school settings: 73 (79.3%, 73/92) outbreaks occurred in child care centre and the dominant setting distribution (child care centre, primary school, middle school) of GII.2 infection showed no significant differences (Fisher's exact test=3.595, p=0.177). Of the thirteen outbreaks caused by the GII.3 strains, most (86%, 11/13) also occurred in child care centre.
Intergenic recombination
To determine whether all the GII.2[P16] strais in Shenzhen are recombinant strains, 21 nearly full-length genome sequences from Shenzhen were analysed using Simplot software v.3.5.1, with GII.P16/GII.4 Sydney2012 (KY887605.1), GII.2[P16] (LC20944.8) and GII.2[P16] (MG746027.1) as the reference strain sequences. The results showed that the ORF1 region of GII.2[P16] in Shenzhen is highly homologous to that of GII.P16/GII.4 Sydney2012. ORF2 and ORF3 are highly homologous with LC20944.8 and MG746027.1. The breakpoint was found to be 5003 nt, close to the ORF1/ORF2 overlap region (Figure 3).
Phylogenetic analysis of RdRp region and VP1 sequence of GII.2[P16] Strain
To examine strain evolution, 52 full-length RdRp region of GII.2[P16] strain sequences from Shenzhen and 95 reference sequences from GenBank were collected for analysis. MCMC chains were run for 1.0×108 steps for the RdRp region sequences. Effective sample sizes greater than 200 were confirmed by the Tracer. According to the MCC (Maximum clade credibility) tree, the evolutionary rate of the RdRp region of the Gll.P16/GII.2 strain was estimated as 2.1×10-3 substitutions/site/year (95% HPD interval, 1.7×10-3- 2.5×10-3 substitutions/site/year). The common ancestors of the GII.2[P16] strain from Shenzhen and GII.P16/GII.4 Sydney 2012 diverged from 2011 to 2012, and the prototype of the RdRp region of the GII.2[P16] strain from Shenzhen formed during 2012 to 2013. The phylogenetic analyses suggested that the RdRp region evolved from GII.P16/GII.4 Sydney2012 (Figure 4A).
Simultaneously, 72 full-length VP1 sequences of GII.2[P16] retrieved from Shenzhen and 146 GenBank reference sequences were used to explore the evolutionary rate. MCMC chains were run for 2.0×108 steps for the VP1 sequence, Effective sample sizes were greater than 200, as confirmed by Tracer. The evolutionary rate of the VP1 sequence of the GII.2[P16] strain was estimated at 2.7×10-3 substitutions/site/year (95% HPD interval, 2.4×10-3-3.1×10-3 substitutions/site/year) based on the MCC tree. The common ancestors of the GII.2[P16] strain from Shenzhen and previous GII.2[P16] diverged from 2003 to 2004, and the prototype of VP1 sequence of GII.2[P16] strain in Shenzhen was formed during 2013 to 2014. The phylogenetic analyses suggested that VP1 evolved from GII.2[P16] (2010-2012) (Figure 4B).
Amino acid mutations of non-structural region of GII.2[P16]
To explore the amino acid mutations within the non-structural region of the recombinant strains, 14 nearly full-length non-structural protein sequences and 22 full-length reference sequences, including GII.17[P16] (2002), GII.2[P16] (2009-2014), GII.2[P16] (2010-2012), GII.13[P16] (2015), GII.3[P16] (2012-2013), GII.4[P16] (2015-2016) and GII.17[P16] (2016-2018), from GenBank were aligned. Sequence data revealed 102 (6%) parsimony-informative sites, but no amino acid mutation in non-structural region of the GII.2[P16] recombinant strain. Furthermore, 6 amino acid substitutions (*77E, R750K, P845Q, H1310Y, K1546Q, T1549A) were found only in recent strains (GII.4 Sydney 2012[P16] and the GII.2[P16] recombinant strain), 2 sites (A644P, A1521V) were substituted in the GII.2[P16] recombinant strains and 1 site (S/T753T) reverted. The resulst showed that amino acid 1310 (Motifs G) was substituted (Table 4).
Amino acid mutations of HBGA-binding and epitopes sites of the GII.2[P16]
To explore HBGA-Binding profile, predicted epitopes and epitope A to E sites of the GII.2[P16] recombinant strain[24-26], 72 full-length VP1 sequences from this study and 65 reference sequences, including GII.2[Pc] (1976-1978), GII.2[Ph] (1997), GII.2[P2] (1987-2015), GII.2[P12] (2004-2006), GII.2[P21] (2010), GII.2[Pe] (2014), GII.2[P16] (2010-2012), GII.2[P16] (2008-2014) and GII.2[P16] (2016-2018), from 1975 to 2018 were collected and aligned. Sequencing data revealed 29 parisimony-informative sites but there were no mutations in the HBGA-binding profile, predicted epitopes and epitope A to E of the GII.2[P16] strain (S2 Table).