Genomic Characteristics and Evolutionary Analysis of a Rare GI.1 Norovirus Isolate From Beijing, China


 Noroviruses are one of the main pathogens of acute gastroenteritis, causing frequent outbreaks worldwide every year that seriously affect human health. The GII.4 genotype causes most norovirus (NoV) infections and large-scale outbreaks. By contrast, the GI genotype is relatively rare. In this study, the whole genome sequence of a newly isolated ZD strain from a patient in Beijing, China, was sequenced and analyzed. The ZD strain genome consisted of 7,597 nucleotides and contained three open reading frames. Whole-genomic analysis indicated the strain was a GI.1 genotype, and no recombination site was detected in the genome. The histo-blood group antigen (HBGA)binding site associated with invasion of the GI genotype did not change, implying relatively conservative evolution. Phylogenetic analysis indicated the VP1 sequence of GI.1 strains could be divided into three clusters according to time of appearance: older (1968-2011), earlier (2011-2015), and new (2017-2018). Each cluster showed distinctive amino acid substitution characteristics, and the number of substitutions increased with time. The isolated ZD strain was in the new cluster. This study is the first to conduct a phylogenetic analysis of a GI genotype NoV isolated from Beijing. The results improve understanding of NoV diversity in China and can be a reference for further study of nondominant epidemic strains of NoVs as well as epidemic prevention and control.


Introduction
Noroviruses constitute a genus in the family Caliciviridae. Virus particles are round and simple in structure without an envelope and have only a layer of capsid protein with a smooth surface [1]. Formerly known as Norwalk virus, noroviruses were first collected from fecal samples during a gastroenteritis outbreak in Norwalk, Ohio, in 1929 [2]. In patients infected with norovirus, usual symptoms include vomiting, nausea, abdominal pain, fever, and self-limiting or nonhemorrhagic diarrhea. The incubation period is 24 to 48 h, and the symptoms usually last for 12 to 60 h [3].
The NoV genome is positive-sense single-stranded RNA that is 7.5 to 7.7 kb in length. The 5' end is covalently linked to the virus-encoded protein genome (VPg), and the 3' end is polyadenylated. Except for those of murine NoVs, genomes of most NoVs encode three open reading frames (ORFs) [4]. The first (ORF1) encodes a polyprotein that can be hydrolyzed into six types of nonstructural proteins (NS1-NS7) by protease, including NoV protease and RNA-based RNA polymerase (RdRp) [5]. The second (ORF2) encodes the main structural protein (VP1) of the virus, including shell (S) and protruding (P) domains. The P domain includes P1 and P2 subdomains. The S domain surrounds the virus RNA; whereas the P domain connects to the S domain through flexure hinges. The P2 subdomain is highly variable and represents the maximum exposure region of the protein [6]. It includes the main neutralizing epitope, the binding site of histo-blood group antigen (HBGA), and the location of the main antigenic determinant of the capsid [7]. The third (ORF3) encodes the secondary structural protein (VP2) of the virus, which is within virus particles and may be associated with capsid assembly and genome encapsulation [8]. A complete, mature NoV particle is a dimer composed of 90 major structural proteins (VP1) and one or two minor structural proteins (VP2) symmetrically assembled in an icosahedron.
Because suitable cell culture systems and animal models are lacking, NoVs cannot be cultured in vitro and serotyping cannot be used. Therefore, classification of NoVs is mainly based on the amino acid sequences of VP1 and RdRp. Currently, there are six recognized genogroups (GI~GVI) and 40 genotypes. In 2019, Preetti et al. expanded the number of NoV genomes to 10 (GI~GX) on the basis of the diversity of amino acid sequences of VP1, with two of the genomes tentative [9]. They suggested that complete genome sequences are needed to classify NoVs.
Human NoVs (HuNoVs) are one of the main pathogens of acute gastroenteritis (AGE), and they account for a very high proportion of diarrhea outbreaks [10]. Approximately 50% of AGE outbreaks worldwide are associated with NoVs. According to the World Health Organization (WHO), approximately 699 million people suffer annually worldwide from NoV infection, resulting in more than 200,000 deaths [11]. In addition, the average direct cost of NoVs to the health system is approximately 4 billion dollars per year, whereas the socioeconomic burden approaches 60 billion dollars [12]. Noroviruses are one of the leading causes of diarrhea in China, and there were 556 NoV outbreaks from October 2016 to September 2018 [13]. Notably, the primary NoVs infecting patients in those outbreaks were GII, GI, and GIV genotypes. The GII genotypes occur more frequently than GI genotypes; whereas GI genotypes are more frequent than GIV genotypes. Among the genotypes, GII.4 was predominant in all outbreaks, which may be because of its rapid evolution and antigenic diversity achieved through recombination [14]. The frequency of GI incidence is associated with geographic area. In NoV outbreaks in the United States from 2013 to 2016, outbreaks caused by GI.1 were rare, approximately 0.3% [15]. Moreover, sequence information of GI.1 is relatively scattered.
In this study, a GI.1 NoV strain was isolated from clinical samples of Beijing Haidian Hospital in 2018 and named ZD. Whole sequences of the ZD strain were sequenced and genomic characteristics were analyzed. The complete ZD sequence was compared with sequences of prevalent NoV strains worldwide to obtain the phylogenetic pedigree and its evolutionary characteristics. The analysis showed that recombination did not occur in the ZD strain and that the gene encoding HBGA, a key protein for GI infection, was highly conserved. Phylogenetic analysis of all available GI sequences showed that the prevalent GI strains in different periods were different. In addition, the prevalent GI strains in the same period had similar amino acid mutation characteristics. The number of amino acid mutation sites gradually accumulated and increased over time. The study provides new information on the GI genome as well as insight into understanding why particular genotypes of NoV strains are prevalent.

Extraction of norovirus RNA
Stool samples containing NoV were obtained in 2018 from Beijing Haidian Hospital, Bejing, China. MagaBio plus virus RNA Purification Kit (BIOER, HangZhou Bioer Technology Co. Ltd., China) was extracted using a kit following the protocol provided by the company. In brief, methods were the following. Stool samples were weighed, approximately 200 mg, and 2 ml of sterilized Phosphate Buffered Saline (PBS) was added. The mixture was centrifuged at 300 ×g for 5 min. The supernatant was removed and centrifuged at 16,260 ×g for 5 min. The supernatant was discarded, and the pellet was resuspended in 1 ml of PBS and centrifuged at 16,260 ×g for 5 min. That pellet was resuspended in 200 μl of PBS. An automatic nucleic acid extraction and purification instrument, Model NPA-32+ (BIOER, HangZhou Bioer Technology Co. Ltd., China), was used to extract virus RNA. Purified RNA was stored in a nucleasefree centrifuge tube at −80 °C.

Construction of norovirus RNA Library
KAPA Stranded RNA-seq Kit (Kapa Biosystems, FL, USA) was used to purify mRNA from the isolated RNA to prepare the RNA library. Samples of RNA, approximately 20 μl, were absorbed at 94 °C for 4 min to deplete ribosomal RNA. The fragmentation samples were prepared for first chain synthesis. PCR reaction conditions were as follow: 4 min at 25 °C, 15 min at 42 °C, and 15 min at 70 °C, followed by 60 min at 16 °C. Agencourt ®AMPure ® XPreagent Magnetic beads (NEW ENGLAND BioLabs, USA) were used to purify samples after synthesis of the second chain. Atailing was added, and the reaction conditions were as follow: 30 min at 30 °C and 30 min at 60 °C. Then, adapters were ligated by KAPA T4 DNA ligase, and the PCR tube was incubated in a metal bath for 15 min at 20 °C. Then, after two purifications by magnetic beads, library fragments were amplified by PCR with KAPA HiFi HotStart ReadyMix (2×), and the reaction conditions were as follow: predenaturation for 45 s at 98 °C; 20 cycles of 15 s at 98 °C, 30 s at 60 °C, and 30 s at 72 °C; and then 5 min at 72 °C. Magnetic beads were used to purify the amplified fragments. The purified library was used to detect the concentration of Qubit, and then, the mixed samples of the library were sequenced on a computer. Samples were sequenced by double-terminal 150 bp (PE150) with Illumina ® (https://www.illumina.com/) NoVaseq. There was 4.1 Gb of clean ZD data.

Bioinformatics analysis
ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) was used to search for ZD ORFs and then compare those ORFs with corresponding ORFs in the reference genome (NC_001959.2). The Pfam sequence analysis tool (http://pfam.xfam.org/) was used to correct ORF positions [16]. The BLAST tool (http://www.ncbi.nlm.nih.gov/BLAST/) was used to search for similarity of nucleotide sequences, and BioAider (v1.3) was used to compare selected sequences in different genomes.
BioAider [17] was used to compare the ZD strain with the 2018 Shanghai strain (MT008453.1), which had the highest homology with ZD. The Norovirus Typing Tool (https://www.rivm.nl/mpf/typingtool/norovirus/) [18] was used to determine the genotypes. ClustalW in BioEdit (v7.0.9) was used for multiple alignment of selected sequences. After alignment, the most suitable algorithm model to analyze the entire virus genome and different ORF sequences was obtained by using Find Best DNA/Protein Models in Molecular Evolutionary Genetic Analysis software (v10.0) (MEGA X). Whole genome sequences were analyzed using the GTR+G model, and ORF sequences were analyzed using the GTR+G+I model [19]. Maximum likelihood (ML) was used to construct phylogenetic trees, and trees were visualized using PhyML 3.0 with the previous optimal model parameters. A bootstrap test (100 replicates) was used. After multiple DNA sequences alignment with BioEdit, amino acid translation results were obtained by MEGA X, and changes in amino acid sites of the main structural protein VP1 of related strains were compared simultaneously. In addition, GI.1 NoV strains with complete VP1 sequence in this study are listed in Table 1.

Complete genome of the ZD strain
The whole nucleotide sequence of the ZD strain genome was 7,597 bp and included three ORFs spanning from 537 to 5,363 (ORF1), 5,347 to 6,939 (ORF2), and 5363 to 7,577 (ORF3) (Fig. 1). A 16-bp nucleotide overlap occurred between ORF1 and ORF2. The nucleotide sequence of the ZD strain genome was used in BLAST search and comparison, and the query coverage rate was 100%. The highest homology (99.14%) was with the Shanghai strain isolated in 2018 (MT008453.1). Compared with the MT008453.1 nucleotide sequence, the ZD genome had 64 single nucleotide polymorphisms (SNPs) ( Table S1). Among the SNPs, 57 were in coding regions, with 36 in ORF1, 16 in ORF2, and 5 in ORF3 (Table 2). Mutated amino acids corresponding to the SNPs are also listed in Table 2. There were seven nonsynonymous mutations, with four in ORF1, one in ORF2, and two in ORF3. In addition, G:C A:T transition > G:C A:T transversion, and the transition-transversion ratio estimate was 20.3 (Table 3).    Table 3 Mutation types of DNA between ZD strain and MT008453.1 noroviruses

Mutation type Amount
Transition T-C 37

Phylogenetic analysis of the ZD strain
The Online Norovirus Typing Tool v2.0 predicted the ZD strain had the GI.1 genotype. To understand the genetic relations between the ZD strain and other NoV genotypes, a phylogenetic tree based on whole genomes was constructed (Fig. 2). Consistent with the prediction, the ZD strain was classified as genotype GI.1. As shown in Fig. 2, the ZD strain was most closely related to the Shanghai strain isolated in 2018, and the identity between the two strains was 99.14%, as obtained by BLAST. Previous studies show that gene recombination occurs frequently near the junction of ORF1 and ORF2 in human NoV, which is related to antigenic shift [20]. Recombination at the junction of ORF2 and ORF3 [14] is less common. Because antigen evolution is related to recombination [21], phylogenetic analyses also were performed to compare recombination in each ORF of the ZD strain with that in selected NoV genomes (Fig.3). The results showed that the ZD strain was nonrecombinant on the basis of whole genome or single ORF analysis.

Comparison of the capsid protein VP1 sequence of the ZD strain with that of other GI.1 strains
Changes in the encoding sequences of the viral capsid protein VP1 can reveal potential variations in antigenicity and receptors for binding receptors [21]. In addition, VP1 is also an important part in norovirus vaccines [22]. Therefore, full-length VP1 proteins of all GI.1 genotype strains in GenBank were obtained. Notably, phylogenetic results of alignments of capsid protein VP1 sequences showed that all GI.1 strains, including the ZD strain, were divided into three clusters according to time of appearance. The GI.1 strains collected from 1968 to 2011 were clustered in an "older" cluster; those collected from 2011 to 2015 were clustered in an "earlier" cluster; and those collected from 2017 to 2018 were clustered in a "new" cluster (Fig. 4). The ZD strain collected in 2018 was in the "new" cluster. On the basis of multiple sequence alignments of the GI.1 capsid protein VP1, 30 amino acid sites of the 531 amino acids encoding VP1 changed, compared with the G1 reference strain (NC_001959.2). "Older," "earlier," and "new" clusters were identified, consistent with those of the phylogenetic tree ( Table 4). The results verified that all collected GI.1 NoVs could be clustered according to the time of collection. In the "older" cluster, there were no significant changes in amino acids compared with the reference strain. In the "earlier" cluster, the GI.1 strain began to change, but there were fewer change sites, mainly concentrated at 252 (S252G). However, in the "new" cluster, multiple amino acid positions of the GI.1 strain changed regularly, namely S11N, V25A, A27T, A142S, S251G, and H286Q. There were also several special strains that were collected in earlier years but appeared in more recent clusters, including a Swedish strain collected in 2007 (FJ384783.1) and a Japanese strain collected in 2003 (EF547392.1) in the "earlier" group and an American strain collected in 2001 (EF547392.1) in the "new" group. Prevalent NoVs tend to be replaced periodically, and thus, it was not surprising that older strains became prevalent again. Notably, the P2 region of VP1 is exposed on the surface of the capsid protein and is the main epitope for immune recognition and contains the binding domain of HBGA. As a highly variable region in NoVs, the occurrence and accumulation of mutations in the P2 region is often the main driving force for the evolution of epidemic strains of NoV (such as GII.4) [23]. Such mutations are also important in developing new strains with new immune binding characteristics and antigenicity [24]. In this study, although the GI.1 strains had multiple mutation sites, the amino acid residue sites related to HBGA binding did not change.

Discussion
Noroviruses are one of the major pathogens of AGE and diarrhea and are an important cause of morbidity and mortality in children under 5 years of age [25]. They are highly contagious and environmentally stable and are usually viable for several weeks after infection. Probability of symptomatic disease from a single NoV particle may be as high as 0.5. Norovirus infection causes severe vomiting and diarrhea. There are 106 to 109 stable, nonenveloped virus particles per milliliter of excreta, creating unlimited opportunities for further infection transmission and outbreaks [26]. Noroviruses are primarily transmitted via fecal-oral and vomit-oral routes and either person-to-person contact or contaminated food or water, as well as environmental contamination [27].
Because NoVs cannot be cultured in vitro, it is important to use whole genome sequencing for classification and analysis of epidemic trends. In this study, sequencing techniques were used to obtain the whole genome of the ZD strain of norovirus, isolated from clinical samples of Beijing Haidian Hospital. According to the NoV online typing tool (v2.0) and MEGA X, the ZD strain had a GI.1 genotype. An online search found that whole genome and partial sequences data of GI.1 were scattered, with links to sporadic cases worldwide. There were only about 20 GI.1 whole genome sequences in National Center for Biotechnology Information (NCBI), indicating that epidemics caused by GI.1 are extremely rare [15].
In the phylogenetic analysis, the ZD strain was clustered with the strain collected in Shanghai in the same year (data from NCBI online) ( Fig. 2 and 3). The high similarity suggested the two strains derived from the same origin. Recent studies indicate that NoV outbreaks are often closely associated with consumption of low-processed shellfish [28,29]. Moreover, several studies show that genogroup GI strains are more often associated with waterborne transmission than GII strains [30]. In general, GI strains have high environmental stability in seawater and bivalves and can usually be isolated from contaminated oysters [31]. Cheng et al. assessed NoV contamination of oysters imported from different countries for consumption in Hong Kong using RT-PCR and found the total positive rate was 10.5% [32]. Therefore, in countries that consume seafoods such as oysters, the ZD strain may come from contaminated seafood, indicating the need for kits to detect GI NoVs within seafood.
In the phylogenetic analyses of the whole genome and each ORF sequence of the ZD strain (Fig. 3), recombination [20] within the ZD genome was not detected. This result suggests the genome of genotype GI is conserved and stable. By contrast, in prevalent strains, such as the GII.4 strain, genomes are unstable and variable because of recombination. Changes in the sequence of NoV capsid proteins, especially VP1, can, at least partially, reveal the antigenicity and corresponding receptor binding functions of NoV variants [21]. In norovirus genotypes that cause epidemics, such as GII.4, evolution of capsid proteins can lead to evasion of host immune defenses and evolution of new subtypes. Therefore, changes in capsid proteins of NoVs need to be analyzed in the study of virus evolution. After narrowing the scope of the VP1 sequence of GI.1 from 1968 to 2018, a linear relationship with time from 1968 to 2018 was detected. On the basis of phylogenetic analysis and changes in amino acids encoding VP1, the GI.1 genotype was divided into three clusters related to time of appearance: "older," "earlier," and "new" (Fig. 4). Each cluster had unique amino acid substitution characteristics, and the number of amino acid substitutions increased with the timeline. Consistent with a previous study [33], of the several amino acid substitutions identified in GI.1, most were in the highly variable P2 region. According to previous studies, the GI.1 amino acid residue sites associated with HBGA binding were 327, 329, 338, 342, 344, 375, 377, 378, and 380 [34]. However, in GI.1 strains analyzed in this study, amino acid substitutions within the P2 region did not occur in the epitopes that are highly associated with NoV invasion and infection ( Table 4). Comparison of evolution rates between non-GII.4 and GII.4 strains [21] suggests non-GII.4 strains are subject to less adaptive pressure. In this study, the low recombination frequency could explain the low prevalence and low incidence of GI.1.
To summarize, whole-genome sequencing and bioinformatic analysis demonstrated the ZD strain was a rare GI.1 genotype. Phylogenetic analysis indicated a conservative evolutionary pattern in GI.1, with high homology among strains isolated from different regions in the same period. These results improve understanding of the GI genotype, the nondominant epidemic NoV strain. The results also suggest possible correlation between GI NoVs and seafood. This study provides reference for future genetic analyses and studies of evolutionary patterns of nondominant epidemic strains.

Conflicts of Interest:
The authors declare no conflict of interest. Availability of data and material: Data of ZD strain is deposited in China National Microbiology Data Center (NMDC) with accession numbers NMDCN0000Q78. Code availability：Not applicable. Authors' contributions：KM conceived and designed the experiments; WSZ, WZ and GBT collected the samples and performed the experiments; WSZ, XLH and XTZ analyzed the data; WSZ and KM wrote the paper; KM and YHX did the editing and proofreading. All authors have read and agreed to the published version of the manuscript. Ethics approval：This article is in compliance with ethical standards for research. Consent to participate ：Informed consent was obtained from the study subjects themselves.