Genetic Variability and Phylogeny of Human Papillomavirus Type 16 Based On E6, E7 and L1 Genes in Central China


 In the current study, a total of 74 single-infected HPV16 samples from females attending the gynecological outpatient clinic in four cities of Henan province were collected and applied to the L1, E6 and E7 sequencing. Variations of the HPV16 L1, E6 and E7 genes were characterized by comparison with reference sequence and the secondary structure analysis were conducted. Phylogenetic trees based on the L1 and E6-E7 sequences were constructed separately. B-cell epitopes of the HPV16 E6 and E7 proteins were predicted further. A total of thirty-seven novel variations, including twenty L1 genes and seventeen E6-E7 genes were identified. Compared with the reference sequence, twenty-eight variations (1.8%, 28/1596) were identified in L1 gene sequences and 10/28 (35.7%) were non-synonymous mutations. For E6-E7 sequences, twenty-five novel gene changes (including 16 mutations (3.4%, 16/477) in E6 gene, 9 mutations (3.0%, 9/297) in E7 gene) were found, 18/25 (72.0%) were non-synonymous and 10/28 (35.7%) were non-synonymous mutations. Phylogenetic analysis showed that 56.8% (42/74) of the samples were A1 sublineages, 37.8% (28/74) were A4, 4.1% (3/74) were A3 and 1.4% (1/74) was A2 sublineages. On the prediction of B-cell epitopes, seven potent epitopes for E6 and four for E7 were identified. Amino mutations, including L90V, R62K, R142Q and F76L in E6, S63F and N29S/H in E7 changed the score. HPV16 variants prevalent in the central China belong to European A1 sublineages. Sequences of HPV16 L1, E6 and E7 in this study may provide assistant for the improvement of HPV vaccines.


Introduction
Cervical cancer is the fourth most commonly diagnosed and leading cause of cancer death among females worldwide, with an estimated 570,000 cases and 311,000 deaths in 2018 [1]. Around 85% of women diagnosed and 87% of women who died from cervical cancer live in the developing countries [2]. Human papillomavirus (HPVs) were found in the cervical carcinoma tissues of most patients and the oncogenic HPVs are regarded as the major cause of cervical cancer. In China, it was reported that there are estimated 110,650 new cancer cases and 36,714 cancer deaths are attributable to HPVs infection in 2015, of which cervical cancer accounted for 85.6% and 78.1% [3].
HPVs are small non-enveloped double-stranded DNA viruses that belong to the genus Alpha-Papillomaviridae family [4]. The HPVs genomes are about 7.2kb to 8.0kb and contain eight open reading frames (ORFs), including: the presumptive early (E1-E2, E4-E7), late (L1 and L2) and Long Control Region (LCR) [5][6][7]. The continued expression of the E6 and E7 genes is related to induce cellular immortalization, transformation, and carcinogenesis [6]. The L1 protein is the primary composition of HPVs and self-assembles into virus like particles (VLPs) [8]. The rst-generation commercial HPV vaccines are based on the recombinant expression of L1 protein in system, such as the bivalent and quadrivalent HPV vaccine [9][10]. Human immunized with commercial HPV vaccines can acquire robust immunity against the homology genotype [8]. The E6 and E7 proteins could be candidate for the development of therapeutic vaccines [11].
It was reported that HPV16 is the primary genotype in Henan province, which is located in central China [16]. However, HPV16 sublineages and the nucleotides variations of HPV16 in Henan province are rarely characterized. Principal objective of the present study is to identify the sublineages and analyze the mutations of HPV16 L1, E6 and E7 nucleotides in central China. Investigate the sublineages of HPV16 would assist in the elucidation of the intrinsic geographical relationships. Variations of HPV16 gene may be bene cial for better understanding the functions of HPV genes on oncogenic and the design of HPV vaccine.

Sample Collection
From May 2019 to May 2021, cervical swabs from females who attended in the gynecological outpatient in four cities (including Jiyuan, Luohe, Luoyang and Zhengzhou) in Henan province, which are located in central China, were collected. After routine cytology examination, the specimens were subjected to HPV genotyping by ow-through hybridization and gene chip (Hybribio Limited Corporation, Guangdong, China) according to the manufacturer's instruction. The single infected HPV16 positive samples were processed for the variant analysis by sequencing. The study protocol was approved by the institutional ethics committee in the 989 Hospital of Joint Service Support Force of Chinese PLA, Military Training Medical Research Institute of the Whole Army.

HPV genotyping and sequencing
Genomic DNA of the samples were extracted by alkalysis using DNA extraction kit (HybriMax, Hybribio Limited Corporation, Guangdong, China) and then stored at -70°C. The quality and integrity of the extracted DNA were monitored through ampli cation of the β-globin gene as an internal control. To amplify the full length of the L1, E6 and E7 genes, primers were designed based on published sequences in GenBank (NC 001526). The primers used for the ampli cation of L1 E6 and E7 genes were shown in Table1. The primers were synthesized by Sangon Biotech, Inc. (Shanghai, China). The PCR reaction volume was 50µl, which included 2µl of template cDNA, 25µl 2×PrimeSTAR Max Premix(Takara Biotechnology Co., LTD, Dalian, China), 2µl of each primer and 17µl of ultrapure water. The PCR program was as follows: initial denaturation step at 94℃ for 10 min; followed by 30 cycles of 95℃ for 30s, 55℃for 30s, 72℃ for 30s, and a nal 72℃ extension for 10min. The PCR products were visualized on 1% agarose gels stained with GoldView TM Nucleic Acid Stain. Identi ed plasmids containing the L1, E6 or E7 genes were used as positive control and the reaction mixture containing no template as negative controls. The targets fragments were then puri ed using TIANgel Midi Puri cation Kit

Molecular characterization and Phylogenetic analysis of HPV16
The variations of the L1, E6 and E7 genes and proteins were compared and numbered with the reference strain HPV16 (GeneBank NC 001526) by DNAStar (Madison, WI, USA). Variants between the studied and reference sequence were noted and the frequency was calculated. Novel variants were identi ed by the comparisons of the sample sequences with those previously published in GenBank by Blast program.

Discussion
Cervical cancer is the fourth most common malignancy in females around world [1]. Persistent high-risk HPVs (eg. HPV16, HPV18, et al) infections play an important role in the development of cervical cancer. The prevalence of HPV infection among the females in Henan province had been evaluated and HPV16 is the most predominant genotype [19,30]. The L1 protein is the major capsid protein and able to induce immune response [29]. Phylogenetic distance and amino variations of the L1 protein in uence the immune e ciency of HPV vaccines [10,31]. The uncontrolled expression of E6 and E7 proteins inactivates the p53 and pRb tumor suppressor proteins and is associated with the HPV persist infection [32]. HPV variants and nucleotide mutations have been suggested to affect the oncogenic potential of HPV persistent infection [23][24][25]33]. Sublineages and variations of HPV16 genes in Henan province have not been reported. Thus, HPV16 L1, E6 and E7 sequences were selected to study lineage phylogeny and the genetic polymorphisms.
In the present study, the sublineages of HPV16 among females in Henan province were investigated by sequencing the L1 gene and phylogenic analysis. HPV16 A1-A4 sublineages were all present and the A1 was the most common sublineage in Henan province. The A4 sublineage was the highest in other areas in China, such as Beijing city, Zhejiang and Yunnan province [34][35][36][37]. The discrepancies were perhaps due to the HPV geography or ethnicity characteristics [37]. Compared with A1-3 sublineages, it was reported the A4 sublineage are associated with more severity disease status in Chinese females and higher risk of cancer [23,24,38,39].
Fourteen non-synonymous mutations were found in HPV16 E6 gene and the amino mutations D32E (29/74) and L90V (5/74) were found to have higher mutations rates. It was reported that the D32E mutation was the diagnostic E6 polymorphism for A4 sublineage and L90V was the most common polymorphism within A1-3 sublineages [36]. In our study, the amino mutations D32E (29/74) were found in all of the A4 sublineage and L90V (5/74) were in the A1-3 sublineages. Although the D32E mutation in E6 protein did not induce the degradation of p53 with a higher level compared with the reference HPV16, the gene variation altered the other gene pro les [42,43]. It was suggested that the D32E amino mutation had a signi cant correlation with the persistent HPV16 infection in females [37]. Variation of L90V may have discrepant effect on the progression of HPV infection, which depends on the population [44]. The amino mutations L90V (5/74) has no signi cant association with cervical cancer in Chinese females [36]. Only four non-synonymous mutations were in the E7 gene, which showed the E7 gene was greater stability than E6 gene. The N29S (27/74) was the predominant variant, which was consistent with previous study [41,45]. The N29S is located in the domain that plays an important role in the binding of E7 protein with the retinoblastoma suppressor protein (pRB) and the complex formation of E7-pRB that is associated to cervical carcinoma [46]. Furthermore, it suggested that the amino 29 was involved in the host immune recognition and lesion progression [47].
In summary, the sublineage of HPV16 prevalent in the central China was rst reported to be A1. The HPV16 sublineage in Henan province would be bene cial for charactering the HPV gene polymorphism in China. Phylogeny trees based on the HPV16 L1 and E6-E7 sequences were constructed. Variations of the HPV16 L1, E6 and E7 were identi ed and some novel variants were found. However, the relationships between the variations and their effects on the protein function should be further investigated. The HPV16 L1 sequences may assistant on the improvement of HPV vaccines.
Associations between the HPV16 E6-E7 sequences and the regression of HPV-related lesion in cervical cancer need to be studied further.

Declarations
Note: numbers of the variation in the domain of the E6-E7 genes were statistics in bracket. Nucleotide sequence data reported are available in the GenBank databases under the accession numbers MZ546225-MZ546256 (E6) and MZ546257-MZ546285 (E7). Note: Sequences with amino acids change were highlight in red. Note: Sequences with amino acids change were highlight in red. Figure 1