The genomic distribution map of human papillomavirus in China

In this study, a total of 301,880 woman were recruited from 4 different regions of China. The overall prevalence of HPV was 18.01 %. The high-risk HPV infection rate was 79.14%, the low-risk HPV infection rate was 12.56 %, and the mixed HPV infection rate was 8.30%. The most common 4 HR HPV subtypes were HPV-52, 16, 58 and 53, which accounted for 20.49 %, 19.93 %, 14.54 % and 10.01 %. In LR HPV genotype, HPV-6 ranked the highest (28.17 %), followed by HPV-81 (9.09 %) and HPV-11 (3.78 %). HPV genotype subgroup analysis also showed that single-type infections had the highest prevalence rate (77.26 %) among HPV positive individuals. Among muti-infection genotype, double infection was the most common with frequencies of 76.04 %. This large report showed that the overall prevalence of HPV was high in China, whose distribution exhibits different patterns across different particular age and regions. Viral genotypes HPV 53, 6 were frequently detected in this population, which is worth of signicant clinical attention.


Introduction
Cervical cancer, a leading genital cancer, is considered as the third most common female gynecologic malignancy and the fourth most common cause of female death from cancer, with the estimated 570,000 new cases and 311,000 new deaths in 2018 (GLOBOCAN, 2018) (1,2). Compared with developed countries, the age-standardized incidence rate for cervical cancer is more frequent in developing countries (16.7 per 100,000 vs 12.7 per 100,000 women-years, respectively)(3). China accounts for around 14 % of the world's annual incidence of cervical cancer (4,5). Thus, in China, cervical cancer remains a relatively heavy burden of public hygiene management with increasing morbidity and mortality of cervical cancer in young women (6,7).
Human papillomavirus (HPV), a sexually transmitted DNA virus belonging to the Papovaviridae family, has been con rmed as the essentially causative agent for cervical cancer (8). It is estimated that most sexually active adults have been infected by at least one HPV Genotype (9). If the infection with the highrisk HPV strains persists which then could be a well-established cause of cervical cancer (10). More than 200 distinct HPV genotypes have been characterized to date, of which, approximately 40 infect the mucosal epithelium of anus and genital tract (11). Generally, they are classi ed as high-risk HPV (carcinogenic HPV types, HR-HPV), low risk HPV (non-carcinogenic HPV types, LR-HPV) and intermediaterisk HPV (IR-HPV), according to their carcinogenic risk or potentially pathogenicity have been reported to exist (12).
It is well known that this type of malignancy is one of the most preventable cancers. Currently, part of comprehensive strategies aimed at control of cervical cancer is based on vaccination against HPV and HPV-based screening programs, which have been demonstrated to effectively eliminate the burden of cervical cancer worldwide (13,14). Nonetheless, the prevalence and genotype distribution of HPV infections are heterogeneous widespread (differences varies between nations and regions, as well as within a country), which makes progress towards prevention often frustrating (15). Hence, an accurate understanding of the regional distribution characteristics of HPV genotypes is extremely important for both prophylactic vaccine-based HPV development and for HPV-based cervical cancer screening strategymaking.
Reports about large-scale data on the genotypic spectrum of HPV infection in China are limited. Therefore, we conducted a retrospective summary of enormous amount of HPV genotypes distribution data in China, the overall prevalence, age-speci c prevalence, and genotype distribution of HPV in different regions were also calculated and analyzed. This study would provide guidance for the development of future screening and prevention programs.

Materials And Methods
Ethical considerations, study population and sample collection. This investigation was approved (no. 2020-173) by the Ethics Committee of the First A liated Hospital of Chongqing Medical University Ethics Review Board and all participants provided informed consent for inclusion in the study. Brie y, this retrospective study involved 301880 samples from more than 8 clinical hospitals, women's health centers, clinics and physical examination centers located in 4 different provinces of China. For each woman, the HPV genotyping results and relevant clinical information, including age and regional data were all collected.
Subsequently, all HPV tests were performed with a HPV genotyping panel (polymerase chain reaction (PCR)-reverse dot blot hybridization method). Brie y, according to manufacturer's instructions, PCR was performed in a 25-μL reaction mixture containing 1-μL extracted DNA, 0.75-μL DNA Taq polymerase, and 23.25-μL PCR-mix solution containing primer system. The PCR cycling parameters were as follows: an initial step at 95˚C for 9 min, and followed by 40 ampli cation cycles (denaturation at 95˚C for 20 s, annealing at 55˚C for 30 s, 72˚C for 30 s, and a nal extension at 72˚C for 5 min). After ampli cation, HPV genotyping was performed by hybridization and RDB on the strips xed with HPV type-speci c probes.
According to the manufacturer's instructions, the HPV type-speci c probes immobilized on nylon membranes were used for reverse-blot hybridization and detection of all HPV genotypes in a single assay. Simultaneously, in order to validate the HPV test, sterile water was used as the negative control, and specimens with known HPV genotypes were used as the positive control.
Statistical analysis. Data were analyzed with IBM SPSS version 21.0. Sample characteristics including age (women with unknown ages were excluded), regionality, HPV infection result and genotypes distribution characters were summarized using frequency distributions to generate the numbers and percentages. The chi-square test was used to compare the HPV prevalence or proportions between different groups. A two-sided P-value of less than 0.05 was considered statistically signi cant.

Results
Overall HPV infection prevalence and genotype distribution. A total of 301880 samples were collected and detected by HPV genotype. The genotype test showed 55071 samples were HPV positive, the overall HPV prevalence was 18.01 %, in which 28.14 % in Tibet Autonomous Region, 18.59 % in Chongqing Municipality, 10.33 % in Guizhou Province and 29.09 % in Shaanxi Province. Among the four different provinces, autonomous regions and municipalities, there were signi cant differences in the HPV prevalence (X 2 =6120.54, P<0.001). The details of genotypes distribution of was shown respectively in the following contents.
Age and genotype distribution of HPV infection in different region.
The HPV prevalence in Tibet Autonomous Region. There were 36073 samples involving in Tibet Autonomous Region and the HPV prevalence was identi ed as 28.14 % (n = 10150). The highest HPV detection rate was found in age group 36-50 years old (detection rate = 53.40 %) (Fig 1). Among HPV positive women, the HR-HPV infection accounted for 73.51 % (n=7461). In HR-HPV, type 16 (19.44%) and 52 (21.50%) were the most prevalent genotypes, and followed by 53 (11.23%) and 58 (11.83%). Among the 1428 case of LR-HPV infection, the two most common detected LR-HPV were type 6 (14.13%) and 81 (11.15%) in descending order. The rates of single and multiple (infection with ≥ 2 different HPV genotypes) infections were 70.35% (n=7141) and 29.65% (n=3009), respectively (among the multiple infections, the rate of double infections was 71.32%). Among the mixed (co-infection with HR-HPV and LR-HPV) infection cases, double HPV genotype infection accounted for more than a half (n=761). And in multiple infection of HR-HPV, the double HPV genotype infection was also the highest with 1298 cases. The detailed genotype distribution of HPV infection was described in Fig 2 and  HPV infection in Chongqing Municipality. A total of 37389 women were identi ed as HPV positive among the 201,089 samples in Chongqing Municipality, with a HPV infection rate of 18.59 %. Similar to Tibet Autonomous Region, the highest HPV detection rate was found in age group 36-50 years old (detection rate = 52.96 %) (Fig 4). Among the infected women, HR-HPV infection accounted for 81.32% (n= 30406).
Among HR-HPV infection, the three most prevalent types were type 16, 52 and 58, with frequencies of 20.82 %, 20.80 % and 15.64 %, respectively. The most common LR-HPV types were HPV 6 (35.65 %), even exceeded the most prevalent HR-HPV genotype (Fig 5 and Fig 6). The rates of single and multiple infections were 79.00 % (n=29537) and 21.00% (n=7852), respectively (among the multiple infections, the rate of double infections was 77.06 %).
HPV infection in Guizhou Province. Among 60205 female of Guizhou Province, the prevalence of HPV infection was 10.33 % with 6219 cases identi ed as HPV positive. In all tested age groups, the highest HPV detection rate was found in age group 20-30 years old (detection rate = 43.18 %) in Guizhou Province (Fig 7). Among HPV positive female, the HR-HPV infection accounted for 62.58 % (n=3892). Among HR-HPV infection, the four most prevalent types were type 52, 16, 58, and 53 in descending order.
The most common LR-HPV types were HPV 6 and 11 (Fig 8 and Fig 9). The rates of single and multiple infections were 79.71% (n=4957) and 20.29% (n=1262), respectively. Among the multiple genotypes infections, the rate of double type infection was 82.96% (n=1047). The HR-HPV infection was also the dominant in Guizhou Province with 3892 cases (62.58%).
HPV infection in Shaanxi Province. There were 4513 samples recorded in Shaanxi Province, with 1313 sample were identi ed as HPV positive (29.09 %). In all tested age groups, the highest HPV detection rate was found in age group 26-55 years old (detection rate = 79.06 %) in Shaanxi Province (Fig 10). Among HPV positive female, the HR-HPV infection accounted for 80.05% (n=1051). The four most prevalent HR-HPV types were type 16, 52, 58, 53 in descending order. The most common LR-HPV types were HPV 6 and 81 (Fig 11 and Fig 12). The rates of single and multiple infections were 69.92% (n=918) and 30.08% (n=395), respectively. Among the multiple infection, double genotypes infection was the dominant types, which counted for 69.62 % (n=275).
HPV genotype distribution in all positive samples. There were 55071 samples in this study. We also analyzed the genotype distribution so that we could make better decision on the HPV vaccine tragedy.  (Table 2). It could be directly observed that HPV infection happened in all ages, and women aged 26-50 years accounted for a main part (Fig 13). Elder women ( 50) and young girls (≤25) had a relatively lower HPV infection proportion. Besides, we conducted a chi-square test to determine whether the difference of age distribution was signi cant in different region. The result showed that there was a signi cant difference among these groups (X 2 =32.83, P<0.001). We summarized the age distribution in three HPV infection patterns, including HR-HPV only, LR-HPV only, mixed HR-HPV and LR-HPV HPV infections. It was showed that the pattern of HR-HPV only was the dominate infection pattern (Table 3).
Age distribution of dominant types. According to the above results, the most common 5 types of HPV were as follows: the HR-HPV genotype HPV 52, HPV 16, HPV 58, HPV 53, and and LR-HPV 6. As the most ve prevalent genotype, we decided to further explore their distribution characteristics in different age. Among type 52, 16, 58 and 53, the number of infection cases increased gradually and reached its highest point in the group of 41-45 years old, while after age 41-45, the number of infection cases was reduced. Interesting, in type 6, the most common infection age was found in women 21-30 years old (Fig 14).
Women aged 36-40 years old and 46-50 years old also had an abundant number in these 5 genotypes HPVs. According to these results, we could come to the conclusion that women aged 36-50 years old account for the vast majority of these ve dominant HPV infection.

Discussion
A pooled estimate among women worldwide showed that HPV prevalence in eastern Asia (which includes China) was signi cant higher than that of both southeastern Asia and south-central Asia where HPV prevalence were 13.6%, 6.2%, 7.5%, respectively(5). In addition, the prevalence of HPV in less developed countries (15.5 %) was higher than that in more developed countries (10.0 %), and the most the heavily burdened HPV regions of Asia are eastern Asia (5). The number of HPV infection cases varies widely among eastern Asian countries, as the most populous developing country, China faces serious burden (16). Moreover, it has been shown that the western China ranks as the rst most mortality rate and the second most cervical cancer incidence rate in China, thus the primary prevention of cervical cancer of western China is particularly important (17).
In the present analysis, the overall HPV prevalence of 301,880 western Chinese women with normal cervical cytology was estimated to 18.01 %, which was higher than the average global level, lower than that of many other countries, such as Eastern Africa, Russia, and higher than that of Japan, India (5,15). A large comparable population-based study of HPV genotype prevalence nationwide, showed a similar overall HPV infection of 21.07 % where 120,772 sample of 37 cities in China were tested (18), which was relatively higher than the overall HPV prevalence 18.01 % in this study. Regarding the HPV prevalence in different region groups, this present survey also showed the high HPV incidences, for Tibet Autonomous Region (28.14 %) and Chongqing Municipality (18.59 %), Guizhou Province (10.33 %) and Shaanxi Province (29.09 %). Compared with region-based data, the rates obtained in our study were different from than those previously reported neighboring regional data of Chongqing (26.20 %) (19), Guizhou (16.95 %) (20) and Yunnan (12.90 %)(21) of western China. The reported results of HPV prevalence varies from study to study as it possibly caused by several variables, including the large Chinese population composition as well as its territories. Together, the overall HPV-positive rate in the present study involving 301,880 cases was found to have increased slightly.
When strati ed by HPV genotype, the most common HR-HPV types detected in our analysis were HPV52 (20.49 %), which was inconsistent with the previous data generated by some Chinese population-speci c investigations and some related studies reported that HPV 16 was identi ed as the most common HR HPV genotype (5,18,(22)(23)(24) (17), which was different from our data. In addition, compared with a nationwide data of a Chinese population-based investigation from 37 cities, except for HPV53 (not reported), the infection rates of HPV16 (4.82 %), HPV52 (4.52 %), and HPV58 (2.74 %) were all lower than that in our population (18). The knowledge of HPV prevalence and subtype distribution in different regions might facilitate in the development of vaccination program implementation. In this study, the HPV infection types and their proportions were different in different regions: the top three HPV genotypes were HPV 52, 16, 58 in Tibet Autonomous Region, HPV 16, 52, 58 in both Chongqing Municipality and Shaanxi Province, HPV 52, 16, 58 in Guizhou Province. Therefore, different regions showed diversity and have their respective proportions with respect to HPV genotypes. The differences in economic conditions, geographical cultural habits, migrations and other multiple factors might affect lifestyles among different populations makes the observed HPV prevalence varies.
Interesting, among HR-HPV genotype infections, HPV53 type infection accounted for the top four in our popoulation (10.01 %). HPV53, a traditionally non-vaccine genotype, was recognized as a probable highrisk genotype and recently demonstrated to be associated with the putative potency of viral carcinogenicity (odds ratio, 3.92) (23,25). Moreover, the prevalence of the HPV53 genotype gradually elevated between 2011 and 2015 (23). The HPV53, also reported as the fth most common HPV types detected in eastern Africa and Central and northern America(5). Thus, HPV prophylactic vaccines, including HPV53, may offer more su cient protection for women in China.
In the present study, we found that the most three common LR-HPV types in this study were HPV6 (28.17 %), HPV81 (9.09 %) and HPV11 (3.78 %). A recent analysis enrolling 94,489 eastern China female conducted in 2019, has showed that the dominate LR-HPV genotypes were HPV81 and HPV6 (26). Furthermore, a previous cross-sectional survey conducted in Arab women reported that HPV 81, 11 and 6 were the most commonly identi ed LR HPV genotypes in decreasing order (27), which were inconsistent with our research. The high HPV 6 prevalence in our study is unexpected. The reasons for this deviation of LR-HPV distribution are unclear and may be due to cultural range of the nationalities. Therefore, the HPV vaccine in China might also consider including the HPV6 genotype.
When regarding the HPV infection proportion of different age subgroups, in this current study, the age distribution showed that the middle age group (26- (20,28). The most frequent age group of our HPV screening population was 25-50 years, which could explain the reason why the HPV types were so distributed. Interestingly, in our present survey, we also concluded that the age distribution varied across different HPV genotype. With respect to the four top HR-HPV (type 52, 16, 58 and 53), the highest HPV detection rate was found in age group 41-45 years old, while the highest HPV detection rate in LR-HPV 6 infection was found in age group 21-30 years old group, which was different from a previous study about HPV genotyping results in China (22). However, our result was consistent with the clinical phenomenon that more and more frequently diagnosis of cervical cancer was in middle/old-aged Chinese women (40-64 years old). Therefore, it has become particularly important to disseminate information about cervical diseases as well as perform HPV infection screening in China according to the HPV distribution characteristics of each age group.
Among the patients infected with multiple subtypes, co-infections with two HPV types was most common with percentage (76.04 %) in our population, which was comparable to the regional results of Guangzhou, Sichuan, and Macao in China from the previous reports (29)(30)(31). Many epidemiologic researches found that infection with multiple HPV genotypes seemed to increase the risk of developing the tissue abnormalities or high-grade lesions that precede invasive cervical cancer considerably, because HPV types might interact synergistically, which might contribute to increase the baseline risk observed with single-type infections (27,32). Similar results have been reached by other investigations that con rmed its association with development of the cervical carcinogenesis and increased the duration of the viral infection (33). Therefore, it was suggested to consider diagnoses of co-infection with HPV into the prediction outcomes of HPV infections.
This is the rst HPV distribution study with large sample size in China. The principal strength of this study was that the large data enrolling 0.3 million women come from western China, thus this population may be representative of the general women population in China. One limitation of this study was sampling bias, because the most frequent age group of our HPV screening strategy was 25-50 years.
Secondly, the women we enrolled in this study attended to clinic for seeking medical advice based on routine gynecological examination and HPV prevalence results, they always accompanied by some clinical symptoms, which may lead to an over-reporting of HPV prevalence in this research. In addition, no clinical characters and cervical cytology results being collected as part of the study, resulting in the speci c risk factors for the cervical cancer and the correlation between HPV genotype and cervical cancer or precancerous lesions were unable to be accurately examined. Furthermore, a follow-up study should be conducted to track changes in genotype, cervical pathology and cytology as there was a close relationship between cervical carcinoma and long-term persistent HR-HPV infections.
This present study represented one of the most comprehensive study of the prevalence and genotype distribution of different HPV types in China to date. We examined the epidemiology of HPV infection in China and con rmed the high overall HPV prevalence of 18.01 % of all tested female. Moreover, in addition to common genotype HPV52, 16 and 58, particular attention should be paid to the high prevalence of non-vaccine genotypes (e.g. HPV53, HPV6) in China. Therefore, the future next-generation HPV prophylactic vaccines in China should also consider to include more HPV types (e.g. HPV53, HPV6).
Regarding the age-speci c distribution of HPV, the highest HPV detection rate was found in age group 26-50 years old (detection rate=74.00 %). These results showed that the main proportion of HPV infections in China might be taking place in middle age, which remind us to pay more attention to middle age women in the prevention and control of HPV. Thus, our results indicate that the preventative strategies including HPV vaccine-based popularization and related educational campaigns should start in 25 years old female. This information might provide valuable information for estimation of the potential clinical bene t of HPV-based screening in China.

Declarations
Authors' contributions Yan Dong, Jiao Li, Li Xu and Jungao Lu carried out the sample collection, laboratory detection and drafted the manuscript. Ling Chen and Xiaosong Li participated in the design of the study and performed the statistical analysis. Yue Wu, Huandong Liu, Zuoyi Yao and Xiaosong Li conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the nal manuscript.