Comparison Study on Characteristics of Hepatitis B Virus Whole Genomes for Hepatocellular Carcinoma and Chronic Hepatitis B Patients

Objective This study was to characterize the whole gene sequences of hepatitis B virus (HBV) derived from patients diagnosed with hepatocellular carcinoma (HCC) as well as those with chronic hepatitis B (CHB) infections. Methods Patients in the HCC group and the CHB group were matched by age, gender, and region. Polymerase chain reaction (PCR) was used to amplify the HBV complete genome sequences. Results Serum samples were collected from 51 patients with HBV-related HCC and 76 patients with CHB. A total of 148 substitution sites deemed statistically signicant were identied between the HCC and CHB groups. In addition, three mutational sites associated with HCC, (F22I/L/P in the pre-S2 region, P33S/T and S144A/T/V in the X region) were identied. Deletions to the pre-S and X regions were found in both HCC and CHB patients. However, deletions to the X region were more common in the HCC group than the CHB group. In this study the mutations associated with high risk of HCC mostly occurred in the sequences and some substitutions (C1470A/T, T1803A/G, and C1804T) that have not been previously reported. Conclusion It was implicated that aa33 and aa144 substitution in X region may be new predictive markers for HCC. The results of our study would provide important basic information for HCC prevention.


Introduction
Hepatitis B Virus (HBV) is considered to be a major global health problem with more than 250 million chronic HBV (CHB) carriers and more than one million HBV-associated human deaths per year [1] . In 2018, Hepatocellular carcinoma (HCC) was reported as the fourth leading cause of cancer-related death worldwide, with more than 841,000 new diagnoses and 782,000 deaths globally per year [2] . Among all the risk factors, HBV infection is the most important one for HCC, accounting for approximately 80% of all cases [3] .
Various types of mutations to the HBV genome have been reported, resulting amino acid substitutions during long-term infection, some of which could serve as markers to predict the development of HBV-associated HCC [4] , such as mutations in the pre-surface antigen (pre-S) (i.e., deletion to the preS1 [W4P/R] and pre-S2 start codon deletion, pre-core (i.e., nucleotide 1896(G1896A), or a double mutation to the basal core promoter (BCP) and X regions (V5M) [5] .
In order to investigate the effects of various HBV mutations on the development of HCC, this cases-control study will report current HCC prevalence in reprentative regions and analyze the complete HBV genomes isolated from HCC and CHB patients.

Field background
Patients with HCC in this study were recruited from six hepatitis B surveillance sites in China (Guangxi, Hebei, Henan, Hunan, Qinghai, and Shanghai), where the incidence of HBV infection were surveillanced since the 1980s. Patients with HCC were diagnosed in provincial hospitals or tumor specialist hospitals. All the HCC subjects were collected between January, 2012 and December, 2015.

Serological markers
All sera samples were tested for three HBV markers (HBsAg, anti-HBe, and HBeAg), as well as anti-HCV antibodies and HDV-IgG. HBV markers were detected with the use of MEIA kits and an ARCHITECT i2000SR immunoassay analyzer (Abbott Diagnostics, Chicago, IL, USA). Other markers were detected using enzyme linked immune-sorbent assay (Beijing Wantai Biological Pharmacy Enterprise Co., Ltd., Beijing, China).
Ampli cation and sequencing of the complete HBV genome from HCC patients and CHB patients In sum, one hundred and twenty-seven patients (51 in the HCC group and 76 in the CHB group) were selected to obtain complete genome sequences. The full-length HBV genome was divided into two parts, which were ampli ed using the nested polymerase chain reaction (PCR) method as we previously described [6] . The whole-genome nucleotide sequences of the 127 HBV isolates reported in this article have been deposited in the National Center for Biotechnology Information GenBank database under accession numbers MT644995 to MT645072.
Genotyping of HBV HBV genotypes were compared with a set of 88 reference sequences (genotype A-I) retrieved from the GenBank database [7] . Phylogenetic trees were reconstructed by the maximum likelihood (ML) method implemented in IQ-tree package under the GTR + I + G nucleotide substitution model, which was selected by ModelFinder [8] . Support for the ML tree was inferred by bootstrapping with 1000 replicates. To detect recombination in genome sequences of HBV, sequences were investigated using SimPlot v3.5.1 software and JPHMM (jumping pro le Hidden Markov Model) method [9] .

Nucleotide and amino acid mutation analysis
The substitution proportion of every nucleotide site was calculated for all of the 127 HBV whole genome sequences. First, map the nucleotide site frequency distribution of 127 nucleotides for every site among 3215 nucleotide sites of HBV genome. The nucleotide type with the largest proportion were taken as the prototype, other types of nucleotide in this position were de ned as a substitution. If there was a deletion for the nucleotide site, it was not included in the calculation. Second, every nucleotide site substitution proportion was calculated for the HCC group and the CHB group respectively. Subsequently, the sites with the nucleotide substitution rate which was more than 5% in either HCC group or CHB group compared between the two groups. The bar chart for the sites with the signi cant difference for nucleotide substitution rate was drawn.
Some amino acid substitutions in the pre-S, pre-C/C, and X region was proved associated with HCC. In present study, all of these nucleotide sites were deduced into corresponding amino acid sites within the pre-S, pre-C/C and X regions. And then the amino acid substitution rate was calculated for the HCC group and the CHB group respectively. At last, the amino acid substitution rate was compared between the two groups.
The sequences with pre-S deletion or X region deletion were aligned with the same genotype reference HBV sequences. All of the Nucleotide sequences of datasets in this study and reference HBV sequences were analyzed using the MEGA 7.0 software.

Statistical analysis
Pearson's χ2-test or Fisher's exact test were used for analysis, as appropriate. All statistical tests were two-tailed and a probability (p) value of < 0.05 was considered statistically signi cant. All analyses were using IBM SPSS Statistics for Windows, version 22.0 (IBM Corp., Armonk, NY, USA).

Baseline features of the study population
A total of 127 whole genome sequences of HBV DNA were successfully ampli ed from serum samples of 51 patients with HCC and 76 with CHB. Of these 127 serum samples, no anti-HCV antibody or HDV-IgG was detected and none of these patients was alcoholism.
The gender, patient ages, regions, HBV genotypes, and HBeAg status of the 51 HCC and 76 CHB patients are shown in Table 1. There was no obvious difference in sex, patient ages, and regions between the two groups. The HBeAg-positive rate was signi cantly higher in the CHB group than in the HCC group. Of the 51 patients in the HCC group, two (3.92%) were positive for the HBV/B genotype, 43 (84.31%) for the HBV/C genotype, and six (11.76%) for other genotypes (HBV/CD, HBV/I).
Of the 76 patients in the CHB group, 14 (18.42%) were positive for the HBV/B genotype, 55 (72.37%) for the HBV/C genotype, and seven (9.21%) for other genotypes (HBV/CD, HBV/I). The HBV/C genotype was predominant in both groups. About the proportions of genotypes, there was no signi cant difference between the two groups (p > 0.05; Table 1). Comparison of HBV DNA nucleotide substitutions between the HCC and CHB groups Nucleotide substitutions of the complete genome with frequencies of > 5% derived from HCC and CHB patients were compared and those with signi cant differences (p < 0.05) are visualized in Fig. 1. There were 148 (4.60%) sites with signi cant differences between the two groups. The rates of 53 substitution sites, which were signi cantly greater in the HCC group than the CHB group, were mainly located in the C-terminus of HBx (T1802C, T1803A/G, C1804T, G1896A, G1899A, C1969T) and C regions (T2263C, A2269G, T2278C/G, C2281T, T2284A/C, T2287C, etc.) C region (T53A/C, C502A/G, A929T, G1896A, G1899A, etc.), while the rates of 95 substitution sites, which were signi cantly greater in the CHB group than the HCC group, were mainly located in the pre-S, S, X and P regions.
Comparative analysis of HCC-related nucleotide and amino acid substitutions The deduced amino acid corresponding to the nucleotide sites in the pre-S, pre-C/C, and X regions were analyzed. The genotype-speci c amino acid substitutions were removed. The amino acid substitution rates with signi cant differences between the HCC and CHB groups are shown in Table 2. All the nucleotide sites shown in Table 2 were associated to HCC, as previously reported [10,11] .The substitution rates of three sites (aa22, aa33, and aa144) were greater in the HCC group than the CHB group (47.83% vs. 28.38%, 29.41% vs. 9.21%, and 28.57% vs. 9.46%, respectively). The amino acid substitutions at position aa22 were located in the pre-S2, while those at positions aa33 and aa144 were located in the X region, distributed among different genotypes. Association of different pre-S deletion types with HCC As there were few sequences in the genotype B I and CD subgroup, only genotype C were analyzed. Most of the pre-S deletion mutations were in the genotype C subgroup, pre-S1 deletions were more frequent in the CHB group (11.63% vs. 25.45%), while pre-S2 deletions were more frequent in the HCC group (25.58% vs. 20.00%). Epitope mapping revealed frequent deletions in ve epitopes. Deletions to the sequences in pS1-B1, pS1-B2, pS2-B2, pS2-B3, and pS1-T1 were more common in the HCC group than the CHB group. Details were shown in Table 3.
Functional mapping revealed that the frequencies of deletions in three functional domains (CBF, NBS, and pHSA) were higher in the HCC group than the CHB group, in contrast to other domains (L start codon, HBS, S promoter, HSC70, CAD, M start codon, and VS start codon) which have no signi cant differences between two groups. As shown in Table 3, the frequency of HBS deletion was signi cantly higher in the CHB group than the HCC group (p < 0.05). Notably, there were no deletion mutants to the functional domains of the L and HBS start codons in the HCC group.

Characteristics of X region deletions
A map of deletions to the HBx protein region of the CHB and HCC groups is shown in Fig. 2. C-terminal deletions are among the most frequently reported mutations to HBx and were detected in four and seven patients in the CHB and HCC groups, respectively. Most patients were genotype C, with the exception of one with genotype CD (the X ORF were in the genotype C fragment). The deletion rate was higher in the HCC group than the CHB group (13.73% vs. 5.26%, respectively), although this difference was not signi cant. All four deletions of the CHB group were located in the C terminus of HBx (aa 126-135).
Meanwhile, there was a larger range of the C-terminus deletions for seven patients in the HCC group, two were truncation to the C-terminus of HBx, and the other four were concentrated at the C-terminus of HBx (aa104-154). Table 3   Table 3 is available in the supplemental le section

Discussion
The etiology and pathogenesis of HCC is complex, due to many related risk factors, such as hepatitis B virus (HBV) infection, a atoxin exposure, as well as physical and chemical factors, especially alcoholism and other unhealthy lifestyle habits. HBV can promote carcinogenesis though chromosomal instability, numerous mutations, and the interaction between HBx protein and host proteins [12] . Besides, the indirect effects of HBV infection include chronic in ammation and oxidative stress, which can subsequently lead to varying degrees of hepatic injury [13] . Chronic hepatitis infection (CHB) is a strong risk factor for the development of HCC mostly due to HBV nucleotide level, HBV genetic mutations, positivity for the hepatitis B e antigen (HBeAg), HBV genotypes, and co-infection with hepatitis C virus [14] .
Integration of the HBV genome is currently believed to be an early event in HBV chronic infection. Notably, mutations and deletions to the HBV genome are associated with an increased risk for the development of HCC and the clinical severity of other hepatic diseases [15] . Most mutations to the HBV genome are generated due to the lack of proofreading capacity of HBV polymerase or host immune pressure [16] .
In the present study, HBV complete genome sequences were obtained from samples of 51 HCC patients and 76 CHB patients.
To determine whether there was any difference in HBV sequences between the HCC and CHB groups, the nucleotide substitutions of HBV whole genome sequences were compared in four overlapping ORF.
The substitution rates of the nucleotide sites located in the pre-C and C regions were mostly higher in the HCC group than the CHB group. Most pre-C/C mutations are generated during HBeAg seroconversion. Several types of HBV pre-C/C mutations, such as G1896A, A1762T, and G1764A, were reportedly related to disease severity [5,16] . Also, the HBeAg positive rate was signi cantly lower in the HCC group than the CHB group. However, it remains unclear whether this phenomenon is associated with the nucleotide substitution to C gene, thus further studies are warranted.
It has been reported that the nucleotide substitution rate in HCC tends to be greater in the X and pre-C/C regions. In the present study, the substitution rates of many sites were proved greater in the CHB group than the HCC group. Though there were lots of studies about nucleotide of HCC [10,16] . However, substitutions identi ed in this study of 29 sites have been rarely reported previously. Nonetheless, the rates of substitutions to 13 of these sites were signi cantly greater in the HCC group than the CHB group (Fig. 1). Among them, nt1470 and other six sites were located at the B cell epitope. The sites nt1726 and nt1730 were located at the T cell epitope. Previous study reported almost 40% of the integrated HBV genomes were cleaved at approximately nt1800 [10] . Therefore, the sites (nt1799, nt1802, nt1803, and nt1804) may play a potential role in HBV genome integration for HCC development.
In this study, the genotype-speci c amino acid substitution rates, as deduced from the nucleotide sites in Fig. 1, were compared between the HCC and CHB groups. The mutation rates of F22I/L/P in the pre-S2 region, as well as P33S/T and S144A/T/V in the X region, were signi cantly higher in HCC group. F22I/L/P is reportedly associated with immune nonreactivity [17] . P33S/T and S144A/T/V occurring in the X region have not been previously reported and, thus, are novel mutations possibly associated with a greater risk for the development of HCC. The amino acid at position aa33 was located in the negative regulation domain of HBx (aa 1-50) which formed a B cell epitope (aa 29-48). The HBx region partially overlaps with the RNase H part of HBV polymerase at the C-terminus, and also contains several critical cis-elements. Genetic alterations in this region may not only affect the reading frame of HBx, but also the overlapping cis-elements and the possible binding a nities of this protein to its targets [18] . The amino acid at position aa144 is located in the core promoter of the Cterminus of HBx, which plays a key role in controlling cell proliferation, viability, and transformation [19] .
The pre-S1 and pre-S2 regions contain several epitopes of T or B cells and play essential roles in the immune response [20] .
Pre-S deletion decreases the expression of the surface proteins of HBV, resulting in intracellular accumulation of HBV envelope proteins and viral particles, which induce endoplasmic reticulum stress and oxidative DNA damage, eventually leading to the development of HCC [21] .Truncated pre-S2/S sequences are often found in HBV DNA integration sites of HCC patients, and truncated pre-S2/S proteins could speci cally activate the MAPK signaling pathway to activate transcription factors such as AP-1 and NF-κB, and thus promote abnormal proliferation of liver cells [22] . In the present study, there was no signi cant difference in deletions pre-S1 and pre-S2 between HCC and CHB groups, while the frequency of pre-S2 deletions was higher than pre-S1 deletions among HCC patients, which was consistent with the ndings of other studies [23] .
Deletions or insertions to the C-terminus of HBx reportedly impair transactivation activity, thereby inhibiting cell proliferation, which may contribute to the development of HCC [24] . In the present study, all 11 deletions (ten for genotype C and one for genotype CD) were located in the C-terminus of HBx. In the CHB patients, there were deletions to codons 125 to 136 of HBx, but involved more codons in HCC patients especially truncations to the C-terminus of HBx. Previous studies have frequently reported deletions to the 3'-end of the X gene, which leads to truncations of the HBx C-terminus [25] . Reportedly, truncations of the HBx C-terminus occur in nearly 80% of HCC tissues, which may contribute to hepatocarcinogenesis via loss of proapoptotic capability of full genes, activation of cell transformation, and subsequent tumor promotion [26] . In the present study, the HBx deletion rate was 13.73% (7/51) in HCC patients, which is lower than previous reports, but higher than in CHB patients 5.26% (4/76).
There were some limitations in this study should be acknowledged. First, there was no signi cant difference in most of the substitution rates between the HCC and CHB groups. Second, as large fragments of deletion in the genome of HBV is the characteristic of HCC samples,the chance of obtaining full sequencing maybe insu cient. To better understand the clinical relevance of HBV gene substitutions, further prospective investigations of HBV-infected patients are required.

Declarations
Ethics approval and consent to participate The study protocol was approved by the Ethics Committee of the Chinese Center for Disease Control and Prevention and conducted in accordance with the ethical guidelines of the 1975 Declaration of Helsinki. Written informed consent was obtained from each participant before the interview and venous blood collection.

Con icts of interest
The authors declare no con icts of interest.
Author's contributions ZS designed and carried out the study, analyzed data, and wrote the manuscript. WF and QF carried out the serological experiments, LH and SQ carried out the molecular biology experiments, ZG, WF and ZH received funding and supervised, SL and BS designed study and reviewed manuscript. All authors read and approved the nal manuscript.

Figure 1
Distribution of nucleotide substitutions in HBV genome. Each bar represents the substitution rates in the nucleotide position. Figure 2