Human papillomavirus (HPV) integration signature in cervical lesions: identification of MACROD2 gene as HPV hot spot integration site

High-risk HPV is clearly associated with cervical cancer. Integration of HPV DNA into the host genome is considered a key event in driving cervical carcinogenesis. However, the mechanism on how HR-HPV integration influences the host genome structure has remained enigmatic. In our study, 25 DNA samples including 11 from fresh-frozen cervical carcinomas and 14 from fresh-frozen high-grade squamous intraepithelial lesion (HSILs) were detected using the method of HPV capture combined with next generation sequencing. We calculated the frequency in each viral gene or region and found that breakpoints were prone to occur in L1 and L2 instead of E2 in the cervical cancer (P = 0.0004 and P = 5.15 × 10−40) and HSIL group (P = 2.1 × 10−32 and P = 7.06 × 10−13). The results revealed that HPV16 showed a strong tendency toward intronic region (P = 5.02 × 10−64) but a subtle tendency toward intergenic region (P = 0.04). The most frequent integration site was in the MACROD2 gene (introns 2, 4, 5, 6, 8 and 9), which in MACROD2 functional domain. Our results revealed that MACROD2 is HPV hot spot integration site in cervical lesions, and its deficiency alter DNA repair and sensitivity to DNA damage thought impaired PARP1 activity resulting in chromosome instability.


Introduction
Cervical cancer is the fourth most frequently diagnosed cancer and the fourth leading cause of cancer death in women, with an estimated 604,127 cases and 341,831 deaths in 2020 worldwide [1]. High-risk human papillomavirus (HR-HPV) persistent infection is the main causative factor of cervical cancer and intraepithelial neoplasia (CIN), HR-HPVs are the causative agents of cervical cancer and have been detected in 99.7% of cervical cancers [2]. The infection with HPV 16 and 18 high-risk types account for more than 80% of cervical cancer incidence [3]. Integration of HPV DNA into the host genome is considered a key event in driving cervical carcinogenesis [4]. The increase in both integration rate and number from CINs to cancer highlights their potential values as predictors of disease progression [5]. The integration occurs in regions of micro-homology among the HPVs and host genome. Viral genome integration events usually result in dysregulation of E6 and E7 gene expression compared to that expressed from extrachromosomal viral genomes [6]. As 1 3 mentioned above, most integration events result in expression of a spliced viral-cellular transcript [7]. These fusion transcripts are very often more stable than their viral counterparts, yet again increasing HPV oncogene expression [8]. However, the mechanism on how HR-HPV integration influences the host genome structure has remained enigmatic.
Despite increased attention on HPV integration hot spots, the characteristic of HPV integration and the relationship between HPV integration and cervical cancer remained elusive. In this study, we focus on integration sites analysis of 25 HPV16-positive cervical lesion samples. Our data revealed a hot spot of HPV integration at MACROD2, involved in impaired PARP1 activity and chromosome instability. Transcription of HPV16-MACROD2 gene fusions from the site of genome integrations was showed through transcriptome sequencing. Our study could further help to gain insights into the characteristic of HPV integration in DNA and RNA samples and provide theoretical basis for understanding the mechanism of tumorigenesis.

Study population and specimen collection
A total of 11 fresh tissue specimens were collected from patients with cervical cancers who had undergone surgeries, and 14 cervical biopsy specimens were collected and diagnosed with high-grade squamous intraepithelial lesion (HSIL) from Yantai Yuhuangding Hospital, Shandong province, China, 2021.
Individual informed consent had been collected from all study participants. This study received ethical approval from the Institutional Review Board of our hospital. All experiments were performed in accordance with relevant guidelines and regulations.

Genomic DNA isolation, HPV typing
DNA from the cervical cast-off cells were extracted by a TIANamp Genomic DNA Kit (No: 3304-9) according to the manufacturer's procedure. Human papillomavirus genotyping was conducted using an HPV GenoArray test kit (HybriBio Ltd). It was used in both DNA amplification and HybriBio's proprietary flow-through hybridization technique. Absence of HPV DNA contamination was confirmed by HPV L1 and an internal control of the human A globin in each reaction.

HPV integration detection
HPV probes were designed according to the full-length genome of 32 HPV types by MyGenostics (MyGenostics, Baltimore, MD, USA). 18 HPV types (16,18,26,31,33,35,39,45, 51, 52, 53, 56, 58, 59, 66, 68, 73, 82) were analyzed in subsequent HPV assays. The overall experiment was conducted according to the manufacturer's protocol. The paired-end read, uniquely mapped with one end to a human chromosome and the other to the HPV reference genome, is identified as a discordant read pair. If a specific position has one or more discordant read pairs, it would be considered as a potential HPV integration site. PCR and Sanger sequencing were used to verify all the potential HPV integration breakpoints. All sequences of the fusion genes were characterized by the NCBI human mega Blast database alignment tool and the UCSC Blat database.

Detecting integration breakpoints by RNA-seq
We selected 3 cervical cancers with hot spot genes detected by HPV capture technology combined with next generation sequencing and sufficiently high-quality RNA. RNA-seq libraries were sequenced as paired-end 90-bp sequence tags using the standard Solexa pipeline. We carried out the analysis of integration sites using the transcriptome data according to a previous method [10].

Statistical analysis
Fisher's exact test was chosen for statistical analysis. P < 0.05 was used as the threshold to indicate statistical significance. All the P values in present study are two-sided.

Results
In this study, 25 DNA samples including 11 from freshfrozen cervical carcinomas and 14 from fresh-frozen HSILs were detected using the method of HPV capture combined with next generation sequencing (Table 1). RNA sequencing (RNA-seq) was applied to validate viralhuman breakpoints.

Determination of potential HPV integration sites
As described in the Bioinformatics Analysis method, if a specific position has one or more discordant read pairs mapped with one end to a human chromosome and the other to the HPV reference genome, it will be considered as a potential HPV integration site. A total of 20,243 potential HPV integration sites were discovered in 25 HPV16-positive cases including 5050 integration sites for 11 cervical cancers and 15,193 sites for 14 HSILs, with frequencies ranging from 11 to 6170 per sample ( Table 1).

Characterization of integration breakpoints
We calculated the frequency in each viral gene or region and found that breakpoints were prone to occur in L1 and L2 instead of E2 in the cervical cancer (P = 0.0004 and P = 5.15 × 10 −40 , Fig. 1) and HSIL group (P = 2.1 × 10 −32 and P = 7.06 × 10 −13 , Fig. 2). Unexpectedly, in contrast with reports that integrated HPV16 should retain intact oncogenes E6 and E7 with the long control region (LCR). We determined that breakpoints were less prone to occur in the LCR (P = 2.1 × 10 −32 and P = 7.06 × 10 −13 , Figs. 1, 2), which was probably preserved because it acted as a strong cis-activator of nearby oncogene expression, promoting the malignant transformation of host cells. In addition, we found that breakpoints were less prone to occur in the E1 (P = 3.79 × 10 −12 and P = 0.00007, Figs. 1, 2). These findings  contradicted that in HPV integration, the disruption of the E1 or E2 gene (a negative regulator of oncogenes E6 and E7) is preferred, which may lead to the dysregulation of oncoproteins E6 and E7, thereby promoting cervical carcinogenesis. 98.17% of HPV16 gene integration sites occurred in the non-coding regions of the host gene, 42.15% of the integration sites were in the intron region of the host gene, 53.44% were integrated in the intergenic region, and only 1.83% were integrated in the exon region of the gene (Table 2). To investigate HPV integration patterns in human genome, we annotated HPV integration breakpoints in specific genomic elements. For instances, HPV16 showed a strong tendency toward intronic region (P = 5.02 × 10 −64 ) but a subtle tendency toward intergenic region (P = 0.04). Breakpoints were less prone to occur in untranslated regions (P = 5.02 × 10 −50 ) (Fig. 3).
To validate HPV integration breakpoints detected by HPV probes and to investigate whether HPV continues to transcribe viral genes after integrating into the host genome, we performed RNA-seq on 3 samples. The number of integration sites at the RNA level (n = 11) is significantly lower than that at the DNA level (n = 19). Comparison of RNA and DNA breakpoints in the HPV genome revealed two patterns of breakpoint distribution in the same samples. Our data suggested the possibility that HPV integration may first occur in the E1/L1 genes (Table 4) and RNA splicing may switch the breakpoint positions to the E1/L2/LCR genes. The HPV integration site still occurs in the intron region of the gene, and the RNA retains the intron region of the host gene (Table 4).

Discussion
Analysis of cervical squamous cell carcinoma shows that human papillomavirus (HPV) integration occurs in more than 80% of cervical cancers [15]. Many studies have compared the human genomic regions associated with HPV integration sites to elucidate the mechanisms that might promote integration and carcinogenesis [16]. Integration of HPV DNA occurs in all human chromosomes; however, integration sites are often found within or in close proximity to common fragile sites [17]. A series of hot spots genes integrated by HPV had been found in the recent study [10]. Despite increased attention on HPV integration hot spots, the characteristic of HPV integration and the relationship between HPV integration and cervical cancer remained elusive.
The HPV breakpoints could occur in any part of the viral genome, perhaps enabling the virus to adapt to the changing environment during carcinogenesis [18]. It is reported that the hinge region of the HPV-E2 gene was the most common deletion or breakage site when the HPV DNA integrates into the host genome [19]. The disruption of E2 blocks the viral replication that resulted in the aberrant viral gene expression, loss of control on E6 and E7 proteins, and ultimately leading to CC progression [20]. We calculated the frequency in each viral gene or region and found that breakpoints were prone to occur in L1 and L2 instead of E2 in the cervical cancer (P = 0.0004 and P = 5.15 × 10 −40 ) and HSIL group (P = 2.1 × 10 −32 and P = 7.06 × 10 −13 ). The HPV16 L1   protein activates innate immunity through the type I interferon pathway and exhibits an efficient anti-cancer effect when cooperating with immune checkpoint blockade therapy [21]. Furthermore, the L1 coding sequences of the immunogenic surface loops are distinctively poorly conserved due to selective pressures for mutagenesis and immune evasion [22]. Therefore, we speculate that HPV integration leads to immune escape by destroying HPV L1.
In this study, the 20,243 potential HPV integration sites in 25 HPV16-positive cases were used to carry out the bioinformatic analysis. We found that HPV16 showed a strong tendency toward intronic region (P = 5.02 × 10 −64 ) but a subtle tendency toward intergenic region (P = 0.04). Our result was consistent with previous study showing that breakpoints of DNA samples were significantly prone to the region of INTRON (P < 0.01, Chi-squared test) [11]. Moreover, Li et al. found that the breakpoints are significantly enriched in the INTRON and PROMOTER regions [23]. Therefore, it might be directly related to the disruption and alteration of gene function.
We focused on 646 different HPV-chromosomal junctions; the most frequent integration site was in the MAC-ROD2 gene (introns 2, 4, 5, 6, 8 and 9). Consistently, Kamal et al. found that patients with HPV integration sites into the MACROD2 gene (introns 5, 6 and 7) [14]. Juliette Mainguené et al. reported that the third HPV integration hot spot is MACROD2 (4.1%) in head and neck squamous cell carcinoma, with two patients displaying intragenic HPV integration [24]. MACROD2 is a protein-coding gene located at a fragile site on human chromosome 20. The MACROD2 protein is a deacetylase involved in the removal of ADP-ribose from mono-ADP-ribosylated; it has a key role in DNA repair [25]. MACROD2 deficiency promotes tumor growth and metastasis by activating GSK-3β/β-catenin signaling in Hepatocellular carcinoma [26]. MACROD2 is a caretaker tumor suppressor gene. MAC-ROD2 loss causes repression of PARP1 activity, impairing DNA repair [27,28]. In breast cancer, MACROD2 overexpression mediates estrogen-independent growth and tamoxifen resistance [29]. The protein data bank showed that the functional domain of MACROD2 protein was located at amino acids 59-240 and mainly interacted with PARP1. The HPV integration sites were in this functional domain, which alter DNA repair and sensitivity to DNA damage thought impaired PARP1 activity resulting in chromosome instability.
For many years, HPV oncogenic potential was only attributed to the viral oncoproteins E6 and E7, but recent studies highlights that HPV integration is an oncogenic event. In our study, a large portion of HPV integration sites in DNA samples was located on the no-coding region (Intron, Intergenic). It might suggest that HPV integration could directly trigger the abnormal transcription and these functions of novel transcript kept unclear. We described recurrent HPV integration in MACROD2 region, which in MACROD2 functional domain. MACROD2 loss alter DNA repair and sensitivity to DNA damage thought impaired PARP1 activity resulting in chromosome instability.