Multi-region sequencing depicts intratumor heterogeneity and clonal evolution in cervical cancer

Cervical cancer is a heterogeneous malignancy mainly caused by human papillomavirus (HPV). While a few studies have revealed heterogeneity of cervical cancer in chromosome levels, the correlation between genetic heterogeneity and HPV integration in cervical cancer remains unknown. Here, we applied multi-region whole-exome sequencing and HPV integration analysis to explore intratumor heterogeneity in cervical cancer. We sequenced 20 tumor regions and 5 adjacent normal tissues from 5 cervical cancer patients, analysis based on somatic mutations and somatic copy number alterations (SCNAs) levels were performed. Variable heterogeneity was observed between the five patients with different tumor stages and HPV infection statuses. We found HPV integration has a positive effect on somatic mutation burden, but the relation to SCNAs remains unclear. Frequently mutated genes in cervical cancer were identified as trunk events, such as FBXW7, PIK3CA, FAT1 in somatic mutations and TP63, MECOM, PIK3CA, TBL1XR1 in SCNAs. New potential driver genes in cervical cancer were summarized including POU2F2, TCF7 and UBE2A. The SCNAs level has potential relation with tumor stage, and Signature 3 related to homologous recombination deficiency may be the appropriate biomarker in advanced cervical cancer. Mutation signature analysis also revealed a potential pattern that APOBEC-associated signature occurs in early stage and signatures associated with DNA damage repair arise at the later stage of cervical cancer evolution. In a conclusion, our study provides insights into the potential relationship between HPV infection and tumor heterogeneity. Those results enhanced our understanding of tumorigenesis and progression in cervical cancer.


Introduction
Cervical cancer is the fourth most common malignancy among women worldwide for both incidence and mortality [1]. ITH in cervical cancer may contribute to poor clinical outcomes by generating phenotype diversity and initial drug resistance [2], and it is further associated with an enhanced risk of recurrence and metastasis [3,4]. Recently, ITH for genetic alterations has been demonstrated in various cancer types [3,5,6]. Subclonal populations of most tumors appear in spatial and temporal heterogeneities during tumor evolution, which is generated by genomic instability and genetic diversity [7]. ITH may promote tumor evolution and adaptation, and bring challenges to personalized medicine and biomarker development [5]. The traditional strategy of bulk sequencing in a single tumor sample is not able to reveal the nonuniform distribution of subclonal populations among different tumor sites. An accurate assessment of ITH is necessary for effective personalized therapies [8]. Multi-region sequencing is an emerging strategy in the last decade and has promoted the exploration of the complex clonal architectures of different cancer types. However, multi-region sequencing Chen Wang, Rui Bai and Yu Liu contributed equally to this work. Tissues from five cervical cancer patients were obtained from the first affiliated hospital of Shihezi University Medical College (Shihezi, China) between June and September 2019. The diagnosis of cervical cancer was independently confirmed by two pathologists. Representative images of hematoxylin and eosin (H & E) staining of cervical tumor tissues are shown in Fig. S1. None of the patients was treated with radiotherapy or chemotherapy. We have compiled all relevant ethical regulations for working with human participants, and that written informed consent was obtained from each patient. Four spatially isolated tumor specimens were obtained per patient with each primary tumor at least 0.5 cm away from the others. Meanwhile, adjacent normal tissues located 0.5-1 cm away from the corresponding tumors were collected. Clinical data and sequencing details are shown in Tables S1-S2. Tumor purity was estimated using the computational method ABSOLUTE (v1.0.6) based on the copy number aberrations and somatic mutations [13].

Multi-region whole-exome sequencing
After dissection of multiple regions of tumor and matched tumor-adjacent noncancerous tissues, a minimum of 200 ng DNA was extracted and used for sequencing. Exomecapture sequencing was performed using Illumina HiSeq X Ten. For each sample, we first used fastp (v0.20.0) for data trimming [14]. BWA was used to align the paired-end sequencing reads to the human reference genome build 38 (GRCh38/hg38) [15]. Samtools and GATK (4.1.4.0) were also employed in the pre-processing for variant discovery according to the standard pipeline [16,17]. The pipeline of GATK (GATK4 Mutect2) was used to analyze the filtered BAM files. ANNOVAR package was used to get the functional annotations of somatic mutations [18]. To avoid bias, data recovery was carried out. All mutations with allele frequency > 5% and sequencing coverage > 20× were collected. For each mutation, we inspected the presence of sequencing reads that support the related mutation in different spatial regions of the same patient.

Spatial mutation categories
Four regional biopsies of each patient were collected to examine ITH. Spatial mutation categories were classified into "ubiquitous", "shared" and "private". Ubiquitous mutations are observed in all sample regions. Shared mutations are present in at least two regions, but not all regions. Private mutations only exist in one region of the tumor. R packages maftools (v2.2.10) and pheatmap (v1.0.12) in R (v3.6.1) were used to generate distribution patterns of spatial regions [19].

Phylogenetic tree construction
We followed the method described in Murugaesu's research to generate the phylogenetic tree [6]. All non-silent mutations were used to create a binary absence/presence matrix. The matrix was used as input of R package phangorn (v2.5.5) [20]. UPGMA hierarchical clustering and parsimony ratchet analysis were employed to construct the unrooted tree. Branch lengths were determined by acctran function. The branch lengths were proportional to the number of nonsynonymous mutations. Trunk, branch and private mutations corresponded to ubiquitous, shared and private mutations in the heatmap.
Matched tumor-normal coverage files were used as input of ExomeCNV R script. The thresholds of the log2 ratios (tumor/matched adjacent) were set as 0.25 and − 0.25. R package circlize (v0.4.11) was used to compare the patterns of SCNAs between spatial regions [22].

Mutational spectra and signature analysis
To reveal the mutagenic processes in cervical cancer, the R package deconstructSigs was used to analyze the mutational spectra of trunks and branches [23]. SNVs were categorized into 6 mutation classes corresponding to different base-pair substitutions. The hypothesis that branch and private mutations tend to harbor more C > A substitution was tested with student's t-test.

HPV type detection
The "quick detect" mode of HPVDetector was used to identify the presence of HPV types and the number of HPV integrated reads [24]. The "integration detect" mode of HPV-Detector was used to determine the chromosome positions for the sites of HPV integration in each tumor sample, and annotate with human genes, chromosomal positions and HPV genes. Default index files of reference were used in the config documents and clean fastq files were used as input of analysis.

Gene ontology and KEGG pathway annotations
We integrated annotated SNV results of four regions into one file, and each entry was classified as trunk, branch or private. All mutation genes of 20 tumor regions of 5 patients were analyzed by DAVID [25,26]. The Gene Ontology biological process and KEGG were selected as the annotation results. In-house pipeline was developed to classify the GO functional annotations or KEGG pathway annotations, which are present in eight levels in each patient. For example, "all" represents the GO function or KEGG pathway that was related to trunk, branch and private mutation genes in a patient.

The mutation landscape of cervical cancer
We utilized multi-region WES to determine the mutational ITH of cervical cancer from 5 patients. For each patient, four tumor regions and one noncancerous tumor-adjacent tissue (as control) were collected. The average tumor purity of multiple regions of each patient is 0.35, 0.40, 0.68, 0.44 and 0.41 for pa01, pa07, pa10, pa11, pa14, respectively (Table S3). For WES, targeted exons were covered with a median depth of 260× and 124× for tumor and normal samples, respectively. Tables S4-S8 show the details of somatic mutations of each patient, including the frequency of mutated genes in 485 cervical tumor sample cohort obtained from cBioPortal, as well as other mutated genes that were not identified in cBioPortal [27,28]. Totally, 91 trunk mutations were identified in 5 patients, some driver genes had a high mutation frequency, such as FBXW7 (11.1% in 485 cervical tumor sample cohort), PIK3CA (28%), and FAT1 (6.8%) (Fig. 1A). The other driver genes mutated frequently in cervical cancer were categorized as branch and private mutations, including TTN (31.8% in 485 cervical tumor sample cohort) and KMT2D (13%). We found 68 new mutated genes in cervical cancer compared to cBioPortal, potential new driver genes in cervical cancer were POU2F2, TCF7, UBE2A. Collectively, 1,125 exonic functional somatic mutations were obtained from 20 tumor samples, including 953 missense, 50 nonsense, 21 splice-site, 4 non-stop mutations, 64 frameshift indels, and 33 in-frame indels (Fig. 1B). C > A and C > T transversions account for the major part of single-nucleotide variant (SNV) classes (Fig. 1C). Mutation burden shows a great discrepancy among 20 tumor samples (Fig. 1D). The top ten frequently mutated genes were illustrated with the proportion of samples (Fig. 1E). We found that different patients rarely share recurrent mutations, indicating the complex mechanism of cervical cancer progression and significant inter-patient heterogeneity. A substantial discrepancy of heterogeneity in mutations was observed in each patient by heatmap visualization (Fig. S2). Mutations were categorized into ubiquitous mutations, shared mutations and private mutations (Fig. S2, Tables S4-S8). To compare trunk mutation genes between cervical cancer and other cancer types, we utilized the CancerTracer which is a comprehensive ITH database containing over 6000 multi-region tumor samples from 1548 patients corresponding to 45 cancer types [29]. Most trunk mutation genes in each patient also mutated at trunk level in other cancer types, and part of important genes have numerous trunk mutations in different cancer types, such as APOB, FBXW7, FSIP2, PIK3CA, CSMD1 and FAT1 (Figs S3-S7, Table S9). Notably, PIK3CA (p.E542K) also occurred at trunk level in bladder cancer, breast cancer, esophageal squamous cell carcinoma and hepatocellular carcinoma. The amino acid changes of each trunk mutation in all patients were also scanned in The Cancer Genome Atlas (TCGA) datasets using cBioPortal. Twelve identity mutations and seven different mutations at the same location were found (Table S10), and PIK3CA (p.E542K) existed in 13 cancer types and 23 cervical cancer samples. In addition, detected somatic mutations were compared with reported driver genes in previous studies, including Cancer Gene Census [30], Bert Vogelstein's research [31], pancancer research [32], David Tamborero's research [33], and intOGen [34]. In general, the results of multi-region WES revealed that heterogeneities not only exist between individuals but also in different tumor regions of the same individual.

Phylogeny estimation from somatic mutations
The progression of ITH is comparable to a growing tree. Most cancers evolve through branched trajectories in which ubiquitous mutations are located on the trunk, and heterogeneous mutations on the branches of a phylogenetic tree [7]. To reveal the branch evolution patterns between different tumor regions, we reconstructed a phylogenetic tree based on the parsimony ratchet analysis method [20]. The branch lengths are proportional to the counts of mutations (Fig. 2). A significant discrepancy of phylogenetic tree was observed among patients. Each region harbored a similar number of mutations in the four regions of patient pa01, pa07, and pa14. However, in patients pa10 and pa11, one of the four regions in each patient harbored much more somatic mutations than the other tumor regions. Driver genes obtained from previous studies were labeled on the tree [30][31][32][33][34].

Multi-region analysis of somatic copy number alterations
SCNAs play an important role in cervical tumor progression. HPV infection may cause gene amplification or deletion in nearby integration regions [35]. SCNAs may influence the expression of both protein-coding and non-coding genes, changing the activity of many signaling pathways [36]. We performed SCNAs analysis using R package ExomeCNV (v1.4) in R programming language (v3.2.2) [21]. Significant SCNAs heterogeneity exists among different regions of each individual patient. Patients 11 and pa14 possess fewer SCNAs compared to patients pa01, pa07, and pa10. Patient pa10 harbored a great number of SCNAs, and hundreds of cancer driver genes were affected by amplification or deletion. SCNAs details of each region of each patient are illustrated in Fig. 3 and Figures S8-S11. Driver genes obtained from previous studies were labeled in the Circos plots. Table S11 shows genes that were amplified or deleted in 4 region tissues of the patient and gene frequency of SCNAs in 588 cervical tumor sample cohort obtained from cBioPortal [27,28]. There were 106 cancer consensus genes and 186 driver

Mutational spectra and signature analysis
The causes of somatic mutations are manifold, including exogenous and endogenous mutagen exposures, defective DNA repair and DNA replication infidelity, thus mutation signatures may be related to cancer etiology, prevention and therapy [37]. To further reveal the mechanisms of the mutational processes during cervical cancer evolution, we examined the mutation spectra of the somatic mutation in trunk and branch/private mutation using deconstructSigs [23]. As is shown in Fig. 4A, all cases were accompanied by a significant increase in C > A transversions in branch/private mutations compared with trunk mutations (t test, p = 0.044). The overall mutational signatures were different between trunk and branch/private mutations. Signature 1 (associated with age) was widely identified in trunk and branch/private mutations and has been found in all cancer types and most cancer samples [37]. APOBEC-associated Signature 2 and Signature 13 were observed in trunk mutations of pa07. DNA damage associated Signature 3 and Signature 15 was found in trunk mutations of pa10 (Fig. 4B). Interestingly, we noticed an increase of signatures associated with DNA mismatch repair ( Signatures 6,15,20) in the branch/private compared with trunk mutations for pa01, pa07, pa10, and pa14, although without statistical significance due to the relatively small sample cohort (Fig. 4, Fig. S12).

HPV-type detection and integration
HPV may randomly integrate into the human genome according to previous studies [38][39][40], and the integration is one of the major risk factors for cervical tumorigenesis [41]. Integration sites located in functional genes may promote the transformation of malignant cells [42,43]. Thus, frequently affected genes by integration may be used as biomarkers for personalized medicine strategies [44,45]. HPVDetector is used to detect HPV from tumor samples using sequencing data [24]. We used the quick detection mode to determine the HPV types and HPV-related reads to confirm whether the co-infection of multiple HPV existed in a tumor specimen. The integration detect mode was also used to obtain the detail of HPV integration. Compared to the comprehensive landscape of integrated genes reported in previous studies [38][39][40], 2420 HPV integration sites in gene region were detected in 13 of 20 samples, 458, 39 and 40 sites located in previously reported genes for HPV 16, 18 or other types, respectively (Tables S12-S13). Several previously reported genes frequently affected by HPV integration, including FHIT, LRP1B, and DLG2, were identified in our study [40]. Fifteen genes with more than 4 events including RB1, PTEN, TTN and NF1 [40], other frequently integrated genes were NOX5 and MIR548H4. The top five types of HPV infection in different cervical lesions in China are HPV 16, 18, 58, 52, and 33. Figure 5 illustrates the infection details and mutation counts in each sample. As expected, 4 patients were infected by HPV 16, 18, 52, and shows a great discrepancy in integration level.

Gene Ontology functional and KEGG pathway annotations
The mutations at the gene level were not converged to specific genes in cervical cancer from our multi-region sequencing data. To further determine whether the mutations   Gene Ontology and KEGG pathway annotations. Different colors represent the different levels of GO terms (A) or KEGG pathways B enriched in specific patient. "All types" represents the GO functions or KEGG pathways are related to trunk, branch and private mutation genes in that individual patient converge to specific pathways at different mutational levels, in-house pipeline was developed to classify the GO functional annotations or KEGG pathway annotations into 8 levels in each patient based on the trunk, branch or private mutational status of related genes (Fig. 6). The GO functional annotation results illustrated that patients pa01, pa07, and pa10 shared similar GO terms at the same level. However, patient pa11 shows great discrepancy compared with other cases because of fewer trunk and branch mutations presented in that patient. Patients pa07 and pa10 also possess similar pathways at the same level. The most frequently affected GO biological processes by trunk or branch mutations were regulation of transcription, signal transduction, protein phosphorylation, transport, multiple organism development, inflammatory response, and so on. As expected, the most frequently affected pathways by trunk or branch mutations were pathways in cancer, metabolic pathways, MAPK signaling pathway. These results may reveal the essential roles of the above pathways in cervical cancer progression.

Discussion
Multi-region sequencing is a valuable strategy in cancer research and has promoted the exploration of different cancer types. Previous studies on different types of cancer suggested that 4 regions were moderate for detecting ITH in cervical cancer [3,5,6]. Although heterogeneity studies on cervical cancer have been performed at chromosome level and single-cell RNA sequencing levels [46,47], our strategy of multi-region WES is eligible for evaluating ITH in cervical cancer with high sequencing depth. Inevitably, different tumor purity in each tumor region might give rise to lower ITH in somatic mutation or SCNA level.
In the current study, variable ITH was observed in both somatic mutation and SCNAs levels. HPV integration has a positive effect on somatic mutation burden (student t test, p = 0.005), consistent with previous conclusions that persistent HPV infection creates a suitable environment which is also advantageous to the accumulation of genetic alteration [39]. Notably, PIK3CA (p.E542K) is a trunk mutation in pa07 (HPV-16 positive) and several cancer types (CancerTracer database). Besides, it exists in 13 cancer types and 23 cervical cancer samples (Tables S9-S10).
As an important driver gene in human cancer, PIK3CA (p.E542K) is a hotspot related to PI3K-AKT-MTOR pathway, and it may be a potential biomarker in HPV positive cervical cancer [48,49]. Our studies have identified other driver genes in different heterogeneity levels in cervical cancer, and those genes may be a potential candidate target for precision medicine. The trunk driver genes which had a high mutation frequency in previous cervical cancer studies may play an important part in tumorigenesis, such as FBXW7, PIK3CA and FAT1. The other driver genes mutated frequently which were categorized as branch and private mutations may promote tumor progression, including TTN and KMT2D. The potential new driver genes of cervical cancer, POU2F2, TCF7 and UBE2A require further validation. The selective advantage of driver aberrations provides special mechanisms for cancer cell fitness. When employing targeted therapeutic approaches, driver events contribute significantly to cancer progression even when it presents in a low frequency [7]. In this study, two tumor regions (pa10_region4 and pa11_region2) with much more driver genes harbored the highest tumor burden indicating the significant impact of driver mutations in tumor progression [5]. Detection of driver mutations and genes has important significance, which can further facilitate the understanding of the molecular mechanism for tumor formation and the development of effective drugs and therapeutic regimens for cancer. Further analysis based on heterogeneity level shows a significant discrepancy between homogeneous and heterogeneous aberrations. For mutation spectra and signature analysis, the pattern of the increase in C > A transversions in branch/private mutations compared with trunk mutations was also found in the multi-region heterogeneity research of primary malignant melanoma of the esophagus and human colorectal adenoma [50,51]. To excavate the potential pattern of nucleotide changes between trunk and branch/private will require further investigation. The association between mutational signature and cancer etiology has been proved by previous studies [37]. In this study, we tend to reveal the pattern of signatures at heterogeneity level by comparing the similarities and differences between trunk and branch/private signatures. As expected, Signature 1 widely exists in the lifetime of tumor evolution. The APOBEC family of proteins has important functions in human diseases. The functions of these proteins include binding to RNA and single-strand (ss) DNA. APOBEC activities usually lose control during cancer development and lead to DNA hypermutation and promiscuous RNA editing [52]. Previous research has revealed that APOBEC is a molecular driver in cervical pathogenesis, and APOBEC-associated Signature 2 and 13 were observed in trunk mutations of patient pa07 further indicating they are important factors in tumorigenesis of cervical cancer [53]. Although without statistical significance, the increase of signature 6, 15 and 20 (associated with defective DNA mismatch repair) is an extensive pattern between trunk and branch/private, and defective DNA mismatch repair is a potential predictive marker in cervical cancer [54]. Signature 2, 13, 6, 15 also exist in other squamous-type cancers [37]. These results may suggest different mutational processes contribute to subclonal diversification in tumor progression of cervical cancer, although inconclusive owing to limited samples. Annotation based on different heterogeneity levels showed a new perspective in the research of tumor initiation and evolution. GO terms annotation presented an important role in transcriptional reprogramming, signal transduction and immune response in relatively early stages of cervical cancer progression. KEGG pathways analysis revealed the essential roles of pathways in cervical cancer progression, including MAPK signaling, Ras signaling and PI3K-Akt signaling.
The non-significant correlation between HPV integration and SCNAs shows a complicated pattern of SCNAs in cervical cancer. For SCNAs analysis, 106 cancer genes and 186 driver genes were identified in all the trunk SCNAs genes, such as TP63, MECOM, PIK3CA, TBL1XR1 amp (with a high frequency in previous studies). We also identified 69 novel trunk SCNAs genes. These driver genes affected by trunk SCNAs may play an important role in tumor evolution and are worth further study. Notably, patient pa10 is in clinical stage IIIC and harbored thousands of SCNAs genes. Patients pa01 and pa07 were diagnosed with stage IB and had relatively fewer SCNAs. Patients pa11 and pa14 were diagnosed with stage IA and harbored most fewer trunk SCNAs genes. CNA affected driver genes may be prognostically related to stage or grade [40,41]. The correlation between trunk SCNAs events and tumor stage requires further investigation and may promote the diagnosis of cervical cancer. The trunk mutation related to signature 3 may be the potential reason for pa10 possessing the greatest number of SCNAs and late tumor stage, signature 3 is a robust and independent biomarker of HRD [55]. In pa10, HRD genes NBN deletion and WRN amplification occurred in all regions may promote tumor progression [56]. Therapy directed toward HRD is now approved in breast and ovarian cancer, and there may also be potential opportunities for a benefit for clinical application in cervical cancer.
HPV detection reveals the correlation between somatic mutations and HPV integration. The potential function of frequently integrated genes in cervical cancer should be taken into account. RB1 is involved in tumor-associated pathways including cell cycle, P53 pathway, and PI3K-AKT signaling pathway. PTEN is associated with P53 and PI3K-AKT signaling pathways. NF1 is related to RAS signaling pathway [57]. NOX5 and MIR548H4 integrations occurred in 3 of 4 HPV-positive patients and are worth further study in cervical cancer. The biological significance of NOX5 to promote malignancy of tumor cells was reported in esophageal squamous cell carcinoma [58]. MIR548H4 was identified as one of the methylation markers for early occult lymph node metastasis in non-small cell lung cancer [59]. These results suggest that HPV integration detection is essential, and frequently affected genes in cancer pathways may be potential biomarkers [44,45].
In a conclusion, five patients with different tumor stages and HPV integration status show great heterogeneity discrepancy in somatic mutations and SCNAs levels. We classified somatic mutations, SCNAs, signatures and pathways at different ITH levels, it is meaningful to investigate ITH at different levels and combine it with HPV integration analysis in cervical cancer. The extensive ITH might contribute to poor clinical outcomes and enhance the risk of recurrence and metastasis in different individuals. This study provides a new perspective on ITH in cervical cancer.