Multiple-level copy number variations in circulating cell-free DNA for prognostic prediction of hepatocellular carcinoma with radical treatments

Copy number variations (CNVs) in circulating free DNA (cfDNA) are emerging as minimally invasive prognostic biomarkers for various cancers. However, little has been reported on the multiple-level analysis of cfDNA CNVs for hepatocellular carcinoma (HCC) patients with radical treatments. Here, CNVs at genome-wide, chromosomal-arm and bin levels were analyzed in cfDNA from 117 HCC patients receiving radical treatments via low-coverage whole genome sequencing. Then, the relationship between cfDNA CNVs and clinical outcomes was explored. Our results showed that a concordant prole of CNVs was observed between cfDNA and tumor tissue DNA. Three genome-wide CNV indicators including tumor fraction (TFx), prediction score (P-score) and stability score (S-score) were calculated based on genome-wide cfDNA CNVs. Kaplan-Meier analysis exhibited a signicantly poorer overall (OS) and recurrence free survival (RFS) than those with low TFx, and respectively (All P < Furthermore, a group of high frequency CNVs at chromosomal-arm level including loss of 17p, 19p and the gain of 8q, 1q clearly predicted of HCC Finally, a bin-level risk score was constructed based on three most relevant bin identied by a LASSO model. Patients with high bin-score had a signicantly poorer OS than those with low bin-score (P < 0.001). in CNVs and clinical outcomes was explored. Our study indicates that the multiple-level cfDNA CNVs are signicantly associated with OS and RFS in HCC patients with radical treatments, suggesting that cfDNA CNVs detected by low-coverage WGS may be used as potential prognostic biomarkers for HCC patients.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common cancers and a leading cause of death worldwide (1,2). More than 75% HCC patients are etiologically associated with chronic hepatitis B virus (HBV) infection in the Asia-Paci c region (3). Radical treatment, including surgery and radio frequency ablation, is the mainstay curative treatment for early-stage HCC patients (4). Unfortunately, tumor recurrence and metastasis still greatly contribute to the poor prognosis of HCC, partially due to the lack of effective biomarkers (5,6). Therefore, it is of primary importance to identify prognostic biomarkers for guiding treatment decisions, especially in patients with radical treatments. Currently, some new biomarkers, such as miR-181a-5p(7) and VEGFA (8), have been reported as potential prognostic indicators of HCC patients. However, these biomarkers still need further validation for clinical applications. To date, the only clinically available blood-based biomarker for HCC surveillance is alpha fetoprotein (AFP), whereas its clinic utility is severely limited by low sensitivity and speci city (9). Hence, there remains an urgent unmet need to develop novel prognostic biomarkers for HCC patients.
Cell free DNA (cfDNA) in tumor patients contains the fragmented circulating tumor DNA mainly derived from dead tumor cells through necrosis and apoptosis, which can be used as a surrogate source of tumor tissue DNA (10)(11)(12). Carrying the comprehensive information of tumor genome pro les, such as methylations, point mutations, and copy number variations (CNVs), cfDNA therefore holds great potential to represent the entire molecular picture of a patient's malignancy. Several studies have reported the signi cant promise of cfDNA as novel biomarkers for HCC by assessing its methylation status and point mutations (13,14). However, the high cost and complex detection pipeline limit the wide application of methylation-based assay. Meanwhile, only a small fraction of mutations presented in both tumor tissue and plasma, which greatly hinders their clinical utility. CNV pro ling offers several advantages over methylation or mutation analysis for cancer detection, including larger genomic region span and the ability for detecting structural genomic alterations (15). Moreover, the detection of cfDNA CNVs by ultra-low coverage sequencing makes it a cost-effective approach for potential clinical applications (16). Previous studies have suggested the prognostic utility of cfDNA CNVs in HCC patients treated with Sorafenib (17). Furthermore, Cai et al. have reported that plasma CNV indicator (TFx) at genomewide level are dynamically correlated with tumor burden in a relatively small cohort (n = 34) of HCC patients receiving surgical resection (18). Recent multiple-level analysis of CNVs in triple negative breast cancer patients shed light on a more appropriate and comprehensive way in the analysis of cfDNA CNVs (10). Nevertheless, little has been addressed on cfDNA CNVs at multiple levels for HCC patients who received radical treatments.
Here, CNVs at genome-wide, chromosomal-arm and bin levels were analyzed in plasma cfDNA from 117 HBV-HCC patients by low-coverage whole genome sequencing. Then, the relationship between cfDNA CNVs and clinical outcomes was explored. Our study indicates that the multiple-level cfDNA CNVs are signi cantly associated with OS and RFS in HCC patients with radical treatments, suggesting that cfDNA CNVs detected by low-coverage WGS may be used as potential prognostic biomarkers for HCC patients.

Materials And Methods
Patients and sample collection A total of 117 HBV-related hepatocellular carcinoma (HBV-HCC) patients who were subjected to radical treatments were recruited between May 2016 and July 2017 from Department of Hepatobiliary Surgery at Xijing Hospital, Fourth Military Medical University (FMMU), Xi'an, China (Fig. 1). Prior to treatment, 5mL of peripheral blood sample was collected in EDTA tubes (BD) from each patient. Fresh HCC tumor tissue samples were also obtained from available cases. This study was approved by the Ethical Committee of FMMU and written consent was obtained from each patient.
Plasma processing and cfDNA extraction To separate plasma, peripheral blood was rstly centrifuged at 1,600 rpm at 4℃ for 10 minutes. The resultant plasma supernatant was further puri ed by centrifugation at 10,000 rpm at 4℃ for 15 min to remove residual cells or debris. The plasma was stored at -80°C until DNA extraction. Plasma cfDNA was extracted using the QIAamp Circulating Nucleic Acid Kit (Qiagen) according to the manufacturer's protocol. The quantity and quality of cfDNA samples were evaluated using Qubit3.0 (Thermo) and Agilent 2100 bioanalyzer (Agilent).
Tumor tissue collection and DNA extraction A total of 19 HCC tissue samples were selected for further CNV analysis. H&E staining was carefully reviewed by a pathologist to ensure that the cancer cell content was >70%. Tissue DNA was extracted from frozen HCC tumor samples using the E.N.Z.A tissue DNA Kit (Omega) according to the manufacturer's protocol. The quantity and quality of tissue DNA samples were evaluated using NanoDrop 2000 (Thermo) and agilent 2100 bioanalyzer (Agilent).

Library preparation and whole genome sequencing
The construction of whole genome sequencing library for tissue DNA samples and plasma cfDNA samples was performed using the NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB) according to the manufacturer's protocol as previously described (19,20). The resulting libraries were sequenced to about 5× genome-wide depths for cfDNA and about 40 × genome-wide depths for tissue DNA on an Illumina HiSeq X Ten platform in 150-bp paired-end mode.

Sequencing data analysis
The raw paired-end reads were mapped to the human reference genome GRCh37 with BWA-0.7.4 and the BAM les were analyzed using Picard tools-1.9.2 (https://broadinstitute.github.io/picard/) to mark and remove the duplicate reads. Quality control was performed using QPLOT to analyze the genome-level statistics. Then, ichorCNA was used to pro le the CNVs across the genome for further multiple-level analysis using default algorithm parameters (available at https://github.com/broadinstitute/ichorCNA) (16). In addition, the copy number data of 370 HCC patients from TCGA database was obtained through the UCSC Genome Browser (http://xena.ucsc.edu/).

Calculation of genome-wide CNV indicators
To quantify the genome-wide CNVs in plasma cfDNA, three indicators were calculated based on three different algorithms. TFx was directly estimated by ichorCNA algorithm with default parameters (16). In addition, a Pscore was calculated as previously reported, which was based on a random forest (RF) model of HBV-HCC and cancer-free HBV patients using the genome-wide CNV pro les (21). This P-score was further used for prognostic assessment. Thirdly, we de ned a novel S-score, which was intended to re ect the extent of CNV deviation from diploid genome and was calculated using the following formula: where Xi means the log 2 ratio of bin-speci c value calculated by ichorCNA, 0 represents a desirable log 2 ratio value of diploid genome, n is the number of bins (n=2475).
Chromosomal-arm level CNV analysis First, the genome was divided into non-overlapping bins of 1-megabase size. Bins with corrected copy number greater than 2 were de ned as bin gain and bins with corrected copy number smaller than 2 were de ned as bin loss. Chromosomal-arm gain or loss was de ned when more than 50% regions of entire chromosome arm was identi ed as bin gain or bin loss. The frequency of each chromosome arm with gain or loss was calculated as the number of patients with speci c chromosomal-arm gain or loss / total patient number and used for subsequent analysis.
Construction of bin-level risk score A univariable Cox proportional hazard regression analysis was performed on all 1-megabase bins to select the signi cant bin candidates, and the least absolute shrinkage and selection operator (LASSO) regression model was used to further screen the candidate bins with prognostic signi cance (iteration=10000). The LASSO regression was analyzed using 'glmnet' package from R (version 3.6.2) (22). Next, the multivariate Cox proportional hazard regression model was used to construct the bin-level risk score, which was de ned as below: Where β represents the coe cient value and Cnv represents the bin's CNV level de ned by ichorCNA algorithm.

Statistical analysis
Pearson's correlation coe cients were used to determine the correlation of CNV frequencies among the cfDNA and tissue DNA in our cohort and tissue DNA from TCGA. Spearman correlation coe cients were used to evaluate the correlations among genome-wide CNV indicators. Associations between clinical variables and multiple-level CNV indicators were assessed using either Chi-square test or Fisher's exact test. Survival analysis was performed using Kaplan-Meier log-rank test. Univariable and multivariable Cox proportional-hazards models were used to examine the association between the CNV indicators and death or recurrence in HCC patients. The ROC curve was adopted to assess the performance of the CNV-related indicators using 'survival ROC' package. All above statistical analyses were carried out in R package (version 3.6.2). The P value of <0.05 was considered as statistically signi cant and all probabilities were two-tailed.

Results
Clinical characteristics of 117 HBV-related HCC patients As shown in Table 1, the median age of patients was 55 years old (ranging from 31-79 years), and 102 (87%) patients were male. In this cohort, tumor size was smaller than 3cm in 30 (26%) patients, 3-5 cm in 50 (42%) patients, and larger than 5cm in 37 (32%) patients. Microvascular invasion (MVI) was observed in 74 (63%) patients. There were 61 (52%), 44 (38%) and 12 (10%) patients with AJCC stage I, stage II and stage III diseases, respectively, while 98 (84%) and 19 (16%) of the patients with Barcelona Clinic Liver Cancer stage (BCLC) 0-A and B-C. The liver function in nearly all the patients (n=112, 96%) was distributed at Child-Pugh Class (CTP class) A. Follow-up information was obtained from 115 patients with a median follow-up time of 32 months (ranging from 1 to 45 months). During the follow-up period, 33 patients died of HCC and 66 patients developed recurrence.

Multiple-level CNV analysis of cfDNA
Whole-genome sequencing was carried out at a mean depth of 4.01× (range, 2.78 to 7.86) for plasma cfDNA samples and 29.51× (range, 18.42 to 49.23) for tissue DNA samples. The sequencing data information was summarized in Table S1, showing good sequencing quality. The ichorCNA was used for subsequent CNV analysis. TFx, P-score and S-score were calculated based on genome-wide cfDNA CNVs. The different distribution of TFx, P-score, S-score and cfDNA concentration was shown in Fig. S1A. Signi cant positive correlation was observed among TFx, P-score and S-score. The top 4 most frequent gains of 20p, 8q, 1q, 20q and the top 4 most frequent losses of 17p, 4q, 19p and 16q were identi ed in 117 HBV-HCC patients (Fig. S1C). Furthermore, signi cant negative correlation was found between gain and loss frequencies (Pearson r=-0.804, P<0.001, Fig. S1C), indicating that CNVs at chromosomal-arm level preferred to gain or loss, but rarely both. In addition, the bin-level CNVs were obtained from ichorCNA and the bin frequencies altered across genome among HBV-HCC patients were exhibited in Fig. S1D.

Concordant pro le of CNVs was observed between cfDNA and tumor tissue DNA
To estimate the concordance between cfDNA and tissue DNA, the frequency of CNVs at bin level or chromosomal-arm level was analyzed among the cfDNA, tissue DNA in our cohort and tissue DNA from The Cancer Genome Atlas (TCGA) database. Similar patterns of CNVs at both bin and chromosomal-arm level were observed between cfDNA and tissue DNA ( Fig. 2A and 2C). Moreover, our data indicated that the frequency of CNVs at bin level in cfDNA was signi cantly correlated with that in matched tumor tissue DNA (Gain: r=0.649, P<0.001; Loss: r=0.856, P<0.001. n=19; Fig. 2B). When analyzed at chromosomal-arm level, the frequency of CNVs in cfDNA was also signi cantly correlated with that in matched tumor tissue DNA (Gain: r=0.779, P<0.001; Loss: r=0.912, P<0.001. Fig. 2D). In addition, the CNV frequency in TCGA tissue DNA were signi cantly correlated with the CNV frequency in both tissue DNA (Gain: r=0.739; P<0.001, Loss: r=0.692; P<0.001.  Table 1 and Table S2-5, respectively. Patients were categorized into two groups by the cut-off values of TFx, P-score, S-score and cfDNA concentration determined by maximizing Youden index ( Table 1). The results indicated that patients with high TFx, P-score and S-score exhibited a signi cantly higher proportion at late stage (stage III) and with large tumor size (≥ 5cm) than those with low TFx, P-score and S-score, respectively (All P<0.05). In addition, an obvious trend was observed that TFx, P-score and S-score were associated with MVI and BCLC stage, although only P-score attained statistical signi cance (P<0.05). Importantly, patients with high TFx, P-score and S-score exhibited a signi cantly higher recurrence and death risk than those with low TFx, P-score and S-score, respectively (All P<0.05). The similar results were observed in patients with gains of 20p, 8q, 1q and 20q (Table S2), losses of 17p, 4q, 19p and 16q (Table S3), high bin-score (Table S4) and high cfDNA concentration (Table S5).
Genome-wide indicators of cfDNA CNVs were signi cantly associated with prognosis of HCC patients receiving radical treatments.
As shown in Table S6, the univariable Cox regression analysis indicated that BCLC stage, AJCC stage and tumor size were associated with overall survival (OS) and recurrence free survival (RFS) of HCC patients receiving radical treatment, which is greatly consistent with previous reports (23). Furthermore, HCC patients were respectively strati ed by cut-off value of three genome-wide CNV indicators determined by maximizing Youden index. Kaplan-Meier analysis showed that the patients with high TFx, P-score, S-score and cfDNA concentration exhibited a signi cantly poorer OS and RFS than those with low TFx, P-score, S-score and cfDNA concentration, respectively (All P<0.05. Fig.3). Similar results were observed when median value of TFx, P-score and S-score was used as cut-off values (All P<0.05. Fig. S2A, S2B and S2C). However, no signi cant difference was observed between patients with high and low cfDNA concentration in OS (P=0.117) and RFS (P=0.731.  (Table 2). Furthermore, the receiver operating characteristic (ROC) curves were used to characterize the discrimination potential of TFx, P-score, S-score, cfDNA concentration, MVI and BCLC stage for 3-year survival or 1-year survival. The results showed that the TFx, S-score and P-score had similar areas under the ROC curve (AUCs) and exhibited a slightly better performance than cfDNA concentration, BCLC and MVI, no matter of 3-year survival or 1-year survival (Fig. 3). It should be noted that the ROC curves of the concentration lay across the diagonal line, which may be due to an unstable estimation using concentration.
High frequency cfDNA CNVs at chromosomal-arm level predicted prognosis of HCC patients.
To evaluate the prognostic value of cfDNA CNVs at chromosomal-arm level in HCC patients receiving radical treatments, the top 4 most frequent gains or loss were used. The Kaplan-Meier survival analysis (Fig. 5 and Fig.   S3) showed that patients with loss of 4q, 17p, 19p 16q and the gain of 8q, 1q, 20q exhibited signi cantly poorer OS and RFS than those without corresponding chromosomal-arm loss and gain, respectively (All P<0.05). Moreover, 20p gain was only related to recurrence (P=0.002) but not death (P=0.105). Univariable Cox regression analysis also indicated the gain or loss of these chromosomal arms was signi cantly associated with OS and RFS in HCC patients (Table S6) Table 3). The 20p gain and 20q gain were only observed to be associated with recurrence risk, with HR of 1.76 (95%CI 1.03-3.01) and 1.96 (95%CI 1.13-3.39), respectively (All P<0.05. Table 3). ROC curve analysis showed that these gain and loss at chromosomal-arm level showed similar AUCs either for 3-year survival or 1-year survival, with a slightly better performance for 1year survival (Fig.6). These results suggest that high frequency cfDNA CNVs at chromosomal-arm level may be used as prognostic biomarkers for HCC patients.
A bin-level risk score improved the ability of CNVs in predicting prognosis To investigate the prognostic value of cfDNA CNVs at bin (1M) level in HCC patients, 1406 prognosisassociated candidate bins were selected from a total of 2475 bins by a univariable Cox regression analysis for further screening based on LASSO model and multivariable Cox regression model. Finally, three bins were chosen to construct the bin-level risk score (simpli ed as bin-score) according to the coe cients from the multivariable Cox regression analysis (Fig. 7A, 7B and Fig. S4B). The distribution of patients with the low and high bin-score, survival status of patients, and heat map of CNVs of the three hub bins were shown in Fig. S4A.
Two of three bins were located on the chr8q (gain) and chr17q (loss), respectively. Another one was located at chr6q12, which could be found in TCGA copy number dataset from cBioPortal (Fig. S4B). When HCC patients were strati ed into two groups by the median value of bin-score, the results showed that patients with high binscore had a signi cantly poorer OS than those with low bin-score (log rank P<0.001. Fig. 7C). Similarly, the patients with high bin-score were more likely to suffer recurrence than those with low bin-score (P=0.002. Fig.  S4C). The multivariable Cox regression analysis con rmed that the bin-score was a prognosis biomarker  Table S7). The AUCs of bin-score in predicting 1-year and 3-year survival were 0.8202 and 0.7457, respectively (Fig. 7D), which was superior to the genome-wide cfDNA CNV indicators, high frequency cfDNA CNVs at chromosomal-arm level, BCLC stage and MVI. Furthermore, the bin-score could also predict the prognosis in the patients whose TFx value was zero (Fig. S4D). These ndings suggest that the prediction ability of cfDNA CNVs for HCC prognosis may be improved by screening the most relevant CNV regions.

Discussion
Here, we comprehensively presented the genomic characterization of plasma cfDNA from a large cohort of HCC patients receiving radical treatments at genome-wide, chromosomal-arm and bin levels. The remarkably similar CNV pro les were observed between cfDNA and tumor tissue DNA of HCC patients, indicating the creditable implication of liquid biopsy for tumor detection. Using NGS-based approach, we demonstrated that indicators based on cfDNA CNVs at different levels could be used as independent non-invasive prognostic biomarkers for HCC patients with radical treatments.
The consistency of genomic copy number between the cfDNA and tumor tissue DNA has been demonstrated in previous studies (10,16). In this study, the consistency of CNV pro les between the tissue DNA of our cohort and TCGA indicated the reliability of our sequencing data. Moreover, our results exhibited remarkably similar CNV pro les between cfDNA and tissue DNA at both bin level and chromosomal-arm level, suggesting that the cfDNA was an alternative sample source of tissue DNA for sequencing. It should be noted that the frequency of CNVs in cfDNA was signi cantly lower than those in tissue DNA samples. A possible explanation is that the detectable CNVs were diluted by predominantly normal cfDNA as background from white blood cell (WBC) (24). Further comparison of CNV pro les between cfDNA and WBC DNA will broaden our understanding on the in uence of normal cfDNA on the detection of CNVs (25).
To analyze the association between cfDNA CNVs and HCC prognosis, genome-wide CNVs was taken into consideration rstly. As reported in previous studies, TFx could be used as a prognostic biomarker for HCC patients who received surgical resection and adjuvant therapies (25). Based on an extended sample size, similar result was also observed in the present study. In our previous study, a random forest model has been constructed for the detection of early-stage HCC patients. The current study further estimated the prognosis prediction capacity of the P-score calculated based on the RF model. As expected, the high P-score was found to be signi cantly associated with poor outcomes in HCC patients with radical treatments. The S-score, newly developed to estimate the genome stability, was also shown as an independent prognostic biomarker for HCC patients. Furthermore, no signi cant difference was observed among three genome-wide CNV indicators in prognosis prediction, possibly due to the high correlations among them, although they were calculated using different algorithms. In comparison, cfDNA concentration could distinguish patients with different prognosis only when the cut-off value was determined by Youden index. The cfDNA concentration may be in uenced by experimental operation errors in cfDNA extraction and quanti cation or by the patients' status at the time of blood extraction (26,27). And variations in cfDNA concentration have been commonly found in different laboratories (17,28). In general, all the genome-wide CNVs indicators were shown to be independent prognostic factors, exhibiting more superior e cacy to the cfDNA concentration and current evaluation variables.
Next, the cfDNA CNVs at chromosomal-arm level were investigated. A series of previous studies have reported that CNVs at chromosomal arm level are commonly observed in HCC tissues. Copy number gains at 1q, 6p, 8q and 20q and copy number losses at 16q, 17p, 19p, 19q, 4q, 1p, 8p, 13q and 22q have commonly been identi ed in HCC tissue samples (29)(30)(31)(32). Our results highlighted the top 4 frequent gain arms (20p, 8q, 1q, 20q) and the top 4 frequent loss arms (17p, 4q, 19p, 16q) in cfDNA of HCC patients. Several regions, such as gain of 1q, 8q24, 20q13.12-13.33 and loss of 17p, have been reported to be related with the prognosis of HCC patients (31,(33)(34)(35). In the present study, we focused on CNVs at chromosomal arm level. These arms characterized by high-frequency CNVs were observed to be associated with prognosis in HCC patients receiving radical treatments, among which gain of 20p and 20q could only be served as independent prognostic factors for predicting recurrence. In addition, the 20p gain was easier to be detected in cfDNA than in primary tissue DNA, which further highlighted the complementary role of cfDNA in the detection of HCC-related CNVs. The cfDNA might also provide recurrence prediction information, which could not be obtained from primary tumor tissue DNA. The biological signi cance of these CNVs at chromosomal arm level detected in cfDNA should be investigated in further studies.
Finally, the prognostic value of cfDNA CNVs was evaluated at bin level. Three bins were selected to construct a bin-level risk score by using the LASSO and multivariate Cox method. Our result indicated that the bin-score was an independent prognostic factor and superior to other prognostic factors, including clinical factors, genome-wide CNV indicators and chromosomal-arm CNVs, which provided a new way for developing the cfDNA CNVs as new prognostic biomarkers. The bin-score could also distinguish the patients with undetectable TFx into high and low risk of death and recurrence, indicating the wide application of the binscore in prognosis prediction of HCC patients. These ndings suggest that screening the most relevant prognostic regions will improve the predicting capacity of cfDNA CNVs. Moreover, the detection of focused regions would further reduce the cost of the assay, which makes it more suitable for future applications in clinical settings.

Conclusion
Taken together, a framework of multiple-level cfDNA CNV analysis is illustrated in this study. Our results provide a comprehensive genomic pro le of cfDNA CNVs from HCC patients. We demonstrate that cfDNA CNV indicators at different levels provide important prognosis information for HCC patients with radical treatments beyond clinicopathologic factors and cfDNA concentration. This approach is helpful for broadening the applicable strategy to reveal clinically useful biomarkers based on cfDNA CNVs analysis. Our study is also to some extent limited by a relatively small sample size. An independent data set is needed to validate the performance of cfDNA CNVs at three levels in predicting prognosis of HCC patients.

Availability of data and materials
The data that support the ndings of this study are available from the corresponding author upon reasonable request.

Ethics approval and consent to participate
Informed consent was obtained in accordance with the Declaration of Helsinki. This study was approved by the Ethical Committees of FMMU.

Consent for publication
Not applicable.

Competing interests
The authors have declared no competing interests. Authors' contributions YW, XW and KZ carried out the sample collection, performed the data analysis, and drafted the manuscript. YL, ZB and LS participated in the bioinformatics analyses. DG and KL performed the laboratory experiments. XG and XG participated in the design of the study and performed the draft revision. KT and JX conceived of the study, and participated in its design and coordination and helped to revise the manuscript. All authors read and approved the nal manuscript.   Abbreviations: HR, Hazard Ratio; CI, con dence interval. Figure 1 Flow diagram of the study design. Figure 1 Flow diagram of the study design.