Correlating Genomic Copy Number Alterations with Clinicopathologic Findings in A Case Series of Hepatocellular Carcinoma

Background: Oligonucleotide array comparative genomic hybridization (aCGH) analysis has been used for detecting somatic copy number alterations (CNAs) in various types of tumors. This study aimed to assess the clinical utility of aCGH for a case series of hepatocellular carcinoma (HCC) and to evaluate the correlation between CNAs and clinicopathologic ndings. Methods: Survival outcomes from this case series were analyzed based on Barcelona-Clinic Liver Cancer Stage (BCLC), Edmondson-Steiner grade (E-S), and recurrence status. aCGH was performed on 75 HCC cases with paired DNA samples from tumor and adjacent nontumor tissues. Correlation of CNAs with clinicopathologic ndings was analyzed by Wilcoxon rank test and clustering vs. K means. Results: The survival outcomes indicated that BCLC stages and recurrence status could be predictors and E-S grades could be a modier for HCC. The most common CNAs involved gains of 1q and 8q and a loss of 16q (50%), losses of 4q and 17p and a gain of 5p (40%), and losses of 8p and 13q (30%). Correlation and clustering analyses noted that losses of 4q13.2q35.2 and 10q22.3q26.13 seen in cases of stage A, grade III and nonrecurrence were likely associated with good survival, while loss of 1p36.31p22.1 and gains of 2q11.2q21.2 and 20p13p11.1 seen in cases of stage C, grade III and recurrence were possibly associated with worst prognosis. Conclusions: These results indicated that aCGH analysis could be used to detect recurrent CNAs and involved key genes and pathways in patients with HCC. Further analysis on a large case series to validate the association of CNAs with clinicopathologic ndings of HCC could provide information to interpret CNAs and predict prognosis.

In the present study, we performed aCGH analysis on 75 pairs of tumor and adjacent nontumor tissues from HCC patients. Genomic pro les of CNAs for all cases and cases with different clinicopathologic ndings were compared. Associations of CNAs with tumor stages, grades and recurrence were evaluated.
These results further con rmed the diagnostic value of aCGH and provided preliminary data for a largescale analysis of genomic and clinicopathologic correlation for HCC.

Patients and sample collection
We selected 79 HCC patients with pathologic diagnosis, surgical treatment, and clinic follow-up results in the A liated Tumor Hospital of Guangxi Medical University from 2014 to 2016. Among them, 72 were male and seven were female; 67 were less than 60 years old and 12 were more than 60 years old, 35 were grade and 44 were grade by Edmondson-Steiner (E-S) grading; 32 were stage A, 12 were stage B, and 35 were stage C by Barcelona-Clinic Liver Cancer Stage (BCLC) classi cation; 29 with recurrence and 50 without recurrence. As of June 30, 2018, 31 of the 79 patients were alive, 38 died and 10 lost to follow-up.
Formalin-xed para n embedded (FFPE) tumor tissue blocks from these patients were collected from the tumor biobank of Guangxi Medical University. This research project was approved by the Medical Ethics Committee of Guangxi Medical University and research applications of residual specimens followed university policies and laboratory standards [19]. aCGH analysis For each case, FFPE blocks of tumor tissue and adjacent nontumor tissue were dissected as a pair of test and control. DNA was extracted from dissected tissues using Qiagen tissue kit following the manufacturer's instructions (Qiagen, Chatsworth, CA). Oligonucleotide aCGH assay was performed as previously described [20]. Brie y, differentially labeled test or control DNA and gender-matched normal reference DNA were co-hybridized to a SurePrint G3 Human CGH 8 × 60K Microarray slide (Agilent Technologies). Post-hybridization image capture, signal feature extraction and copy number analysis were performed using Agilent's Cytogenomics 2.5 [21,22]. Benign copy number variants recognized in the Database of Genomic Variants (http://dgv.tcag.ca/dgv/app/home) were excluded. CNAs detected from the tumor tissue and undetected in the paired adjacent nontumor tissue were considered pathogenic. The nucleotide positions for CNAs were designated according to the NCBI36/hg18 assembly in the Human Genome browser (http://genome.ucsc.edu/). The smallest overlap regions (SORs) were de ned as the shared overlapped regions of CNAs. A relative frequency for each SOR was calculated by the number of cases with overlapped CNAs divided by the number of all cases. Genomic pro les for SORs of CNAs in all cases and in cases by different clinicopathologic classi cations were generated. Detailed description of SOR are provided in supplementary method. The percentage of CNAs in each case was calculated by the total size of CNAs divided by the size of human genome. We compared the survive between different clinicopathologic classi cations of E-S grading, BCLC stages, and recurrence status using Kaplan-Meier method with p-value calculated from log-rank test [23]. Considering the interaction among these three clinical variables, we strati ed the survival analysis on recurrence status and E-S grade at BCLC stages A and C. We didn't perform strati ed survive analysis on cases in BCLC stage B because of the limited sample size of only 12 cases. The correlation between the clinical variables were tested by Fisher's exact test.
We compared the genomic pro les of CNAs by relative frequencies of SORs between cases of different E-S grades, BCLC stages and recurrence status. P-value was calculated from Wilcoxon rank test when comparing the changes of CNAs in selected regions between case classi cations. We also used receiver operating characteristic (ROC) curve to evaluate the association between the percentage of genomic CNAs for the three clinicopathologic classi cations. In this analysis, the percentage of the CNAs in each case was used to predict the three classi cations of the patients. Furthermore, K-means was used to cluster the cases according to SORs of CNAs for all cases and the cases in different BCLC stages. The size of SORs was used as the weight in clustering considering that CNAs with larger SOR may have more in uence on the tumors. There were both male and female patients in our analysis. We did not include chromosome X and Y in clustering to avoid the bias of clustering cases with same gender together.
Statistical analyses were performed using statistical computing software R 3.6.1. [24].

Results
Survival outcomes from different clinicopathologic classi cations.
The clinicopathologic classi cations of the 79 HCC cases were summarized in Table 1. Survival outcomes from these cases were compared based on their classi cations of BCLC stages, recurrence status, and E-S grades. The Kaplan-Meier analysis showed that cases in BCLC stage A had longer survive than cases in BCLC stage B and C (P < 0.001) (Fig. 1A). Cases without recurrence had better prognosis than cases with recurrence (P < 0.001) (Fig. 1B). Cases in E-S grade III seemed to have longer survival than cases in grade II (P = 0.022) (Fig. 1C). When we took the survival analysis on different combinations of stages and recurrence, cases without recurrence in stage A had the best prognosis, while cases with recurrence in stage C had the worst prognosis (Fig. 1D). Considering the interactions between the three clinical classi cations, we further strati ed all cases into different BCLC stages A and C and evaluated the in uence on survival by recurrence status and E-S grades. Cases without recurrence had better prognosis than those with recurrence in both stages A and C (P < 0.001) (Supplemental Fig. 1A/B). Cases in E-S grade III had better prognosis than grade II only in stage A (P = 0.018) but not in stage C (Supplemental Fig. 1C/D). It was also noted that cases in E-S grade III was signi cantly associated with non-recurrence in BCLC stage A (P < 0.001) but had no signi cant association between recurrence and non-recurrence in BCLC stage C (P = 1.00) ( Table 1). These results indicated that the BCLC stages and recurrence status could be an independent predictor for prognosis and E-S grades might be a modifying predictor affected by stage and recurrence. However, the overall size sample size from this case series was limited and thus this preliminary observation would need further validation from a large case series.  Fig. 2A. Of the CNAs from these 75 cases, more than 50% cases had gains of 1q and 8q and a loss of 16q, more than 40% cases had losses of 4q and 17p, and a gain of 5p, and more than 30% cases had losses of 8p and 13q.
The genomic pro les of CNAs for cases in E-S grades II and III were shown in Fig. 2B Fig. 4). These results indicated that there were more CNAs in E-S grade III than grade II (P = 0.003, one tail Wilcoxon rank test) but the percentage of genomic CNAs had no signi cant association with E-S grades, BCLC stages and recurrence.
During the analysis of survival outcome and clinicopathologic classi cations, we found the impact of E-S grades and recurrence on BCLC stages (Supplemental Fig. 1). We further strati ed cases into BLCL stage A and C to evaluate any difference in CNAs. The signi cant difference between E-S grades II and III in BCLC stage A was more losses of 4q13.2q35.2 (P = 0.003 in BCLC A; P = 0.80 in BLCL C) and 10q22.3q26.13 (P = 0.033 in BCLC A; P = 0.55 in BCLC C) in grade III; the signi cant difference between E-S grades II and III in BCLC stage C was more gains of 2q11.2q21.2 (chr2:98,228,328 − 134,727,485, 36.5 Mb) (P = 1 in BCLC A, P = 0.008 in BCLC C) and 20p13p11.1 (P = 0.59 in BCLC A; P = 0.005 in BCLC C) in grade III (Supplemental Fig. 5).

Cluster analysis of CNAs for clinical classi cations
We used k-means method to cluster patterns of CNAs into three clusters for all cases and cases in BCLC stages A and C. The CNAs from all 75 cases were divided into three clusters as shown in Fig. 3. The 28 cases in cluster 1 had more CNA losses, the 39 cases in cluster 2 had less CNAs, and the eight cases in cluster 3 had more CNA gains. There were more CNAs in clusters 1 and 3 than cluster 2. The E-S grades, BCLC stages and recurrence status were mixed in clusters 1 and 2, while the eight cases in cluster 3 were all E-S grade III and further divided into BCLC stage A without recurrence and BCLC stage C with recurrence. Further clustering analysis was performed on cases in BCLC stage A and C. For the 32 cases in BCLC stage A, 14 cases were in cluster 1 with more CNA losses and 12 of them were classi ed as E-S grade III and nonrecurrence, 15 cases were in cluster 2 with less CNAs and mixed in E-S grades and recurrence statues, and three cases in cluster 3 with more CNA gains were all in E-S grade III and nonrecurrence (Supplemental Fig. 6). For the 31 cases in BCLC stage C, 12 cases were in cluster 1 with more CNA losses, 14 cases were in cluster 2 with less CNAs, and ve cases were in cluster 3 with more CNA gains. It was noted that cases in clusters 1 and 2 were mixed in E-S grades and recurrence status, while cases in cluster 3 was associated with E-S grade III and recurrence (Supplemental Fig. 7).
Survival differences between the three clusters in all three data sets were shown in Supplemental Fig. 8. For all 75 cases, cases in cluster 3 had worse survival than cases in cluster 1 and 2 (P = 0.021, Supplemental Fig. 8A). For cases in BCLC stage A, cases in cluster 1 showed better survival than cases in clusters 2 and 3 (P = 0.047); for cases in BCLC stage C, cases in clusters 3 showed the worst prognosis than cases in clusters 1 and 2 (Supplemental Fig. 8B/C). Further analysis focused on a comparison between cases with the best prognosis by BCLC stage A and nonrecurrence and cases with the worse prognosis by BCLC stage C and recurrence (Fig. 1D). Of the 21 cases classi ed as BCLC stage A, E-S grade III and nonrecurrence in Table 1, there were 12, six and three cases in clusters 1, 2 and 3, respectively. The losses of 4q13.2q35.2 and 10q22.3q26.13 associated with E-S grade III were seen in cases of cluster 1, which suggested these CNAs might play a protective or modify role for survive outcome. Other gains of chromosomes 3, 6, 11p, 12q and 20 in cluster 3 probably related to poor prognosis (Supplemental Fig. 6). Of the seven cases classi ed as BCLC stage C, E-S grade III and recurrence in Table 1, there were one case each in clusters 1 and 2 and ve cases in cluster 3. The loss of 1p36.31p22.1 and gains of 2q11.2q21.2 and 20p13p11.1 associated with E-S grade III and additional gains of chromosomes 6, 7 and 20q were seen in cluster 3 and thus likely correlated with worst survival (Supplemental Fig. 7). Combing clustering and survival results indicated that speci c CNAs could provide further association with clinicopathologic classi cations and survival prediction.

Discussion
We performed rst survival analysis based on BCLC stages, E-S grades and recurrent status and then tried to correlate genomic pro le of CNAs with clinicopathologic classi cations. The BCLC staging classi es HCC based on liver functional status, physical status and cancer-related symptoms and linked the four stages with treatment algorithm [25,26]. E-S grading is determined by tumor histologic and cytologic ndings [27,28]. The general genomic pro le of CNAs from this case series showed a consistent pattern or shared similar CNAs from previous studies [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. The most commonly seen CNAs in more than 30% of our case series, including gains of 1q, 5p and 8q and losses of 4q, 8p, 13q, 16q and 17p as shown in Fig. 2A, were all recurrent CNAs for HCC. Bioinformatic and gene expression analyses had been used to de ne candidate genes from these recurrent CNAs. Candidate genes as well as key genes and pathways from nine studies were listed in Supplemental Table 2. This list of genes showed large variations from different studies. Further gene functional analysis would be needed to validate the causal or modifying roles of these genes for HCC. Analysis of gene expression levels in CNAs noted that CNA gain or ampli cation of the MDM4 gene at 1q32 associated with poor metastasis-free survival [5,16]. Mechanistically, gains of 1q resulted from hypomethylation of 1q heterochromatin and jumping translocations with two or more recipient chromosomes were seen in myelodysplastic syndrome acute myeloid leukemia [29]. Gains of 8q involving the MYC gene overexpression was observed in viral and alcohol-related HCCs [5]. Signi cantly upregulated genes ATAD2, SQLE, PVT1, ASAP1 and NDRG1 associated with gains of 8q24.13q24.3 was an unfavorable prognostic marker for HCCs [15]. Loss of 8p involving the DLC1, CCDC25, ELP3, PROSC, SH2D4A genes associated with poor outcomes for HCC; in vitro and in vivo analysis indicated that the PROCS, SHAD4A and SORBS3 genes have tumor-suppressive activities, along with the known tumor suppressor gene DLC1 [9]. Integrated analysis of somatic mutations and CNAs in HCCs identi ed 56 key genes and ve pathways for HCC [8]. At least 38% (21/56) of these key genes could be mapped to the recurrent CNAs and affected four core pathways of p53/cell cycle control, chromatin remodeling, PI3K/Ras signaling, and oxidative and endoplasmic reticulum stress (Supplemental Table 2). Multiple genes from recurrent CNAs affected the p53/cell cycle control, including the CDKN2C gene at 1p32.3, the CDK11A/B genes at 1p36.33, the IRF2 gene at 4q35, the RB1 gene at 13q14.2 and the TP53 gene at 17p13.1. These recurrent CNAs and affected genes and pathways were considered an integral part of the comprehensive genetic landscape and genomic characterization of HCC [30,31].
Correlating genomic CNAs with the HCC stages and grades could be helpful to interpret test results and guide clinical treatment. A dendrogram of the cluster analysis showed rst the gain of 1q then gains of 8q and 5p followed by other CNAs; the gains of 1q and 8q were signi cantly associated with E-S grades II-IV [18]. Another study compared the recurrent CNAs between E-S grades I/II and III/IV and noted that gain of 8q was statistically more frequently seen in grade III/IV [14]. In general, cases in E-S grade III seemed to have more CNAs than those in grade II. More speci cally, signi cant association of losses of 1p36.31p22.1, 4q13.2q35.2 and 10q22.3q26.13 and gains of 2q11.2q21.2 and 20p13p11.1 with E-S grade III were noted. After we strati ed the cases according to BCLC stages, two regions signi cant in BCLC stage A were losses of 4q13.2q35.2 and 10q22.3q26.13 (Supplemental Fig. 3). The 4q13.2q35.2 included the IRF2 gene for p53/cell cycle control and SMARCAD1 gene for chromatin remodeling, while the 10q22.3q26.13 included the PTEN gene for PI3K/Ras signaling (Supplemental Table 2). These genes might be important for cancer cell differentiation and progression. Counterintuitively, our cases showed a better survival and less recurrence for E-S grade III than grade II at BCLC stage A. One hypothesis to explain this observation was that in BCLC stage A usually involved one tumor and the tumor in E-S grade III was cytologically more distinct from tumor in grade II, which might make the tumor in grade III more likely to be removed completely during the surgery than in tumors in grade II. This could also explain signi cantly less recurrence in E-S grade III than in grade II in BCLC stage A (P < 0.001). In BCLC stage C, the cancer cells have been spread to blood vessels, lymph nodes or other body organs. The recurrence rate after the surgery is similar between E-S grades II and III in this stage (P = 0.72). The loss of 1p36.31p22.1 and gains of 2q11.2q21.2 and 20p13p11.1 and additional gains of chromosomes 6, 7 and 20q were associated BCLC stage C, E-S grade III and recurrence for the worst survival outcome. The ARID1A, CDKN2C, CDK11A and CDK11B genes at 1p could affect the p53/cell cycle control and chromatin remodeling pathways (Supplemental Table 2). A study focused on gene expression from gains of 20q identify candidate genes contributing to unfavorable outcomes for HCC. Overexpression of the DDX27, B4GALT5, RNF114, ZFP64 and PFDN4 associated signi cantly with vascular invasion, and high RNF114 expression also associated with advanced tumor stage [12]. We didn't nd signi cant association between CNAs and BCLC stage or recurrence status. The association of CNAs with cytologic and histologic ndings by E-S grades may re ect a link between molecular and cellular levels. Additionally, clustering analysis noted more CNAs in cases with poor survival outcomes but ROC analysis did not support the association of percentage of CNAs with clinicopathologic classi cations.
This study provided preliminary results correlating genomic CNAs to clinicopathologic ndings. However, two major limitations should be mentioned from this study. Firstly, the limited number of cases possibly introduced bias in case strati cation. The results from this study need to be further validated from a large cohort of HCC cases. The second limitation was the technical challenge in tracking the clonal evolution and dissecting tumor heterogeneity [32,33]. All tumor specimens were collected at the surgical procedures and thus made it di cult to look into initial event and accumulated aberrations from different tumor stages. HCC is a highly heterogeneous disease at the clinicopathologic level in association with comprehensive genomic heterogeneity from accumulated CNAs, somatic mutations and epigenetic alterations. A biopsy-based integrative diagnostic approach including morphology, immunohistochemistry, transcriptomic data, mutational pro les, CNA and methylome analysis have been proposed for future analysis of HCC [34].

Conclusion
Clinical application of aCGH could detect genomic pro les of recurrent CNAs and affected key genes and pathways for HCC. Speci c CNAs could be associated with the cytologic and histologic grading and likely related to the prognosis of HCC. Further studies of genomic CNAs on a large case series to validate the association between CNAs and clinicopathologic ndings and functional analysis on the genes in recurrent CNAs could lead to better understanding of underlying causing mechanisms and treatment strategy. Comprehensive genetic and genomic analysis could be an important part in an integrative diagnostic approach for HCC. research project aiming to de ne the genetic defects in the tumorigenesis and progression of HCC.

Competing interests
The authors declare that they have no competing interest.

Authors' contributions
Study concepts and experiment design: QH and PL; Sample collection and processing: QH and YL; Clinicopathologic ndings: SW; aCGH analysis: HC, WJ and QH; Statistic analysis and data interpretation: GP, HC, QH and PL; Manuscript writing: GP, PL and QH. All authors read and approved the manuscript.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.