The Potential and Analysis of ctDNA Sequencing in Hepatocellular Carcinoma

Background: The Genome map of hepatocellular carcinoma (HCC) is complex. We used the next generation of targeted sequencing technology to decipher the mutations in patients with liver cancer Methods: The circulating tumor cell DNA (ctDNA) of 10 patients with hepatocellular carcinoma (Including 8 cases of primary hepatocellular carcinoma and 2 cases of metastatic hepatocellular carcinoma) were sequenced. We used SAMtools to detect and screen single nucleotide polymorphism (SNP) and insertion deletion (INDEL) mutations, and ANNOVAR to annotate the structure and function of the detected mutations. Results: Targeted capture and deep sequencing of 560 cancer-related genes in 10 liver cancer ctDNA samples showed 8950 single nucleotide variations (SNVS) mutations and 70 INDELS. The most common mutation gene was PDE4DIP, followed by SYNE1, KMT2C, PKHD1 and FN1. According to the American College of Medical Genetics and Genomics (ACMG) guidelines, we authenticated 54 pathogenic and possible pathogenic mutations in 39 genes in exons and splice regions of 10 patients with HCC. Conclusion: Our research provides the gene mutation map of Chinese hepatocarcinoma patients, and enriches the understanding of the pathogenesis of HCC. According to the guidelines of the American Academy of medical genetics and genomics (ACMG), based on the preliminary analysis of NGS data, through the analysis of Sorting Tolerant From Intolerant(SIFT), PolyPhen, Mutation Taster and Combination Annotation Dependent Deletion (CADD), zygosity, variation type, variation effect, location and lter coverage rate, the results showed that there was no signicant difference between them, Small allele frequency and other conditions were screened to further classify and analyze the genes. We found 39 possible mutations related to the development of liver cancer, including RET, TP53 (40% (4/10), LAMA2 KAT6B, DNMT1, FGFR3, PRKDC, FOXP1, PDGFRB and SYNE1 (20% (2/10). We also found that some mutation sites do not exist in dbSNP database or cosmic database. 90 signicant nonsynonymous mutation were identied, including 72 missense mutations, 2 deletion mutations, 1 code shifting deletion mutation, 8 non shift code deletion mutations, 7 non shift code insertion mutations, which were located in 87 genes. After we summarized these loci, we extracted representative genes, which were AXIN1, CHD7, FN1, LAMA2, NIN, PRKDC, CEPPA, PTCH1 (2/10), PDE4DIP (4/10), SYNE1 (3/10)). in 481 mutation genes, and found 54 pathogenic and possible pathogenic mutations in 39 genes, which may be the cause of liver cancer in these individuals. Go and KEGG analysis were performed to further understand the role of pathogenic mutant genes. GO annotation showed that BP included peptide tyrosine phosphorylation, positive regulation of RNA polymerase II promoter transcription, positive regulation of GTPase activity, protein autophosphorylation, phosphatidylinositol phosphorylation and so on, which may provide a basis for further understanding of the occurrence and development of liver cancer. In addition, the enriched KEGG pathway was found to be involved in hepatitis (hsa05161) and tumor (hsa05200, hsa05230, hsa05206, hsa05205) related pathways, and BCR, RB1, PDGFRB, RET, CASP8, APC, LAMA2, CDH1, ERBB2, CBL, FGFR3, TP53 played a key role in carcinogenesis and tumor progression.


Introduction
The morbidity of HCC is the sixth place of the world's cancer incidence, and the mortality rate is the fourth place of the global cancer deaths, most of which occur in the 50-60 years old population [1]. According to the global cancer statistics report in 2018, there are 841000 new liver cancer patients and 781000 deaths, accounting for 4.7% and 8.2% of the total cancer Morbidity and Mortality: in the world respectively [2]. The pathological types of liver cancer include hepatocyte type, bile duct type and mixed cell type, among which hepatocyte type accounts for about 70% [3]. HCC is one of the most frequent fatal malignant tumors in the world. It is the fth most common cancer in the world. The standardized mortality rate ranks third among all kinds of malignant tumors, next to gastric cancer and esophageal cancer [4]. In Japan and parts of China, the main risk factors of hepatic carcinoma are chronic hepatitis C virus (HCV) or hepatitis B virus (HBV) [5]. The incidence of hepatocellular carcinoma is increasing, with a particularly high incidence in sub-Saharan Africa and the south-east. [6]. Although the prognosis and treatment of hepatocellular carcinoma have been developing over the years, the morbidity of HCC is still rising. In the past 20 years, progress has been made in elucidating the mechanism of cancer, early diagnosis of disease and improving local and systemic treatment of HCC [7].
The diagnosis of HCC can be realized by using ultrasound and tumor markers to monitor high-risk groups. Until now,alpha fetoprotein (AFP) has obvious positive reaction, but studies have found that AFP sensitivity is low, AFP does not rise in many patients with liver cancer, and even AFP level can be normal in patients with advanced liver cancer [8]. At present, the European Liver Association and the American Association for the Study of Liver Diseases no longer recommend the determination of AFP levels for the diagnosis of liver cancer, and there are some doubts about the diagnostic sensitivity of AFP [9]. The presence or absence of cirrhosis with tumor features, including vascular invasion and portal vein thrombosis, tumor size and alpha-fetoprotein, are important prognostic indicators of HCC, which affects treatment decisions and outcomes [10].
The occurrence of HCC is a progressive, dynamic and multi gene regulated pathological process. Therefore, understanding the molecular mechanism of its pathogenesis and screening according to the relevant factors of high-risk groups can prevent and treat HCC. It is well known that environmental risk factors that may lead to potential cirrhosis include hepatitis B virus, hepatitis C virus, exposure to toxins (such as a atoxin) and alcohol consumption. Speci c gene mutations have been isolated for each cause of HCC [11]. In recent years, in order to improve the prognosis of liver cancer, many researchers are committed to exploring tumor biomarkers. For example, Wei Lu thought that the expression level of TCF21 in HCC was signi cantly decreased, and it was negatively correlated with the invasive progression of the disease. TCF21 may be a biomarker to predict the prognosis of HCC [12]. Jianguo Qiu et al found that the expression level of LncRNA LOC285194 in tumor tissues was signi cantly lower than that in adjacent normal tissues, and it was often down regulated in HCC. LncRNA LOC285194 expression is closely related to the occurrence, development, invasion and metastasis of tumor, and has anti-tumor effect, which can be used as a potential target for the development of new therapies for liver cancer [13]. Although many biomarkers have been proposed, the prognosis evaluation of HCC is still a great challenge in clinic. In addition, it is necessary to nd new and speci c biomarkers for this malignant tumor. NGS (Next Generation Sequencing) technology is a new gene screening, prognosis and diagnosis technology, and also an effective and acceptable clinical gene detection method [14]. Targeted sequencing is based on high-throughput sequencing, which can simultaneously detect multiple gene mutation types (Including point mutation, insertion / deletion, copy number change and so on.) of multiple cancer species. It is suitable for the detection of any tissue samples and liquid samples. The sequencing included the whole exon region of 560 genes and the hot spot region of TERT gene promoter mutation. All the genes were derived from thousands of classic literatures, the cancer genome census, the authoritative commercial cancer panels, and most of the driving genes in the databases closely related to clinical medication guidance High frequency mutation genes and susceptibility genes are from three classical reviews [15][16][17], and are supplemented and sorted out by literature collection and reading. This method is based on the Agilent SureSelect targeted sequence capture system, combined with repeatedly optimized probe design and powerful capture e ciency. The target gene has high coverage and strong speci city, which can realize the accurate detection of gene mutation and accurately screen cancer-related mutations.

Materials And Methods
2.1 Patients and DNA extraction total of 10 subjects were included in this study. Nine patients with primary hepatocellular carcinoma, aged 33-68, were collected from January 2019 to December 2019 in our hospital. The diagnostic criteria of primary liver cancer (meeting any of the following three criteria): 1, It has two typical imaging manifestations of liver cancer (ultrasound, CT, MRI or selective hepatic arteriography), with the lesion > 2cm. 2, A typical imaging manifestation, lesions > 2cm, AFP > 400ng / ml. 3, Liver biopsy was positive. Among the patients with primary liver cancer, 1 case was diagnosed by pathological biopsy, and the other 8 cases were diagnosed by clinical diagnosis. Two patients with liver metastases (one with gallbladder cancer and liver metastasis, one with rectal cancer and liver metastasis) were collected. The diagnostic criteria were implemented according to the guidelines for diagnosis and treatment of gallbladder cancer (2015 Edition) and the Chinese code for clinical diagnosis and treatment of colorectal cancer. Patients with primary liver cancer complicated with other tumors such as gastric cancer, lung cancer, cervical cancer, ovarian cancer and prostate cancer, human immunode ciency virus infection or autoimmune liver disease, alcoholic liver disease, nonalcoholic fatty liver disease, history of other chronic liver diseases, and samples with hemolysis during the test were excluded. According to the inclusion and exclusion criteria, 9 cases of primary HCC group and 2 cases of liver metastasis group were collected. However, 1 case of primary liver cancer group whose ctDNA was not quali ed after blood extraction was excluded, and the rest were quali ed. There were 8 cases of primary HCC group and 2 cases of liver metastasis group. 5ml venous blood was collected by a special nurse using EDTA anticoagulant tube. After balancing, it was put into a centrifuge (within half an hour of blood drawing), and centrifuged at room temperature for 5 minutes. After centrifugation, plasma was collected and stored in 2 ml EP tube in ultra-low refrigerator at -80 ℃ until ctDNA was extracted. Use GeneRead DNA FFPE Kit Qiagen kit to extract ctDNA from plasma (strictly follow the instructions of the kit).Agarose gel electrophoresis was used to analyze the extent of DNA degradation and whether there was contamination of RNA, or protein, and Qubit was used to quantify the DNA concentration (DNA samples with DNA concentration above 20ng/ul and 0.2ug above 0.2ug were used to build the database).

Target sequencing
Nuohe Zhiyuan uses Agilent's liquid chip capture system to e ciently enrich human speci c target region DNA, and then conducts high-throughput and highdepth sequencing on Illumina Hiseq platform. The Agilent SureSelectXT Custom kit was used in the database building and capture experiments. The reagents and consumables recommended in the manual were strictly used, and the latest optimized experimental process was referred to for operation. Genomic DNA fragments with 180-280 BP in length were randomly interrupted by Covaris fragmentation apparatus. After terminal repair and adding A-tail, the ends of the fragments were connected to prepare DNA library. After the library with speci c index was pooled, it was hybridized with up to 500000 biotin labeled probes in liquid phase, and then the target gene fragment was captured by magnetic beads with streptomycin. After PCR linear ampli cation, the quality of the library was tested, and the quali ed library could be sequenced. After building the library, we used Qubit2.0 for preliminary quanti cation, diluted the library to 1ng / μ L, and then used Agilent 2100 to detect the insert size of the library. After the insertion size reached the expected value, Q-PCR method was used to accurately quantify the effective concentration of the library (the effective concentration of the library >2nm) to ensure the quality of the library. The Illumina-HiSeq platform was sequenced according to the library's effective concentration and data output requirements.

Data analysis
The raw image data les obtained by sequencing were ltered by raw reads for low quality, base uncertainty and other factors to get clean reads. The wrong sequencing data was screened out by the sequencing Phred value of each base. The quality of sequencing data was mainly above Q20. The effective sequencing data were aligned to the reference genome (B37) by BWA (Li h et al.) and Samblaster, and the initial alignment results in Bam format were obtained. The BAM le is marked and repeated by Samblaster to get the nal comparison result of BAM format. ANNOVAR software was used to annotate the mutation sites, including gene structure annotation, genome feature annotation, nonsynonymous mutation hazard prediction, known mutation database annotation and mutation related gene function annotation. RefSeq and Gencode were used to annotate the gene structure of the mutation site, including mRNA, noncoding RNA, small RNA and microRNA; The genomic characteristics of the mutation sites included CG Island, cell karyotype, phastconselements46way conserved region, genome repeat, transcription factor binding site and encode annotation of Gm12878 cell lines; Sift, Polyphen, Mutationassessor, LRT and other methods were used to comprehensively evaluate the impact of nonsynonymous mutations on disease / tumor; dbSNP, thousand human genome SNP database, HapMap database, cosmic known tumor somatic mutation database and esp6500 mutation database were provided to screen any combination of mutation results; Go biological process, Go cell components, Go analysis function, KEGG, Reactome, Biocarta, PID and other functional annotation databases were used to interpret signal transduction and metabolic pathways.

Clinical characteristics of patients with liver cancer
Ten patients with liver cancer were identi ed, including 8 primary liver cancer patients, 1 gallbladder cancer with liver metastasis and one with colorectal cancer complicated with liver metastasis. Table 1 of the patient characteristics summarizes the clinical and pathological data of all patients in this study. The clinical indicators include average age, sex, age range, stage, tumor size, lymph node, distant metastasis, AFP, hepatitis, cirrhosis.

Gene mutation spectrum
We sequenced the whole exon region of 560 genes in 10 patients with liver cancer and the promoter mutation hotspot of the TERT gene. These 560 genes are the hot genes in cancer research and clinical treatment related genes, including 194 clearly functional drivers, 127 cancer signaling pathway genes ( resistant genes were used for annotation, of which 202 genes were labeled as drug target genes by at least one drug database, and 43 genes were labeled drug resistance genes. 15 genes related to drug metabolism were found by drug metabolism pathway enrichment.
The target capture depth sequencing of 560 tumor related genes in 10 ctDNA samples of hepatoma showed that 8950 SNVs mutations were found in 481 genes, while only 20 genes had 70 insertions and deletions. There were 281 gene mutations over 10, 145 gene mutations more than 20. Supplemental Table  2  . We analyzed the SNV single nucleotide changes detected and found that in all patients with liver cancer, the changes of G > A, T > C, A > G, C > T were more common than other changes, as shown in Figure 1a.
In addition, we also analyzed the regions of these SNV. Among these variants (Fig. 1b), exon variants (80.91%) were most common, followed by intron variants (13.31%), splicing variants (2.51%), and UTR variants (1.21%). In addition, we detected the genetic effects of these variations in the exon region, including missense variation, synonymous mutation, stop gain / loss mutation and unknown mutation.
3.3 Analysis of the mutation of high frequency mutation gene in primary liver cancer samples after screening We detected a large number of cancer related gene mutation sites. Comparing with top20 mutation gene in the COSMIC database, we found that 6 gene loci (PREX2, ZFHX3, FHIT, CAMTAL, GPHN and SND1) in the high frequency gene did not nd mutation sites, and the other 14 genes in the samples were as shown in the Table below. According to the mutation, most of the high frequency mutations of HCC have found mutation sites in ctDNA of primary liver cancer patients, but the mutation frequency is not exactly the same as the reference frequency in cosmic database, which may be related to the insu cient amounts of experimental samples, and may be affected by other factors (somatic release) in the blood uid of the body.
TP53 and FOXP1 were the mutation sites after screening, indicating that these mutations may be related to the development of liver cancer. It has certain diagnostic signi cance and provides possible target drug treatment direction, and has further research signi cance.

Analysis of the gene mutation of high frequency mutation gene in circulating tumor DNA of patients with liver metastasis
In patients with liver metastases, 12 high frequency mutation genes were detected in liver cancer tissues with mutation sites, as shown in the following Table. FOXP1 did not nd mutation site, only gene mutation occurred in primary liver cancer, and the speci city was relatively high in the diagnosis of liver cancer.
KMT2C/TP53 was screened in the gene samples of liver metastasis patients, and KMT2C/TP53 met the screening requirements. The mutations of gallbladder cancer and rectal cancer were found in cosmic database, among which KMT2C was the high frequency mutation gene site of gallbladder cancer and TP53 was the high frequency mutation site of rectal cancer, which was in line with the mutation of circulating tumor DNA gene in blood. Therefore, it can be speculated that KMT2C/TP53 may be related to the development of gallbladder cancer / rectal cancer. KMT2C may be used as a reference gene for the diagnosis of gallbladder cancer, and TP53 is a reference gene for the diagnosis of colorectal cancer. According to the guidelines of the American Academy of medical genetics and genomics (ACMG), based on the preliminary analysis of NGS data, through the analysis of Sorting Tolerant From Intolerant(SIFT), PolyPhen, Mutation Taster and Combination Annotation Dependent Deletion (CADD), zygosity, variation type, variation effect, location and lter coverage rate, the results showed that there was no signi cant difference between them, Small allele frequency and other conditions were screened to further classify and analyze the genes. We found 39 possible mutations related to the development of liver cancer, including RET, TP53 (40% (4/10), LAMA2 (30%), KAT6B, DNMT1, FGFR3, PRKDC, FOXP1, PDGFRB and SYNE1 (20% (2/10).
We used Davis to analyze the gene ontology (Go) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The term go is annotated from three aspects: cell component (Go CC), molecular function (Go MF) and biological process (Go BP). Based on KEGG pathway database https://www.kegg.jp/kegg/ The pathway enrich of is applied to making pathway annotations. The term Go and KEGG pathway were considered statistically signi cant at P < 0.05. PPI network forecast is provided by string online software (https://strin g-db.org/).

Go notes on the ltered pathogenic genes and possible pathogenic genes
According to ACMG guidelines, we identi ed 54 pathogenic and possibly pathogenic mutations in 39 genes in exons and splicing regions in 10 patients with liver cancer. (Table 3). The gene ontology annotation and path analysis were carried out on 39 pathogenic genes and possible pathogenic genes by GO annotation of pathogenic genes and possible pathogenic genes (Figure 2 and Supplemental Table 4). BP of these overlapping genes may be related to positive regulation of transcription of RNA polymerase promoter, phosphorylation of peptide tyrosine, positive regulation of transcription, positive regulation of GTPase activity, signal transduction, MAPK cascade, negative regulation of apoptosis process, protein phosphorylation, and signal pathway of transmembrane receptor protein tyrosine activator. The results of go MF annotation indicate that these genes are involved in protein binding, ATP binding, protein tyrosine kinase activity, transcription factor activity, sequence speci c DNA binding, RAS guanosine exchange factor activity, transcription factor binding, enzyme binding, protein kinase activity, receptor binding, protein heterodimerization activity. In addition, CC of these pathogenic and possibly pathogenic genes is mainly related to the cytoplasmic perinuclear region, receptor complex, cell adhesion connection, cytoskeleton, extracellular components of cytoplasmic side, replication fork, otillin complex and catenin complex.

Pathway analysis and PPI protein prediction
Further enrichment analysis based on KEGG database shows that these pathogenic and possibly pathogenic genes are highly enriched in the liver cancer and cancer related pathways, as shown in Table 5. The approaches include: cancer related pathways, such as central carbon metabolism and protein polysaccharides, small RNA in cancer, and various cancers, including thyroid, prostate and pancreatic. The PPI network of 39 pathogenic and possibly pathogenic genes was constructed by using string database. The string map shows that these genes contain 44 nodes and 143 edges. Please refer to https://www.string-db.org/cgi/network?taskId=bydYaAww2Fdf&sessionId=b8FnnMXJsCm6 for more information.

Discussion
In this experiment, a large number of liver cancer related mutation genes were detected by ctDNA extraction, and the level of ctDNA was signi cantly increased, which was similar to the contrast gene, which was in line with the experimental results. Most of the high frequency mutation genes in HCC group can be found mutation sites, which indicates that the mutation genes in blood ctDNA are basically the same as those in liver cancer tissues. It is of diagnostic value to verify that blood ctDNA does carry the information of the mutation gene in the liver cancer tissue.
In this study, 560 tumor related genes from 10 hepatoma samples were sequenced in depth. We identi ed 8950 SNV and 70 INDEL in 481 genes, among which PDE4DIP, SYNE1, KMT2C, PKHD1, FN1, LRP1B, ALK, FANCA, NOTCH1, ABCC4, RNF213, HNF1A, LAMA2, GF2R, NIN, PIK3C2B, APC and DOCK8 had more than 70 mutations. 39 pathogenic genes and possible pathogenic genes were identi ed by ACMG. The results showed that the pathogenic genes and the possible pathogenic genes were analyzed by go enrichment, path analysis and PPI network analysis. Our results may provide some molecular data for the mapping of genetic variation of Chinese hepatoma patients.
The mutation analysis was used to draw genetic variation of liver cancer, and it was found that PDE4DIP mutation was the highest in almost all patients with liver cancer, followed by SYNE1, KMT2C, PKHD1 and FN1. PDE4DIP is a protein encoding gene. The protein encoded by this gene anchors phosphodiesterase 4D to the dictyosome/centrosome region of the cell, participate in microtubule dynamics, promote microtubule assembly, and play a role in the level of dictyosome or centrosome [18,19]. The gene defect may be one of the reasons for the correlation between myelodysplasia (MBD) and eosinophil proliferation, and also one of the risk factors of adult pineal blastoma [20,21]. The results of database showed that PDE4DIP protein may interact with pathway. My study suggests that PDE4DIP may be involved in the pathogenesis of liver cancer. In addition, SYNE1 is a coding gene, which encodes an anchor protein, expressed in skeletal muscle, smooth muscle and peripheral blood lymphocytes, and is located in the nuclear membrane, and it participates in the relationship between nuclear layer and cytoskeleton [22]. It has been reported that the gene has higher transcription expression in the beginning, progress and stage of HCC. The encoded protein plays a potential role in the development of HCC and tumor [23]. KMT2C is a protein coding gene, which is a member of ASC-2 / NCOA 6 complex (AsCOM), it has histone methylation activity and participates in transcriptional CO activation [24]. It is reported that KMT2C is frequently mutated in a variety of human cancers, which is crucial for the occurrence and development of most cancers [25]. PRKDC is a member of the PI3 / pi4 kinase family. PRKDC encodes the catalytic subunit of DNA dependent protein kinase (DNA-PK), it plays a role in DNA double strand break repair and recombination together with Ku 70 / Ku 80 heterodimer protein [26]. The expression of PRKDC was signi cantly correlated with the overall survival rate of HCC, FN1 gene encodes bronectin, a glycoprotein in the form of soluble dimer in plasma and dimer or polymer in cell surface and extracellular matrix [27] [28]. Fibronectins are involved in cell adhesion, cell motility, opsonization, wound healing, and maintenance of cell shape [29]. Fibronectin is a known biomarker for inchoate diagnosis of HCC, its changes may be an alternative indicator to evaluate the response of patients with early HCC after therapy [30].
We used the method based on the ACMG mutation classi cation guidelines in 481 mutation genes, and found 54 pathogenic and possible pathogenic mutations in 39 genes, which may be the cause of liver cancer in these individuals. Go and KEGG analysis were performed to further understand the role of pathogenic mutant genes. GO annotation showed that BP included peptide tyrosine phosphorylation, positive regulation of RNA polymerase II promoter transcription, positive regulation of GTPase activity, protein autophosphorylation, phosphatidylinositol phosphorylation and so on, which may provide a basis for further understanding of the occurrence and development of liver cancer. In addition, the enriched KEGG pathway was found to be involved in hepatitis (hsa05161) and tumor (hsa05200, hsa05230, hsa05206, hsa05205) related pathways, and BCR, RB1, PDGFRB, RET, CASP8, APC, LAMA2, CDH1, ERBB2, CBL, FGFR3, TP53 played a key role in carcinogenesis and tumor progression.

Conclusion
Inevitably, there are several shortcomings in this study. Above all, this is a straightforward center research, and requires a large-scale multi agency study to verify the consequences. Secondly, the potential functions and pathways are predicted by bioinformatics merely, which need test check out. Third, because our sample size is small and the ctDNA sequencing is used, there may be errors. But our target sequencing can be used to screen out the hot genes of liver cancer e ciently, quickly and conveniently, and provide new ideas for the diagnosis and prognosis of patients with liver cancer. In conclusion, we sequenced the samples of liver cancer, identi ed some new mutation gene sites and screened some genes which are of signi cance for the diagnosis and prognosis of liver cancer. Our research provides the gene mutation map of Chinese hepatocarcinoma patients, and enriches the understanding of the pathogenesis of liver cancer. In addition, we screened some pathogenic genes according to ACMG guidelines, and carried out Go analysis, pathway analysis and PPI network analysis. Notwithstanding, the greater cohort research and experimental research need to be explored in the potential mechanism of HCC.