In-situ multiomics analysis reveals only minor genetic and epigenetic changes in human liver cancer stem cells

Cancer stem cells (CSCs) play pivotal roles in human cancer but their genetic and epigenetic changes are still largely unknown. Here we show that human CSCs are more similar to paratumor cells in liver cancer. Using single-cell-based in-situ multiomics analysis, we found that although liver CSCs (LCSCs) from patients had some potential key genetic and epigenetic changes in their genome, they had substantially fewer somatic single nucleotide variants (SNVs), copy number alterations (CNAs) and differentially methylated regions, and less change in the global levels of DNA cytosine modications, than other tumor parenchymal cells. The cluster analysis of SNVs, CNAs and DNA methylation patterns and spatial transcriptomes all clearly showed that the LCSCs were clustered with the paratumor liver cells. Thus, only minor genetic and epigenetic changes occurred in human LCSCs. Targeting key changes in CSCs, not just changes in bulk tumor cells, should be more effective for human cancer therapy.


Introduction
Human cancer development is a complex evolutionary process. To date, it is universally acknowledged that cancer cells possess many signi cant genetic and epigenetic changes caused by multiple factors compared with normal cells. Many cancers are thought to originate from cancer stem cells (CSCs), which pose a high risk of therapy resistance and cancer relapse 1 . Understanding the genetic and epigenetic changes in human CSCs should shed light on the design of new better cancer therapeutic approaches.
However, it is always di cult to study human CSCs, as they generally account for a very small proportion of tumor specimens from clinical patients. Consequently, the genetic and epigenetic changes that occur in human CSCs and whether these changes are the same as those in other tumor parenchymal cells of the same clinical tumor are still poorly understood.
In the current study, we developed a method combining immunohistochemistry (IHC), laser capture microdissection (LCM) and single-cell sequencing techniques (CIL-seq) to uncover the genetic and DNA methylation changes related to human CSCs. We rst selected specimens with a stem cell phenotype from mixed hepatocellular cholangiocarcinoma (HCC-CCA) patients to investigate liver CSCs (LCSCs). We found that LCSCs had only minor genetic and epigenetic changes and were more similar to paratumor tissues than to tumor tissues in both genetic and epigenetic patterns. We further con rmed these results in many other liver cancer samples using in-situ multiomics analysis.

Development of CIL-seq method
We selected epithelial cell adhesion molecule (EpCAM), which was used to label liver stem cells 2,3 or LCSCs 4-6 , as a marker molecule for liver stem cells in our study. To explore genetic and epigenetic differences between LCSCs and other EpCAM-stained cells in both tumor and paratumor tissues, we developed the CIL-seq method, which combines IHC, LCM and single-cell sequencing techniques ( Fig. 1a and Methods). First, to better obtain LCSCs for further study, we carried out EpCAM IHC staining on frozen sections. Second, we used LCM to capture needed cells more accurately and visually in multiple spatial regions, which is helpful for the investigation of tumor biology and heterogeneity 7,8 . Then, to test whether the IHC process damages the integrity of genomic DNA, we selected approximately 40 pure HCC or paratumor liver (PL) cells, which were clustered together within the same component in the tumor or paratumor, for whole genome sequencing (WGS) library analysis based on single-cell technology.
Simultaneous sequencing of several same-type cells could reduce the inherent single-base false positives caused by whole-genome ampli cation of single cells as much as possible. Last, the WGS results showed satisfactory coverage; at a 15× mean sequencing depth, WGS covered approximately 78% to 80% of the genome at ≥ 1× coverage (Supplementary Table 1), con rming the integrity of genomic DNA.

Characteristics of LCSCs in liver cancer patients
Mixed HCC-CCA is a rare type (less than 5%) of primary liver cancer (PLC) with mixed differentiation ( Supplementary Fig. 1a), and its diagnosis depends on postoperative histopathology 9 . Some cancer cells in mixed HCC-CCAs have stem cell characteristics 9,10 . Therefore, this type of PLC can provide us with a perfect tumor object directly from clinical patients to help investigate human CSCs and compare the similarities and differences between LCSCs and HCC or CCA cells within the same liver cancer sample. We collected liver cancer samples that were resected and diagnosed with PLC before surgery. According to the clinicopathological diagnosis, one mixed HCC-CCA case (patient P2, Supplementary Table 2) among a total of 35 cases (Supplementary Table 2) collected was identi ed. Fortunately, this mixed HCC-CCA specimen is the stem-cell feature type, which is the exact type of specimen that we aimed to research. Pathological histocytology ( Supplementary Fig. 1b) showed three distinct types of parenchymal cells coexisting in the tumor tissues: HCC cells, CCA cells, and LCSCs, which express liver stem cell markers, including EpCAM, cytokeratin 19 (CK19), OV6 and Sal-like protein 4 (SALL4), and account for approximately 10% of the overall tumor tissues. Some LCSCs grew in clonal clusters, and the cell morphology within each clone was not distinctly different under the microscope ( Supplementary Fig. 1b).
By integrating EpCAM staining and cell morphology, the liver parenchymal cells in the paratumor tissue could be further subdivided into two types: the most common type was EpCAM-negative cells, which are ordinary PL cells, and the other type was EpCAM-positive cells, which correspond to paratumor ductular reaction (PDR) cells ( Supplementary Fig. 1b). In PDR cells, each clone is generally believed to contain a portion of liver stem cells 11 . Therefore, EpCAM immunostaining combined with cell morphology features could clearly distinguish the two types of liver parenchymal cells in paratumor tissues and the three types of parenchymal cells in tumor tissues. At the junction between LCSCs and HCC cells or CCA cells, some transitional areas could be clearly observed (Fig. 1b), supporting that HCC and CCA cells may be derived from LCSCs. LCSCs and PDR cells had similar characteristic markers, such as EpCAM, CK19, OV6, and SALL4 ( Supplementary Fig. 1b). Therefore, LCSCs closely resembled PDR cells in both histomorphology and key markers of liver stem cells. We investigated the pathological sections of 12603 PLC patients who received surgical resection from 2017 to 2019 at our hospital. Among them, 278 cases (2.2%) were pathologically diagnosed as HCC-CCA and 58 cases of them (Supplementary Table 2) could be observed such LCSCs as in patient P2. We also investigated 265 HCC patients and found that 35 cases of HCC (Supplementary   Table 2) could be observed such LCSCs, whereas the number of LCSC clones was much less than that in HCC-CCA. Notably, however, LCSCs had signi cantly more Ki-67-positive staining than PDR cells ( Fig. 1c and Supplementary Fig. 1b), indicating that LCSCs have greater proliferative activity.
Whole-exome sequencing (WES) analysis of LCSCs reveals only minor key genetic changes We performed a WES analysis of specimen P2 based on CIL-seq. According to the aforementioned EpCAM staining, we isolated 5 types of cell samples ( Fig. 1d and Supplementary Fig. 2a, b) from frozen sections, including three components in the tumor: LCSCs (3 samples), CCA cells (4 samples) and HCC cells (2 samples), and two components in the paratumor: PDR cells (3 samples) and PL cells (4 samples).
In addition, we used bulk immune in ltrating cells (~ 500 cells) from the paratumor tissue (Fig. 1d) in this patient as the normal genomic reference. Taken together, WES libraries were prepared from a total of 17 samples and sequenced. One of the 17 samples, E3A, was sequenced repeatedly, and the results showed good sequencing concordance ( Supplementary Fig. 2c). At least two samples of each type had good coverage (average target depth > 62×) (Supplementary Table 1).
We utilized two algorithms to identify somatic single nucleotide variants (SNVs) and removed false positives by commonly used lters (Methods). Consequently, we detected 125 SNVs in all samples, and Sanger sequencing further con rmed the accuracy of SNV detection (Supplementary Table 3). They were predominantly C to T transitions (Fig. 2a), similar to those previously reported [12][13][14] . Through cluster analysis of the 125 SNVs, it was clearly seen that different samples from the same cell types could be clustered together. Intriguingly, the LCSC samples were clustered with the paratumor samples instead of the HCC or CCA samples (Fig. 2b).
Most SNVs were detected in HCC or CCA samples (73/125 and 70/125, respectively, Supplementary Table   3), and the number of SNVs was consistent with that of previous ndings in HCC or mixed HCC-CCA samples [15][16][17][18] . Notably, only minor SNVs (28 in total) were detected in LCSCs. There were considerably more independent SNVs among LCSC samples (Fig. 2c, Supplementary Fig. 3a 14,19 . Consistent with these ndings, we found many SNVs (43 in total, Supplementary Of the 69 genes with missense mutations or stop codon changes, only a small portion of these SNVs were detected in LCSCs ( Fig. 2e and Supplementary Table 3). We found that three genes (DHX9, ERBB4, FAM46D) were among the 299 reported driver genes 20 , and one (ZFHX4) was among the 161 driver genes of HCC 15 . These driver gene mutations were detected only in HCC or CCA cells (Supplementary  Table 3). Taken together, these results suggest that crucial gene mutations that may promote tumor growth occurred in HCC and CCA cells; however, substantially fewer mutations occurred in LCSCs.
We next analyzed the copy number alteration (CNA) using the WES data. Our analysis focused only on the ampli ed loci because of the low coverage of some samples. We found many CNA ampli cation regions in the tumor, but most of them were centralized in HCC and CCA cells, including the ampli cation of chromosome 8q, which is often detected in liver cancer 8,18,21 , while only very few CNA regions were detected in LCSCs ( Fig. 2f and Supplementary Table 4). We also analyzed the ampli ed genes. Overall, consistent with the CNA ampli cation regions, at the gene level, LCSCs also had apparently fewer ampli cation genes than HCC and CCA cells (Supplementary Fig. 3b and Supplementary Table 4). A phylogenetic tree constructed using the ampli ed genes clearly showed that similar to SNVs, LCSCs were more similar to paratumor cells than to HCC and CCA cells (Fig. 2g), and this analysis also showed that HCC and CCA cells might be derived from LCSCs, which is consistent with the aforementioned histological characteristics between these cells (Fig. 1b). Interestingly, some of these ampli ed genes are oncogenes, such as MYC, which was also ampli ed in one sample of LCSCs (Supplementary Fig. 4 and Supplementary Table 4). Activation of MYC is thought to be related to the formation of CSCs in the liver 12,22 . Taken together, WES data showed that LCSCs have only a few SNVs and CNAs compared to those in HCC and CCA cells, but LCSCs might possess a few crucial changes, such as ampli cation of the oncogene MYC.
Whole-genome bisul te sequencing (WGBS) analysis of LCSCs reveals minor key epigenetic changes Epigenetic aberrations play an important role in cancer development 7,23,24 , and our WES data showed few striking genetic changes in LCSCs. Thus, we are curious about what epigenetic changes hidden in LCSCs changed them to malignant cells instead of normal cells. Considering the actual situation of the samples we picked, we mainly examined the DNA methylation landscape of these samples. DNA methylation is the most essential DNA modi cation and is closely associated with other epigenetic contents 25 . We performed WGBS analysis of captured different cell clones ( Fig. 1d and Supplementary  Fig. 2) from specimen P2 using CIL-seq. The bisul te conversion e ciency of most samples was greater than 98%, and CpG coverage in most samples (Supplementary Table 1) was higher than that reported for single-cell WGBS sequencing [26][27][28] .
In general, HCC and CCA cells have undergone broad DNA demethylation compared with paratumor cells (Fig. 3a). However, the global methylation level of LCSCs changed much less (Fig. 3a), indicating a closer relationship to paratumor cells. In addition, there was extensive DNA methylation heterogeneity in tumor cells ( Supplementary Fig. 5a), as reported 16,[29][30][31] . Notably, there was also strong DNA methylation heterogeneity among LCSC samples ( Supplementary Fig. 5b). The cluster analysis of the DNA methylation pattern (Fig. 3b) clearly illustrated that HCC and CCA cells were clustered together, and different samples in the paratumor were clustered together. Intriguingly, LCSCs were clustered with paratumor cells. Thus, these results suggest that, consistent with our WES analysis shown above (Fig. 2b, g), for DNA methylation, LCSCs were also more similar to paratumor cells than to HCC and CCA cells.
By comparing the differentially methylated regions (DMRs) of distinct sample types, we found that LCSCs displayed limited variance with paratumor cells but had greater variance with HCC and CCA cells, which is consistent with the global DNA methylation pattern (Fig. 3b). Speci cally, 10.1% DMRs were found between LCSCs and PDR cells, while 27.1% or 32.3% DMRs were found between LCSCs and HCC or CCA cells ( Fig. 3c and Supplementary Table 5).
Formalin-xed, para n-embedded (FFPE) sample resources are far richer than frozen tissues. We chose an FFPE HCC-CCA sample (P4, Supplementary Table 2) and conducted WGBS analysis based on the CILseq method (Fig. 3d). Although the quality of the sequencing data was much lower than that of frozen tissues (Supplementary Table 1), we still observed that the overall methylation level and DNA methylation pattern of LCSCs were similar to those of PDR cells (Fig. 3e, f), consistent with the results of patient P2. Taken together, these results suggest that in the global pattern of DNA methylation, very large methylation changes occurred in HCC and CCA cells compared with paratumor cells, whereas there were fewer differences in DNA methylation patterns between LCSCs and paratumor cells.

DNA methylation changes in LCSCs correspond to a chromatin restriction state
We next examined the distribution region of DMRs in the genome of P2 samples. Overall, there were signi cantly fewer regions with prominent DNA methylation changes in LCSCs than in HCC and CCA cells (Supplementary Table 6). According to the 15-state model of chromatin 25 , the signi cant methylation changes in HCC and CCA cells were closely related to the methylation level of the 15 chromatin states in normal cells. Chromatin states with signi cant demethylation occurred on those initially exhibiting a hypermethylated state in normal cells 25 , including the 9_Het and 15_Quies chromatin states, and chromatin states with signi cant hypermethylation occurred on those initially exhibiting a hypomethylated state in normal cells 25 , including the 1_TssA, 10_TssBiv, 11_BivFlnk and 12_EnhBiv states (Fig. 4a, b and Supplementary Fig. 5c). However, substantially fewer methylation changes occurred in these chromatin states in LCSCs, which is consistent with the minor change in the global methylation level in LCSCs (Fig. 3a).
Remarkably, we found that a few crucial chromatin regions in LCSCs had the same DNA methylation tendency as in HCC and CCA cells. Signi cant hypermethylation changes in repressive Polycomb group (PcG) protein-marked regions (including the chromatin states 10_TssBiv, 11_BivFlnk, 12_EnhBiv and 13_ReprPC) occurred in HCC and/or CCA cells (Fig. 4a, b), which is consistent with previous ndings that PcG-marked genes in both embryonic and adult progenitor systems have a higher chance of becoming hypermethylated in cancers 32 . In addition, we also found that two of these PcG-marked regions had already undergone signi cant changes in LCSCs compared with PDR cells, including in the bivalent state 10_TssBiv and the chromatin state 13_ReprPC (Fig. 4b). The functions of those genes corresponding to these hypermethylated regions in LCSCs are mainly involved in cell differentiation and fate determination (Fig. 4c and Supplementary Table 7), and more than half of them (20/32) also correspond to bivalent genes in the human fetal liver 33 , such as APC2 (Supplementary Table 7), a homologue of the APC tumor suppressor 34 . Therefore, hypermethylation changes in these regions in LCSCs might result in a restrictive chromatin state that can block a differentiation program 24 . Taken together, these results suggest that although there were only a few DNA methylation changes in LCSCs, which were much less than the signi cant changes observed in HCC and CCA cells, the changes in LCSCs might result in a restrictive chromatin state, which may be important for cancer initiation.
Minor genetic and epigenetic changes in LCSCs of HCC patients HCC is a major type of PLC. We collected frozen tissues and prepared sections from 23 PLC cases (Supplementary Table 2). One of them (patient P7) was pathologically diagnosed with HCC and had obvious EpCAM+ LCSCs. We performed WES and WGBS analysis of this case with CIL-seq ( Fig. 5a and Supplementary Table 1). The results showed that the genetic (SNV and CNV) difference or DNA methylation pattern difference between LCSCs and PDR cells was small (Fig. 5b-f), consistent with the analysis results of patients P2 and P4. Interestingly, among the minor genetic changes in LCSCs, we found a stop gain mutation in the driver gene MGA 20 , which also appeared in an HCC sample (Supplementary Table 3). MGA is signi cantly mutated in lung adenocarcinoma and participates in the negative regulation of MYC 35 .
An important epigenetic change during cancer development is the global demethylation of DNA. We and others have found that 5-methylcytosine (5mC) can be oxidized to the demethylation intermediates 5hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) in mammalian cells 36,37 . Recently, we found that in the early stage of HCC, the global content of 5hmC and 5fC was decreased 38 , but it is not clear whether there is such a change in LCSCs. We immunostained the FFPE section samples of P7 as well as P2 and P4 for 5mC, 5hmC and 5fC ( Supplementary Fig. 6). The results showed that most LCSCs were different from other tumor parenchymal cells which had obviously decreased 5hmC and 5fC staining. There was little difference in 5mC, 5hmC or 5fC staining between LCSCs and PDR cells. Examination of 65 samples with obvious LCSCs (Supplementary Table 3), including those from 35 HCC patients, showed similar results (Fig. 5g, h), suggesting that LCSCs do not undergo extensive DNA demethylation. Taken together, these results are consistent with the results of WGBS analysis showing that minor DNA methylation changes occur in LCSCs.
Finally, we aimed to determine whether the RNA expression pro les of LCSCs are similar to those of paratumor liver cells, since the genetic and epigenetic changes in LCSCs are not signi cant. We performed a spatial transcriptomic (ST) analysis of P7 frozen tissues. We generated 4493 spot transcriptomes at a medium depth of 14455 UMIS/spot and 4064 genes/spot ( Supplementary Fig. 7a), which can clearly distinguish tumor and paratumor cells (Supplementary Fig. 7b). In general, there were large RNA expression differences between the tumor and paratumor cells (Supplementary Fig. 7c).
Further cluster analysis showed that the tumor cells had extensive heterogeneity (Fig. 5i). Because of the small proportion of LCSCs, we manually selected all LCSC spots in the tumor tissues and PDR cell spots in the paratumor tissues ( Supplementary Fig. 7d, e). In addition, we randomly selected some spots of tumor or paratumor parenchymal cells. Cluster analysis of the whole expression pro les showed that LCSCs and PDR cells were clustered together ( Supplementary Fig. 7f). The cluster analysis of the expression pro les of 11 liver-related genes of these spots showed similar results (Fig. 5j), indicating that there was only a small difference in RNA expression patterns between LCSCs and PDR cells, consistent with the results of the genetic and epigenetic changes.

Discussion
In the present study, using single-cell-based in-situ multiomics analysis, we provide evidence that only a few genetic and epigenetic changes occur in human LCSCs. Our data showed that HCC and CCA cells have undergone considerable genetic and epigenetic changes that we usually see in human cancers, but only minor genetic and epigenetic changes have occurred in LCSCs.
Because of the possible great genetic and epigenetic differences between CSCs and other tumor parenchymal cells, the detection of genetic and epigenetic changes in whole tumor tissue could not re ect what has happened in CSCs. The sequencing signals from bulk tumor samples would be dominated by major tumor parenchymal cells, rendering rare CSCs undetectable. Therefore, treatments that target most of the signi cant genetic and epigenetic changes (and hence the changes in RNA level and protein level) in these tumors may not be effective for CSCs because CSCs may not possess those changes.
The LCSCs in the patients in this study harbor CSC characteristics, including the expression of stem cell markers and a higher proliferative capacity than PDR cells, and histopathological and genetic evidence suggests that LCSCs may be the common origin of HCC and CCA cells. Nevertheless, LCSCs have signi cant genetic and epigenetic differences compared with HCC and CCA cells in the same patient. The proportion of CSCs in some mixed HCC-CCAs is usually much higher than that in other types of tumors; therefore, such mixed HCC-CCAs can provide us with an ideal tumor object to help investigate human CSCs. Furthermore, because LCSCs in mixed HCC-CCAs can differentiate into two kinds of tumor parenchymal cells, it is easier to identify them.
The CIL-seq method that we developed in this study, which combines IHC, LCM and single-cell sequencing techniques, is suitable for the investigation of rare CSCs, especially when spatial heterogeneity is taken into consideration. To obtain monoclonal CSCs, in-situ capture, such as LCM, is a direct and effective method. Information on cell spatial positions could not be obtained if using methods such as cell sorting. Because of the breakage of DNA in FFPE tissue cells, such samples are often not suitable for single-cell based genetic and epigenetic sequencing research. Frozen sections are much better, as illustrated in this study.
In conclusion, we found only minor genetic and epigenetic changes in CSCs in liver cancer patients. Further studies are needed to determine whether such changes are common in CSCs in patients with other types of cancer. Identifying few but key genetic and epigenetic changes in both CSCs and other tumor parenchymal cells of each speci c cancer patient, such as MYC ampli cation in patient P2 or MGA mutation in patient P7 in this study, could help design more effective methods to treat human cancer in the future.

Patients and collection of clinical specimens
We investigated pathological sections of 12603 PLC patients who received surgical resection from 2017 to 2019 at the Eastern Hepatobiliary Surgery Hospital (EHBH) in Shanghai, China. In addition, 265 HCC patients who underwent surgical resection from 2009 to 2020 at EHBH were randomly selected and investigated for LCSCs (Supplementary Table 2).
Fresh primary liver cancer tissues and paratumor liver tissues were obtained from 58 patients who underwent surgical resection at EHBH (Supplementary Table 2). Each tissue specimen was embedded in optimal cutting temperature (OCT) compound medium, immediately frozen in an isopentane slurry made with liquid nitrogen, and nally stored at -80 °C until further processing. Each sample was embedded within 30 minutes for frozen sectioning after surgical removal.
The archives of all patients were collected by the EHBH archive system. Informed consent was obtained from the patients, and all procedures were approved by the ethical committee of EHBH.
For formalin-xed, para n-embedded (FFPE) IHC, the tissues were xed overnight in 4% PFA, embedded in para n and cut into 4 μm serial consecutive sections. After depara nization, antigen retrieval was performed with sodium citrate buffer (10 mM sodium citrate, 0.05% Tween-20, pH 6) (for H&E, omit this step, subsequently by hematoxylin and eosin standard protocols). The following steps were the same as those in the aforementioned frozen section. The primary antibodies used here were as follows: anti-SALL4 (1:100, Santa Cruz Biotechnology, sc-101147), anti-CK19 (1:100, Abcam, ab9221), and anti-Ki-67 (1:1000, Abcam, ab15580). For 5mC, 5hmC or 5fC immunostaining, DNA was denatured with 2 N HCl for 15 minutes at room temperature after antigen retrieval treatment, followed by neutralization with 100 mM Tris-HCl (pH 8.0) for 10 minutes at room temperature. The primary antibodies used were as follows: anti-5mC (1:5000, Cell Signaling Technology, Cat#: 28692), anti-5hmC (1:10000, Active Motif, Cat#: 39769); and anti-5fC (1:1000, Active Motif, Cat#: 61227). For EpCAM and 5hmC double-staining, primary anti-5hmC antibody (rabbit) was rst incubated for 16 hours at 4 °C, followed by alkaline phosphatase-based streptavidin/biotinylated link system to visualize dark purple cell nuclei. Anti-EpCAM antibody (mouse) was then incubated for 2 hours at room temperature, followed by detection using a horseradish peroxidase/AEC system to visualize the red cell membrane. Speci cally, the slides were sealed with glycerol mounting medium and without hematoxylin counterstaining. We could not obtain good results for EpCAM and 5fC double staining, possibly because of the low 5fC content in cells and thus usually weak staining for 5fC.

CIL-seq IHC and LCM
OCT-embedded frozen tissues were cut into 10 µm thick sections and mounted on PEN Membrane Glass Slides (Leica). After IHC of EpCAM (see above), tissues were dehydrated through rising ethanol concentrations (50, 75, 85, 95, 100 and 100% ethanol, 60 s each) in 50 mL sterile centrifuge tubes. When membrane slides dried completely, the LCM procedure was initiated by a Leica LMD 7000 laser microdissection system according to the manufacturer's protocol. All captured samples were collected in 0.2 ml PCR tubes. Finally, the membrane slides were cleared with xylene and sealed for long-term preservation. Certi ed RNase/DNase-free materials were used whenever available.

Genetic libraries
Ultralow DNA from LCM samples was extracted by a Quick-DNA Microprep Plus Kit (Zymo Research) before single-cell multiple displacement ampli cation (MDA) using a REPLI-g Single Cell Kit (Qiagen) for trace DNA ampli cation. After obtaining high yields of high-quality whole-genome ampli ed DNA, wholeexome capture and library construction were carried out by an Agilent SureSelect Human All Exon V6 kit (Agilent) according to the manufacturer's instructions. Whole-genome libraries were constructed by the QIAseq FX Single Cell DNA Library kit (Qiagen).

Whole-genome bisul te sequencing (WGBS) libraries
Tissue samples from LCM were digested for up to 4 hours with 1 mg/ml proteinase K according to the EZ DNA Methylation-Direct Kit (Zymo Research). After thorough digestion, LCM cells were immediately sent to the Pico Methyl-Seq Library Prep Kit (Zymo Research) to accommodate ultralow DNA input according to the manufacturer's instructions with minor adjustment: bisul te conversion time was extended to 90 minutes to ensure complete conversion; for the sample M1BULK, the library only needed 6 cycles in Section 4, while other libraries needed 11 cycles.

Sequencing
All libraries were sequenced with the Illumina Xten platform to generate 150 bp paired-end reads.
Whole-genome sequencing (WGS) and whole-exome sequencing (WES) data analysis Data quality and ltering Raw reads were ltered with Trimmomatic to remove adaptor sequences and low-quality bases 39 . The remaining reads were aligned to the human genome (hg38) by the BWA algorithm 40 . After mapping, the duplicate reads were removed. Germline variants were called using GATK 41 .

Somatic mutations
Taking bulk immune in ltrating cells as a control, somatic mutations were identi ed from BAM les using muTect and varScan 42,43 . High-quality somatic single nucleotide variants (SNVs) were obtained by applying the following lters: 1) Removal of potential germline variants that are recorded in dbSNP or variants with allele frequency larger than 1% in population 44,45 ; 2) Removal of variants located within 10 bp distance, which are more possible to be errors introduced during library preparation; 3) Removal of mutations that are identi ed by only one algorithm; 4) Depth of the mutation sites should be larger than 10 in sample and control sample; 5) There are at least 2 reads supporting the alternative allele and the allele frequency is larger than 5% in sample , and no reads have the alternative allele in control sample; 6) Since there are more samples in P2 than in P7, only mutations that occur in more than one of P2 samples are analyzed to improve quality. This lter is not suitable for P7.
Functional consequences of somatic SNVs were annotated by ANNOVAR 46 . Somatic SNVs of all samples were converted to a 0-1 matrix ( ), in which 1 indicates having a mutation. To reduce the effects of allele dropout, when site had low coverage (≤ 5×) in the sample, was adjusted based only on other samples of the same pathology group. if more samples carry the mutation or if more samples are wild type. Euclidean distances between paired samples were calculated from and hierarchical clustering was performed to cluster samples.

Copy number alterations
Copy number alterations (CNAs) were identi ed from WES data by CNVkit 47 . Samples covering < 50% target regions were removed from the CNA analysis. Deletions were ignored to avoid the effects of uncovered regions. Segments and genes with log 2 (CN/2) ≥ 1 were regarded as ampli cations. To decrease false positives of ampli ed genes, we only analyzed genes that were ampli ed in ≥ 5% TCGA liver cancer patients (https://www.cancer.gov/tcga) and genes with three more ampli cations in tumor samples than in paratumor samples. Euclidean distances were calculated from ampli ed genes, and hierarchical clustering was used to plot the clustering tree.

WGBS data analysis
Data quality and ltering Raw reads were ltered with Trimmomatic to remove adaptor sequences and low-quality bases 39 . The remaining reads were aligned to the human genome (hg38) by Bismark 48 . Potential PCR duplicates were removed. The bisul te conversion rate was estimated as the ratio between the number of methylated non-CpGs and total non-CpG sites.

Data processing
The R package methylKit was used to generate and analyze the methylation matrix 49 . Bases whose coverage was larger than 500× or lower than 3× were discarded. The genome was split into 300-bp windows with 300-bp step size for fresh-frozen samples. For FFPE samples, the genome was split into 1 Mb windows with a 1 Mb step size. Windows that were covered by less than 3 bases were discarded.
Similarity between paired samples was measured by Pearson correlation coe cients. Hierarchical clustering was used to explore the clusters of samples. For differential methylation analysis between two groups, the chi-square test was used to compare windows covered by at least two samples in each group.
Differentially methylated regions (DMRs) were windows that satis ed Q-value < 0.05 and absolute difference > 0.25. DMRs were annotated with genomic contexts and 15 chromatin states of embryonic stem cell line H9 (downloaded from Roadmap Epigenomics Project) 25 . Annotation enrichment analysis was performed by Fisher's exact test. Annotations with Q-values < 0.01 were regarded as signi cantly enriched.

Gene Ontology (GO) enrichment analysis
The Metascape web-based tool 50 was used for GO analysis of the genes of interest. The Metascape analysis was performed using the default settings. A value of P < 0.05 was considered statistically signi cant.

DNA modi cation analysis
For statistical analysis of 5mC, 5hmC and 5fC staining, we set a standard assessment rule: within one patient, the strongest stained areas were marked with 3 scores. Speci cally, strong positive, score = 3; medium positive, score = 2; weak positive, score = 1; no signal, score = 0. Five randomly selected equal elds (400×) of IHC images of each cell type were analyzed.

Spatial transcriptome analysis ST experiments
Spatial transcriptomics (ST) experiments were performed according to the user guide of Visium Spatial Gene Expression Reagent Kits (10x Genomics). Brie y, HCC tissues were gently washed with cold PBS and lled with OCT in a proper mold and then snap frozen in chilled isopentane. Cryosections were mounted onto a spatially barcoded array of ST with 10-μm thickness, and serial adjacent cyosections were mounted onto regular glass slides for IHC staining of EpCAM or other markers with 8-μm thickness. For processing, the tissue was xed for 30 min with prechilled methanol at -20 °C, followed by H&E staining. Slides were nally taken on a Leica SCN400 F whole-slide scanner at 40× resolution. After capturing ideal tissue morphology information and ensuring that RNA was not degraded (RIN ≥ 7), tissue permeabilization and reverse transcription were immediately conducted by a Visium Spatial Tissue Optimization Kit (10x Genomics). Finally, the ST library was prepared with second strand synthesis and denaturation and sequenced by NovaSeq 6000 (Illumina). Each of the spots printed onto the array is 50 μm in diameter and 100 μm from center to center, covering an area of 6.5× 6.5 mm 2 (ref. 51).

Data processing and cluster analysis
Raw sequenced ST reads were processed using Space Ranger analysis software (version 1.0.0, 10× Genomics) mapped to the GRCH38 genome assembly following the standardized analysis rules. UMI counts in each spot were normalized and scaled by the median number transcript count across all spots.
Cluster analysis was based on the K-Means algorithm in Space Ranger software.

Data and materials availability
The raw sequencing data were deposited in the National Omics Data Encyclopedia under accession number OEP000991 (https://www.biosino.org/node/project/detail/OEP000991).  Table 2). Each dot represents the mean value of ve randomly selected equal elds of IHC images from one patient. NCT: non-CSC tumor cells. Data are presented as the mean ± SD. ***, P < 0.001 by Wilcoxon matched-pairs signed rank test. d, Locations and names of the samples of patient P2 used for WES and WGBS analysis. The samples were collected from two frozen sections that were stained with anti-EpCAM and had a spatial distance greater than 500 μm from each other. The dotted lines indicate the boundaries between the tumor and the paratumor.   repressed states, as suggested in reference 25. The red color represents signi cant enrichments of hypermethylated or hypomethylated DMRs in the latter cell type (Q-value < 0.01). b, DMR frequency between different cell types in different chromatin states. Both hypomethylation and hypermethylation DMR frequencies between the two indicated cell types in the chromatin states 10_TssBiv, 11_BivFlnk, 12_EnhBiv and 13_ReprPC are shown. DMR frequency = DMR tiles in the chromatin state/all common identi ed tiles between the two cell types × 100%. The symbol * represents signi cant DMR enrichment of hypomethylation or hypermethylation in the latter cell type, as shown in a. c, The functions of genes corresponding to the hypermethylated DMRs between PDR and LCSCs in the chromatin states 10_TssBiv and 13_ReprPC.