scONE-seq: A one-tube single-cell multi-omics method enables simultaneous dissection of molecular phenotype and genotype heterogeneity from frozen tumors

Genomic and transcriptomic heterogeneity both play important roles in normal cellular function as well as in disease development. To be able to characterize these different forms of cellular heterogeneity in diverse sample types, we developed scONE-seq, which enables simultaneous transcriptome and genome proling in a one-tube reaction. Previous single-cell-whole-genome-RNA-sequencing (scWGS-RNA-seq) methods require physical separation of DNA and RNA, often by physical separation of the nucleus from the cytoplasm. These methods are labor-intensive and technically demanding, time-consuming, or require special devices, and they are not applicable to frozen samples that cannot generate intact single-cell suspensions. scONE-seq is a one-tube reaction, thus is highly scalable and is the rst scWGS-RNA-seq method compatible with frozen biobanked tissue. We benchmarked scONE-seq against existing methods using cell lines and lymphocytes from a healthy donor, and we applied it to a 2-year-frozen astrocytoma sample proling over 1,200 nuclei, subsequently identifying a unique transcriptionally normal-like tumor clone. scONE-seq makes it possible to perform large-scale single-cell multi-omics interrogation with ease on the vast quantities of biobanked tissue, which could transform the scale of future multi-omics single-cell cancer proling studies. amplication process, while also incorporating UMIs 31–33 . Thus, DNA and RNA reads can be distinguished by their barcode information after sequencing. scONE-seq has several advantages: it has a simplied library construction workow; it is compatible with standard biology workows such as uorescence-activated cell sorting (FACS); being a one-pot reaction, its throughput can be easily scaled up using liquid-handling robots; most importantly, scONE-seq does not require physically separating the DNA and RNA, and is therefore applicable to a variety of sample types including single nuclei. This means that frozen clinical samples and tissue types like liver, bone, and brain, that are dicult to dissociate into single-cell suspensions, which are intractable with other scWGS-RNA-seq methods, can be proled using scONE-seq. We demonstrate that the technical performance of scONE-seq is comparable to existing single-cell methods in a benchmarking experiment using various sample types, including several cell lines and lymphocytes from the blood of a healthy donor. We then show how scONE-seq can be used to discover novel disease-related information by applying it to a frozen IDH1-mutant astrocytoma sample. Using scONE-seq data, we were able to simultaneously characterize both the molecular phenotype and genotype heterogeneity of this patient sample, proling the cell types in the tumor as well as the clonal events. We were able to fully dissect the genome level and transcriptomic level heterogeneity of this patient’s disease, and in doing so we identied a unique normal-like tumor clone by integration of clonal and transcriptomic information. Further analysis of RNA and DNA proles of this clone suggests that this subpopulation may play a role in the maintenance of the tumor, as well as in regulating the tumor’s interactions with normal cells in its microenvironment. The abnormal genomic prole of this small subpopulation along with its normal-like transcriptomic phenotype hints at its origins during the tumor initiation process and demonstrates the power of scONE-seq to distinguish genomic variations of near-normal cells from true normal cells. Using this frozen primary glioma sample as an example, we establish a framework for integrative single-cell multi-omic WGS/RNA-seq analysis on biobanked samples to generate new insights on tumor heterogeneity, enabled by the versatility and scalability of the scONE-seq approach. and sample applicability. Most importantly, we designed this technology to be compatible with frozen tissue samples that have been stored for years. This feature makes it easier to plan and perform larger-scale clinical multi-omic single-cell studies in two ways: First, by enabling studies on existing biobanked samples, which we have demonstrated herein; Second, for studies on new samples, it also removes the burden of having to immediately process tissues from clinical researchers whose priority is patient care. The application of scONE-seq on frozen tumor tissue in our study demonstrates a comprehensive investigation of the genotypic and molecular phenotypic heterogeneity of astrocytoma. By superimposing the multi-omic data, we directly observed and characterized the differentiated tumor clones, which supports the idea that tumor clones can produce a differentiation hierarchy 7,66,67 . In addition, we found a unique tumor clone in this patient’s 2R astrocytoma with a very similar transcriptomic phenotype as normal astrocytes. Putative clone 1 cells expressing the same gene markers as clone 1 were found using both independent 10X Genomics snRNA-seq as well as immunostaining on tissue sections. Incorporating the histological and transcriptomic information of clone 1 cells, we propose some potential roles of these cells in the TME, including cell-cell communication via both signaling molecules and potential tumor-neuron synapse formation. Specically, we found clone 1 to be the only tumor subpopulation that expresses GRIA1, an AMPA receptor encoding gene, while all the other tumor clones only express other GRIA genes. GRIA1-encoded GluA1 subunit often forms GluA1 homomer AMPARs, which are calcium-permeable and broadly found in synapses in early development 65 . Calcium-permeable AMPARs are a key signaling molecule in the tumor-neuron synapse, and the maintenance of long-term potentiation is also known to be modulated through post-translational modication of GluA1, making GluA1 critical for neural plasticity in the brain 65,68 . The unique, differential expression of GRIA1 in clone 1 hints at a unique, possibly multi-faceted role for clone 1 in cell-cell communication in the tumor microenvironment 7,62,63 . We also found TGFβ signaling expression to be prominently expressed in normal astrocytes, clone 1 cells, and TAMs. TGFβ signaling in glioblastomas can promote invasiveness, angiogenesis, and plays a role in the suppression of the immune system; it is a therapeutic target for the treatment of gliomas. The expression of TGFβ pathway-associated genes by clone 1 cells is comparable to that of the normal astrocytes within the tumor, raising intriguing questions about the role of clone 1 in tumor immune modulation, and its possible interactions with TAMs. These features of the tumor microenvironment and whether clone 1 cells are a commonly occurring cell-type in gliomas or specically astrocytomas will be an interesting avenue of future investigation on a larger sample set. Collectively from these results, we posit that scRNA-seq-only based cancer studies could underestimate important layers of tumor heterogeneity, and that simultaneous direct DNA measurement could contribute meaningful and informative insight on tumor evolution. Meanwhile, the clonal analysis based on scWGS-only data also ignores the complex interactions within a tumor microenvironment. By deciphering the genetic and phenotypic heterogeneity within the tumor ecosystem with scONE-seq, we can reveal the interplays of clonal expansion, tumor cells differentiation hierarchy, and TME. We anticipate that future large-scale studies on more samples will be even more informative, and has the potential to inform our knowledge of cancer and therapeutics in a more generalizable manner.


Introduction
Single-cell genomics has become a mainstay technology used to dissect multicellular organisms and tissues that are composed of cells with diverse functions [1][2][3][4] . The power of this approach has been demonstrated in several cell atlas studies: novel cell types have been discovered that further led to the elucidation of new mechanisms; complex cellular interactions and transitions associated with disease initiation or progression have been revealed; crossspecies analyses have shed light on evolutionary processes [5][6][7][8] . The use of single-cell technology in studying cancer is especially important. Regulatory mechanisms underlying drug resistance or immune evasion are elusive and complex, and tumor cell heterogeneity is a major contributing factor to this complexity, making it particularly challenging to dissect these mechanisms with bulk techniques [9][10][11] . Single-cell technologies have greatly enhanced our understanding of tumor heterogeneity and accelerated mechanistic discovery. At the phenotype level, single-cell RNA-seq (scRNA-seq) has been used to uncover drug-resistant melanoma subpopulations and to characterize cancer stem cell subpopulations in glioblastoma 7,[12][13][14][15] . scRNA-seq has also enabled a more comprehensive phenotypic understanding of the tumor microenvironment (TME) in many cancers including glioma and colorectal cancer [16][17][18][19][20] . At the genotype level, genomic instability contributes to cancer initiation, progression, relapse, and metastasis 9 . With single-cell whole genome sequencing (scWGS), the clonal structure of the tumor can be resolved, and evolutionary analysis based on copy number aberrations (CNAs) can reveal tumor progression 21,22 .
Evidently, both the genomic and transcriptomic heterogeneity of tumors contributes to the disease, and understanding the importance of both in cancer studies is crucial. Several single-cell methods that interrogate DNA and RNA simultaneously in the same cell (scWGS-RNA-seq) have been developed [23][24][25][26][27][28] .
However, these rst-generation scWGS-RNA-seq methods have not been widely applied since their invention, largely because they require physical separation of DNA and RNA, often by physical separation of the nucleus from the cytoplasm. These separation techniques are labor-intensive and technically demanding, time-consuming, requiring well-trained experimental techniques, or require special micro uidic devices 29,30 . Furthermore, they are not applicable to frozen samples, where it is impossible to obtain intact single-cell suspensions. As such, existing scWGS-RNA-seq methods cannot be applied to the vast majority of primary biobanked tumor samples, which represents most of the readily available clinical samples. All in all, these limitations make rst-generation scWGS-To achieve single-cell genome and transcriptome co-pro ling, we devised a work ow to amplify RNA and DNA simultaneously (Fig. 1a). Brie y, to perform scONE-seq, after the sample dissociation, cells or nuclei are sorted into PCR plates containing lysis buffer with a ow cytometer. Plates of sorted single cells can then be processed immediately, or store at -80 for months before processing. To start the single-cell ampli cation, we rst use Tn5 with a custom adaptor to fragment and label the genome or any other DNA within the cells [34][35][36] . In this step, the ampli cation adaptor, which includes a 6-nucleotide "DNA barcode" and 6-nucleotide UMI, is added to fragmented DNA (fDNA). Subsequently, we use reverse transcription (RT) to generate cDNA, where the RT primer is comprised of: a priming sequence adapted from that of the MATQ-seq protocol 37 ; a 6-nucleotide "RNA barcode"; and a 5-nucleotide UMI. The RT priming sequence is a modi ed random oligo and primes to the internal regions of RNA transcripts, thereby enabling detection of full-length transcripts, including nonpolyadenylated (non-polyA) RNAs. cDNA 3' adaptor is then added through subsequent poly-C tailing and degenerate PCR 38,39 . Once DNA-speci c and RNAspeci c barcodes have been added, fDNA and cDNA are ampli ed simultaneously, and the sequencing library is constructed with the pre-ampli ed products (Methods).
First, to characterize the transcriptome generated by scONE-seq, we benchmarked it against Smart-seq2 (SS2) 40,41 using a variety of test samples: extracted RNA-free E.coli genomes (mock DNA), extracted DNA-free human total RNA (mock RNA), as well as a mixture of the two (i.e. E. coli DNA mixed with human total RNA); and cultured HCT116 single cells. We evaluated the sensitivity by assessing the number of genes detected in each of the benchmark mock and HCT116 samples and found that scONE-seq detected more genes per cell than SS2 ( Fig. 1b and Supplementary Fig. 1a, P < 2×10 -16 , t-test). This is likely due to scONE-seq being able to capture total RNA 37,42 while SS2 only targets polyA RNA, and therefore capturing a more diverse set of molecules at any given sequencing depth ( Supplementary Fig. 1d). Also, scONE-seq enables full-length transcript pro ling and achieves gene body coverage uniformity comparable to SS2 (Fig. 1c). We then used sample-to-sample correlation analysis as well as the detection of ERCC Spike-In to estimate the accuracy of scONE-seq in comparison to SS2. The sample-to-sample correlation analysis shows the two methods are comparable (Fig. 1d, Supplementary Fig. 1c). Using ERCC Spike-In as a measure of accuracy, SS2 performs slightly better with a correlation 0.91 (R value) against the expected ERCC concentrations, while scONE-seq is 0.86; both are su ciently high for quantitative measurement of transcript abundance from single cells ( Supplementary Fig. 1b).
Next, we sought to validate the whole genome sequencing (WGS) capability of scONE-seq. Lorenz curves 43 compare the coverage uniformity for each method, showing a good performance by scONE-seq (Fig. 1e). Then, we used bulk HCT116 WGS data in comparison to scWGS data generated by scONE-seq to con rm that CNAs captured by scONE-seq are consistent with those de ned by bulk (5×10 6 cells) and pseudo bulk (86 cells) (Fig. 1f, g, h). In addition, we performed UMI deduplication on the scONE-seq DNA dataset, since our method adds UMI to the DNA fragments during the tagmentation step and found that deduplication successfully reduces the bias introduced during single-cell DNA ampli cation (Methods). To con rm this effect, we calculated the coe cient of variation (CV) for the raw genomic bin counts (genome bin size of 500kb), and for the UMI deduplicated data, and indeed the CV was signi cantly reduced upon UMI deduplication ( Supplementary Fig. 1f, P = 1.2×10 -7 , t-test). These analyses show that scONE-seq captures true CNAs from single cells at 500-kb resolution.
Summarily, the analysis of scRNA-seq and scWGS data generated using benchmark samples shows that scONE-seq can pro le genome and transcriptome data from the same single cell without compromising data quality as compared to existing standard methods. scONE-seq data correctly assigns cell types from primary donor samples After thoroughly assessing the technical performance of scONE-seq, we next applied it to known biologically heterogeneous samples to evaluate whether it can accurately identify cellular subtypes within a mixed population. To do so, we performed scONE-seq on four different cell lines, as well as on a primary peripheral blood mononuclear cell (PMBC) sample from a healthy donor.
First, we analyzed the cell-line dataset containing 86 HCT116 cells, 143 NPC43 cells, 37 HUVEC cells, and 17 H9 cells to check for accurate cell-type assignment. With unsupervised graph-based clustering of the RNA expression data, cells from the same cell lines successfully clustered together (Fig. 2a). We also checked the gene markers for each cell line (Fig. 2b) and notably, several well-studied gene markers for these cell lines are found in the scONE-seq dataset ( Fig. 2b, Supplementary Fig. 2a).
Next, we used lymphocytes from PBMC to test scONE-seq cell-types clustering accuracy in primary samples. We prepared sequencing libraries with scONE-seq and SS2 from the same PBMC sample for comparison. After quality control ltering to remove low-quality cells, we collected 200 cells for scONE-seq and 194 cells for SS2. With unsupervised graph-based clustering, we found no difference in the cell-type composition between the two methods ( Fig. 2c,e, P = 0.5109, Chi-squared test). After clustering, we annotated the cell types using known lymphocyte markers [44][45][46] (Fig. 2d,f, and Supplementary Fig. 2b): B cells distinguished by CD19 and MS4A1 (CD20); T cells characterized by CD3E; Less differentiated T cells (Naïve and memory T cells) identi ed by SELL (CD62L), CCR7, and LEF1; CD4+ T cells characterized by CD4; CD8+ T cells characterized by CD8A and CD8B; and cytotoxic T cells distinguished by PRF1 and NKG7. Notably, cytotoxic T cells comprise both Gamma delta T cells (γδ T cell; expressing TRDC, TRGC1, and TRGC2; Supplementary Fig. 2c) as well as effector memory T cells (TEM; lacking CCR7 expression and positively expressing IL2RB) ( Supplementary Fig. 2b). We surmised that these are TEM rather than effector T cells since the sample is from a healthy donor. Beyond that, in the scONE-seq dataset, we also captured some regulatory T cells (Treg cell; FOXP3+, CCR4+) ( Supplementary Fig. 2d) and detected several non-polyA genes including PZP and SESN3 whose expressions in T-cells have previously been described 47 (Supplementary Fig. 2e); these features were not found in the SS2 dataset.
These results collectively demonstrate that scONE-seq RNA data can accurately capture the biological variation within a heterogeneous sample. scONE-seq data identi es distinct clones in different samples The analysis above shows the feasibility of scONE-seq for the cell-type assignment using RNA data. Next, we evaluated the performance of clone identi cation with scONE-seq WGS data. Here, we utilized scONE-seq WGS data that was obtained simultaneously from the cell lines used in the previous celltype assignments analysis, and delineated the CNAs clonal structure of all four cell lines, followed by hierarchical clustering with their integer copy number pro les (Fig. 2g). From this analysis, we see that HCT116 maintains a relatively homogeneous clonal composition, whereas NPC43, a primary patient-derived cell line that shows strong genome instability, is comprised of 3 main clones (Fig. 2g). Furthermore, the CNAs structure of these 3 clones differ substantially compared to when the cell line was rst established 48 (Fig. 2h, Supplementary Fig. 3a, b, c, d), especially in chromosomes 1, 3, 4, 6, 7, and 11. Correspondingly, distinctions between clones are mainly found in chromosomes 1, 3, 7, and 11 ( Supplementary Fig.3a, b, c, d). Based on this observation, the change in chromosome copy numbers during cell culturing of primary cell lines could be a common phenomenon in cell lines with abundant CNAs and unstable genomes. Studies have shown extensive genetic variation across different cell culture lines, and that single cells from some cell lines can give rise to populations with multiple clones due to genome instability 21,49 . Additionally, with the matched transcriptomes of every single cell and their corresponding copy number states, we mapped the clonal information to the transcriptome UMAP for NPC43 and found that the CNAs in NPC43 did not impact the transcriptome state dramatically ( Supplementary Fig. 3e). This demonstrates that scONE-seq can identify both phenotype and genotype states for each individual cell.
Dissecting the clonal structure and cell-type subpopulations of an IDH1-mutant astrocytoma Gliomas, especially high grade ones, are some of the most aggressive malignant tumors originating in the brain 50,51 . When studying gliomas or other brain tissues using single-cell technology, it is challenging to obtain intact dissociated whole single cells, especially cells with complex morphology, and could lead to biases in cell-type sampling 52 . As such, for brain single-cell pro ling, single nucleus isolation is more widely used. To pro le both the genotypic and phenotypic heterogeneity in a biobanked sample, we apply scONE-seq on single nuclei isolated from a snap-frozen astrocytoma specimen that has been stored since the surgery for two years prior to analysis: a second recurrent (2R) astrocytoma sample with IDH1 (R132H), TP53 (P278S), and ATRX (R781*) mutations ( Supplementary Fig. 4a). The primary (P) and rst recurrent (1R) samples were limited in quantity, and subject to whole exome sequencing (WES) and RNA sequencing in bulk 53 (Fig. 3a). In total, we used scONE-seq to pro le over 1,200 nuclei, including 1,210 scRNA datasets, 1,089 scWGS datasets, generating 908 passed-QC paired DNA and RNA datasets.
First, we delineated the clonal structure of this 2R astrocytoma sample. Using dimension reduction on scWGS data, we clustered cells into four distinct genomic states (Fig. 3b), consisting of one cluster of normal cells and three tumor clones ( Supplementary Fig. 5a). Whole genome duplication was also found in this tumor and validated by measuring each cell's DAPI intensity using ow cytometry ( Supplementary Fig. 5b). We were also able to identify aneuploidy and loss of heterozygosity in multiple loci using the B-allele frequency estimation ( Supplementary Fig. 5c, d), and the integer copy number of cells was subsequently calculated (Fig. 3c). Based on the genomic pro le of each clone and the WES data from the primary and recurrent tumors, the phylogenetic tree of the patient was constructed. The 2R clone 1 was found to be closer to the root (normal cell), with fewer loss of heterozygosity (LOH) events, and it has very similar genome alterations as the primary tumor WES data (Fig. 3d, e; Supplementary Fig. 5e). The 2R clone 2 and clone 3 harbor many of the same deletion regions as the 1R tumor, resulting in LOH ( Fig. 3e; Supplementary Fig. 5e). We surveyed the known driver genes and found BRAF, MET, and MYC to be ampli ed in all the 2R clones [53][54][55] (Fig. 3b). Several key deletion events were found to only occur in the 2R clone 2 and clone 3, including deletion of CDKN2A, and PTEN; in particular, the homozygous deletion of CDKN2A is a known prognostic factor for IDH mutant astrocytomas, and was recently incorporated into the World Health Organization central nervous system tumors classi cation scheme as a su cient criterion for classifying cases as Grade 4, even when classical histology criteria are not ful lled 56 (Supplementary Fig. 5f).
Next, we analyzed the RNA data from this dataset. First, we performed unsupervised graph-based clustering on scONE-seq RNA data, obtaining multiple cell clusters that were then annotated based on their RNA markers. We found this tumor contains tumor-associated macrophages (TAM), neurons, astrocytes, oligodendrocytes, and tumor cells de ned by their canonical cell type gene signatures (Fig. 4a, b). The complex tumor microenvironment (TME) indicates the highly in ltrated phenotype of this recurrence tumor. The tumor cells -those that have transcriptomes classi ed as tumor-like -display high EGFR expression, a well-known feature of high grade gliomas. These tumor cells can further be subset into 4 cellular states based on meta-module scores described by Neftel et al. 13 (Fig. 4a): oligodendrocyte progenitor cell-like (OPC-like), neural progenitor cell-like (NPC-like), mesenchymal-like (MES-like), and astrocyte-like (AC-like) cellular states.
In addition to the phylogenetic tree obtained from DNA data, which dissects the clonality, we are also able to use paired RNA data to superimpose the cell-type information onto the clonal information to identify clonal subpopulations with unique functional, phenotypic features. To do so, we mapped the clonal information to the RNA UMAP to visualize the clonal distribution among different cell types (Fig. 4c). The 2R clone 3 is the major clone of this tumor and is differentiated into all 4 tumor phenotypes: OPC-like, NPC-like, MES-like, and AC-like cellular states. The 2R clone 2 consists predominantly of AC-like cells. The 2R clone 1 is the most interesting: using RNA data alone, all cells from this clone were clustered with normal astrocytes, indicating transcriptome similarity between 2R clone 1 and normal astrocytes that is indistinguishable using scRNA-seq data alone; but upon superimposing matched genotype and phenotype information, this unique population of astrocyte-like tumor cells with clearly abnormal genotype as compared to the true normal cells are revealed (Fig. 4c). This highlights the importance of directly measuring copy number pro les as a standard to identify cancer cells and is particularly important in the study of tumor initiation for capturing the normal-to-cancer transition.
Next, we sought to validate the presence of this clone within the tumor and to investigate its potential role in the tumor microenvironment.
Characterization of a unique tumor clone with normal astrocyte-like phenotype To verify the existence of clone 1 cells, we rst identi ed gene markers unique to clone 1, including XIST, RFX3, ADCY8, and GRIA1, which can distinguish them from other tumor and normal cells (Fig. 4d, Supplementary Fig. 6a). These markers were also found in the droplet-based snRNA-seq dataset to label a putative clone 1 population, also adjacent to normal astrocytes ( Supplementary Fig. 6b, c, d). As validation, we integrated the scONE-seq RNA dataset and 10X dataset to show that our scONE-seq dataset of 1210 nuclei exhaustively captured all cell types that were observed in droplet snRNA-seq of 4416 nuclei and showed that clone 1 merged with the putative clone 1 cells from the droplet-based snRNA-seq dataset (Fig. 4e).
Then, we performed histological analyses on FFPE sections from both the primary and 2R tumors of this patient to identify the cells with IDH1 (R132H) and ADCY8. Anti-IDH1 (R132H) is expected to label all tumor cells, as this is known to be a somatic mutation in IDH-mutant glioma cells 55,57,58 ( Supplementary   Fig. 4a), and anti-ADCY8 is expected to mark some normal neurons and normal astrocytes in addition to clone 1 cells (Fig. 4d, Supplementary Fig. 6e). As such, putative clone 1 cells are those cells marked by double-positive staining of IDH1 (R132H) and ADCY8. First, we looked at the overall staining pattern across the whole slide section and noted that the IDH1 (R132H) positive tumor cells are distributed over the entire section for both the primary and 2R tumors.
The ADCY8 signals appear stronger in 2R tumor sections and are speci cally concentrated to certain regions which also express IDH (R132H) more strongly (Supplementary Fig. 7a). Interestingly, these ADCY8 positive regions are always near the IDH1 (R132H) negative 'normal adjacent' regions ( Supplementary Fig.   7a). The double-positive cells that we suspect to be the putative clone 1 cells appear to be near other normal and malignant cells (Fig. 5a). These histological immune staining results provided additional details on the spatial distribution of putative clone 1 cells in the tumor sections.  64,65 . Interestingly, we found the GRIA1 genes to be differentially expressed between the different tumor clones in our sample (Fig.  5b). The major clone, clone 3, expressed GRIA2-4 and does not express GRIA1; clone 1, however, is the only tumor subpopulation that expresses GRIA1 and all three other GRIA family genes are expressed at much lower levels. The gene expression feature of clone 1 generally resembles astrocytes, including the expression of the astrocytic marker APOE (Supplementary Fig. 7b), but normal astrocytes do not express GRIA1 (Fig. 5b, Supplementary Fig. 7b). Next, we also performed ligand-receptor analysis for the different subpopulations and found TGFβ signaling transcripts to be strongly and speci cally expressed in normal astrocytes, clone 1 cells, and TAMs ( Supplementary Fig. 7c), with clone 1 cells expressing the ligand and predominantly the TAMs expressing the receptor (Fig.   5c). These observations hint at the potential role of clone 1 cells within the tumor, and future studies expanded to additional patient samples will be necessary to determine the generalizability of these phenomena.

Discussion
We have developed the versatile scONE-seq method, which is comparable with previous methods in multiple aspects including sensitivity and accuracy, and surpasses previous methods on throughput, versatility, and sample applicability. Most importantly, we designed this technology to be compatible with frozen tissue samples that have been stored for years. This feature makes it easier to plan and perform larger-scale clinical multi-omic single-cell studies in two ways: First, by enabling studies on existing biobanked samples, which we have demonstrated herein; Second, for studies on new samples, it also removes the burden of having to immediately process tissues from clinical researchers whose priority is patient care. The application of scONE-seq on frozen tumor tissue in our study demonstrates a comprehensive investigation of the genotypic and molecular phenotypic heterogeneity of astrocytoma. By superimposing the multi-omic data, we directly observed and characterized the differentiated tumor clones, which supports the idea that tumor clones can produce a differentiation hierarchy 7,66,67 . In addition, we found a unique tumor clone in this patient's 2R astrocytoma with a very similar transcriptomic phenotype as normal astrocytes. Putative clone 1 cells expressing the same gene markers as clone 1 were found using both independent 10X Genomics snRNA-seq as well as immunostaining on tissue sections. Incorporating the histological and transcriptomic information of clone 1 cells, we propose some potential roles of these cells in the TME, including cell-cell communication via both signaling molecules and potential tumor-neuron synapse formation. Speci cally, we found clone 1 to be the only tumor subpopulation that expresses GRIA1, an AMPA receptor encoding gene, while all the other tumor clones only express other GRIA genes. GRIA1-encoded GluA1 subunit often forms GluA1 homomer AMPARs, which are calcium-permeable and broadly found in synapses in early development 65 . Calcium-permeable AMPARs are a key signaling molecule in the tumor-neuron synapse, and the maintenance of long-term potentiation is also known to be modulated through post-translational modi cation of GluA1, making GluA1 critical for neural plasticity in the brain 65,68 . The unique, differential expression of GRIA1 in clone 1 hints at a unique, possibly multi-faceted role for clone 1 in cell-cell communication in the tumor microenvironment 7,62,63 . We also found TGFβ signaling expression to be prominently expressed in normal astrocytes, clone 1 cells, and TAMs. TGFβ signaling in glioblastomas can promote invasiveness, angiogenesis, and plays a role in the suppression of the immune system; it is a therapeutic target for the treatment of gliomas. The expression of TGFβ pathway-associated genes by clone 1 cells is comparable to that of the normal astrocytes within the tumor, raising intriguing questions about the role of clone 1 in tumor immune modulation, and its possible interactions with TAMs. These features of the tumor microenvironment and whether clone 1 cells are a commonly occurring cell-type in gliomas or speci cally astrocytomas will be an interesting avenue of future investigation on a larger sample set. Collectively from these results, we posit that scRNA-seq-only based cancer studies could underestimate important layers of tumor heterogeneity, and that simultaneous direct DNA measurement could contribute meaningful and informative insight on tumor evolution. Meanwhile, the clonal analysis based on scWGS-only data also ignores the complex interactions within a tumor microenvironment. By deciphering the genetic and phenotypic heterogeneity within the tumor ecosystem with scONE-seq, we can reveal the interplays of clonal expansion, tumor cells differentiation hierarchy, and TME. We anticipate that future large-scale studies on more samples will be even more informative, and has the potential to inform our knowledge of cancer and therapeutics in a more generalizable manner.
Initially, scONE-seq suffered cross-contamination from DNA to RNA due to mislabeling of barcodes; this problem was resolved by optimizing the reverse transcription temperature and ampli cation extension time (Supplementary Table 1). Further, we checked the mapping region of the RNA data from single cells with DNA and RNA co-pro les and single cells with only RNA pro les. The exon mapping proportion difference is inconsequential (Supplementary Fig.   1e), suggesting the mislabeled molecules are a negligible fraction of the total sequenced reads. Further benchmarking and validation experiments with different sample types demonstrated that scONE-seq data correctly identi es cell types and clones simultaneously.
Bene tting from its one-tube reaction system, scONE-seq possesses very high scalability, and a higher throughput version can be achieved by simply adapting this method to a robotic liquid handler. Alternatively, producing scONE-seq and droplet-based single-cell data in parallel and then integrating them, is also a useful complementary, multi-omics approach to study cancer with high throughput. Moreover, additional processing can be easily added to the scONE-seq work ow to enable pro ling of more layers of information: To detect chromatin accessibility simultaneously, an additional nuclei tagmentation step [69][70][71] with customized ATAC adaptors could be added before FACS sorting; Similarly, quantitative protein estimation 72 could be achieved by using DNA-barcoded antibodies before single-cell sorting steps of the scONE-seq; and by jointly performing whole-exome capturing or any hybridized target sequencing panels with the standard scONE-seq library, paired high-depth single-cell somatic mutation information could also be integrated into the scONE-seq dataset. In general, scONE-seq has very strong scalability to measure different signals from a cell. We expect that scONE-seq will be a powerful tool to dissect cellular heterogeneity and will inspire other ultra-high-throughput single-cell multi-omics methods development.
Fresh whole blood was taken in the HKUST clinic from a healthy human donor. Lymphocytes were isolated via Ficoll-Paque PLUS (GE Healthcare) density centrifugation. The red blood cells were removed with 1X Red Blood Cell lysis buffer (Thermo Fisher).
Cells or nuclei were then loaded to Aria III ow cytometer (BD Biosciences) to sort single cells into PCR tubes (96 or 384 PCR plates) containing lysis buffer.
Generation of scONE-seq libraries. To start the scONE-seq pre-ampli cation, the proteinase K (Sigma) was used to completely lyse cells or nuclei. Tagmentation reaction was performed to fragment the genome DNA and add the DNA-speci c barcode. This reaction includes the following components, 6 mM MgCl 2 , 0.5 mM dNTP (NEB), 8.5 mM TAPs-NaOH, 1.5 U/µl RNase Inhibitor, 0.05 U KAPA polymerase (Roche), 8 % PEG8000, and One-Tn5 35,36 with customed adaptor (GTCTCGTGGGCTCGGTCATGN 5 AGATGTGTATAAGAGA CAG) (Novoprotein). The reaction was incubated at 55°C for 10 mins followed by 72 °C for 10 mins. Then, proteinase K or thermolabile proteinase K (NEB) was used to deactivate the enzyme in the buffer. Thereafter, we performed reverse Then, the terminal transferase (NEB) was used to add the C-tail to cDNA fragments. This reaction was performed at 37°C for 5 mins and the enzyme was immediately deactivated with thermolabile proteinase K. Second strand synthesis was then performed by adding, 0. and puri ed with Ampure XP beads. scDASH protocol was then used to remove the abundant ribosome and mitochondrial sequences 75,76 . Double size selection can be performed to optimize the library size. The library was then sequenced on Illumina NextSeq500 with customized sequencing primers (Supplementary Table 2). DNA and RNA data separation. Sequencing data were rstly ltered with fastp 77 . Fastq les were then separated into DNA fastq les, RNA fastq les, and Unmatched fastq les with seqkit, seqtk, and bbduk [78][79][80] . During this process, UMI of the reads was extracted and labeled to fastq les head with fastp 77 .
If only performing the counts-based copy number variation analysis, Ginkgo was used to generate the normalized counts 43 . If performing the allele-speci c copy number variation analysis, CHISEL was used to generate allele frequency information 85,86 . The integer copy number calculation was based on the previous studies 21,22,87 . In this pipeline, the segmentation was performed with copynumber and aCGH in R 88 .
WES data from different tumor samples were analyzed with CNVkit and THetA to infer the copy number pro les [89][90][91] . The integer copy number data from WES and scONE-seq were combined to perform the phylogenetic reconstruction, adopted from a previous method 21,92,93 .
RNA data analysis. UMI-based deduplication was also performed with RNA fastq les. The work ow kept the same except replacing the BWA with STAR 94 .
Then, the fastq les can be quanti ed with Kallisto 95 (cDNA quanti cation) or Salmon 96 (pre-mature RNA quanti cation). 10x snRNA-seq data was quanti ed with kb-python 97 . The expression data were analyzed using Seurat with sctransform pipeline (normalization, dimension reduction, dataset integration, nding clusters, differential gene analysis) [98][99][100] . The astrocytoma cellular states scoring was performed following the original paper 13 . The ligand-receptor analysis was performed with CellChat 101 .
Visualization. Plots were created using the ggplot2 and ggtrees R package [102][103][104] . Heatmaps were created with the ComplexHeatmap package 105  IHC analysis. Slides were obtained from Prince of Wales Hospital. Xylene and ethanol were used to remove wax. Antigen retrieval was performed with Sodium Citrate Buffer (Thermo Fisher) at 98°C for 15 min. Blocking was performed in 10% normal serum (goat and donkey, Abcam) with 1% BSA in PBST buffer (0.05% Triton X-100). IDH1(R132H) antibody (Dianova, 1:40) and ADCY8 antibody (Abcam, 1:200) were added to slides and incubated at 4°C overnight in a humid box. Secondary antibodies (1:400,anti-mouse, anti-rabbit, Thermo Fisher) were used to provide the uorescent signal. Mounting buffer with DAPI (Abcam) was used to stain the nucleus and retain uorescence. Images were taken with Zeiss Axio Scan.Z1 Slide Scanner with 20X objective (Zeiss). Figure 1 Overview of scONE-seq: schematic and benchmark. a, Work ow of scONE-seq. DNA and RNA barcodes are added by Tn5 with custom adaptor and RT primers respectively. P7-Illumina sequencing primer is included in this pre-ampli cation process. P5-Illumina sequencing primer is added in the later library construction step with a second Tn5 tagmentation custom adaptor. b, Gene detection sensitivity for scONE-seq (n = 86, HCT116 cells) and Smart-seq2 (n = 94, HCT116 cells), downsampled to similar sequencing depth (0.2 million mapped reads) (P < 2x10-16, t-test). Shown are numbers of genes detected over 1 count. c, Gene body coverage for scONE-seq (n = 86) and Smart-seq2 (n = 94). Cell-cell variations are shown with error areas. d, Accuracy across mock samples for scONE-seq (RNA control, DNA+RNA group) and Smart-seq2. Pearson correlations were calculated from log-transformed TPM values. e, The Lorenz curve of bulk and scONE-seq single-cell data. Percentiles of the genome covered are plot against the cumulative fraction of reads. A perfect coverage uniformity results in a straight line with slope 1. f, g, h, Dots plot with normalized counts across the genome and solid line plots for corresponding estimated integer copy number. Ampli cation regions are highlighted with red; deletion regions are highlighted with light blue. Data from the bulk HCT116 whole genome sequencing (e), the HCT116 scONE-seq pseudo-bulk data (f, n = 86), and a single-cell HCT116 scONE-seq data (g). Figure 2 scONE-seq cell-types classi cation and CNAs clone identi cation. a, UMAP of scONE-seq cell lines RNA data, cells from same the cell line are clustered together. b, Differential expression genes (DEGs) heatmap; DEGs separate the cells based on their cell type; common markers for these cell lines are labeled in the heatmap. c, e, UMAP of lymphocytes RNA data from scONE-seq (c) and Smart-seq2 (e); cell-types annotations are based on known markers of immune cells. Cell-type composition shows no difference between two datasets (P = 0.5109, Chi-squared test). d, f, Dot plots of markers used for cell-types annotations with the scONE-seq dataset (d) and Smart-seq2 dataset (f). g, Copy number pro les calculated with scONE-seq cell line DNA dataset; cells are organized by hierarchical clustering (normal n = 27; HCT116 n = 48; NPC43 n = 108). h, The minimum evolution tree with diploid as root; NPC43 cells used in this study acquired more CNAs compared with the genome state when the cell line was established. Unit shows the Manhattan distance. Figure 3 scONE-seq reveals the clonal composition of the IDH1-mutant astrocytoma. a, Schematic showing the patient history for the sample used in this study. The patient has been diagnosed as IDH1-mutant (Grade IV) astrocytoma. Surgery was performed to excise the tumor, and concurrent chemoradiotherapy (CCRT) was then applied. The recurrences of the tumor were excised without further drug treatment. Tumor specimens were snap-frozen in the liquid nitrogen tank and stored for two years before subject to nuclei extraction. b, UMAP of scONE-seq DNA copy number data shows the 4 genome states in this IDH1-mutant astrocytoma sample. 2R -second recurrence. c, Heatmap of integer copy numbers of all the 2R astrocytoma cells pro led; 3 clones with different copy number pro les are observed. The bottom annotation bar represents the CNAs in this tumor sample. Some commonly known glioma/astrocytoma driver genes are shown. Ampli ed genes are highlighted with red; deleted genes are highlighted with dark blue. d, The minimum evolution tree with diploid as root;

Declarations
the WES data inferred CNV from the same patient were integrated to show the evolutionary relationship between the tumor recurrences and their various clones. The P clones are inferred from bulk WES data (see Methods). P -primary; 1R -rst recurrence. e, The top panel shows the dots plot with normalized counts across the human genome and the solid line representing the estimated integer copy number. The bottom panel shows the mirrored BAF dots plot across the genome. Relatively ampli ed regions are highlighted with red; relatively deleted regions are highlighted with blue. If dots are close to the red belt in the mirrored BAF dots plot, this indicates that there are LOH in those regions. If dots are close to the blue belt in the mirrored BAF dots plot, this indicates that there are imbalanced haplotypes in that regions. The top bar highlights the LOH regions of the genome. The clonal pseudo-bulk genome information for each 2R clone is also shown. Figure 4 scONE-seq reveals the tumor microenvironment composition of an IDH1-mutant astrocytoma. a, UMAP of scONE-seq RNA data shows the TME composition cell types in this second recurrence IDH1-mutant astrocytoma sample. Tumor cells are classi ed into 4 cellular states based on their meta-module scores. b, Dot plot shows some markers for the annotation of cell types. c, UMAP of scONE-seq RNA data annotated with clonal information. The 2R clone 1 cells are clustered with normal astrocytes. d, Volcano plot shows the DEGs between the 2R clone 1 and clone 3 cells. Genes with higher expression in clone 1 cells are colored with red. Genes with higher expression in clone 3 are colored with blue. e, UMAP showing the integration of scONE-seq and 10x snRNA-seq dataset.
The integrated data (left) retains all cell types identi ed with scONE-seq or 10x snRNA-seq. The split-UMAP (right) shows adequate mixing of cell-types found separately by the 2 methods, with clone 1 cells (from scONE-seq) and putative clone 1 cells (from snRNA-seq) falling into the same integrated cluster.