Ensemble of Nucleic Acid Absolute Quantitation Modules for Accurate Copy Number Variation Detection and Targeted RNA Pro ling

Lucia Wu Rice University https://orcid.org/0000-0001-5051-9685 Peng Dai Rice University https://orcid.org/0000-0002-4581-3473 Michael Wang Rice University https://orcid.org/0000-0001-7009-6958 Sherry Chen Rice University Evan Cohen The University of Texas MD Anderson Cancer Center https://orcid.org/0000-0003-4192-9644 Gitanjali Jayachandran U.T. M.D. Anderson Cancer Center Jinny Zhang NuProbe USA Inc. Angela Serrano NuProbe USA Inc. Naoto Ueno M. D. Anderson Cancer Center https://orcid.org/0000-0002-0166-7275 JAMES REUBEN UT MD ANDERSON CANCER CENTER Carlos Barcenas The University of Texas MD Anderson Cancer Center David Zhang (  genomic.dave@gmail.com ) Rice University


Introduction
Quantitation of speci c nucleic acid sequences is the basis of many important biological applications; one example is gene ploidy calculation for detection of copy number variations (CNVs) [1][2][3] , which is one of the most frequently observed genetic biomarker types in cancer 4,5 . It is present in 3%-98% of tumor cases depending on the cancer type 6 , and is clinically relevant as prognostic markers and as therapeutic targets [7][8][9] . Additionally, quantitation of speci c RNA molecules for gene expression patterns pro ling re ects the state of a cell or tissue 10,11 and may reveal pathological mechanisms underlying diseases [12][13][14] .
Droplet digital PCR (ddPCR) is the gold standard for absolute quantitation of speci c nucleic acid sequences 15,16 . The quantitation precision enabled small fold change measurements in CNV detection.
The minimum copy number gain that can be distinguished from normal ploidy of 2.0 was improved from 3.0 using quantitative PCR (qPCR) to approximately 2.4 using ddPCR 17 , rendering ddPCR useful for CNV detection in clinical settings 18 . However, improved CNV sensitivity is still highly desired especially for cellfree DNA (cfDNA) samples in which tumor DNA (ctDNA) are signi cantly "diluted" by DNA from normal tissues 19,20 . The physical limitation from stochasticity in molecule sampling leads to actual number of observed DNA molecules and thus the observed ploidy deviating from the expected "true value" in CNV quantitation ( Supplementary Fig. S1). Poisson statistics can be used to model this sampling process: the standard deviation of a Poisson variable and the coe cient of variation (CV) is Thus, in principle, increasing the DNA input amount or the number of genomic sites to quantify in the same gene would improve the limit of detection (LoD). Because the DNA input is usually limited especially in plasmaderived cfDNA, to overcome the Poisson distribution problem, ensemble of quantitation modules to sample a large number of independent genomic loci on the same gene is required to further improve CNV sensitivity. Highly multiplexed ddPCR remains challenging due to limited uorescence channels.
As an alternative approach to ddPCR, we present Quantitative Amplicon Sequencing (QASeq), a PCRbased molecular barcoding NGS approach for accurate absolute quantitation which is compatible with high multiplexing. Herein, we demonstrate that 2-plex QASeq exhibited higher and more consistent conversion yield than ddPCR in absolute molecule count quantitation, and enables CNV quantitation accuracy similar to ddPCR. Multiplexed QASeq improves LoD to allow con dent distinguishing 2.05 ploidy from normal 2.00 ploidy, and is applied to longitudinal serial 57 plasma cfDNA samples from patients with metastatic ERBB2+ (HER2+) breast cancer. Finally, an RNA QASeq panel covering 20 genes are demonstrated on a wide range of RNA samples including tumor and placenta FFPE RNA.

Results
QASeq development and 2-plex demonstration. Unique molecular identi ers (UMIs) 21,22 are attached to individual input DNA strand via two cycles of PCR with long annealing time for high and uniform barcoding e ciency. After further ampli cation, NGS reads originating from the same input DNA strand carry the same UMI sequence and thus belong to the same UMI family. Therefore, the unique UMI family count represents the number of input DNA strands (Fig. 1a).
We demonstrated QASeq for absolute quantitation and copy number calculation using a 2-plex panel containing 2 quantitation modules in gene ERBB2 (target) and EIF2C1 (reference) respectively (Supplementary Sect. 2 and Sect. 6), and compared with ddPCR for the same 2 genes side-by-side using ve replicated experiments for both methods (Fig. 1b). QASeq exhibited higher and more consistent conversion yield than ddPCR, where conversion yield is the fraction of input molecules that are observed in the experiment. 10 ng Human PBMC gDNA from the same healthy donor was used per experiment, corresponding to 2,790 haploid copies. QASeq showed higher conversion yield (86% on average) than ddPCR (53% on average). The coe cient of variation (CV) of molecule count was lower for QASeq (5.0% for ERBB2, 2.5% for EIF2C1) than for ddPCR (12.8% for ERBB2, 13.3% for EIF2C1) in 5 replicates.
High dynamic range of DNA input was observed for absolute quantitation using 2-plex QASeq (Fig. 1c).
Observed ERBB2 molecule counts by QASeq were close to expected value calculated from DNA input amount because of high conversion yield. Lower conversion yield at 1 ng input DNA was possibly a result of material loss at low concentration.
ERBB2 ploidy calculated from QASeq was accurate and highly reproducible (Fig. 1d). ERBB2 ploidy was calculated as 2 × ERBB2 molecule counts / EIF2C1 molecule counts. The mean of 5 replicates was 1.98, which is close to normal ploidy of 2.00. The CV for calculated ERBB2 ploidy was 4.8% in ve replicates. Spike-in cell-line DNA samples with different expected ERBB2 ploidy were assayed by 2-plex QASeq and ddPCR, and high correlation in calculated ploidy was observed between the methods (Supplementary Fig.   S2).
The histogram for observed UMI family size distribution followed log-normal distribution after removal of small families (Fig. 1e), because in theory random yield difference in PCR e ciency was ampli ed exponentially in PCR cycles post UMI attachment. Large number of UMI families with UMI family size < 3 were observed. These small UMI families were not used for molecule counting as they are likely results of polymerase and sequencing errors.
Highly multiplexed QASeq for CNV detection. Combination of multiple absolute quantitation modules for CNV detection was demonstrated to overcome the stochasticity in sampling. Experimentally, we did observe that the stochastic error in copy number quantitation reduced as a function of number of quantitation modules in the gene (Fig. 2a). A QASeq panel with 175 modules was designed for ERBB2 CNV detection, with 49 quantitation modules in ERBB2 and 123 modules in other regions of human genome serving as the reference (Supplementary Sect. 3 and Sect. 6). The rest of 3 modules are in Chromosome X thus are not used in CNV analysis. Five technical replicates were conducted with 8.3 ng healthy PBMC gDNA per library. 1 to 49 modules were used for ERBB2 ploidy quantitation. The CV of ERBB2 ploidy in 5 replicated experiments was reduced from 3.25-0.69% as the number of modules increases, consistent with theory based on Poisson distribution.
Combination of multiple quantitation modules in QASeq allowed con dent discrimination between 2.05 and 2.00 ploidy ERBB2 samples using QASEq. 2.05 ploidy sample was prepared by mixing a normal PBMC DNA sample and ERBB2-positive cell line (SK-BR-3) DNA. Normal sample was tested in quadruplicates, and 2.05 ploidy sample was tested in duplicates.
Multiplexed QASeq for CNV in tumor tissue samples. QASeq was applied to 18 fresh/frozen (FF) tumor samples from 16 breast cancer patients and was compared with both ddPCR and immunohistochemistry (IHC) results on ERBB2. QASeq ERBB2 ploidy results were concordant with IHC and ddPCR from the tumor tissue ( Fig. 2cd) with potentially fewer false positives. ERBB2 ploidy from QASeq showed high correlation with ddPCR ( Supplementary Fig. S3). For the single sample with discordance between QASeq and IHC, ddPCR agreed with QASEq. For all the 3 samples with discordance between QASeq and ddPCR, ddPCR ploidy were between 2.5 and 3 and IHC agreed with QASeq for negative call. There was no case where IHC and ddPCR agreed on a call that con icted with QASeq results.
175-plex QASeq is theoretically equivalent to C(175, 2) = 15225 different ddPCR CNV assays ( Supplementary Fig. S4). To utilize quantitation modules beyond just calculating ERBB2 ploidy, modules in 'reference' were further grouped based on gene. Ploidy for 10 genes and 2 chromosomal regions were calculated from QASEq. To reduce the false positives of CNV calls in clinical samples and account for potential poor sample quality, sequential Mann-Whitney U tests on each gene of interest was performed ( Supplementary Fig. S5, see Supplementary Sect. 3 on data analysis for multiplexed QASeq with > 2 quantitation modules). As an example, ploidy of each of the 175 quantitation modules for a normal PBMC DNA sample and for an FF DNA sample from breast cancer tumor section was shown in Fig. 2e. In clinical sample analysis, the ploidy values for a gene will be reported as 2.00 if there is no statistical difference between gene of interest and reference by Mann-Whitney U test. We summarized the CNV results in all of 18 FF samples (Fig. 2f). The LoD for ERBB2 CNV was calculated from the ve technical replicates in healthy gDNA to be 1.97 ploidy for copy number loss and 2.04 for gain (Supplementary   Table S2 and Supplementary Sect. 3). Additionally, ERBB2 ploidy in PBMC DNA from 10 different healthy donors were assessed with 175-plex QASeq and ddPCR respectively for biological variability ( Supplementary Fig. S6). There is no sample with ploidy deviating from 2.00 for over 10%, but ddPCR showed wider ploidy range (1.8 to 2.1) than QASeq (1.9 to 2.0) in the 10 normal samples.
QASeq could improve clinical sensitivity in CNV assessment. The ploidy values of all observed gene ploidies in 18 tumor DNA samples are plotted as a histogram ( Supplementary Fig. S7). Using methods with LoD at 1.6 and 2.4 ploidy, such as ddPCR, 44% of the CNVs may be missed. Additionally, QASeq allows quantitation of mutations down to 0.1% variant allele frequency (VAF) with UMI error correction ( Supplementary Fig. S8). As an NGS-based quantitation method, sequence mutation calling is performed in addition to the copy number calculation. We designed QASeq panel amplicons to include hot spot mutations commonly observed in breast cancer (Supplementary Sect. 4 and Sect. 6).
Study of QASeq liquid biopsy results with disease progression in ERBB2+ (HER2+) metastatic breast cancer patients. QASeq breast cancer liquid biopsy panel (Supplementary Sect. 6) was applied to serial longitudinal plasma cell-free DNA (cfDNA) samples and was compared with disease progression. 57 plasma cfDNA samples from 15 patients with ERBB2 + metastatic breast cancer were tested by QASeq, with 2-8 samples per patient, where all patients had a baseline sample obtained at the moment of diagnosis of metastatic breast cancer, and follow-up samples were obtained at different timepoints for each patient. All patients had a diagnosis of ERBB2+ (IHC II + or III + or FISH+) metastatic breast cancer and the ERBB2 status was con rmed from a biopsy obtained from tumor tissue.
We summarized ERBB2 ploidy change in cfDNA and disease progression dates for each patient using a swimmer plot (Fig. 3a). There were 8 patients who developed disease progression who had a plasma sample collected within 6 months before or after the disease progression. ERBB2 ampli cation or increase of ERBB2 ploidy relative to the previous time point was observed in 6 out of the 8 patients who developed disease progression. In the other two patients (de-identi ed patient ID 2697 and 2366), disease progression was reported 4 times for each patient and abnormal ERBB2 CNV can only explain part of the disease progression. Signi cant allele frequency changes in PIK3CA G1049R mutation in patient 2697 and in SNP rs1309838194 in patient 2366 were correlated with disease progression respectively (Fig. 3b). PIK3CA G1049R mutation (COSV55874453) is considered to be structural damaging alteration as disease-causing drivers 23,24 in breast cancer. The VAF for PIK3CA G1049R in circulating cfDNA was increased by over 10-fold during follow-up, serving as evidence for increased tumor fraction. The PIK3CA mutation may have contributed to disease progression. In patient 2366, VAF for the SNP rs1309838194 in ERBB2 changed from around 50% (heterozygous at baseline time point) to 80% during follow-up, which indicated increased tumor-derived DNA in plasma and may be associated to disease progression. Since the overall ERBB2 ploidy in plasma was still normal, we think that the copy-neutral loss of heterozygosity (LOH) 25,26 may be present in the tumor, leading to SNP allelic imbalance. Taken the CNV and mutation results together, abnormal change was reported by QASeq in all the 8 patients who developed disease progression.
The 15 cases were classi ed according to whether or not there was abnormal molecular ndings in QASeq liquid biopsy and whether or not disease progression was clinically observed (Fig. 3c). Chi-square test suggested that the QASeq result and progression are not statistically independent (p = 0.038). Although all cases with disease progression were featured with QASeq abnormal ndings showing good sensitivity, increase of ERBB2 ploidy was also observed in 4 patients who did not develop disease progression within this time frame. Because all patients presented here were treated with ERBB2-targeted therapy trastuzumab and pertuzumab until disease progression, molecular change may not translate to clinically observed disease progression in all cases.
Furthermore, tumor ERBB2 ploidy was inferred from plasma QASeq results and was compared with FISH. Because both CNV and mutation information are available from QASeq, tumor fraction in plasma cfDNA can be estimated based on VAF of tumor mutation; tumor ploidy can be calculated with plasma ploidy and tumor fraction. We demonstrated this normalization in two cases (Fig. 3d), where pathogenic mutation was observed in baseline cfDNA with VAF between 1% and 30%. Tumor FISH results were collected at three time points in case 1834. QASeq detected ERBB2 ampli cation in plasma cfDNA 5 months earlier than FISH from tumor tissue. In addition, the inferred tumor ERBB2 ploidy by QASeq was consistent with the available tumor ploidy from FISH. The inferred tumor ERBB2 ploidy was generally stable in both of the two patients, so ERBB2 ploidy change in cfDNA was in uenced by tumor fraction. Based on the correlation of QASeq results with progression and FISH results, we envision non-invasive and sensitive longitudinal study of CNV and mutation change in plasma by QASeq can help with understanding disease progression and resistance mechanism. RNA QASeq for gene expression level quantitation. Next, we demonstrated QASeq technology for RNA quantitation in a variety of samples including tumor tissue FFPE RNA, total blood RNA and total liver RNA. RNA sample was reverse transcribed to cDNA as input for QASEq. Random hexamer was chosen as reverse transcription primer to be compatible with low-quality fragmented FFPE RNA. A targeted multigene breast cancer panel covering 78 amplicons in 15 cancer-related and 5 reference genes similar to Oncotype DX 27 panel was built (Supplementary Sect. 5 and Sect. 6). Expression of each gene is calculated from the molecule count of each amplicon, based on UMI count and conversion yield, and is further normalized relative to the expression level of the 5 reference genes in log2 scale ( Supplementary  Fig. S9).
The RNA quantitation accuracy was rstly validated using ERCC RNA spike-in mix. 16 ERCC sequences were targeted with 16 amplicons. The ERCC RNA sample was diluted and mixed with commercial human total liver RNA for a nal expected molecule count between 3 and 100,000. The observed molecule count showed good correlation with the expected (Fig. 4a). QASeq quantitation for RNA was across ve orders of magnitude and as few as 3 expected molecules were detected.
Reproducibility for expression level relative to reference genes was evaluated. Total liver RNA was assayed with breast cancer panel in replicates and consistent expression level was observed (Fig. 4b). We observed that multiple quantitation modules (amplicons) per gene reduced quantitation variability in expression level. The standard deviation for relative expression level in triplicate experiments became lower as the number of amplicons per gene increased from 1 to 5 (Fig. 4c), with median standard deviation reduced from 0.44 to 0.21. Outlier was only observed when only 1 amplicon is considered.
RNA Expression level from QASeq was extensively compared with other technologies including RNAseq 28 , NanoString nCounter 29 , Microarray 30 and RT-qPCR using FFPE RNA from breast cancer and lung cancer tissue. The expression level was normalized in the same way relative to the 5 reference genes for all the methods, and was summarized in Fig. 5d for a breast cancer FFPE RNA. RNA QASeq is consistent with RNAseq and NanoString nCounter. Microarrary, however, showed poor correlation with any of the other methods. RNA QASeq was further compared with these technologies in a couple other samples ( Supplementary Fig. S10-13). Nanostring showed high correlation with QASeq in all samples, but required much higher input amount than RNA QASEq. Low expression level species are dropped out at 10 ng as compared to the typical 150 ng input ( Supplementary Fig. S14). Microarray showed poor concordance with both QASeq and nanostring in all samples (Supplementary Fig. S13 and S15). QASeq was consistent with RNAseq in most samples. However, because RNAseq was a non-targeted approach, most reads were wasted on genes of no interest and coverage uniformity issue led to poor robustness for the quantitation of lowly expressed genes as it was observed in the two FFPE samples (Supplementary Fig.   S11). RT-qPCR is consistent with UMI-based QASeq quantitation, but is limited by low multiplexing ability. We summarized the relative expression level in four clinical FFPE and 3 normal placenta FFPE samples (Fig. 4d). Hierarchical clustering indicated the expression patterns were the most similar between normal placenta samples.

Discussion
QASeq developed in this work provides an accurate absolute nucleic acid quantitation method that can be conveniently scaled up to high multiplexing, thus overcoming sampling error from Poisson distribution for CNV detection. We demonstrated con dent distinguishment of 2.05 ploidy from 2.00 ploidy. Absolute quantitation is based on a highly e cient PCR-based barcoding approach. Based on Poisson distribution theory and our observation, QASeq allows the construction of targeted panels with adjustable CNV sensitivity for each gene by changing the number of amplicons that cover each gene of interest.
Robust, consistent conversion yield and high number of quantitation modules are signi cant for QASEq. We chose to perform two cycles of PCR-based barcoding with long annealing time, because the fraction of DNA molecules in a sample represented in the nal NGS library is low for ligation-based UMI attachment. As the number of primer pairs increases for a multiplex PCR ampli cation, there is a combinatorial explosion of potential primer dimers and non-speci c genomic ampli cation. The dimer problem is more complicated when UMIs must be incorporated in the multiplex PCR. By combining simulated annealing design using dimer likelihood estimation (SADDLE) 31 , a primer set optimization software developed in our lab, with a nested protocol, about 60% on-target rate is maintained even in the liquid biopsy panel with 223 modules.
The most accessible sample type for cancer monitoring is cfDNA derived from plasma. However, CNV monitoring using cfDNA is understudied comparing to mutation detection 32 . Common CNV detection methods Immunohistochemistry and in situ hybridization are limited to tissue/cell samples, and not applicable to non-invasive cfDNA. Other cfDNA-compatible methods including ddPCR, NGS panels such as FoundationOne and Guardant 360, and microarray are not sensitive enough, all requiring > 25% heterozygous single copy loss or gain for detection which corresponds to 1.75 ploidy for loss and 2.25 ploidy for gain. The high CNV sensitivity of QASeq may allow better clinical sensitivity of copy number changes.
As a proof-of-concept demonstration, QASeq is used to infer tumor gene ploidy from cfDNA when tumor mutation is observed in two clinical cases. A single gene ploidy number in cfDNA may not be actionable because it is decided by both circulating tumor DNA (ctDNA) fraction and CNV in cancer cells, not to re ect any of the two explicitly. As an example, 1% tumor fraction in cfDNA with tumor copy number of 20, or 6% tumor fraction with tumor copy number of 5 has the same cfDNA overall ploidy (2.18), but the former may have better outcome to HER2-targeted therapy. Deconvolution of tumor CNV from cfDNA will be complicated when tumor tissue is highly heterogeneous. Clonal mutation, the mutation that is present in all cancer cells within tumors, needs to be identi ed in this case to infer the average gene ploidy in tumor.
We identi ed normal ERBB2 ploidy but signi cant SNP allelic imbalance in ERBB2 in one patient, which may be due to copy-neutral LOH in tumor. QASeq breast cancer panel covers hotspot cancer mutations while designing multiple regions in gene of interest so not many SNPs are included. Inspired by this case study, we can intentionally add quantitation modules for SNP calling in gene of interest to better identify allelic imbalance for higher con dence of LOH detection.

Methods
Oligonucleotides and Reagents. All oligonucleotides were purchased from Integrated DNA Technologies (100uM in IDTE, pH 8.0). Oligonucleotide sequences are provided in Supplementary Section 6. Primers in the 2-plex QASeq panel are dual-HPLC puri ed; primers in other panels are standard-desalted. Phusion High-Fidelity DNA polymerase and deoxynucleoside triphosphates (dNTPs) were purchased from New England Biolabs. PowerUp SYBR Green Master Mix was purchased from Thermo Fisher Scienti c. iTaq Universal SYBR Green Supermix was puchased from Bio-Rad Laboratories. AMPure XP was purchased from Beckman Coulter. NGS index primers (NEBNext Multiplex Oligos for Illumina) was purchased from New England Biolabs. QASeq protocol. Library preparation consists of three PCR reactions: UMI PCR, nested PCR and index PCR, all performed on a T100 Thermal Cycler (Bio-Rad). In UMI PCR, DNA sample was mixed with 1U Phusion High-Fidelity DNA polymerase, Phusion HF buffer, forward and outer reverse primers (15 nM each), and dNTPs (0.2 mM each) to reach a total volume of 50 μL.
Thermal cycling started with 30 s at 98 ˚C, followed by 2 cycles of 10 s at 98 ˚C, 30 min at 63 ˚C and 15 s at 72 ˚C, and then 2 cycles of 10 s at 98 ˚C, 15 s at 63 ˚C and 15 s at 72 ˚C, nally 5 cycles of 10 s at 98 ˚C and 30 s at 71 ˚C. During the last 5 min of the second 30 min at 63˚C, 1.5 μM of each universal primer was added while keeping the reactions inside the thermal cycler. After UMI PCR, 1.6X AMPure XP beads puri cation was performed.
In nested PCR, the eluate from the previous step was mixed with PowerUp SYBR Green Master Mix (1X nal concentration) and 15 nM each inner reverse primer. Thermal cycling started with 3 min at 95 ˚C, followed by 2 cycles of 10 s at 95 ˚C and 30 min at 60 ˚C. The PCR product was puri ed by 1.6X AMPure XP beads.
Next, index PCR was performed; the eluate from the previous step was mixed with iTaq Universal SYBR Green Supermix (1X nal concentration) and 250 nM each NEBNext index primers. Thermal cycling started with a 3 min incubation step at 95 ˚C, followed by 25 cycles of 10 s at 95 ˚C and 30 s at 65 ˚C, and nally 2 min at 65 ˚C. After index PCR, double-side size selection (0.4X, 0.4X ratio) was performed. Libraries were normalized and loaded onto an Illumina sequencer. DNA extracted from FF or blood samples was sheared to 150 bp peak length using Covaris LE220 Focused Ultrasonicator before library preparation.
In RNA QASeq, RNA sample was rstly reverse transcribed to cDNA as input for QASeq protocol. RNA was mixed with dNTP (0.5 mM), Murine RNase Inhibitor (8 U), M-MuLv buffer (1X), M-MuLV Reverse Transcriptase (8 U), and random hexamer (6 µM). The mixture was incubated at 25 ˚C for 5mins, at 42 ˚C for 60mins, and then inactivated at 65 ˚C for 20mins. The reaction mixture was directly used as input for UMI PCR without puri cation.
Sequencing was performed on HiSeq or NextSeq (Illumina) with 2X 150 bp paired-end reads and dual 8 bp index. NGS data processing. NGS adapter sequences were rst removed from FASTQ data using custom Python code; alignment was performed using Bowtie2 software 33 . UMI grouping and CNV analysis were performed using custom Matlab code; detailed description of the algorithm can be found in Supplementary Section 2 and Section 3. Mutation analysis was performed using custom Python and Matlab code; detailed description can be found in Supplementary Section 4.
Digital droplet PCR. ddPCR CNV Assays from Bio-Rad were used in this study. Speci cally, ddPCR Copy Number Assay: ERBB2, Human (Fluorophore: FAM, UniqueAssayID: dHsaCP1000116) and ddPCR Copy Number Assay: AGO1 (EIF2C1), Human (Fluorophore: HEX, UniqueAssayID: dHsaCP2500349) were purchased. Reaction setup, thermal cycling conditions and data acquisition were performed according to Bio-Rad protocol for ddPCR Copy Number Variation Assays. 10 ng of input DNA were used for each reaction.
Samples. Fresh frozen (FF) breast tissue samples from breast cancer patients were purchased from OriGene Technologies. ERBB2 status of the tumor tissue measured by immunohistochemistry (IHC) was obtained from the vendor. Genomic DNA from FF samples and buffy coat of blood samples was extracted using QIAamp DNA Mini (Qiagen) following manufacture's protocol. 56 plasma samples from 15 ERBB2+ metastatic breast cancer patients in de-identi ed format were collected from MD Anderson Cancer Center. All procedures performed in studies involving human participants were approved by Institutional Review Board at MD Anderson (protocols PA16-0507 and PA19-0375), and were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all participants.
Cell-free DNA was extracted from plasma using QIAamp MinElute ccfDNA Mini Kit (Qiagen) following manufacture's protocol. Samples were quanti ed by qPCR with Human cell-line gDNA NA18537 as reference. The concentration calculated from qPCR re ects the ampli able DNA.
Normal human placenta FFPE was purchased from BioChain. Total RNA from FFPE was extracted using RNeasy FFPE Kit (Qiagen). Human liver total RNA was purchased from Takara Bio. Human whole blood samples from healthy people were purchased from Zen-Bio Inc. RNA from fresh total blood was extracted using Monarch Total RNA Miniprep Kit (New England BioLabs). Infer tumor ERBB2 ploidy from plasma. cfDNA ERBB2 Ploidy = mean tumor ERBB2 Ploidy * tumor fraction% + 2.0 * (1 -tumor fraction%); tumor fraction% = mutation VAF in cfDNA / VAF in tumor. We took pathogenic mutation observed in baseline cfDNA with VAF between 1% and 30% for tumor fraction calculation, to avoid the in uence of SNP. Baseline mutation in 2 patients (1834 and 3669) were identi ed. We hypothesized mutation VAF in pure tumor is 50% (monoallelic), so that tumor fraction% = mutation VAF in cfDNA * 2.
RNAseq. Library preparation was performed using NEBNext Ultra II RNA Library Prep Kit for Illumina. Ribosomal RNA depletion was performed using NEBNext rRNA Depletion Kit v2. Raw fastq reads were initially quality ltered using Trimmomatic v0.39. Speci cally, individual reads were trimmed to the longest continuous segment for which phred quality score (Q) was ≥20 (Q ≥ 20 represents ∼99% accuracy per nucleotide position). Reads shorter than 50bp after trimming were discarded. Next, libraries were aligned to the human reference genome (GRCh38) using bowtie2 v2.4.4. After alignment, sam les were sorted and converted to bam les using samtools v1.12. HTseqv0.13.5 in mode 'intersection-strict' and with additional parameter '--minaqual 1' was used to estimate the number of reads that mapped to each gene of interest. Finally, StringTie v2.1.7 was used to calculate TPM-normalized gene abundance.  NGS-based QASeq DNA absolute quantitation modules. a. Schematic of a single QASeq module for quantitation of DNA bearing a speci c nucleotide sequence. Each input DNA strand was attached with a UMI by 2 PCR cycles. Further PCR ampli cation was performed after removing non-extended primers bearing UMI. NGS reads originating from the same input DNA strand carry the same UMI sequence, thus unique UMI family count represents the number of input DNA strands. b. Quantitation of ERBB2 and EIF2C1 molecule count using 2-plex QASeq and ddPCR. Five replicated experiments were performed for both methods with 10 ng gDNA input. QASeq showed higher conversion yield (86% on average) than ddPCR (53% on average). The CV of molecule count was lower for QASeq (5.0% for ERBB2, 2.5% for EIF2C1) than for ddPCR (12.8% for ERBB2, 13.3% for EIF2C1). c. Absolute quantitation of different DNA input. Observed ERBB2 molecule counts by QASeq were close to expected molecule counts calculated from DNA input amount because of high conversion yield. Lower conversion yield at 1 ng input DNA was possibly a result of material loss at low concentration. d. Technical variation of ERBB2 ploidy using 2plex QASeq. The mean of 5 replicates was 1.98, which is close to normal ploidy of 2. e. UMI family size distribution and data processing. UMI family size follows log-normal distribution after removing small families. Family size cutoff was calculated as 5% of the mean of top 3 largest family size here. Combination of multiple absolute quantitation modules for CNV detection. a. Stochastic error in copy number quantitation was reduced by increasing the number of quantitation modules in the gene. ERBB2 ploidy was calculated as 2 times the ratio between the mean of UMI family counts from modules in ERBB2 and in the reference. Because there are multiple possibilities of module down-selection, here each datapoint represents the average of 30 randomized selections. 123 modules served as the reference for ploidy calculation. The CV of 1 module was lower than that in Fig.1c, because only 1 module was used for both the reference and ERBB2. b. Discriminating 2.05 and 2.00 ploidy ERBB2 samples using QASeq.
The P-value calculated using the two-sided t-test was 0.0013. c. Concordance of QASeq ERBB2 ploidy with IHC. For the sample with discordance between QASeq and IHC, ddPCR agreed with QASeq; this might be a result of ERBB2 over-experession at protein level due to changes in promoters or enhancers. d. Concordance of QASeq ERBB2 ploidy with ddPCR. For the 3 samples with discordance between QASeq and ddPCR, IHC agreed with QASeq. e. Ploidy of each QASeq quantitation module for a healthy PBMC donor (top) and for a breast cancer fresh/frozen (FF) tumor section (bottom). Genes with called CNVs are displayed in orange, genes within expected variation are shown in black. f. QASeq CNV calls for 18 fresh/frozen breast tumor sections. Two non-adjacent tumor sections were tested from patients p15 and p16. Between 5 and 8 ng DNA input was used based on availability.

Figure 3
QASeq for longitudinal study of plasma samples from 15 ERBB2+ metastatic breast cancer patients. a.
Swimmer plot of clinical course and molecular ndings of patients. QASeq identi ed cfDNA ERBB2 ampli cation or increase of ERBB2 ploidy relative to the previous point in 6 out of the 8 patients with progression within 6 months of plasma sampling. The other 2 patients (sample ID 2697 and 2366) were each reported with 4 progressions. Although ERBB2 ampli cation or increase of ERBB2 ploidy was also observed in these two patients, abnormal ERBB2 events are only associated with part of the progression. b. ERBB2 ploidy and mutation allele frequency change in plasma sample of patients 2697 and 2366. Signi cant allele frequency changes in PIK3CA G1049R mutation in patient 2697 and in SNP rs1309838194 in patient 2366 were correlated with progression. c. Categorization of patients based on QASeq abnormal molecular ndings and disease progression. Chi-square test suggested that the QASeq result and progression is not statistically independent (p = 0.038). d. Plasma ERBB2 ploidy normalization with tumor fraction to infer tumor ERBB2 ploidy. Tumor FISH results at three time points were collected in patient 1834. QASeq detected ERBB2 ampli cation in plasma 5 months earlier than FISH from tumor tissue. The inferred tumor ERBB2 ploidy by QASeq was consistent with the available tumor ploidy from FISH. The inferred tumor ERBB2 ploidy was generally stable in both of the two patients