Development of a novel NGS methodology for ultrasensitive circulating tumour DNA detection as a tool for early-stage breast cancer diagnosis

doi:10.21203/rs.3.rs-2246067/v1

Download PDF

Research Article

Development of a novel NGS methodology for ultrasensitive circulating tumour DNA detection as a tool for early-stage breast cancer diagnosis

https://doi.org/10.21203/rs.3.rs-2246067/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

BACKGROUND

Breast cancer (BC) is the most prevalent cancer in women. While usually detected when localized, invasive procedures are still required for diagnosis.

METHODS

Herein, we developed a novel ultrasensitive pipeline to detect circulating tumour DNA (ctDNA) in a series of 75 plasma samples from localized BC patients prior to any medical intervention. We first performed a tumour-informed analysis to correlate the mutations found in tumour tissue and plasma. Disregarding the tumour data next, we developed an approach to detect tumour mutations in plasma.

RESULTS

We observed a mutation concordance between tumour and plasma of 29.50% with a sensitivity down to 0.03% in mutant allele frequency (AF). We detected mutations in 33.78% of the samples, identifying 8 patients with plasma-only mutations. Altogether, we determined a specificity of 86.36% and a positive predictive value of 88.46% for BC detection. We demonstrated an association between higher ctDNA median AF and higher tumour grade, multiple plasma mutations with likelihood of relapse and more frequent TP53 plasma mutations in hormone receptor-negative tumours.

CONCLUSIONS

Overall, we have developed a unique ultra-sensitive sequencing workflow with a technology not previously employed in early BC, paving the way for its application in BC screening.

Circulating tumor DNA

ultra-deep sequencing

early breast cancer

liquid biopsy

Breast cancer (BC) is the most commonly diagnosed cancer in women worldwide (the Global Cancer Observatory, 2020). It is normally detected at early stages mainly due to surveillance programs employing mammograms in asymptomatic women aged between 40–50 to 70. Conversely, if the disease has spread to other organs outside the breast and axillary lymph nodes, it is largely incurable with current therapeutic options. BC is, in fact, the leading cause of cancer deaths among women (the Global Cancer Observatory, 2020). Once an abnormal finding is detected in a mammogram, a biopsy of the lesion remains the gold standard to confirm the presence of cancer cells. However, this well-established invasive clinical method imposes inherent risks on the patients such as breast bruising, swelling, infections and altered breast appearance. Moreover, it is well known that spatial heterogeneity is a common feature in cancer[1], and thus a localized solid biopsy, which only takes a small piece of the lesion for analysis, might not reflect the entire molecular landscape of the tumour.

Over the last decades, liquid biopsy has revolutionized the molecular oncology field as a non-invasive procedure to obtain crucial information from the tumour. It is a clinically validated methodology to detect minimal residual disease, treatment resistance and/or to serve as cancer treatment guidance, easily permitting continuous monitoring, and theoretically capturing molecular heterogeneity of the tumour[2–4]. Importantly, it represents a promising tool for early-stage diagnosis[5] and potentially for screening of asymptomatic individuals for the presence of tumours. In this regard, little has been published about liquid biopsy in the screening process to detect BC in high-risk women. Several studies have been able to detect circulating tumour DNA (ctDNA) in the pre-treatment blood of BC patients with different sensitivities[3, 6, 7], however, all of them required previous solid tumour genetic information to find cancer mutations in blood. In this regard, a seminal study developed a pan-cancer methodology to screen tumours through ctDNA detection and protein biomarkers without prior somatic analysis, but the sensitivity to detect BC was the lowest amongst all tumour types[8]. Considering all the above-mentioned, it is crucial to find novel approaches to improve ctDNA detection in the first stages of cancer development and to demonstrate the utility of liquid biopsy to detect BC in women with high probability of presenting this disease.

In this study, we developed a novel method employing a custom BC capture sequencing panel with unique molecular identifiers (UMIs), ultra-deep sequencing and a custom bioinformatic pipeline, to detect tumour mutations in plasma from localized BC patients before diagnosis. We investigated concordance between the mutational landscape of tumour and plasma and performed a non-tumour informed analysis to discriminate between cancer patients and healthy individuals that could potentially be used to non-invasively detect BC prior to any other medical intervention (Fig. 1).

Patients

Plasma samples from 75 women with BIRADS 4c/5 mammography findings were collected just before tissue biopsy prior to cancer diagnosis and treatment.

Tumour biopsies were extracted using core needle biopsies which were fresh frozen. Immunohistochemical (IHC) analysis was performed to quantify expression of human epidermal growth factor receptor 2 (HER2), hormone receptors (HR) and Ki67. Estrogen receptor (ER) and progesterone receptor (PR) were considered positive in tumours presenting more than 1% nuclear-stained cells. HER2 staining was scored according to guidelines[9]. HER2 status was considered positive when graded as 3+, while 0 to 1 + were negative and 2 + was an inconclusive result and silver in situ hybridization was performed.

Blood sample processing

10 ml of plasma were obtained from each recruited individual in STRECK tubes (Streck, La Vista, NE). Within 2h after collection, plasma was isolated from whole blood by centrifugation for 10min at 3000 rpm at room temperature and stored at − 80 ◦C until circulating-free DNA (cfDNA) extraction.

DNA extraction and quantification from plasma and solid biopsies

cfDNA was extracted from plasma samples using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany) according to manufacturer’s instructions. Tumour DNA was isolated from fresh frozen tissue samples using the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following manufacturer’s instructions. cfDNA and DNA from solid tumours were quantified using droplet-digital PCR (Bio-Rad, Hercules, CA) and the RNAseP assay (Thermo Fisher Scientific, Waltham, MA, USA) as previously published[10].

Sequencing BC panel design

The genes to be included in the custom panel were selected as follows: i) Genes with mutations in BC in > = 1% of samples from a public database (https://www.cbioportal.org/), ii) Genes analysed and mutated in BC samples from a seminal study[11], iii) Genes with interest in BC biogenesis and iv) other interesting genes showing low mutation frequencies in BC databases but with important roles in other cancer types. Thus, the panel included the coding regions of following gene list: AKT1, ARID1A, ATM, BAP1, BRAF, BRCA1, BRCA2, CBFB, CDH1, CDKN1B, CTCF, ERBB2, ESR1, GATA3, HRAS, KDM6A, KRAS, MAP2K4, MAP3K1, MEN1, NCOR1, NF1, PBRM1, PIK3CA, PIK3R1, PTEN, RB1, RUNX1, SF3B1, SMAD4, TBX3, TP53, USP9X (Additional file 2: Table S1). The custom NGS panel for BC was designed using the SureDesign software (Agilent, Santa Clara, CA) with the next settings: 5x for tiling, least stringent for masking, XTHSBoosting for Boosting and a value of 30 for extension into repeats.

Sequencing library preparation

SureSelect^XTHS (Agilent, Santa Clara, CA) methodology was employed to generate sequencing libraries. We constructed libraries using a median input plasma DNA of 39.78 ng (max 173.91 ng – min 5.01 ng) from BC patients and 21.78 ng (max 113.52 ng – min 1.71 ng) from healthy individuals and a median tissue DNA of 199.50 ng (max 200 ng – min 6.95 ng) from tumours. The DNA from tissue was fragmented using the SureSelect Enzymatic Fragmentation kit (Agilent, Santa Clara, CA) and the libraries prepared using the SureSelect^XT Target Enrichment System kits (Agilent, Santa Clara, CA) following manufacturer’s indications. All PCR steps were carried out in the C1000 Touch Thermal Cycler (Bio-Rad, Hercules, CA).

Fragment ranges from libraries were assayed with the Bioanalyzer High-Sensitivity DNA chips (Agilent, Santa Clara, CA) and quantified using the KAPA Library Quantification Kit (Roche, Basel, Switzerland). For tumour tissue DNA sequencing, 8 pools containing 8 to 9 library samples per pool were prepared and sequenced. For BC plasma DNA, 8 pools containing 9 to 10 library samples per pool and 3 pools containing 7 to 8 library samples per pool from healthy controls plasma DNA were also prepared and sequenced. 19 lanes (1 lane per pool) were employed to sequence the libraries aiming to obtain ultra-deep sequencing of around 20,000X before de-duplication in the DNBseq-G400 platform (MGI, Hong Kong) at 100 pair-end reads following manufacturer´s instructions for UMIs sequencing.

Sequencing data processing

We created a custom pipeline for the processing of the SureSelect^XTHS (Agilent, Santa Clara, CA) sequencing data. We initially performed quality control of the sequencing data using fastQC v0.11.9. Next, we trimmed reads for adapters and quality filtered using trim-galore v0.6.7. To perform the processing steps that involve barcoded data, we used a subset of fgbio tools v1.5.1. We mapped the data to the GRCh38 reference genome using bwa v0.7.17. We next used fgbio GroupReadsByUmi to collapse by barcode using the Identity option to take into account that SureselectXT-HS barcodes are degenerate. Next, we generated consensus reads using fgbio CallMolecularConsensusReads. The generated consensus reads were mapped again with bwa. We then filtered these aligned consensus reads using fgbio FilterConsensusReads requiring a minimum base quality of 30 and keeping consensus reads supported by at least a minimum number of reads. We then used fgbio ClipBam to remove forward and reverse reads overlapping regions.

Finally, we performed variant calling with Mutect2 (gatk v4.2.2.0–1) including a panel of non-cancer DNA and a germline variant annotation file for the GRCh38 genome, obtained from the gatk resource bundle, that we used to annotate variants for filtering and only considering the regions included in the SureSelect panel. We annotated the variants with ANNOVAR[12] v20200608 with custom made databases for COSMIC version 95 and TCGA, downloading the calling results generated with the MuTect2 variant caller from the GDC data portal[13] for the latter.

Variant filtration and analysis

For tumour, we used a more stringent approach in order to create a solid reference to compare with the ctDNA findings. We generated consensus reads requiring a minimum of 3 contributing reads per read family. We accepted as valid calls only variants with VAF > 0.05 that were also present in either COSMIC or TCGA, increasing the VAF threshold to VAF > 0.2 for Formalin-Fixed Paraffin-Embedded tissues.

In case of ctDNA, we identified mutations using two methods: i) Stringent; using the same approach as described above but filtering for a minimum of 1 read per read family, with no VAF threshold applied. To consider mutations not found in tumour as detected in plasma we required them to have a duplex configuration, with at least two fragments mapping to different coordinates and to be present in both COSMIC and TCGA BC. We applied the same processing approach to control samples. ii) Exploratory; visualizing the alignments in the IGV genome browser[14] in order to identify mutations previously found in the corresponding tumours but missed by variant callers. When we detected the presence of the variant not reported by the variant caller, we counted the number of reads carrying the mutation in a given sample and the number of reads not carrying the mutation. Then, we compared them against the same read proportions in controls and BC plasma samples without the corresponding mutation using a Fisher test (Additional file 2: Table S2).

Statistical analyses and data visualization

We performed statistical analyses and plotted data with R (https://www.R-project.org/). Fisher´s exact test or Chi-square test were applied when appropriate both for testing association between clinicopathological variables and plasma sequencing data as well as in sequencing data analyses. Wilcoxon test was also applied to test for differences in sequencing coverage between cases and controls (Additional file 1: Figure S1). The threshold for statistical significance was established at p < 0.05. Sensitivity, specificity and PPV values were calculated using the caret v6.0.93 package. The oncoplot function from the maftools[15] v2.12.0 package was used to plot mutations and clinicopathological data.

The genetic landscape in tumours and ctDNA of localized BC patients

A total of 75 early-stage BC patients were recruited for the study after obtaining a suspicious mammogram result (BIRADS 4C/5). For all of them, a blood sample was taken prior to any medical intervention. In 71 cases, a diagnostic pre-treatment core needle solid biopsy was also available. These BC patients were recruited between 2016 to 2018 and continue nowadays in clinical follow-up, with a median clinical follow-up of 4.36 years (Table 1).

Table 1

Clinicopathological characteristics of the localized BC patients included in the study.
Clinical characteristics	n (%)
Diagnostic age (years)
30–50	13 (17.6)
> 50	61 (82.4)
Tumour type
IDC	59 (79.7)
DCIS	5 (6.8)
ILC	3 (4.1)
PC	1 (1.4)
TC	3 (4.1)
MC	3 (4.1)
Tumour size
< 2cm	32 (43.2)
2-5cm	37 (50.0)
> 5cm	5 (6.8)
Tumour grade
I	15 (20.3)
II	37 (50.0)
III	22 (29.7)
Axilar lymph node
Positive	28 (37.8)
Negative	38 (51.4)
Unknown	8 (10.8)
Estrogen receptor
Positive	66 (89.2)
Negative	7 (9.5)
Unknown	1 (1.4)
Progesterone receptor
Positive	56 (75.7)
Negative	17 (23)
Unknown	1 (1.4)
HER2 status
Positive	6 (8.1)
Negative	67 (90.5)
Unknown	1 (1.4)
BIRADS category
4/B/C	40 (54.1)
5C	33 (44.6)
Unknown	1 (1.4)
Clinical relapse
Yes	8 (10.8)
No	64 (86.5)
Unknown	2 (2.7)
IDC, invasive ductal carcinoma; ILC, invasive lobular carcinoma; DCIS, ductal carcinoma in situ; PC, papillary carcinoma; TC, tubular carcinoma; MC, mucinous carcinoma.

A custom capture panel composed of the exonic regions of 33 genes involved in BC pathogenesis (see materials and methods) was employed to characterize the mutational landscape of 71 pre-treatment solid biopsies and 75 plasma samples from the corresponding patients taken before any procedure, 4 of them were plasma-only samples and 1 tumour sample without the corresponding plasma. Firstly, the tumour DNA (N = 71) was sequenced using the Agilent SureSelect^XTHS technology, following protocol recommendations as previously reported[16]. Tumour sequencing was performed at 15,483X median coverage (Supplementary Fig. 2). Posterior bioinformatic processing utilizing UMIs to minimize sequencing errors provided a final median coverage of 1,698X (Additional file 1: Figure S2). Amongst the captured regions, only 3 were covered with less than 100X in more than 10% of the sequenced bases (Additional file 2: Table S3). Amongst these regions, only one presented mutations in the TCGA BC database in 0.09 and 0.27% of the total samples (Additional file 2: Table S3). In addition, all genes presented homogeneous coverage across samples (Additional file 1: Figure S3). Next, a custom filtering was performed using information from public genomic databases to identify somatic mutations (see materials and methods). Overall, 61 mutations were identified in 40/71 (56.33%) of the tumour samples. Amongst them, 33 were located in the PIK3CA gene (54.09%), 12 in TP53 (19.67%) and 4 in GATA3 (6.55%) (Table 2, Additional file 1: Figure S4, Additional file 2: Table S4), representing the most frequently mutated genes in our tumour set.

Table 2

Mutations detected in tumour and plasma samples.
Tumour
Sample	Gene	Nucleotide change	Aa Change	VAF (%)	Caller detected No(N)/Yes (Y)	Manually detected No(N)/Yes (Y)	VAF (%)
001MS	PIK3CA	c.G3145C	p.G1049R	14.6	N	N	-
002MS	CDH1	c.C2245T	p.R749W	5.3	N	N	-
	TP53	c.G524A	p.R175H	54.5	N	Y	0.2
	PIK3CA	c.G1252A	p.E418K	35.0	N	N	-
	PIK3CA	c.A3140T	p.H1047L	34.4	N	N	-
007MS	PIK3CA	c.A3140G	p.H1047R	31.2	Y	N	0.4
009MS	TP53	c.C637T	p.R213X	45.2	Y	N	0.8
010MS	PIK3CA	c.A3140G	p.H1047R	15.0	N	N	-
014MS	PIK3CA	c.A3140G	p.H1047R	25.9	N	N	-
015MS	GATA3	c.922-3_922-2delCA	p.X308_splice	22.4	N	Y	0.08
015MS	TP53	c.A377G	p.Y126C	58.2	N	Y	0.24
016MS	TP53	c.G743T	p.R248L	37.9	Y	N	4
	SMAD4	c.C725G	p.S242X	23.7	Y	N	3.2
	PIK3CA	c.A3140T	p.H1047L	35.1	Y	N	4.6
017MS	PIK3CA	c.A3140G	p.H1047R	27.3	N	N	-
021MS	TP53	c.376-2A > G	p.X126_splice	33.4	Y	N	1.8
022MS	KDM6A	c.C1747T	p.Q583X	12.5	N	N	-
023MS	TP53	c.G524A	p.R175H	63.0	N	N	-
023MS	PIK3CA	c.G1633A	p.E545K	39.3	N	N	-
030MS	GATA3	c.922-3_922-2delCA	p.X308_splice	38.7	N	N	-
030MS	PIK3CA	c.G1633A	p.E545K	77.0	N	N	-
031MS	PIK3CA	c.G1633A	p.E545K	15.7	N	N	-
032MS	PIK3CA	c.G353A	p.G118D	6.9	N	N	-
	PIK3CA	c.G2908A	p.E970K	14.7	N	N	-
	PIK3CA	c.A3140G	p.H1047R	11.4	N	N	-
	PIK3CA	c.A3140T	p.H1047L	3.0	N	N	-
033MS	TP53	c.A503T	p.H168L	37.6	Y	N	0.37
035MS	PIK3CA	c.G1633A	p.E545K	34.8	N	N	-
036MS	NF1	c.3478delG	p.G1160Vfs*6	5.5	N	N	-
036MS	PIK3CA	c.A3140T	p.H1047L	31.5	N	N	-
039MS	AKT1	c.G49A	p.E17K	33.7	N	N	-
039MS	NCOR1	c.G6751T	p.G2251C	10.3	N	N	-
040MS	PIK3CA	c.G1093A	p.E365K	21.7	N	N	-
040MS	PIK3CA	c.G1624A	p.E542K	40.0	N	N	-
044MS	KRAS	c.G35C	p.G12A	29.3	Y	N	0.97
044MS	TP53	c.G587C	p.R196P	51.2	Y	N	1.2
045MS	AKT1	c.G49A	p.E17K	6.9	N	N	-
047MS	PIK3CA	c.A3140G	p.H1047R	7.2	N	N	-
052MS	PIK3CA	c.A1637G	p.Q546R	19.8	N	N	-
052MS	PIK3CA	c.A3073G	p.T1025A	21.6	N	N	-
056MS	PIK3CA	c.G1624A	p.E542K	17.9	N	N	-
057MS	PIK3CA	c.G1633A	p.E545K	32.1	N	N	-
060MS	MAP3K1	c.813_814del	p.R273Sfs*27	11.6	N	N	-
064MS	PIK3CA	c.T1035A	p.N345K	34.9	N	N	-
065MS	GATA3	c.922-3_922-2delCA	p.X308_splice	23.3	N	N	-
065MS	PIK3CA	c.A3140G	p.H1047R	25.9	N	N	-
066MS	ERBB2	c.G2305T	p.D769Y	23.4	N	N	-
066MS	PIK3CA	c.G1624A	p.E542K	27.5	N	N	-
067MS	TP53	c.A842C	p.D281A	51.8	Y	N	0.31
067MS	PIK3CA	c.G3145C	p.G1049R	86.2	Y	N	0.32
079MS	TP53	c.C742T	p.R248W	9.1	N	Y	0.05
079MS	PIK3CA	c.A1637G	p.Q546R	11.9	N	N	-
080MS	PIK3CA	c.G1633A	p.E545K	26.4	N	N	-
081MS	PTEN	c.T406C	p.C136R	54.0	Y	N	3.3
081MS	TP53	c.G743A	p.R248Q	52.9	Y	N	1.5
093MS	PIK3CA	c.A1634G	p.E545G	29.1	N	N	-
095MS	PIK3CA	c.A3140T	p.H1047L	52.0	N	N	-
099MS	PIK3CA	c.A3140T	p.H1047L	28.6	N	N	-
101MS	SF3B1	c.A2098G	p.K700E	20.9	N	N	-
104MS	TP53	c.A715G	p.N239D	22.2	N	N	-
107MS	GATA3	c.922-3_922-2delCA	p.X308_splice	40.1	N	Y	0.03
Aa, aminoacid; VAF, Variant allele frequency

To investigate the concordance between the mutations found in tumours and in plasma, the custom capture panel was also applied to plasma DNA (N = 75). Plasma sequencing reached 17,704X median coverage (Additional file 1: Figure S2). In total, 74 plasma samples from the patients were sequenced, 4 of them without tumour tissue available and one plasma sample failing in the sequencing process. After UMIs processing, the median coverage was 2,525X (Additional file 1: Figure S3). Amongst the sequenced gene regions, 3 presented low coverage and all genes showed homogeneous coverage (Additional file 1: Figure S3, Additional file 2: Table S3). Amongst these low-coverage regions, mutations were observed in 2 of them in the TCGA BC database, identified in 0.09% and 0.27% of the total samples (Additional file 2: Table S3). After bioinformatic analyses using established mutation caller (see materials and methods), 13/61 (21.31%) tumour mutations were found in plasma that were also present in the corresponding tumours; 7 mutations in the TP53 gene (53.84%) and 3 in PIK3CA (23.07%) as the most frequently mutated genes (Fig. 2, Additional file 1: Figure S5, Table 2 and Additional file 2: Table S5).

Additionally, all mutations previously identified in tumours were manually inspected in the plasma sequencing raw data. Aligned data was used to identify supporting reads for the variant alleles using the IGV software (see materials and methods). Mutations found in at least 2 reads with different genomic coordinates passed to the next analysis step as previously recommended[16]. To consider the variants as valid, a Fisher´s exact test was applied using sequencing data from 22 plasma healthy controls and non-mutated patients´ plasma samples (Additional file 2: Table S2). In total, 5 mutations from 4 different patients were rescued from plasma sequencing using manual inspection (see Materials and Methods, Table 2 and Additional file 2: Table S5). Amongst them, 3 mutations were located in the TP53 gene and 2 in GATA3. Interestingly, the 2 structural variants in GATA3 with robust sequencing stats recovered using manual inspection evidences the difficulties some callers have to identify indels. Considering the detected variants both by the caller and by the manual inspection, 18/61 (29.50%) variants found in tumour tissue were also discovered in plasma samples (Fig. 2, Table 2 and Additional file 2: Table S5).

Panel utility for BC detection using a non-tumour informed pipeline and association with clinicopathological variables

To investigate the capacity of our next generation sequencing (NGS) pipeline to be used to non-invasively detect BC after suspicious mammograms, a bioinformatic non-tumour informed analysis was developed. In this analysis, the somatic mutations’ information from solid biopsies was disregarded and only Mutect2 was employed to detect mutations in plasma samples using 1 UMIs families and no filters in variant allele frequencies (VAFs) (see materials and methods). Variants were considered as shredded by the tumour if i) they affected exonic regions, ii) were annotated in the COSMIC, TCGA BC and TCGA databases including all cancer types as well as if iii) there were variant-supporting reads aligned in 2 or more different genomic coordinates manually visualized using the IGV software. Following the mentioned criteria, 25/74 (33.78%) individuals presented tumour mutations detected in their plasma (Fig. 2), 16 of the mutations were not observed in the previous tumour-informed analysis (Additional file 2: Table S6). Amongst them, a new mutation was observed in the TP53 gene in the sample 081MS, different to the one detected in the tumour sequencing (Table 2, Fig. 2, Additional file 2: Table S6). Additionally, ctDNA mutations were found in 8 plasma samples in whose corresponding tumour biopsies no mutations were detected (Fig. 2, Additional file 2: Table S6). Finally, a mutation was found in one plasma sample with no tumour tissue available (Additional file 2: Table S6). Overall, amongst the 25 different plasma mutations, TP53 (13 mutations, 52%), PIK3CA (3 mutations, 12%) and GATA3 (3 mutations, 12%) were the most frequently affected genes (Table 2, Additional file 2: Table S6).

Then, 22 plasma samples from healthy individuals were sequenced with the same sequencing panel, protocol conditions and coverage as the plasma samples from patients (Wilcoxon test p-value = 0.7112) (Additional file 1: Figure S6). After applying the same bioinformatic pipeline as for BC cases, mutations were found in the plasma of 3/22 (13.63%) controls (see materials and methods, Additional file 1: Figure S6). One mutation affected the MAP3K1 gene (p.N1125D), which was described in the COSMIC database in one breast cancer tumour sample, one mutation was located in the ERBB2 gene (p.V842I), which has been observed substantially more frequent in colon and endometrial cancers, and an additional one was found in the SMAD4 gene (p.R361H), which is also remarkably frequent in colon adenocarcinoma and pancreatic cancer (Additional file 2: Table S7).

Considering our findings, the employment of the custom capture panel together with an ultra-deep sequencing and a custom non-tumour informed bioinformatic analysis led to a sensitivity of 31.08% (95% CI: 20.83–42.90%), a specificity of 86.36% (95% CI: 65.09–97.09%) and a positive predictive value (PPV) of 88.46% (95% CI: 71.75–95.86%) for breast cancer detection in our cohort.

The association of clinicopathological variables with mutation detection in plasma were also investigated. In detail, ctDNA positivity in plasma, the mutations median allele frequency (AF), the number of mutations per sample as well as samples with mutations in TP53 were studied for their association with clinical characteristics (Additional file 2: Table S8). Overall, higher median AF was associated with higher tumour grade (p = 0.0463), the presence of more than 1 plasma mutation in plasma with likelihood of clinical relapse (p = 0.0237) and TP53 mutations in plasma more frequently observed in hormone receptor (HR)-negative tumours (estrogen receptor (ER)-negative p = 0.0316; progesterone (PR)-negative, p = 0.0257). Additionally, the association of clinical relapse and plasma mutations with high median allele frequency (AF), defined as mutations with > 0.05% in AF, was interestingly close to significance (p = 0.059) (Additional file 2: Table S8). To note, 38.35% of the patients included herein were asymptomatic and diagnosed by the BC early detection program. Amongst them, 28.57% of them presented plasma mutations, similar percentage as the 33.33% of symptomatic women with mutations.

In this study, we described the utility of a novel custom capture panel used together with ultra-deep sequencing to detect ctDNA in pre-treatment plasma samples from localized BC patients. We aimed to i) study the correlation of detected variants between tumour tissue and plasma and ii) the panel efficacy to detect ctDNA as biomarker for BC in non-diagnosed patients. To our knowledge, this is the first time that a similar technology has been employed in plasma samples from early BC patients both to correlate genetic landscapes between tumour and plasma as well as to detect BC in women with suspicious mammograms. Previous studies have tried to use amplification NGS technologies alone[6] or in combination with other blood-circulating components with limited results in BC[8]. In addition, the methodology used herein has demonstrated its capability to detect minute amounts of mutant DNA although it had been never employed in plasma samples from localized cancers to date[16].

Firstly, we performed ultra-deep sequencing in tumour DNA and the corresponding plasma to correlate the mutational landscape. We observed a concordance of 29.50% between the mutations found in tissue and plasma. Previous studies have shown similar results using amplification methodologies but studying a remarkably smaller number of genes, limiting the tumour genetic information inferred from them[17]. In addition, we have developed a custom bioinformatic pipeline to detect ctDNA mutations in plasma missed by an automatic variant caller. The same technology and sequencing depth have been tested previously demonstrating a robust variant identification around a VAF of 0.15% and less efficient detection of variants down to 0.075%[16]. Herein, we increased the detection sensitivity by identifying variants below 0.075% using a different sequencing platform and a custom bioinformatic pipeline (Table 2, materials and methods). Importantly, we also found mutations in 8 plasma samples whose corresponding tumours bore no detectable mutations (Additional file 2: Table S6). This observation highlights the tumour heterogeneity as well as the commonly mentioned liquid biopsy’s capacity to provide a more complete tumour genetic landscape as compared to solid biopsy, which is limited by the tumour tissue captured by core needles[18, 19].

In addition, we explored the panel clinical validity in detecting BC in women with suspicious BIRADs 4c and 5 lesions in the mammograms. We developed a non-tumour informed pipeline using the plasma DNA sequencing of our series of patients as well as 22 plasma samples from women who enrolled into the study with suspicious mammograms but were eventually not diagnosed with BC. We could observe high specificity (86.36%) but relatively low sensitivity (31.08%) in identifying individuals affected by BC. These findings highlight the difficulties in detecting ctDNA in localized BC even in pre-treatment blood samples with a demonstrated limit of detection down to 3 mutant molecules in 10,000 wild-type (Table 2). Concordant results were reported in other studies utilizing different technologies such as the droplet-digital PCR[3, 10, 20]. However, the high specificity observed using our methodology with a remarkably high PPV of 88,46% remains noteworthy. Other study showed the possibility of detecting ctDNA in localized BC at lower sensitivity but requiring tumour information to design patient-specific NGS panels[7]. Another recent study has tried to explore methodologies for BC screening using liquid biopsy and NGS panels together with UMIs. Importantly, also tumour genetic information was necessary therein to design patient-specific panels and the authors only detected ctDNA in 14,1% of the pre-treatment plasma samples from early BC patients[6]. Similarly, a seminal study investigated the utility of using a pan-cancer high-sensitive NGS technology, together with circulating biomarkers to early detect 8 tumour types. Strikingly, the sensitivity to detect localized BC in this study was similar to the one shown herein, but it required the addition of other circulating biomarkers such as proteins[8]. Moreover, it is important to highlight that the set of localized BC patients included in the mentioned study had higher tumour grade than the ones studied here. This might have enhanced the probability of ctDNA detection in the plasma samples.

Importantly, we could test the association between mutations in plasma and multitude of patient´s clinicopathological characteristics (Additional file 2: Table S8). We observed statistical significance between higher median AF and higher tumour grade and more frequent plasma TP53 mutations in HR-negative. To note, we could associate the presence of more than 1 mutation in plasma with the likelihood of clinical relapse in part thanks to the long clinical follow-up of the patients included in this study. Moreover, it is also important to highlight the observation of a trend in the association between the median AF with patients´ clinical relapse (Additional file 2: Table S8). This is one of the first studies suggesting than a pretreatment plasma sequencing could provide information about the clinical outcome in localized BC patients. However, studies including higher number of patients are required to validate this finding. In addition, median AF of plasma variant was associated with tumour stage, a finding previously shown[17, 21]. Moreover, the association between more frequent TP53 mutations in HR-negative patients has been also previously demonstrated in plasma sequencing from BC cancer patients [22, 23], where the authors pointed certain implications in response to anti-HER2 treatments.

Considering the above discussed, our NGS plasma-only workflow showed enhanced capacities to detect ctDNA in localized BC patients at the very first diagnosis stages, improving detection sensitivity and adding evidence that ctDNA could help in the diagnostic process of asymptomatic population. This is supported by the similar percentages in plasma mutation identification between symptomatic and asymptomatic women of 33.33% and 28.57% respectively. In this regard, we developed a custom bioinformatic pipeline to identify plasma mutations without tumour information, demonstrating high PPV and suggesting similar approaches could be tested as a screening tool for BC. We also demonstrated that by sequencing early BC patients’ plasma DNA, it is feasible to obtain important information about the disease as well as to predict the clinical outcome in these patients.

BC, breast cancer; ctDNA, circulating tumor DNA; AF, allele frequency; UMIs, unique molecular identifiers; IHC, Immunohistochemical; HER2, human epidermal growth factor receptor 2; HR, hormone receptors; ER, estrogen receptor; PR, progesterone receptor; cfDNA, circulating cell-free DNA; ddCPR, droplet-digital PCR; NGS, next generation sequencing; VAFs, variant allele frequencies; PPV, positive predictive value.

Ethics approval and consent to participate

The study was approved by the local ethics committee and was performed according to the Good Clinical Practice and the Declaration of Helsinki guidelines. Informed consent was obtained for all women that were recruited at the “Hospital Universitario Virgen de la Victoria de Málaga” and at the “Hospital Clínico Universitario de Valencia” before performing any assessment required as per protocol.

Consent for publication

Not applicable

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare no conflict of interest.

Funding information

Comino-Mendez´s contract is funded by the Spanish Association Against Cancer Scientific Foundation (AECC). This study was supported by the “Consejería de Salud y Familias – Junta de Andalucía” (PI-0291-2019), “Fundación Unicaja” funding Alba-Bernal´s contract and Andalusia-Roche Network in Precision Medical Oncology this last also funding Quirós-Ortega´s contract. Carbajosa-Antona´s contract is funded by the “Ayudas María Zambrano para la atracción de talento internacional – Universidad de Málaga”.

The above-mentioned funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Authors´ contributions

BJ-R, EA and ICM conceptualized and supervised the study. EA, BJ-R, MA, LV, GD-C, CH, BB, AJ-P, AL and JP recruited patients, collected and evaluated plasma and tumour samples. IC-M, AA-B, EL-L, MEQ-O, GC, AG-A MIQ-O, MDR-D and JG-V G-S perform the experiments and analyzed results. IC-M, EA, JP, BJ-R, AA-B, EL-L, GC designed and performed figures and tables. EL-L and GC performed the bioinformatic analyses. EA, IC-M, BJ-R and JP wrote the manuscript with important inputs from all authors. All authors reviewed and agree with the content of the manuscript. All authors have read and approved the final version of the manuscript.

Acknowledgments

The authors are thankful to all women who participated in the study. We thank Veronika Mancikova for editing the manuscript.

Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A et al: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 2012, 366(10):883–892.
Tie J, Cohen JD, Lahouel K, Lo SN, Wang Y, Kosmider S, Wong R, Shapiro J, Lee M, Harris S et al: Circulating Tumor DNA Analysis Guiding Adjuvant Therapy in Stage II Colon Cancer. N Engl J Med 2022, 386(24):2261–2272.
Garcia-Murillas I, Chopra N, Comino-Mendez I, Beaney M, Tovey H, Cutts RJ, Swift C, Kriplani D, Afentakis M, Hrebien S et al: Assessment of Molecular Relapse Detection in Early-Stage Breast Cancer. JAMA Oncol 2019, 5(10):1473–1478.
Dawson SJ, Tsui DW, Murtaza M, Biggs H, Rueda OM, Chin SF, Dunning MJ, Gale D, Forshew T, Mahler-Araujo B et al: Analysis of circulating tumor DNA to monitor metastatic breast cancer. N Engl J Med 2013, 368(13):1199–1209.
Bettegowda C, Sausen M, Leary RJ, Kinde I, Wang Y, Agrawal N, Bartlett BR, Wang H, Luber B, Alani RM et al: Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med 2014, 6(224):224ra224.
Page K, Martinson LJ, Hastings RK, Fernandez-Garcia D, Gleason KLT, Gray MC, Rushton AJ, Goddard K, Guttery DS, Stebbing J et al: Prevalence of ctDNA in early screen-detected breast cancers using highly sensitive and specific dual molecular barcoded personalised mutation assays. Ann Oncol 2021, 32(8):1057–1060.
McDonald BR, Contente-Cuomo T, Sammut SJ, Odenheimer-Bergman A, Ernst B, Perdigones N, Chin SF, Farooq M, Mejia R, Cronin PA et al: Personalized circulating tumor DNA analysis to detect residual disease after neoadjuvant therapy in breast cancer. Sci Transl Med 2019, 11(504).
Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, Douville C, Javed AA, Wong F, Mattox A et al: Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 2018, 359(6378):926–930.
Wolff AC, Hammond ME, Schwartz JN, Hagerty KL, Allred DC, Cote RJ, Dowsett M, Fitzgibbons PL, Hanna WM, Langer A et al: American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. J Clin Oncol 2007, 25(1):118–145.
Garcia-Murillas I, Schiavon G, Weigelt B, Ng C, Hrebien S, Cutts RJ, Cheang M, Osin P, Nerurkar A, Kozarewa I et al: Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci Transl Med 2015, 7(302):302ra133.
Pereira B, Chin SF, Rueda OM, Vollan HK, Provenzano E, Bardwell HA, Pugh M, Jones L, Russell R, Sammut SJ et al: The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat Commun 2016, 7:11479.
Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38(16):e164.
Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, Staudt LM: Toward a Shared Vision for Cancer Genomic Data. N Engl J Med 2016, 375(12):1109–1112.
Robinson JT, Thorvaldsdottir H, Wenger AM, Zehir A, Mesirov JP: Variant Review with the Integrative Genomics Viewer. Cancer Res 2017, 77(21):e31-e34.
Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP: Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res 2018, 28(11):1747–1756.
Mansukhani S, Barber LJ, Kleftogiannis D, Moorcraft SY, Davidson M, Woolston A, Proszek PZ, Griffiths B, Fenwick K, Herman B et al: Ultra-Sensitive Mutation Detection and Genome-Wide DNA Copy Number Reconstruction by Error-Corrected Circulating Tumor DNA Sequencing. Clin Chem 2018, 64(11):1626–1635.
Rodriguez BJ, Cordoba GD, Aranda AG, Alvarez M, Vicioso L, Perez CL, Hernando C, Bermejo B, Parreno AJ, Lluch A et al: Detection of TP53 and PIK3CA Mutations in Circulating Tumor DNA Using Next-Generation Sequencing in the Screening Process for Early Breast Cancer Diagnosis. J Clin Med 2019, 8(8).
Alba-Bernal A, Lavado-Valenzuela R, Dominguez-Recio ME, Jimenez-Rodriguez B, Queipo-Ortuno MI, Alba E, Comino-Mendez I: Challenges and achievements of liquid biopsy technologies employed in early breast cancer. EBioMedicine 2020, 62:103100.
Dang DK, Park BH: Circulating tumor DNA: current challenges for clinical utility. J Clin Invest 2022, 132(12).
Olsson E, Winter C, George A, Chen Y, Howlin J, Tang MH, Dahlgren M, Schulz R, Grabau D, van Westen D et al: Serial monitoring of circulating tumor DNA in patients with primary breast cancer for detection of occult metastatic disease. EMBO Mol Med 2015, 7(8):1034–1047.
Zhou Y, Xu Y, Gong Y, Zhang Y, Lu Y, Wang C, Yao R, Li P, Guan Y, Wang J et al: Clinical factors associated with circulating tumor DNA (ctDNA) in primary breast cancer. Mol Oncol 2019, 13(5):1033–1046.
Yi Z, Ma F, Rong G, Guan Y, Li C, Xu B: Clinical spectrum and prognostic value of TP53 mutations in circulating tumor DNA from breast cancer patients in China. Cancer Commun (Lond) 2020, 40(6):260–269.
Liu B, Yi Z, Guan Y, Ouyang Q, Li C, Guan X, Lv D, Li L, Zhai J, Qian H et al: Molecular landscape of TP53 mutations in breast cancer and their utility for predicting the response to HER-targeted therapy in HER2 amplification-positive and HER2 mutation-positive amplification-negative patients. Cancer Med 2022, 11(14):2767–2778.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Development of a novel NGS methodology for ultrasensitive circulating tumour DNA detection as a tool for early-stage breast cancer diagnosis

Status:

Version 1

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

Figures

Background

Methods

Patients

Blood sample processing

DNA extraction and quantification from plasma and solid biopsies

Sequencing BC panel design

Sequencing library preparation

Sequencing data processing

Variant filtration and analysis

Statistical analyses and data visualization

Results

The genetic landscape in tumours and ctDNA of localized BC patients

Panel utility for BC detection using a non-tumour informed pipeline and association with clinicopathological variables

Discussion

Conclusions

LIST OF ABBREVIATIONS

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1