Patient material
In this study, 23 non-BRCA1/BRCA2 patients from families with a strong history of breast cancer, previously included in a study predicting BRCAness (12), were selected where matched tumour and blood samples were available. Inclusion criteria to enter the study were 1) a pedigree indicating monogenic inheritance of breast cancer predisposition, or 2) presence of ovarian cancer in pedigrees with breast cancer cases, or 3) a very young age at diagnosis of breast cancer (<30 years). Furthermore, four BRCA1 and three BRCA2 patients carrying a pathogenic BRCA1/BRCA2 variant with unknown family history were selected as controls for BRCAness classification. All tumour tissues were freshly frozen primary breast tumours collected between 1982 and 2008 in Odense and had been stored in the tumour biobanks of Department of Pathology, Odense University Hospital and Danish Breast Cancer Cooperative Group (DBCG). Data for Immunohistochemistry (IHC) of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) status was received from DBCG. The ER, PR and HER2 hormone receptor status not identified by the pathological review were estimated from gene expression levels of ESR1, PGR and ERBB2. The PAM50 subtypes were also classified for all samples from the gene expression (Supplementary Table S1).
Family risk from BOADICEA breast cancer estimation model
The Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) was used to validate the increased risk of breast cancer in the patients based on their family history (14). Five patients did not show increased risk of breast cancer according to BOADICEA but were still included due to either early-onset breast cancer, bilateral breast cancer, multiple breast or ovarian cancers in the family, or a combination of those (Supplementary Figure S1, Supplementary Table S1).
Whole-genome sequencing (WGS)
Sample preparation was performed using Illumina TruSeq Nano protocol with 550 bp insert length to strengthen detection of structural variants. Samples were sequenced on Illumina Novaseq 6000 with paired-end 2x150bp. The average sequencing coverage was 50.2X for tumour samples and 38.5X for normal samples (Supplementary Table S2).
Gene expression
Gene expression analysis Gene expression analysis was performed using a customized version of Agilent SurePrint G3 Human GE 8x60K Microarray and raw data were pre-processed as previously described (11). Microarray data have been deposited to the Gene Expression Omnibus (GSE49481).
Alignment of WGS data
The paired-end reads resulting from the sequencing was aligned to the human reference genome (GRCh37) using BWA-MEM v0.7.17. The specific version used can be found in the cgpmap-3.0.4 docker image (https://dockstore.org/containers/quay.io/wtsicgp/dockstore-cgpmap:3.0.4).
Processing of WGS data
The whole-genome sequencing data was processed using the same bioinformatic pipeline as in Nik-Zainal et al. (20).
CaVEMan (Cancer Variants Through Expectation Maximization: http://cancerit.github.io/CaVEMan/) was used for calling somatic and germline single nucleotide variants (SNVs). A lightly modified version of Pindel 2.0 (http://cancerit.github.io/cgpPindel/) was used for calling somatic and germline insertions and deletions (indels).
BRASS (BReakpoint AnalySiS: https://github.com/cancerit/BRASS) was used to detect rearrangements and other structural variants.
The Battenberg algorithm (https://github.com/cancerit/cgpBattenberg) was used for the detection of copy number variation in matched tumour-normal samples.
The specific versions of the tools used are found in the cgpwgs-2.1.0 docker image (https://dockstore.org/containers/quay.io/wtsicgp/dockstore-cgpwgs:2.1.0).
Filtering variants
Germline variants
Germline variants were filtered using a candidate gene list of 170 pathogenic and likely pathogenic germline variants associated with hereditary cancer (21). Then filters were applied keeping only frameshift, splice-site, and nonsynonymous variants with strong bioinformatic prediction and with frequency <0.01 according to gnomAD and ExAC (22). The variants were evaluated using the variant databases ClinVar and HGMD, and six missense variant predictors implemented in VarSeq. Loss-of-function (protein truncating) and splice variants, variants with strong bioinformatic prediction, and variants in genes associated with breast cancer risk with an odds ratio above two (23) were selected for further investigation.
Somatic variants
Somatic variants were filtered using the default settings of the tools in the bioinformatic pipeline. Somatic driver mutations were identified by filtering the list of somatic variants for the driver genes previously identified in 560 breast cancers using identical criteria for reporting a driver event as in (20).
Polygenic risk score
We applied the polygenic risk score with 313 SNPs (PRS313) developed for breast cancer risk prediction (13) incorporated in the latest version of BOADICEA (14) to predict the risk of getting breast cancer for each individual in our cohort under the assumption that they did not already develop breast cancer.
Mutational signatures
We applied a mathematical model (10) implemented in the Signature Tools Lib R package (24)(https://github.com/Nik-Zainal-Group/signature.tools.lib) to fit substitution and rearrangement signatures imprinted in the breast cancer genomes i.e. first a catalogue of substitutions and rearrangements was created for each sample and then fitted using bootstrap for robustness to the twelve substitution and six rearrangement signatures previously identified (20).
Stratification of tumours using unsupervised hierarchical clustering
Unsupervised hierarchical clustering with Euclidean distance and Ward’s linkage criterion (ward.D2 in the statistical programming language R) was used to stratify the breast cancer tumours. We incorporated both substitution and rearrangement signatures in the clustering. To make the signatures comparable, we needed to normalise the signatures to correct for the fact that cancer genomes often carry more substitution than rearrangement signatures thereby giving higher weight to the rearrangement signatures in the clustering. Proportions of signatures were normalised by dividing all substitution and rearrangement signatures by the highest proportion identified in their respective mutation categories.
BRCAness: HRDetect and our RNA classifier
The HRDetect model for detection of BRCA1/BRCA2-deficient tumours (7) was applied to the patient cohort. The HRDetect model incorporates information from substitution and rearrangement signatures, HRD score and deletion of microhomology and computes the probability of each tumour being BRCA1/BRCA2-deficient. We used the HRDetect model implemented in the Signature Tools Lib R package (24).
We included the BRCAness classification from our in-house developed RNA classifier published in an earlier study (12). The RNA classifier has been developed to classify basal and LumB subtype tumours i.e. basal-like tumours can be classified as either BRCA1-like or non-BRCA1-like, and LumB-subtype tumours can be classified as either BRCA2-like or non-BRCA1/BRCA2 like. Other subtypes are not yet supported. Molecule subtypes were identified using PAM50 as previously described (11).
Detection of promotor methylation
Detection of promotor methylation of the breast cancer predisposition genes BRCA1 and BRCA2 in the patients was done in an earlier study using MLPA (12).
Whole-genome profiles and heatmap figures
Breast cancer whole-genome profiles were created using the Signature Tools Lib R package (24) and are presented in Supplementary Figure S3. Heatmaps and stacked figures (Figures 1-3 and Supplementary Figure S1) were created using the ComplexHeatmap R package (25).
Additional information about variant interpretation
We identified very few rare germline variants in known breast cancer candidate genes. In one family, a well-known pathogenic mutation in CHEK2 (26-28) was found as well as high PRS score resulting in a predicted life-time risk of 57%. In another family, a missense TP53 germline variant, previously shown to be deleterious in functional assay (29), accompanied by a somatic second hit in TP53 is very likely to explain the extreme early onset breast cancer at the age of 29 years. The effect of loss of function mutations identified in the candidate genes FANCD2, RAD51D, SLX4, MSH6 is less clear.
These variants included loss of function variants in CHEK2, FANCD2, RAD51D and SLX4. In addition to the deleterious variant in CHEK2 we identified in another patient an in-frame CHEK2 deletion of unknown significance, c.246_260delCCAAGAACCTGAGGA previously shown to have intermediate functional impact (27). In another family, two affected members both carried a MSH6 missense variant of unknown significance (VUS) c.1813A>G, p.Thr605Ala, predicted deleterious by 5 of 6 bioinformatic predictors. No MMR signatures were identified indicating that the variant might not be pathogenic.
FANCD2 and SLX4 are well-established Fanconi Anemia genes similar to several other breast cancer genes. Nevertheless, mutations in these genes are expected to have low penetrance for breast cancer (23, 30-34). In combination with other genetic risk factors e.g. a high PRS this might explain the strong familial phenotype. However, the contribution from PRS estimated from BOADICEA was minor. Nevertheless, the included families had pedigrees indicative of a strong pattern of inheritance, and therefore other yet unknown genetic risk factors are likely to play a role in these families.
Our study also indicates that tumours with pathogenic mutations in TP53 and CHEK2, which are associated with DNA-damage signalling and detection of double-stranded breaks, did not classify as BRCA1/BRCA2-deficient tumours according to both prediction models tested. This confirms findings from earlier studies (7, 35).
The only tumour with high HRDetect score and no clear BRCA1/BRCA2 inactivating mechanism (germline variant or methylation) had a somatic VUS in BRCA1. The variant is located on exon 11 that although containing more than half of the coding region of BRCA1, does not contain reported pathogenic germline missense mutations. Low allele frequency and a high copy-number level in the BRCA1 region questions if the variant is causal for the high HRDetect score. Our finding of low BRCAness measured by HRDetect among non-BRCA1/BRCA2 familial cancer indicates a low false positive rate for classification of VUS in this clinically relevant patient group. Although HRDetect have a strong potential for classification of VUS our case illustrates that caution should be taken for this approach.