Altered cfDNA Fragmentation Prole in Hypomethylated Regions as Diagnostic Marker in Breast Cancer

Backgroud: Breast cancer, the most common malignancy in women, has been proved to have both altered plasma cell-free DNA (cfDNA) methylation and fragmentation proles, nevertheless, simultaneously detecting both of them for breast cancer diagnosis has never been reported. Moreover, although fragmentation pattern of cfDNA is determined by nuclease digestion of chromatin, structure of which may be affected by DNA methylation, whether cfDNA methylation and fragmentation are biologically related or not still remains unclear. Methods: Improved cfMeDIP-seq were utilized to characterize both cfDNA methylation and fragmentation proles in 25 plasma samples from both healthy individuals and patients with breast cancer. The feasibility of using cfDNA fragmentation prole in hypo- and hyper- methylated regions as diagnostic markers for breast cancer was evaluated. Results: Mean size of cfDNA fragments ranging from 100 to 220 base pairs (bp) was found to increase from 170.06 (Input libraries) to 173.04 (IP libraries) bp in healthy individuals, which was not observed in patients with breast cancer (170.51 to 170.71 bp). Furthermore, mean size of cfDNA fragments mapped to hypomethylated regions decreased more win patients with breast cancer (4.60 bp, 172.33 bp in hypermethylated regions to 167.73 bp in hypomethylated regions) than healthy individuals (2.87 bp, 174.54 bp in hypermethylated regions to 171.67 bp in hypomethylated regions). The feasibility of using abnormality of short cfDNA fragments ratio in hypomethylated genomic regions for diagnosis of breast cancer in validation cohort was evaluated. 7 out of 11 patients were detected as having breast cancer (63.6% sensitivity), whereas no healthy individuals were mis-detected (100% specicity). Conclusion: We identied enriched short cfDNA fragments after 5mC-immunoprecipitation (IP) in patients with breast cancer, and demonstrated the enriched short cfDNA fragments might originated from hypomethylated genomic regions. Furthermore, we proved the feasibility of using differentially methylated regions (DMRs)-dependent cfDNA fragmentation prole for breast cancer diagnosis.

were proved to be shorter than noncancer-derived cfDNA fragments recently, which led to the aberrant size distribution of cfDNA fragments in patients with cancer [3,11,18,19]. Furthermore, genome-wide cfDNA fragmentation pro ling was reported to achieve 70% detecting sensitivity with 95% speci city as biomarker for breast cancer diagnosis [3].
These studies suggested that both altered methylation and aberrant fragmentation were present in cancer-derived cfDNA. Conceptually, approaches detecting these changes simultaneously can better differentiate the origin of cfDNA, and thus improve cancer detection e cacy. However, whether methylation and fragmentation of cfDNA are biologically related or happened independently have not been reported, and need to be investigated.
Since cfDNA is originated from the nucleases digestion of chromatin during multiple cellular processes including apoptosis, necrosis and active cellular secretion [20], fragmentation pattern of cfDNA should be closely related to the accessibility of chromatin, which may be affected by epigenetic modi cation, nucleosome position and location of transcription machinery [19][20][21][22][23]. Therefore, we hypothesized that methylation pro le of cfDNA, which had implications for chromatin remodeling, should be related to fragmentation pro le of cfDNA.
In this study, we used the improved cfMeDIP-seq approach to investigate whether the differential methylation of cfDNA in patients with breast cancer was related to cfDNA fragmentation pro le or not (Fig. 1). And we further evaluated the possibility of detecting both methylation and fragmentation of cfDNA for better e cacy of breast cancer diagnosis.

Sample collection and cfDNA extraction
Blood samples from patients with breast cancer in discovery cohort were obtained at the time of treatment in Shenzhen University General Hospital. Blood samples from patients with breast cancer in validation cohort were obtained at the time of diagnosis, before tumor resection or therapy from Huazhong University of Science and Technology Union Shenzhen Hospital. Blood samples from healthy individuals in discovery cohort and validation cohort were obtained at the time of routine screening from Shenzhen University General Hospital and The Third People's Hospital of Shenzhen respectively. This study was approved by the Institutional Review Board of Shenzhen University General Hospital and Huazhong University of Science and Technology Union Shenzhen Hospital according to established ethical guidelines as outlined in the Declaration of Helsinki. All patients signed an informed consent document approved by the Institutional Review Board before entering any study. Clinical characteristics for all participants in this study were listed in Table S1.
All blood samples for participants in this study were collected in tubes containing EDTA as anticoagulant, and processed immediately for plasma isolation. In general, whole blood were rst centrifuged at 1000g for 10min at 4 °C for plasma and cellular components separation, and followed by centrifugation at 16000g for 10min at 4 °C for further purifying plasma. The puri ed plasma was then stored at -80 °C. cfDNA was extracted from plasma using MiniMaxTM High E ciency Cell-Free DNA Isolation Kit (Apostle, A17622-250) according to manufacturer's instructions. The concentration and quality of cfDNA were assessed by the Qubit dsDNA HS Assay kit (Themo Fisher Scienti c, Q32854) and Bioanalyzer 2100 (Agilent Technologies).
cfMeDIP-seq library construction and sequencing cfDNA was used for cfMeDIP-seq library preparation with the method described previously with the following modi cations [6,24].
(1) ~ 10 to 20ng cfDNA was ligated with a pool of eight unique paired end Illumina adapters with 8-bp molecular barcodes instead of the NEBNext adapters (NEBNext Multiplex Oligos for Illumina kit, New England BioLabs) (table S6), and the ligation was conducted by using KAPA Hyper Prep kit (KAPA biosystems, KK8504) according to manufacturer's instructions.
(4) Input and IP libraries were sequenced at 0.5 and 5 respectively.
The speci city of the immunoprecipitation reaction and fold enrichment ratio in sequencing IP libraries were evaluated using the MagMeDIP kit (Diagenode, C02010021) according to the manufacturer's instructions.

Data processing and analysis
Raw reads of cfMeDIP-seq Input and IP libraries were processed according to the following steps. (1) Each reads were labelled with the molecular barcode identi ed in the leading 8-bp sequences of R1 and R2 reads with 1 mismatch allowed, and then the molecular barcode sequence was removed from raw reads. (2) Illumina sequencing adapter and low quality sequence were removed with cutadapter (version 2.10) and trimmomatic (version 0.39) respectively. (3) Pair reads with insert size less than 20 bp were also eliminated for further analysis. (4) The ltered reads were aligned against the human reference genome (version hg19) using BWA (version 0.7.17-r1188). (5) Only properly paired and unique mapped read pairs with a MAPQ score above 13 were kept, and PCR duplicates de ned as having the same genomic start, end and molecular barcode were removed as well. The remaining mapped read pairs in SAM les were converted to BAM format using SAMtools (version 1.7) for further analysis.
cfDNA fragment size analysis To calculate fragment size of cfDNA, the bam le obtained above was rst processed by R package GenomicAlignments (version 1.24.0), and then a Granges object was generated for calculating the fragment size of each cfDNA molecule by R package GenomicRanges (version 1.40.0). Density plot was generated for illustrating the size distribution of cfDNA fragment through R package ggplot2 (version 3.3.2). Short cfDNA fragments were de ned as having lengths between 100 bp and 150 bp and long fragments as having lengths between 151 bp and 220 bp according to previous study [3]. Short fragments ratio was calculated as the counts of short cfDNA fragments mapped to the investigated regions or genomic windows dividing by the counts of long cfDNA fragments mapped to the same regions or windows in sequencing libraries. Input-adjusted short fragments ratio was calculated through dividing the short fragments ratio in investigated regions or genomic windows by the short fragments ratio in whole human reference genome (version hg19) of corresponding Input library. Genome-wide cfDNA fragmentation pro les in Input and IP libraries for participants in discovery cohort were calculated without GC adjustment according to the methods reported in previous study [3].

Identi cation of differentially methylated regions (DMRs)
For each sample from participants, we computed cfDNA fragment counts per 10-kb non-overlapping windows across human reference genome (version hg19), ltered out windows with the mean counts less than 10, and R package DESeq2 (version 1.28.1) with default parameters was used for calling DMRs at padj < 0.05. Hypermethylated and hypomethylated regions were de ned as the genomic windows that have log2FoldChange > 1 and log2FoldChange < -1 in patients with breast cancer compared with healthy individuals, and then illustrated in volcano and heatmap by ggplot2 (version 3.3.2) and pheatmap (version 1.0.12) R packages. Density plot was generated through R package ggplot2 (version 3.3.2) to show fragment size distribution of the cfDNA mapped to hypermethylated and hypomethylated regions. Differentially methylated 10-kb windows were selected as DMRs according to the following criteria. (1) the selected genomic windows should have at least 20 unduplicated cfDNA fragments for all samples including patients with breast cancer and healthy individuals; (2) the selected genomic windows should have input-adjusted short fragments ratio of less than 10 for any samples investigated. For samples of lung cancer from another study [25], same data processing and analysis were used without deduplication step, and DMRs were called at p-value < 0.05 and |log2FoldChange| > 1.

Diagnostic model for breast cancer detection
To distinguish patients with breast cancer from healthy individuals using fragmentation pro les in DMRs, we calculated the median input-adjusted short fragments ratio in each differentially hypomethylated 10kb windows of healthy individuals (n = 8) as a baseline pro le. We then evaluated the Pearson correlation of the fragmentation pro le in each participants from validation cohort to the baseline pro le. Cut-offs threshold was determined as the correlation value that can classify healthy individuals and patients with breast cancer at maximum speci city and sensitivity. Receiver operating characteristic (ROC) curve was used to evaluate the classi ers for predicting breast cancer through the R package pROC (version 1.16.2).

Results
Altered cfDNA fragmentation pro le upon 5mCimmunoprecipitation (IP) As it has been reported that cancer-derived cfDNA fragments may have altered methylation and smaller size [6,11], we decided to focus on cfDNA fragments ranging from 100 to 220 base pairs (bp) to investigate whether the release of cancer-derived cfDNA was related to DNA methylation or not. In a preliminary analysis in discovery cohort, cfDNA extracted from plasma of 3 healthy individuals (H1, H2 and H3) and 3 breast carcinoma patients (P1, P2 and P3) in recovery period with low tumor burden were used for cfMeDIP-seq library construction with some modi cations ( g. S1, A-E, and table S1). Both Input and IP libraries were sequenced for pair-end reads with around 0.5 × and 5 × coverage respectively (table  S2). Interestingly, we observed a decrease of short cfDNA fragments (100-150 bp) density and short fragments ratio (de ned as the ratio of short cfDNA fragments to long cfDNA fragments (151-220 bp)) in IP libraries compared with it in corresponding Input libraries for healthy individuals (Fig. 2, A-C and G), whereas these phenomena were not seen in patients with breast cancer (Fig. 2, D-F and H). Furthermore, mean cfDNA fragments size was found to increase from 170.06 (Input libraries) to 173.04 (IP libraries) bp in healthy individuals, which was not observed in cancer patients (170.51 to 170.71 bp) as well ( g. S2, A and B). To examine differences between healthy individuals and cancer patients, percentage change of short fragments ratio from IP library to corresponding Input library was calculated, we found that patients with breast cancer had signi cant smaller changes compared with healthy individuals ( g. S2, C-E and table S3).
To nd out the short fragments ratio variation across human genome, genome-wide cfDNA fragmentation pro les in both Input (Fig. 2I, upper panel) and IP (Fig. 2I, middle panel) libraries were shown in 5-Mb windows for participants in discovery cohort according to the method described previously [3], changes of cfDNA fragmentation pro le (IP -Input) due to 5mC-IP were calculated through subtracting the short fragments ratio in Input libraries from the short fragments ratio in IP libraries in each 5-Mb genomic window (Fig. 2I, lower panel). Smaller changes of short fragments ratio between IP library and Input library were observed in almost all genomic windows across human genome for patients with breast cancer.
Overall, these results suggested that more cancer-derived short cfDNA fragments were enriched during 5mC-IP reaction than noncancer-derived short cfDNA fragments. Therefore, we hypothesized that the enrichment of short cfDNA fragments in cancer patients might be due to the differences in methylation pro les.

Relationship Between Methylation And Fragment Size In Cfdna
To examine origins of the enriched short cfDNA fragments in patients with breast cancer, we rst identi ed 2,211 differentially methylated regions (DMRs) between cfDNA of patients and healthy individuals (1,241 hypermethylated, 970 hypomethylated in patients at padj < 0.05 and |log2FoldChange| > 1 with each region represented 10kb genomic window) (Fig. 3, A and B, and table S4). We further evaluated DMRs-dependent cfDNA fragmentation pattern in IP libraries, it was found that cfDNA released from hypomethylated regions had higher short fragments ratio than hypermethylated regions in both patients and healthy individuals (Fig. 3C). Analysis of percentage change for short fragments ratio in hypomethylated regions compared with hypermethylated regions showed patients with breast cancer had increased short fragments ratio in hypomethylated regions compared with healthy individuals (Fig. 3D), which indicated that enriched cancer-derived short cfDNA fragments might be mainly released from hypomethylated regions.
In accordance with increased short fragments ratio in hypomethylated regions, size distribution of cfDNA fragments mapped to hypomethylated regions was found to shift to the direction of smaller size compared with cfDNA fragments mapped to hypermethylated regions, and this shift was to a greater extent in patients with breast cancer (Fig. 4, A-F). Moreover, mean size of cfDNA fragments mapped to hypomethylated regions decreased more in patients with breast cancer (4.60 bp, 172.33 bp in hypermethylated regions to 167.73 bp in hypomethylated regions) than healthy individuals (2.87 bp, 174.54 bp in hypermethylated regions to 171.67 bp in hypomethylated regions). Collectively, these ndings again demonstrated that in contrast to healthy individuals, patients with breast cancer had enriched short cfDNA fragments during 5mC-IP reaction, which might mainly originated from hypomethylated genomic regions.
To further con rm the origin of short cfDNA fragments, size of cfDNA fragments in patients with lung cancer from another study were also investigated [25]. As expected, patients with lung cancer had higher percentage change of short fragments ratio in hypomethylated regions compared with it in hypermethylated regions ( g. S3, A and B).

Breast Cancer Diagnostic Accuracy In Validation Cohort
To verify whether the ndings obtained from discovery cohort could be applied for diagnosis of breast cancer, we performed cfMeDIP-seq for cfDNA extracted from 11 patients with breast cancer (P4 -P14) and 8 healthy individuals (H4 -H11) in validation cohort (Table S1). All patients with breast cancer had not undergone previous treatment and were con rmed through biopsy. Similarly, increased short cfDNA fragments density in IP libraries of patients with breast cancer was observed (Fig. S6, A and B, and g. S7). Within the identi ed 731 DMRs, higher percentage change of short fragments ratio as well as greater shift of size distribution of cfDNA fragments in hypomethylated regions compared with hypermethylated regions were also found for patients with breast cancer (Fig. S8, A-D, g. S9, and table S5).
Subsequently, we assessed whether DMRs-dependent cfDNA fragmentation pro le could differentiate cancer patients from healthy individuals in validation cohort. It was found that abnormal input-adjusted short fragments ratio in speci c hypomethylated genomic windows were present for most of the patients with breast cancer, whereas it remained consistent in healthy individuals ( g. S10 and g. S11).
We then developed an approach called 'correlation assessment of DMRs-dependent cfDNA fragmentation pro le' to evaluate the abnormality of short fragments ratio in 72 frequently altered hypomethylated genomic windows with at least 20 unduplicated cfDNA fragments for all samples and input-adjusted short fragments ratio of no more than 10 for any samples were identi ed within each window. Correlation analysis of input-adjusted short fragments ratio from each participant to the median input-adjusted short fragments ratio of healthy individuals in the 72 hypomethylated windows was performed. It was found that healthy individuals had higher correlation with an average of 0.83, whereas patients with breast cancer had lower correlation with an average of 0.68 (Fig. 6A). If using the correlation value as classi er for detecting patients as being healthy or having cancer, at a threshold of 0.72, we detected 7 out of 11 patients as having breast cancer (63.6% sensitivity), whereas no healthy individuals were mis-detected (100% speci city) ( Table 1). Receiver operator characteristic analysis for the detection of patients with cancer had an area under the curve (AUC) value of 0.909 (95% con dence interval, 0.771-1.000) (Fig. 6B). Taken together, DMRs-dependent cfDNA fragmentation pro ling could distinguish patients with breast cancer and healthy individuals.

Discussion
Genome-wide DNA methylation alteration has been demonstrated to present in neoplastic tissue and lead to the changes of chromatin structure [26,27], which is the direct source for releasing cfDNA into plasma. However, it is still unknown to what extent DNA methylation may affect the initiation and release of cfDNA. In this study, we not only proved that cfDNA fragment size was associated with cfDNA methylation, but also suggested that DMRs-dependent cfDNA fragmentation pro le might provide an alternative approach for breast cancer diagnosis with high e cacy.
Although the fact that size of cancer-derived cfDNA is smaller compared to noncancer-derived cfDNA has been unravelled recently [11], the cause of this shortening remains unclear. Differences in nucleosome wrapping and action mode of nuclease during apoptosis are considered to determine the size of cfDNA fragments in plasma [28]. As nucleosome compaction and rigidity decrease upon DNA demethylation [29,30], hypomethylated regions in genome should be more vulnerable to nuclease digestion during apoptosis in theory. In accordance with this hypothesis, our results showed that cfDNA originated from hypomethylated regions in patients with breast cancer tend to have signi cant smaller size compared with healthy individuals, which might be the result of excessive digestion of the wrapped DNA in nucleosome by nuclease with decreasing nucleosome compaction in hypomethylated regions (Fig. 7). Furthermore, global DNA hypomethylation presented in white blood cells of patients with breast cancer could lead to the genome instability and opening of chromatin [31,32], and thus aggravated the nuclease digestion (Fig. 7). Despite the variation of cfDNA fragmentation pro le in hypomethylated regions in patients with breast cancer, it was relatively consistent in healthy individuals. We identi ed that short fragments ratio of cfDNA mapped to both hypermethylated regions and hypomethylated regions had less changes in healthy individuals, and we supposed this phenomenon was an indicator of genome integrity and stability.
Hypomethylation in promoter regions of oncogenes was found to occur in breast carcinomas [33,34], therefore aberrant short cfDNA fragments might partially originated from certain oncogene. Indeed, previous studies suggested that short cfDNA fragments harbor footprints of transcription factors [19]. In this study, cfDNA mapped to TRAF3IP3, PTPRN2 and GALNT9 gene locus in hypomethylated regions were found to have signi cant increased short fragments ratio in patients with breast cancer. Increased expression of these three genes during tumor growth has been reported previously, which might indicate demethylation in their promoter regions [35][36][37][38], and thus lead to the release of short cfDNA fragments due to excessive digestion by nucleases. In addition, most of genomic windows in hypomethylated regions that have altered short fragments ratio in patients with breast cancer were found to colocalized with histone modi cation marker H3K27ac (data not shown), therefore DNA modi cation might also work in conjunction with histone modi cation for determining the fragmentation pattern of cfDNA in breast cancer.
This study showed that high sensitivity and speci city detection of early-stage breast cancer could be achieved by characterizing the fragmentation pro le of cfDNA in DMRs. As genome-wide fragmentation pro les varied slightly for participants in validation cohort, differentiating patients with breast cancer from healthy individuals became di cult under this circumstance. Besides, although various DMRs were identi ed, cancer-related DMRs rather than individual variations-related DMRs still required further discrimination. Nevertheless, through detecting DMRs-dependent cfDNA fragmentation pro le, we could not only precisely focus on the genomic regions that might lead to aberrant cfDNA release, but also help to evaluate diagnostic value of each DMR. If the variation of DMRs-dependent cfDNA fragmentation in patients with breast cancer could be con rmed in a larger population-based cohort study in future, it could be utilized as a companion approach to routine diagnostic method for detecting patients with breast cancer.
Aberrant epigenetic modi cations including changes in DNA methylation, histone modi cations and chromatin remodeling are considered to occur at very early stage in neoplastic development and cancer initiation [39][40][41][42], hypomethylated intergenic and intronic regions have further been demonstrated to appear early in the transition from normal to neoplastic cells [26,43,44]. Release of short cfDNA fragments in hypomethylated regions thus should also occur at early stage during cancer development, which can guarantee early and real-time monitoring of breast cancer development through DMRsdependent cfDNA fragmentation pro le.
Chromatin remodeling involves the assembly of nucleosomes and regulation of DNA accessibility, which may differ depending on the tissue investigated. Through calculating cfDNA short fragments ratio in DMRs, it is possible to remind us the original chromatin structure pro le and thus help to inform tissue of origin, which has been demonstrated to be feasible in another study [45]. For example, the altered cfDNA fragmentation pro le in TRAF3IP3, PTPRN2 and GALNT9 gene locus together with their upregulated expression could remind us the chromatin changes due to the development of breast carcinomas. In future, DMRs-dependent cfDNA fragmentation pro le should be further characterized together with chromatin changes in multiple cancer types to validate the results obtained in this study.

Conclusions
To summarize, through analyzing cfDNA methylation and fragment size simultaneously, this study reveal that the short cfDNA fragments were possibly originated from hypomethylated DNA regions in patients with breast cancer, and demonstrated the feasibility of using a DMRs-dependent cfDNA fragmentation pro ling method for detecting breast cancer. Several limitations should also be taken into consideration. The population in this study was relatively small, so as to eliminate misleading, cfDNA samples in discovery cohort were from patients that in recovery period and undergoing treatment, whereas cfDNA samples in validation cohort were from patients at the time of diagnosis. In searching for differentiated methylation pro le between patients with breast cancer and healthy individuals, we de ned genomic windows with 10kb in length that have padj < 0.05 and |log2FoldChange| > 1 as DMRs, which might not be the most appropriate selecting threshold. With more participants as well as more patients with multiple cancer types investigated, identi cation of DMRs for calculating cfDNA fragment ratio still need further calculation. Figure 1 Schematic representation of the improved cfMeDIP-seq approach used in this study. Plasma was collected from patients with breast cancer and healthy individuals. cfDNA was extracted and processed into adapter ligation and 5mC-immunoprecipitation (IP) for sequencing library construction. cfDNA methylation and fragmentation pro le were identi ed through analyzing the NGS data.  Breast, patients with breast cancer; * represents P value < 0.05.  DMRs-dependent cfDNA fragmentation pro les. (A) Input-adjusted short fragments ratio were shown with 10-kb windows in hypermethylated and hypomethylated regions for both patients with breast cancer (purple) and healthy individuals (black). (B) Distribution of the cfDNA fragmentation pro le mentioned above was shown across human genome. The input-adjusted short fragments ratio in each 10-kb window was calculated by dividing short fragments ratio in each 10-kb window by the short fragments ratio in corresponding input libraries. Differentially methylated 10-kb windows were selected for representation according to the following criteria: (1) hypermethylated 10-kb windows have padj < 0.05 and log2FoldChange > 1; (2) hypomethylated 10-kb windows have padj < 0.05 and log2FoldChange < -1;

Figures
(3) the selected windows should have at least 20 deduplicated cfDNA fragments for all samples including patients with breast cancer and healthy individuals; (4) the selected windows should have input-adjusted short fragments ratio of less than 10 for any samples analyzed. Hyper, hypermethylated genomic resiong; Hypo, hypomethylated genomic regions.

Figure 6
Detection of breast cancer using DMRs-dependent cfDNA fragmentation pro le. (A) Input-adjusted short fragments ratio was depicted for hypomethylated genomic windows, individual pro le was colored according to their Pearson correlation to the healthy median in each genomic window. (B) Receiver operator characteristics for breast cancer detection using correlation assessment of DMRs-dependent cfDNA fragmentation pro le. AUC=0.909; 95% CI (0.771-1.000). Healthy, healthy individuals; Breast, patients with breast cancer.