Human mammary gland carcinoma cell line MCF-7 was obtained from ATCC. MCF-7 were grown in DMEM(Gibco,11995065) supplemented with 10% FBS (Gibco,10099141), 0.01mg/ml insulin(), and 1% penicillin-streptomycin(Gibco, 15140122). The cell line was regularly checked for mycoplasma infection (Yeasen, 40612ES25).
Nuclei isolation and MTase treatment
Cells were grown to 70–80% confluency, and were collected by TrypLE (Gibco,12604013). After 300xg centrifuge for 5 minutes, nuclei were isolated with lysis buffer (100 mM Tris–HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1 mM EDTA, 0.5% CA630) for 5 minutes on ice. Nuclei were centrifuged at 300xg in wash buffer (100 mM Tris–HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1 mM EDTA) at 4 degree, and washed twice for 5 minutes and counted.
1x10^6 intact nuclei were subjected to an m6A methylation reaction mixture containing 1x Cutsmart buffer (NEB), 200U of non-specific adenine methyltransferase M.EcoGII (NEB, M0603S), 300mM sucrose, and 96 uM S-adenosylmethionine (NEB, B9003S) in 500ul volume. The reaction mixture was set up at a 37-degree thermomixer with shaking at 1000rpm for 30 minutes. S-adenosylmethionine was replenished at 640uM every 7.5 minutes at 7.5, 15, and 22.5 minutes into the reaction mixture. The reaction was stopped by adding an equal volume of stop buffer (20 mM Tris-HCl pH 7.4, 600 mM NaCl, 1% SDS, 10 mM EDTA). No methylation controls were treated in the same conditions without adding M.EcoGII in the reaction mixture. The samples were then treated with 20ul of Proteinase K (20mg/ml) at 55 degrees overnight, and the DNA was extracted with phenol: chloroform extraction and ethanol precipitation.
ecDNA isolation, purification, and sequencing
ecDNA was isolated by Circle-Seq(Møller 2020) method, which digested linear DNA with modifications. Briefly, 10ug of M.EcoGII treated DNA was subjected to a reaction mixture containing 1x plasmid-safe reaction buffer, 20U plasmid-safe ATP-dependent DNase (Lucigen, E3101K), 1mM ATP, and nuclease-free water was supplemented to a final volume of 100ul. The reaction mixture was incubated at 37 degrees for 7 days. Every 24 hours, the reaction mixture was replenished by adding 20U plasmid-safe ATP-dependent DNase, 1mM ATP, and 0.4ul 10X plasmid-safe reaction buffer. Digested ecDNA was purified with 1.8X AMpure XP beads (Beckman Coulter).
Purified ecDNA was prepared for nanopore sequencing by ligation kit LSK-SQK108(ONT). The samples were 10kb by Covaris G tubes, end-repaired and dA-tailed using NEBnext Ultra II end-repair module (NEB), followed by clean-up using 1.8X AMpure XP beads. Sequencing adaptors and motor proteins were ligated to end-repaired DNA fragments using blunt/TA ligase master mix (NEB), followed by clean-up using 0.4x AMpure XP beads. 1ug adaptor-ligated samples per flow cell were loaded onto PRO-002 flowcells and run on PromethION sequencers for up to 72h. Data were collected by MinKNOW v.1.14.
Base-calling and Linear DNA methylation calling
Reads from the ONT data were processed using Megalodon V2.2.9, which used Guppy base caller to base-calling, and Guppy model config res_dna_r941_min_modbases-all-context_v001.cfg released into the Rerio repository was used to identify DNA m6A methylation. Megalodon_extras was used to get per read modified_bases from the Megalodon basecalls and mappings results. To further explore the accurate threshold of methylation probability, a control sample with almost no m6A methylation was used as background noise, and the Gaussian mixture model was used to fit the methylation probability distribution generated by Megalodon.
ONT Reads meet the following conditions were defined as ecDNA molecules performed by the inner mappy/minimap2 aligner (Li 2018). (1) One segment (> 1kb) of an ONT read was mapped to the genome at one site, and another segment (> 1kb) was mapped to the genome at another site. (2) Two segments were mapped to the same chromosome. (3) Two segments were mapped to the same strand of the genome. (4) Two segments in a pair showed outward orientation.
Nanopore ecDNA methylation calling
Due to ecDNA special structure, the m6A calling cannot be successfully performed by aligning to the reference genome, especially for junctional regions. The custom python script was used to assemble ecDNA reference genome sequences according to the table generated from the previous step. Considering that the read length might be longer than the ecDNA reference, the ecDNA reference was subsequently preprocessed by adding 10M N to the ends to increase the mapping efficiency. The downstream step is performed in a similar way as linear DNA methylation calling.
Annotation and methylation configuration
TES, TTS, CDS, and other gene elements were downloaded from UCSC Table Browser, And the gene elements were processed into 50bp bin for downstream analysis. Linear DNA and ecDNA were also processed to the size of 50bp bin and sliding for 5bp. The accessibility score over multi base-pair windows was calculated as methylation ratio = m6A bases in all covered reads under bin/ adenosine bases in all covered reads under the bin.
RNA-seq data analysis
The RNA-seq data of MCF-7 was downloaded from the Gene Expression Omnibus (GEO) repository database with the accession number GSE71862. The gene expression was divided into three categories: high, medium, and low, representing 25%, 25%-75%, and 75% gene expression rank, respectively.
To evaluate co-accessibility patterns along the genome, we applied COA as follows. Each chromosome in the genome was split into windows of size w. For each such window ( i, i + w), we identified another window (j,j + w) such that the span (i,j,w) was covered by ≥ N reads. For each single spanning molecule k, accessibility scores (A) in each bin were then aggregated and binarized as described above. The local co-accessibility matrix between two windows was calculated: