Materials.
Q5® High-Fidelity DNA Polymerase, Q5® Hot Start High-Fidelity DNA Polymerase and Taq DNA Polymerase were purchased from New England Biolabs (NEB). Eastep Gel and PCR Cleanup Kit was purchased from Promega. Oligo pool was synthesized by Twist Bioscience. All primers we used were synthesized by GENEWIZ and in Table S1.
DNA Master Oligo Pool.
The synthesized oligo pool was resuspended in 1× TE buffer for final concentration of 2 ng/uL. The DNA master pool 1 containing 11520 DNA strands used in this study was previously prepared in our laboratory. The details of preparation the DNA master pool were as follows: PCR was performed using 10 ng ssDNA pool, 4 µM of the forward primer F1 / F2 and 4 µM of the reverse primer R1 / R2, 10 µL 5× Q5 Reaction Buffer, 0.2 mM dNTPs, 0.5 µL Q5 High-Fidelity DNA Polymerase in a 50 µL reaction. Thermocycling conditions were as follows: 5 min at 98°C; 10 cycles of: 10 s at 98°C, 30 s at 54°C, 30 s at 72°C, followed by a 5 min extension at 72°C. The reaction was then purified and eluted in 50 µL DNase/RNase-free water according to the instructions in the Eastep Gel and PCR Cleanup Kit. This library was considered the master pool for deep replication.
Deep replication reaction in PCR amplification.
Deep replication was performed using 0.5 µL Q5 High-Fidelity DNA Polymerase/ Q5 Hot Start High-Fidelity DNA Polymerase, 10 ng DNA master pool, 4 µM of forward primer F1-1 / F2; 4 µM of reverse primer R1 / R2, 10 µL 5× Q5 reaction buffer in a 50 µL reaction. Thermocycling conditions were as follows: 5 min at 98℃; 10 cycles / 30 cycles of: 30 s at 98℃, 30 s at 54℃, 10 s at 72℃, followed by extension at 72℃ for 5 min. However, PCR with 60 cycles were divided into two consecutive PCR process with 30 cycles. Detailly, amplicons generated by PCR with 30 cycles were purified and used as the template in the next PCR with 30 cycles. The thermocycling conditions were the same as above mentioned. The PCR product was purified by Eastep Gel and PCR Cleanup Kit and eluted in 50 µL DNase/RNase-free water. Then the amplicons were sequenced on Illumina Hiseq 4000 platform. The reaction system using Taq DNA Polymerase were 0.5 µL Taq DNA Polymerase, 10 ng DNA master pool, 4 µM of forward primer F1-1, 4 µM of reverse primer R1, and 5 µL 10× Taq reaction buffer in a 50 µL reaction. The thermocycling protocol was: 5 min at 95℃; 10 cycles / 30 cycles of: 30 s at 95℃, 30 s at 54℃/67℃, 15 s at 68℃, followed by extension at 68℃ for 5 min.
The preparation 10 DNA sequences (five DNA strands with high GC content, and five with low GC content).
The 10 DNA sequences had been individually synthesized in the form of miniplasmid (synthesized by GENEWIZ), amplified using PCR: 10 ng DNA strand, 4 µM of the forward primer F1-1 and 4 µM of the reverse primer R1, 10 µL 5× Q5 Reaction Buffer, 0.2 mM dNTPs, 0.5 µL Q5 High-Fidelity DNA Polymerase in a 50 µL reaction. Thermocycling conditions were as follows: 5 min at 98°C; 30 cycles of: 30 s at 98°C, 30 s at 54°C, 10 s at 72°C, followed by a 5 min extension at 72°C. The product was then purified and eluted in 50 µL DNase/RNase-free water according to the instructions in the Eastep Gel and PCR Cleanup Kit.
qPCR for ten DNA sequences.
Preparation of the standard curve of the TaqMan probe: 1 pg, 10 pg, 100 pg, 1 ng and 10 ng of top 1 / bottom 1, 0.4 µM of the forward primer F1-1 (Top1-F / Bottom1-F) and 0.4 µM of the reverse primer R1 (Top1-R / Bottom1-R), 0.2 µM TaqMan probe T1 (T1-1) and 0.2 µM TaqMan probe T2, 5 µL 10× Taq Reaction Buffer, 0.2 mM dNTPs, 2 µL Taq DNA Polymerase in a 50 µL reaction. The mixtures were incubated in a QuantStudio 6 qPCR System (Thermo Fisher Scientific) as follows procedure: 5 min at 95°C; 40 cycles of: 30 s at 95°C, 30 s at 54°C, 20 s at 68°C, with fluorescence measurements being taken at each cycle.
The details of qPCR were as follows: 1 ng of each strand (total 10ng), 0.4 µM of the forward primer F1-1 and 0.4 µM of the reverse primer R1, 0.2 µM TaqMan probe T1 and 0.2 µM TaqMan probe T2, 5 µL 10× Taq Reaction Buffer, 0.2 mM dNTPs, 2 µL Taq DNA Polymerase in a 50 µL reaction. The mixtures were incubated in a QuantStudio 6 and the procedures were the same as above mentioned.
qPCR for the product of deep replication.
The DNA mixtures of ten oligos at different concentration were amplified using 10, 30 and 60 PCR cycles. 0.25 ng, 0.45 ng, 0.5 ng, 0.6 ng, 0.7 ng of 1–5 and 2.5 ng, 2 ng, 1.5 ng, 1 ng, 0.5 ng of 6–10 DNA strand (total 10 ng, top 1: bottom 1 = 1: 10), or 1 ng of each strand (total 10 ng, top 1: bottom 1 = 1: 1), or 2.5 ng, 2 ng, 1.5 ng, 1 ng, 0.5 ng of 1–5 and 0.25 ng, 0.45 ng, 0.5 ng, 0.6 ng, 0.7 ng of 6–10 DNA strand (total 10 ng, top 1: bottom 1 = 10: 1), 4 µM of forward primer F1-1, 4 µM of reverse primer R1, 10 µL 5× Q5 reaction buffer in a 50 µL reaction. Thermocycling conditions were the same as deep replication reaction above mentioned. The 10, 30 and 60 cycles PCR product was purified by Eastep Gel and PCR Cleanup Kit and eluted in 50 µL DNase/RNase-free water. Then 10 ng of DNA out of PCR products were subjected to qPCR analysis, 0.4 µM of the forward primer Top1-F / Bottom1-F and 0.4 µM of the reverse primer Top1-R / Bottom1-R, 0.2 µM TaqMan probe T1-1 / T2, 5 µL 10× Taq Reaction Buffer, 0.2 mM dNTPs, 2 µL Taq DNA Polymerase in a 50 µL reaction. The mixtures were incubated in a QuantStudio 6 and the procedures were the same as above mentioned.
Sequencing on an Illumina Hiseq 4000 platform.
Sample collection and preparation. DNA degradation and contamination were monitored on 2% agarose gels. DNA purity was checked using the NanoPhotometer spectrophotometer (IMPLEN, CA, USA). DNA concentration was measured using Qubit DNA Assay Kit in Qubit 2.0 Flurometer (Life Technologies, CA, USA).
Library preparation for sequencing. A total amount of 700 ng DNA per sample was used as input material for the DNA sample preparations. Sequencing libraries were generated using NEB Next® Ultra DNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, the Chip DNA was purified using AMPure XP system (Beckman Coulter, Beverly, USA). After adenylation of 3’ ends of DNA fragments, the NEB Next Adaptor with hairpin loop structure were ligated to prepare for hybridization. Then electrophoresis was used to select DNA fragments specified in length. 3 µL USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated DNA at 37°C for 15 min. At last, the products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.
Clustering and sequencing. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using HiSeq 4000 PE Cluster Kit (Illumina) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq 4000 platform and 150 bp paired-end reads were generated.
The bioinformatic statistical analysis.
We combined the sequenced read pairs using PEAR to obtain the complete sequenced reads. The sequenced reads were aligned to the actual sequences using BLAST. The coverage and number per million reads were obtained using the Valid_Coverage_Number.pl, and the frequency per coverage was calculated by dividing by the sum of these numbers (Fig. 2A). The Gini coefficient was calculated using R (Fig. 2B). The GC content was obtained using the get_GC.pl (Fig. 2C). The read number of each oligo was obtained using the Valid_Coverage_Number.pl, after which the depth per oligo was calculated by dividing the reads number of each oligo by the total reads number to display the distribution of depth as shown in Fig. 4A. The increment was calculated by dividing the depth in the PCR reaction with 60 cycles or 30 cycles by the corresponding depth in the PCR reaction with 10 cycles (Fig. 3A). The coverage of aligned sequences was sorted from small to large and numbered in sequence. The serial numbers were then used to select the top 1% (115 oligos) and bottom 1% (115 oligos) from the PCR reactions with 10, 30 and 60 cycles were selected and the average GC content of these sequences was calculated (Fig. 3B). The increment was sorted from small to large and numbered in sequence. The serial numbers were then used to select the top 1% (115 oligos) and bottom 1% (115 oligos) increments for 30 cycles/10 cycles and 60 cycles/10 cycles, after which the average increment of these sequences was calculated (Fig. 3C). The secondary structures and Gibbs free energy values of oligonucleotides were calculated using NUPACK (http://www.nupack.org) (Fig. 4B). The k-mers of the sequences were analyzed using the kmer.pl (Figure S1). Based on these data, the distribution of GC content, Gibbs free energy and increment could be plotted as shown in Fig. 5.