Antibodies and Staining solution
Antibodies and reagents used were APC anti-human CD3 Antibody (1:20, BioLegend, 317318), 7-AAD Staining Solution (1:50, Abcam, ab228563).
Cell culture
All established cell lines were purchased from the National Collection of Authenticated Cell Cultures (Shanghai, China). Cells were cultured at 37 °C in an atmosphere of 5% (v:v) carbon dioxide in DMEM (NIH-3T3, HeLa, and HEK293T) or RPMI (K562, Jurkat) supplemented with 10% fetal bovine serum.
Indexed Tn5 Transposome assembly
The assembly of the i7-only Tn5 transposome was performed as per the manufacturer’s instructions for TruePrep Tagment Enzyme (Vazyme, S601-01) reagent. TN5_A_ME and TN5_R2_index (sequences are provided in Supplementary Table 1) were synthesized and purified by high-performance liquid chromatography (Sangon Biotech (Shanghai)) and dissolved into Annealing Buffer (Vazyme, S601-01) at a final stock concentration of 10 µM. For the annealing reaction, oligonucleotides were mixed at a 1:1 molar ratio at 10 μM and mix them thoroughly and anneal samples using the following thermocycling parameters: 75 °C for 15 min; 60 °C for 10 min; 50 ℃ for 10 min; 40 °C for 10 min; 25 °C for 30 min. After this step oligonucleotide cassette can be aliquoted and frozen at -20 °C for future transposome assemblies. To assembly the Tn5 transposase, we mixed 7 µl of oligonucleotide cassette from the previous step with 4 µl of TruePrep Tagment Enzyme (Vazyme, S601-01), and 39 ul coupling buffer (Vazyme, S601-01), mixed well and then incubated for 1h at 30 ˚C in a thermocycler. The resulting 50 µl of assembled transposome can be stored at -20 ˚C for at least one month.
Assays of Tn5 in vitro tagmentation on RNA/DNA hybrids
Messenger RNA preparation. Total RNA was extracted from HEK293T cells with TRIzol (Invitrogen, 15596026), according to the manufacturer’s instructions. The resulting total RNA was treated with DNase I (NEB, M0303) to avoid genomic DNA contamination. Phenol/chloroform extraction and ethanol precipitation were then performed to purify and concentrate total RNA. For mRNA isolation, mRNA Capture Beads (Vazyme, N401-01) were used according to the manufacturer’s instructions. Cre mRNA was prepared by in vitro transcription of synthetic Cre DNA with HiScribe T7 ARCA mRNA Kit (NEB, E2060S) according to the manufacturer’s instructions, then treated with DNase I (NEB) to digest DNA template.
mRNA/DNA hybrids preparation. Hela mRNA and Cre mRNA were reverse transcribed into RNA/DNA hybrids separately by Maxima H Minus Reverse Transcriptase (Invitrogen, EP0752), according to the manufacturer’s protocol. mRNA/DNA hybrids products were purified with 0.6 X Agencourt AMPure XP SPRI beads (Beckman, A63881), according to the manufacturer’s protocol.Tn5 in vitro tagmentation on RNA/DNA hybrids. For testing whether or not the Tn5 transposase also stayed bound to its RNA/cDNA substrate after transposition, we added a transposition mix (4ul 5x Reaction buffer from TruePrep ® Tagment EnzymeTn5 transposase kit (Vazyme, S601-01), H2O, 2ul 10uM assembled transposome) to 50ng Hela and Cre RNA/DNA hybrids products (in a final volume of 20 µl) respectively and mixed gently by pipetting. Each RNA/DNA hybrid performed 3 tubes of the test. The transposition tubes were incubated at 55 °C for 10 min in a thermocycler with a heated lid. For 3 tubes:
- One of the transposition products (10ul) was added with 10 µl of 40 mM EDTA (pH8.0) and then stored on ice;
- One was added with 2 µl of 1% SDS and incubated at RT for 10 min and then stored on ice;
- The third one was added with 2 µl of 1% SDS and purified by using 1.2X Agencourt Agencourt AMPure XP SPRI beads to remove Tn5 and excess free adaptors, and eluted in 15ul nuclease-free water. Then added 25ul NEBNext High-Fidelity 2x PCR Master Mix (NEB, M0541S), 5ul 10uM S-P7 (CAAGCAGAAGACGGCATACGAGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-s-T), and 5ul nuclease-free water to the transposed RNA/DNA hybrids. The linear amplification PCR was performed in a thermocycler as follows:72 ˚C for 3 min, 98 ˚C for 45 s, 11 cycles [98 ˚C for 20 s, 67 ˚C for 30 s, 72 ˚C for 1 min], 72 ˚C for 1 min, storage at 4 ˚C. Then the PCR product was cleaned with 0.8x AMPure XP SPRI beads (Beckman Coulter, A63881), eluting in 20 µl of EB buffer.
PAGE analysis. The quality of the complexes was assessed on an 8% TBE gel (Invitrogen, EC62155BOX) according to the manufacturer’s instructions.
FIPRESCI methods
note:
- Sequences of oligos or primers used below are all provided in Supplementary Table 1.
- The round1 barcode information involved in each FIPRESCI experiment mentioned in the manuscript is provided in Supplementary Table 2.
Single cell suspension preparation. The single-cell suspension was further processed into nuclei or permeabilized cells.
Nuclei isolation. Cells (30,000-3,000,000) were washed with 3-10 ml of ice-cold 1x PBS (hyclone, SH30256.01) and centrifuged at 300xg for 5 minat 4 ˚C. The cell pellets were re-suspended in 100-1000 uL chilled lysis buffer (10 mM Tris-HCl, pH 7.4 (Sigma, T2194), 10 mM NaCl (Sigma, 59222C), 3 mM MgCl2 (Sigma, M1028), 0.1% NP40 (Sigma, 74385), 0.1% Tween-20 (Thermo, 28320), 0.01% digitonin (Thermo, BN2006), 1 U/µl Sigma Protector RNase inhibitor (Sigma, 3335399001) with 1% BSA (Sigma, A1933)), and pipette mix 10X. After incubation for 3-5 min on ice, add 1-10 ml chilled Wash Buffer ((10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 1% BSA) to the lysed cells. Pipette mix 5x, then centrifuged at 300 g for 5 min at 4 °C and resuspended in 5-500 µl of ice-cold PBS-BSA- RNase inhibitor (1x PBS supplemented with 1% w/v BSA (Sigma, A1933) and 1 U/µl Sigma Protector RNase inhibitor (Sigma, 3335399001)) and concentration was determined by using a LUNA™ Automated Cell Counter.
Cell permeabilization. Cells (30,000-3,000,000) were washed with 3-5 ml of ice-cold 1x PBS (Hyclone # SH30256.01) and centrifuged at 300xg for 5 min at 4 ˚C. The cell pellets were re-suspended in 100ul 1x PBS and fixed in 900 ul of ice-cold methanol (Fisher Scientific #M/4000/17) at -20 ˚C for 10 min. After two additional washes (centrifugation: 300xg, 5 min, 4 ˚C) with 1 ml of ice-cold PBS-BSA- RNase inhibitor and filtered through a cell strainer (40 µM or 70 µM depending on the cell size). Permeabilized cells were resuspended in 5-500 µl of ice-cold PBS-BSA-SUPERase and counted by using a LUNA Automated Cell Counter (Luna-FL).
Reverse Transcription. For 100,000 cells or nuclei (7 µl) were added with 3ul 25uM RT primers (containing RT PolyT Primer (NEB, S1327S), or RT random Primer (Thermo Scientific, SO142), or their mix), incubated for 5 min at 55 ˚C to resolve RNA secondary structures, then placed immediately on ice to prevent their re-formation. Then a mix of 40ul RT reaction buffer, containing 10 µl 5x Reverse Transcription Buffer, 2.5 µl of 100 mM DTT (Thermo Fisher Scientific, P2325), 2.5 µl of 10 mM dNTPs (Vazyme, P031-01), 2.5 µl of RNaseOUT RNase inhibitor (40 U/ml, Thermo Fisher Scientific, 10777019), 3.5 µl of Maxima H Minus Reverse Transcriptase (200 U/ml, Thermo Fisher Scientific, EP0753), and 21.5 µl nuclease-free water were added. The reverse transcription was incubated as follows (with heated lid set to 60 ˚C): 50 ˚C for 10 min, 3 cycles of [8 ˚C for 12 s, 15 ˚C for 45 s, 20˚C for 45 s, 30 ˚C for 30 s, 42 ˚C for 2 min, 50 ˚C for 3 min], 50 ˚C for 5 min, store at 4 ˚C.
i7 only transposome tagmentation. Cells or nuclei were then distributed into one or multiple 96-plates (18ul transposition mix: 4ul 5x Reaction buffer from TruePrep ® Tagment EnzymeTn5 transposase kit (Vazyme, S601-01), 12 ul H2O, 2ul 10uM indexed transposome was already added to each well). For each well, 2,000-4,000 cells or nuclei (1-2 µL) were added and pipetted to mix thoroughly, then incubate in a thermomixer at 1000rpm for 30min at 37℃. Tagmentation was stopped by adding 5 µl of 0.5M EDTA to quench TN5, and the sample was held on ice for 10 min.
Sample pooling. Then pooled the cells or nuclei together, added 240 μl 10% BSA (for a final 1% BSA) to prevent cells or nuclei from clumping during the following centrifugation step. Wells were washed with ice-cold 1x PBS containing 1% BSA, which was transferred to the same tube for maximum recovery. Samples were centrifuged at 600g for 5 min, removed supernatant, and washed pellets twice with 0.5 ml of ice-cold PBS-BSA-RNase-inhibitor, then resuspended in 20ul of PBS-BSA-RNase-inhibitor.
GEM generation & cDNA template switch. Cells or nuclei pools were then processed on a single 10x microfluidics lane. Sequencing libraries were prepared using a custom protocol based on the 10x Genomics Single Cell 5' RNA Reagent Kits (10x Genomics, Pleasanton, California) workflows. Briefly, the 10x workflow was followed up until single-cell Gel Bead-In-Emulsions (GEM) generation. The aimed target cell recovery for each library was based on customer requirements (according to our test, the targeted nuclei recovery rate was above 58% ). In brief, cellular suspensions were loaded on the sample chip in the Chromium Controller instrument (10X Genomics) according to the manufacturer’s instruction (for Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1 refer to User Guide: CG000207 Rev E, or Chromium Next GEM Single Cell 5' v2 refer to User Guide: CG000331 Rev A ) to generate GEMs. After the Chromium controller completed running, transferred 100ul GEMs into the tube strip on ice with the pipette tips against the sidewalls of the tubes, then incubate in a thermomixer for 30 min at 25 ˚C, 90 min at 42 ˚C, 10 min at 53 ˚C.
GEM clean-up. The emulsion was broken and purified according to the manufacturer’s instructions (10X Genomics). Briefly, 125 µl Recovery Agent (10x Genomics catalog no. 220016) was added to each sample (post GEM-RT incubation) at room temperature for 2min. Then 125 µl of the pink oil phase was removed by pipetting. The remaining sample was mixed with 200 µl of Dynabead Cleanup Master Mix (per reaction: 182 µl of Cleanup Buffer (10x Genomics, 2000088), 8 µl of Dynabeads MyOne Silane (10x Genomics, 2000048), 5 µl of Reducing Agent B (10x Genomics, 2000087) and 5 µl of nuclease-free water). After 10 min of incubation at room temperature, samples were washed twice with 300 µl of freshly prepared 80% ethanol (Merck, 603-002-00-5) and eluted in 35.5 µl of EB Buffer (Qiagen, 19086) containing 0.1% Tween (Sigma, P7949-500ML) and 1% v/v Reducing Agent B. Bead clumps were sheared with a 10 µl pipette or needle. 35 µl of the sample was transferred to a new tube.
cDNA enrichment. 35 µl of the above product were mixed with 65 µ PCR reaction mix, containing 50 µl of NEBNext High-Fidelity 2x PCR Master Mix (NEB, M0541S), 0.5 µl of 100 µM S-P5 primer (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC), and 14.5 µl of nuclease-free water. 5’-end cDNA was amplified in a thermocycler as follows:72 ˚C for 3 min, 98 ˚C for 45 s, 13-16 cycles (based on input cell number of [98 ˚C for 20 s, 67 ˚C for 30 s, 72 ˚C for 1 min], 72 ˚C for 1 min, storage at 4 ˚C. Then the cDNA enrichment reaction was cleaned with 0.8x AMPure XP SPRI beads (Beckman Coulter, A63881), eluting in 40 µl of EB buffer.
Library preparation and sequencing. 20ul of enriched cDNA products were then mixed with 80 µl library reaction mix, containing 50 µl of 2X KAPA HiFi HotStart Ready Mix (Kapa, KK2602), 0.5 µl of 100 µM S-P5 primer, 5 µl of 10 µM S-P7-index (CAAGCAGAAGACGGCATACGAGAT[NNNNNN]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-s-T), and 24.5 ul of nuclease-free water. The reaction was amplified in a thermocycler as follows: 98 ˚C for 45 s, 16 cycles of [98 ˚C for 20 s, 54 ˚C for 30 s, 72 ˚C for 20 s], 72 ˚C for 1 min, storage at 4 ˚C. libraries were purified with 0.75x AMPure XP SPRI beads and eluted with 24.5ul nuclease-free water. The final libraries were quantified using Bioanalyzer High-Sensitivity DNA chip (Agilent, 5067-4626) and Qubit HS assay (ThermoFisher Scientific, Q32854) and then sequenced in NovaSeq 6000 (Illumina, San Diego, CA) or MGISeq-T7 (MGI Tech Co., Ltd., China) with a 150-bp paired-end read length, targeting a depth of 10,000–50,000 reads per cell.
Species mixing controls
Human (Jurkat) and mouse (NIH-3T3) cell lines were processed using the FIPRESCI protocol as described above. Briefly, methanol-fixed whole cells were prepared separately for each of the cell lines. Prepared Jurkat and NIH/3T3 cells were mixed at a 1:1 ratio and performed reverse transcription. After RT reaction, cells were tagmented Tn5 on a 96-well plate loaded with indexed oligonucleotides (containing unique sets of round1 indices) (Supplementary Table 2). For 10x Genomics Chromium processing, 15300 cells per microfluidic channel were loaded to generate single-cell Gel Bead-In-Emulsions (GEM) using Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1 according to User Guide: CG000207 Rev E.
scRNA-seq and snRNA-seq experiment for FIPRESCI
For the proof-of-concept scRNA-seq and snRNA-seq experiments, HEK293 cells, K562 cells, and Hela cells were prepared nuclei and methanol-fixed, permeabilized whole cells separately for each of the cell lines. scRNA-seq and the snRNA-seq experiment were processed separately using the FIPRESCI protocol as described above. Briefly, methanol-fixed whole cells or nuclei were performed reverse transcription separately for each of the cell lines. After RT reaction, different cell lines were loaded into TN5 wells (containing unique sets of round1 indices) as Supplementary Table 2 described. For 10x Genomics Chromium processing, 100,000 cells or nuclei were loaded per microfluidic channel to generate single-cell Gel Bead-In-Emulsions (GEM) using Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1 according to User Guide: CG000207 Rev E.
Alternative buffers for tagmentation activity of FIPRESCI
For testing the tagmentation activity of Tn5 on RNA/DNA hybrids within permeabilized cells or nuclei, we performed a set of experiments with different 16 reaction buffers and variations according to the previously published studies (as described in the manuscript). Hela and HEK293 methanol-fixed permeabilized cells were used separately to test the performance of each tagmentation buffer (tagmentation buffers used in the manuscript are provided in Supplementary Table 3). Briefly, methanol-fixed Hela or HEK293 cells were performed reverse transcription separately. After RT reaction, different cell lines were loaded into TN5 wells containing different tagmentation buffers and unique sets of round1 indices as Supplementary Table 2 described. For 10x Genomics Chromium processing, 15,300 HEK293 cells and 15,300 Hela cells were loaded per microfluidic channel to generate single-cell Gel Bead-In-Emulsions (GEM) using Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1 according to User Guide: CG000207 Rev E.
E10.5 mouse embryo nuclei with FIPRESCI readout
For the collection of E10.5 mouse embryos, 8-10 week-old female C57BL/6 mice (purchased from Beijing Vital River Laboratory Animal Technology, Beijing, China) were mated to 10-12 week-old male PWK/PhJ mice (purchased from Jackson Laboratory) naturally, and the first day that the vaginal plug was observed was considered as E0.5. We collected 2 embryos. Embryos were washed twice with DPBS and were cut into small pieces. Then tissues were digested with 1 mg/ml type II collagenase (Gibco, 17101015) and 1 mg/ml type IV collagenase (Gibco, 17104019) at 37 °C for 40min.
Dissociated cells were centrifuged at 300 g for 5 min at 4 °C, then re-suspended in 1 mL of cold DPBS with 0.1% BSA. After passing through a 40 um strainer (Biologix), cells were washed twice, centrifuged at 300 g for 5 min at 4 °C, re-suspended in cold DPBS with 0.1% BSA at a density of 1×105 cells/ml, and stored on ice before sc-RNA-Seq and nuclei isolation.
To isolate nuclei, the cell pellets were re-suspended in 200 uL chilled lysis buffer (10 mM Tris-HCl, ph 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP40, 0.1% Tween-20, 1mM DTT, 1 U/µl Protector RNase inhibitor (Sigma, 3335402001), 0.01% digitonin and supplemented with 1% BSA), and pipette mix 10X. After incubation for 5 min on ice, add 1 ml chilled Wash Buffer (10 mM Tris-HCl, ph 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 1mM DTT, 1 U/µl Protector RNase inhibitor (Sigma, 3335402001), 1% BSA) to the lysed cells. Pipette mix 5x, nuclei were collected by centrifugation (500g, 5 min, 4 °C), and fixed in 1 ml of ice-cold 1× PBS containing 2% formaldehyde (Thermo Fisher Scientific, 28908) for 10 min on ice. Fixed nuclei were collected (500g, 5 min, 4 °C), the pellet was resuspended in 1.5 ml of ice-cold PBS-BSA-SUPERase (1× PBS supplemented with 1% w/v BSA (Sigma, A8806-5) and 1% v/v SUPERase-In RNase Inhibitor (Thermo Fisher Scientific, AM2696)) and nuclei were collected (500g, 5 min, 4 °C). After one more wash with 1.5 ml of ice-cold PBS-BSA-SUPERase, fixed nuclei were resuspended in chilled Diluted Nuclei Buffer (10x Genomics, PN-2000153) based on the number of cells used for isolation and assuming ~50% nuclei loss during cell lysis, and transferred to a 1.5 ml tube. If cell debris and large clumps are observed, pass through a cell strainer.
Fixed nuclei were then processed using the FIPRESCI protocol as described above. Briefly, 400,000 fixed nuclei were used for the reverse transcription reaction. After RT reaction, nuclei were gently pipetted then loaded into TN5 wells (containing unique sets of round1 indices). After tagmentation, stopping the reaction, pooling the sample, and washing, about 200,000 nuclei were remaining. For 10x Genomics Chromium processing, all the remaining nuclei were loaded to 10x Genomic microfluidic channel for generating single-cell Gel Bead-In-Emulsions (GEM) using Chromium Next GEM Single Cell 5' v2 according to User Guide: CG000331 Rev A.
Preparation of Human T cell for FIPRESCI
Whole blood was obtained from fourteen donors, including two healthy people and twelve patients who were pathologically diagnosed with cancer (including 5 cancer types: endometrial cancer, breast cancer, liver cancer, pancreatic donors, and stomach cancer) were enrolled in this study. These samples were collected from the Chinese People's Liberation Army General Hospital. The available clinical characteristics of twelve patients are summarized in Supplementary Table 4.
The fresh peripheral blood was collected in EDTA anticoagulant tubes and processed immediately. For each donor, Peripheral blood mononuclear cells (PBMCs) were isolated according to the manufacturer’s instructions for Ficoll-Paque PLUS (GE Healthcare, 17-1440-02). Briefly, 2 ml of fresh peripheral blood were combined, diluted in 2ml 1× PBS, and carefully added to a 15ml tube containing 2 ml of Ficoll. The tubes were centrifuged for 20 min at 1000g (minimum acceleration and deceleration). The interphase was carefully collected and washed with PBS and subsequently centrifuged for 10 min at 250g at room temperature. Isolated PBMCs were gently suspended with cryopreservation medium CELLBANKER2 (AMSBIO, 11891), and dispensed the cell suspension in 100ul aliquots to cryopreservation vials. Store the vials directly at -80℃ before further processing.
After isolating PBMCs from all samples, we used FACS to sort T cells for sequencing. Single cell suspensions were subjected to antibody staining with anti- CD3 and 7-AAD for FACS sorting performed on a BD Aria SORP instrument. FACS gates were drawn to include only live single cells based on 7-AAD. Further gates were drawn to arrive at CD3 +. Based on FACS analysis, live single T cells were sorted into 1.5 mL tubes (Eppendorf). The sorted cells were placed on ice until all samples were prepared.
Human T cells were then processed using the FIPRESCI protocol as described above. Briefly, methanol-fixed whole cells for reverse transcription were prepared separately for each of the donors (added 15,000 cells per donor). After RT reaction, different donors’ cells were loaded into TN5 wells (containing unique sets of round1 indices) as Supplementary Table 2 described. For 10x Genomics Chromium processing, 80,000 cells per microfluidic channel were loaded to generate single-cell Gel Bead-In-Emulsions (GEM) using Chromium Next GEM Single Cell 5' v2 according to User Guide: CG000331 Rev A.
Single cell VDJ enrichment from T cell cDNA product of FIPRESCI
VDJ libraries were enriched from the pan-cancer T cell cDNA product of FIPRESCI by using two consecutive nested PCRs and VDJ library preparation according to the manufacturer’s protocol (Chromium Single Cell V(D)J Reagent Kits, 10X Genomics). The final libraries were quantified using Bioanalyzer High-Sensitivity DNA chip (Agilent, 5067-4626) and Qubit HS assay (ThermoFisher Scientific, Q32854) and then sequenced in NovaSeq 6000 (Illumina, San Diego, CA) with a 150-bp paired-end read length, targeting a depth of 5,000 reads per cell.
Bulk ATAC-seq
10,000 Hela nuclei were used to generate bulk ATAC-seq library with TruePrep DNA Library Prep Kit V2 for Illumina kit (vazyme, TD501) according to the manufacturer’s protocol.
Processing of FIPRESCI data
FASTQ files generated from FIPRESCI were demultiplexed according to round1 barcodes, any mismatch wasn’t allowed in this step. After demultiplexing, FASTQ files with the same round1 barcode can be regarded as the output of standard 10X genomics single-cell 5-end RNA-seq experiments. Thus, reads containing the same round1 barcode were used as inputs for 10X Genomics software Cell Ranger (version 6.0.2, parameters: --chemistry=”fiveprime”, --include-introns). To generate the final gene expression matrix, we merged Cell Ranger output filtered_feature_bc_matrices from different round1 barcodes by Seurat (version 4.0.5)[25] “merge” function. Round1 barcodes were added to their corresponding round2 (10X genomics) barcodes through this step, forming cell barcodes. Therefore, cell barcodes and the final gene expression matrix were obtained. Finally, we saved that gene expression matrix as a final matrix in .rds format for downstream analysis.
Species Mixture FIPRESCI data analysis
We used Seurat (version 4.0.5) for the Species Mixture FIPRESCI data analysis. Firstly, cells whose UMI<1000 were filtered. To normalize gene expression levels, gene expression values for each cell were divided by total counts for that cell and multiplied by 10000 then log transformed. PCA (Principal Component Analysis) was performed after scaled and centered data. Dimensionality reduction method UMAP (Uniform Manifold Approximation and Projection) was performed by Seurat function “RunUMAP” with top 20 PCs and other default parameters. To distinguish mouse cell or human cell from mouse and human mixture data. We defined a cell as a mouse cell if its mm10 UMI counts percentage > 90%, a cell as a human cell if its mm10 UMI counts percentage < 10%.
Nuclei and permeabilized prepared three cell lines FIPRESCI data analysis
We analyzed nuclei or permeabilized prepared three cell line FIPRESCI data independently by Seurat (version 4.0.5). Similarly to species mixture FIPRESCI data analysis, after normalization and PCA, the top 10 PCs were used as input for the dimensional reduction method tSNE (t-distributed Stochastic Neighbor Embedding). For comparative analysis of the two kinds of cell preparation, firstly we selected the top 3000 highly variable genes from each cell preparation condition. Secondly, we used intersected highly variable genes as common features. A gene’s expression profile within one cell line of one cell preparation condition was calculated as the mean value of UMI counts. Finally, Spearman correlation was used to evaluate similarity across different cells prepared three cell line FIPRESCI profile. To identify differentially expressed genes, we performed a Wilcox test and calculated the average log2 fold change value between two groups of cells using the Seurat function “FindMarkers”.
Sensitivity analysis across different RT primer and cell preparation conditions
For sensitivity analysis across 6 different RT primer and cell preparation conditions, we selected 6 different round1 barcodes (each round1 barcode represented one RT primer and cell preparation condition) and their corresponding reads files. For each condition, we randomly downsampled read pairs and ensured that mean read pairs per cell are 10k, 20k, 30k, 40k, and 50k by seqtk (parameters: -s100) https://github.com/lh3/seqtk . Then, we mapped each downsampled read pair to the genome independently and calculated the average detected gene numbers per cell.
TSS enrichment and enhancer enrichment analysis for RT primer condition FIPRESCI data
We used deeptools (version 2.29.0)[35] and bedtools (version 3.3.0)[36] for reads enrichment analysis. Firstly, bam files were removed PCR duplication by Picard (version 2.21.6) https://broadinstitute.github.io/picard/. Then, we used the deeptools command “bamCoverage” (parameters: --normalizeUsing RPKM, --smoothLength=200) to generate a reads coverage profile across the genome. For TSS enrichment analysis, we used the deeptools command “computeMatrix reference-point” (parameters:–referencePoint TSS, -a 2000 and -b 2000) to calculate reads coverage around TSS. For enhancer enrichment analysis, we used macs2 (version 2.2.7.1, parameters: --shift -100, --extsize 200, --nomode and -q 0.01) [37] to call ATAC peaks from bulk Hela ATAC-seq data. Then, enhancers were defined as ATAC peaks whose distance to all TSS is greater than 2000 bp. We used deeptools “computeMatrix reference-point” command (parameters: –referencePoint center, -a 2000 and -b 2000) to calculate reads coverage around the enhancer center. Finally, we used the deeptools command “plotHeatmap” and “plotProfile” to visualize the results.
Analysis of peripheral T cell gene expression profile from FIPRESCI
We used Seurat (version 4.0.5) to analyze the peripheral T cell gene expression profile from FIPRESCI. Firstly, cells with mitochondria ratio > 5% and gene number detected <50 were filtered. To annotate cell types, we performed label transfer based on a public well-annotated CITE-seq PBMC atlas[25] by Seurat function “FindTransferAnchors” (parameters: normalization.method=”SCT”, reference.reduction=”spca”, dim=”1:30”) and Seurat function “MapQuery” (parameters: “ reduction.model=”wnn.umap”, reference.reduction=”spca”, refdata=”celltype.l2”). After label transfer, we filtered cells whose prediction score is smaller than 0.4, then we visualized remained cells embedding on reference UMAP by setting a parameter: reduction=”ref.umap” in Seurat function “DimPlot”. After annotation and embedding, we calculated each cancer donor’s cell types proportion changes compared with healthy donors. To identify differentially expressed genes, we performed a Wilcox test and calculated the average log2 fold change value between two groups of cells using the Seurat function “FindMarkers”. To find other differential features, we also imputed surface protein expression levels for our peripheral T cell data based on the CITE-seq PBMC atlas by Seurat function “MapQuery” ( parameters: refdata = list (predicted_ADT = “ADT”)).
Analysis of Treg sub-populations in peripheral T cells
We mainly used Seurat (Version 4.0.5) to analyze Treg sub-populations in peripheral cells. We extracted Treg expression matrix (UMI count matrix) from label transferred PBMC matrix described before. Then, we normalized and scaled the gene expression matrix by corresponding Seurat function with default parameters. After PCA, a shared nearest neighbor graph was constructed by Seurat function “FindNeighbors” with top 20 PCs. Finally, unsupervised clustering was applied on Treg cells with Louvain algorithm resolution=0.4, and so we got five Treg sub-population. We used R pacakage ComPlexHeatmap[38] to show five sub-population over donors.
To indentify which sub-population is at early stage. We implemented CytoTRACE[29] to Treg gene expression matrix to calculate CytoTRACE score for each cells. A cell with higher CytoTRACE score means less differentiated or at an earlier stage.
Analysis of TCR profile from FIPRESCI.
FASTQ files generated from the FIPRESCI TCR profile were used as input for 10X Genomics software Cell Ranger vdj (version 6.0.2) with default parameters. We only kept barcodes that satisfied the following criteria: 1. Corresponding cells in T cell gene expression profile data must be successfully annotated (annotation procedure was described before); 2. Round2 barcodes must be unique among annotated cells, that’s because reads in the FIPRESCI TCR profile only contained round2 (10X Genomics) barcodes. After raw sequencing data processing, we used scRepertoire (version 1.3.2)[39] for downstream analysis. In briefly, we calculated unique clonotype numbers by scRepertoire function “quantContig” (parameter: cloneCall=”gene+nt”). We also looked at the length distribution of the CDR3 sequences in the TRA and TRB chain by using function lengthContig with respective default parameters. Gene usages in both chains were investigated by function vizGenes with respective default parameters. Function clonalProportion helped us to see top clonotypes (ranked by frequency of occurrence) proportion within one sample. Clonotype diversity were calculated by function clonalDiversity (parameters: cloneCall = “gene+nt”, n.boots=1000(for Treg sub-population, n.boots=100)). Combined with the R package “circlize” (version 0.4.13)[40], we used scRepertoire function getCirclize to visualize shared clonotypes across different groups of cells.
Co-clustering of peripheral TCR and NeoTCR.
We downloaded NeoTCR CDR3 amino acid sequence from recent work[41]. We firstly filter cells who only detected single chain of CDR3 amino acid sequence, and then we used software GIANA[42] to cluster our peripheral TCR and NeoTCR CDR3 amino acid sequence (single chains) by default parameters (except –M TRUE –v FALSE). Since GIANA can embed TCR CDR3 seqeuence into a 96-dimention space, we calculated pair-wise pearson distance in the space. Finally, we visualized the TCR clusters and relationships between TCR clusters by R package ape (Version 5.6-2) function nj with default parametes and the pair-wise distance matrix.
ATAC peak signal prediction model building.
Hela FIPRESCI uniquely barcoded cells (UBCs) (for processing easily) were used as training data. Followings are training procedures. Firstly, To equip Hela FIPRESCI cells with ATAC peak signals, we matched Hela FIPRESCI cells with ATAC-seq cells from public Hela scATAC-seq data by OptMatch. We called two cells was a cell pair if they were matched cells. A cell pair was regarded as one pseudo cell that had two modalities: RNA profile and ATAC profile. Secondly, to extract eRNA information from FIPRESCI data, FIPRESCI R1 reads whose distance to all TSS was greater than 1Kb or overlapped with any exon were filtered. Remained reads were overlapped with bins from the scATAC-seq bin matrix to get read counts at each peak. Thirdly, we filtered bins occurring in fewer than 0.1% cells or more than 10% of cells. Bins passed filter criterion were used as complements to RNA modality in cell pairs. Finally, we employed BABEL to train an ATAC signal prediction model on cell pairs.
Computing the correlation of predict peak signal and bulk peak signal.
We used the ATAC signal prediction model described before to predict ATAC signals from FIPRESCI unpaired Hela cells (3,154/5,539). We can get a cell verse peak probability matrix P, P(i,j) indicated peak i’s accessible possibility in cell j. The sum of probability values of all cells for each peak was used as the predicted peak signal.
Dividing the whole genome as bins at 500bp, we applied the same coordinate to scATAC peaks and bulk peaks. To reduce multiple overlaps when coordinate shift, bedtools was implemented to compute the intersection between predicted peaks and whole genome bins (bedtools -f=0.5), then we got 327,516 bins as bin set A, while from bulk ATAC-seq peaks and whole genome bins we got 23,284 bins as bin set B (bedtools -f=0.2). The common bin number in Set A and Set B is 14,676. After ranking overlapped predicted peak bins signals, we computed the correlation between the top 6,000 predicted peak signals and experimental bulk peak signals.
Analysis of E10.5 mouse embryo gene expression profile
We used Seurat (version 4.0.5)[25] to analyze the E10.5 mouse embryo gene expression profile from FIPRESCI snRNA-seq. Firstly, cells with mitochondria ratio > 5% and gene number detected <50 were filtered. After normalization and PCA, the top 20 PCs were used to construct a KNN graph by Seurat function “FindNeighbors”. Then, the Louvain algorithm was applied to clustering with resolution=0.4. Differential expressed genes for each cluster were identified by Seurat function “FindAllMarkers”. (parameters: only.pos = TRUE) and other default parameters. UMAP was performed for visualization.
For integration analysis, we downloaded a public scRNA-seq profiled mouse organogenesis atlas [22]. For convenient calculation, we used randomly sampled 20 thousand annotated E10.5 mouse embryo cells from the atlas as reference. Before label transfer, the normalization method SCtransform[43] was applied to our FIPRESCI data for reducing the impact of sequencing depth. Then, we performed label transfer by Seurat function “FindTransferAnchors” (parameters: normalization.method=”SCT” , reference.reduction=”pca” ,dim=”1:20”) and Seurat function “TransferData”. After label transfer, we filtered cells whose prediction score is smaller than 0.4.
Trajectory analysis of E10.5 mouse brain
We mainly used monocle3 (version 1.0.0) [22] to analyze the E10.5 mouse brain trajectory. We extracted brain cells from whole E10.5 mouse embryo FIPRESCI data. Then we implemented Scanpy (version 1.8.2) to construct a PAGA (Partition-based Graph Abstraction) graph. UMAP embedding of those cells was renewed by Scanpy function “Scanpy.tl.umap” (parameters: init_pos=”paga”). Monocle3 was applied to the renewed UMAP embedding cells. Trajectory was learned by monocle3 function “learn_graph” (parameters: close_loop=FALSE) and other default parameters. With the prior knowledge that Neural Tube cells are the earliest cells, we programmatically determined root principal node in trajectory: firstly, grouping cells according to which trajectory graph node they are nearest to, then calculating what fraction of the earliest cells (Neural Tube) at each node, node who is most heavily occupied by the earliest cells is selected as root (source code is available at monocle3 tutorial pages). Beginning with the root node, pseudo time was assigned to each cell by monocle3 function “order_cells” with default settings.
Trajectory graph correlated genes were identified by monocle3 function “graph_test” with default settings. Then we selected genes with q value smaller than 0.01 after “graph_test”. Gene modules were identified using those genes by monocle3 function “find_gene_modules” (parameters: resolution=0.1, partition_qval=0.01).
Focused on Inhibitory neurons trajectory, we selected Inhibitory neuron trajectory related trajectory nodes, graphs, and cells to obtain an Inhibitory neuron trajectory sub-graph from the whole graph. We also identified trajectory graph correlated genes by monocle3 function “graph_test”. Genes with q value <0.05 and Moran’s index > 0.05 after “graph_test” are selected as graph correlated genes. Then K-mean clustering (k=5) was performed on those genes (gene verse pseudo-time matrix). Finally, we use ComplexHeatmap (version 2.6.2) to show those graph correlated gene expression’s dynamic changes along pseudo time.
TSS analysis of E10.5 mouse brain from FIPRESCI
We used the Paraclu peak caller[44] to identify TSS. For unsupervised clusters, we first grouped unique mapping reads according to clusters. We call peaks by Paraclu peak caller per clusters , Then we merged peaks from all clusters by bedtools(Version v2.30.0), and filter peaks whose width is greater than 300 bp. Finally peaks whose distance to any annotated TSS(from mm10 reference genome) is smaller than 100 bp would survive. Those survived peaks are regarded as TSS peaks. Reads mapping those TSS peaks can be regarded using the corresponding TSSs.
For inhibitory neuron trajectory TSS analysis, we firstly grouped unique mapping reads by equally 10 bins of pseudo time. Then we merged first 3 bins (in pseudo time increasing order) as early stage of trajectory, 4th to 7th bins as medium stage and 8th to 10th bins as later stage. We used same peak calling strategy described before, the only difference is we used 3 groups of reads (early, medium and later) rather than 15 unsupervised clusters.
To control for differences in overall gene expression, we computed the relative usage for each TSS by its reads proportion of all TSS reads in corresponding gene. This usage proportion would be converted to an integer by multiple scale factor 100 and then ceiling to nearest integer. Within a cluster, we treated a gene’s TSS reads count value as missing value if the sum of its TSSs’ reads is smaller than 0.5 quantile or great than 0.99 quantile over the cluster’s gene TSSs reads count distribution. We compared the relative usage for the set of TSSs in a given gene between each pair of samples using Fisher’s exact test implemented using the R function fisher.test. the test was performed on the 2 (samples used for comparison) by n (TSSs within one gene) table. We then took the minimum Benjamini-Hochberg corrected P value across all comparisons (for the multiple comparisons per gene), reported as the P value for differential usage across the samples within a gene. Within one TSS, the variance between 2 samples who used top 2 most proportion of that TSS was reported as the variance of the TSS.