Cell lines and cell culture
HCT116 cells were cultured in McCoy’s 5A (Gibco, 16600-082) medium supplemented with 10% fetal bovine serum (FBS) (Eurobio, CVFSVF0001) at 37°C in a humidified 5% CO2 atmosphere. HCT116:TIR1 cell line constitutively expressing TIR1 was described in29 and DIS3-AID cell line was described in55. For construction of XRN1-AID cell line a HCT116 tet-on Tir1 cell line was created using protocol and plasmid from (Natsume et al. 2016), following protocol described in (Eaton et al. 2018). Briefly, 250.000 HCT116 cells were seeded/ well in 6 well plate a day before transfection. 1ug of all in one plasmid containing Cas9 and sgRNA targeting stop codon of XRN1 plasmid and Homology Directed Repair template containing three tandem FLAG and miniAID epitopes and neomycin selection marker was used for transfection with Jetprime (Polyplus), following the manufacturers’ guidelines. The selection marker was separated from the tag by P2A site for cleaving during translation (Kim et al. 2011). Media was changed after 24 hours and, after 48 hours, cells were re-plated into 100mm dishes in media containing 700ug/ml Neomycin. Resistant colonies were picked after 7 days. Correct genomic insertion of tags was assayed by PCR and by Western blot. For DIS3- AID set of experiments, auxin was used at a concentration of 500uM for 1 hour unless stated otherwise, for XRN1-AID due to need of inducible TIR1 2 µg/ml doxycycline (dox) and 500 µM auxin was added for 24 h. The KMM-1 and KMS27 (JCRB) cell lines were grown in RPMI 1640 media (Gibco, 21875-034) supplemented with 10% FBS the suspension part of the culture was passaged by dilution while the adherent cells of KMM-1 were trypsinized and combined with suspension cells in each passage.
CRISPR clone construction
gRNA was cloned into pX459 plasmid (Addgene 62988) using two rounds of PCR amplification of gRNA insert with flanking region containing AflII and XbaI cloning sites (primers listed in Table S9). Next, both PCR product and plasmid were digested and ligated using T4 ligase (NEB) overnight. Proper insert was assayed by sanger sequencing. For homology repair template a plasmid with XRN1 homology arms (500 nt flanking stop codon form both side, synthesized by GeneScript) and 3xFLAG 3xminiAID Neomycin containing vector constructed in (Eaton et al. 2018) was used for amplification of respectful fragments using primers listed in Table S6 for cloning with NEBuilder® HiFi DNA Assembly kit. Obtained vector was assayed by Sanger sequencing and linearized before introducing into cells.
Cell Fractionation and RNA seq libraries preparation
After thawing, cells were passaged at least 2 times before seeding 2.5x 10^6 cells into T150 flasks and grown to 80-90% of confluency. For XRN1-AID, the auxin and doxycycline were added 24h before reaching this confluency while for DIS3-AID auxin was added for 1h the last day. For each condition cells were grown in 4 flasks from which 1 was used for protein extraction and fractionation control and 3 others for the replicates of RNA. Cells were washed with cold PBS and scrapped in PBS on ice. After gentle centrifugation cell pellet was resuspended in buffer containing a-amanitin, RNase (SUPERasin) and proteinase (complete) inhibitors and low level of detergent to lyse cell membrane and keep the nuclei intact, followed by centrifugation collection of cytoplasmic fraction and lysis of nuclei and separation of chromatin. Described before in Gagnon et al. 2014103 (with change to Tris pH 7.4 for HLB and NLB buffer). The quality of separation of the fractions was tested on western blot and by qPCR (Fig. S1 C-D). 400 ng of cytoplasmic RNA was used for preparation of libraries using Truseq Stranded Total RNA kit (Illumina) after depletion of ribosomal RNAs or Stranded mRNA Prep (llumina). Libraries were sequenced on NovaSeq 6000 System. Between 54-112 millions of uniquely mapped reads to human genome version 38 were used to further analyse the changes in cytoplasmic RNA levels (Table S5).
After 1h or 24h depletion triplicates of cell cultures for each condition were subjected to fractionation and RNA extractions. Following quality control of the purity of fractions at the RNA and protein levels, total RNAs of the cytoplasmic fractions from the control cell lines and DIS3-AID and XRN1-AID, with or without auxin, (and doxocycline for XRN1-AID) were sequenced (CYTO-seq).
RNA extraction and qPCR
RNA was extracted using the miRNeasy Mini Kit (Quiagen, 217004) following the manufacturers instruction. An additional DNase (Qiagen) treatment was performed following the manufacturer’s instructions for samples send for sequencing. For testing the cell fractionation same volume of RNA was taken of reverse transcription reaction. For other experiments 500ng of RNA was used for first-strand cDNA synthesis using SuperScript II RT (Invitrogen) and following the manufacturer’s instructions with 100ng of random primers (Invitrogen) used for total cDNA synthesis with primers listed in (Supplementary Table S6). Real time qPCR was performed using SyberGreen (Roche) and Light Cycler 480.
Protein extraction
Beside cell fractionation experiment, protein extracts were prepared from cells grown in p100 plates, washed once with ice cold PBS, and lysed in 500µl RIPA lysis buffer (ThermoScientific, 89900) supplemented with 10µl/ml of a protease and phosphatase inhibitor cocktail (ThermoScientific). The lysed cells were scrapped and collected into Eppendorf tube followed by 8min sonication in 30 sec ON / 30 sec OFF cycles (Diagenode) and centrifuged 10 min at 13 000 rpm at 4°C. Protein concentration was analyzed using the Pierce BCA Protein assay kit (ThermoScientific). Antibodies used for Western blot are listed in Table S7.
Ribo-seq
For each of the cell lines, two vials were thawed and followed to obtain two independent biological replicates. Cells were passaged at least two times before seeding 1x10^6 cells into p100 dishes. Two days later, on the day of experiment, media were aspirated from the plates and fresh media with or without auxin were added for 1h. Afterwards, media were aspirated, and dishes were flash frozen in liquid nitrogen and stored in -80℃ for further use. Cells were gently thawed on ice and 250µl of lysis buffer (10mM Tris pH7.5; 10mM (CH3COO)2Mg; 100mM KCl; 1% Triton; 2mM DTT) containing complete protease inhibitor cocktail (Roche) and 2U of RNase Murine Inhibitor (NEB) was added to each dish. Cells were collected with scrapper and then centrifuged for 3 minutes at maximum speed at 4℃ an aliquot of each lysate was collected for RNA extraction and the rest was flash frozen and stored at -80℃. Cell lysates were digested with RNase I (Ambion) (1U/ per 15 unit of absorbance) and Turbo DNase (Invitrogen) (1U/50ug of material) for 1h at 25°C loaded onto a 24% sucrose cushion centrifugated 90 min at 110 krpm on a TLA110 rotor at 4°C. Next monosomes were rinsed 2 times with 500ml of lysis buffer then resuspended with 500ml of lysis buffer and digested using 5U/UA260nm RNase I (Ambion) for 1h at 25℃ followed by addition of 500U of SUPErasin RNase inhibitor (Invitrogen). RNA was extracted by acid phenol at 65°C, chloroform and precipitated by ethanol with 0.3 M sodium acetate pH5.2. Resuspended RNA was loaded on 17% polyacrylamide (19:1) gel with 7M urea and run in 1xTAE buffer for 6h at 100V. RNA fragments corresponding to 28-34nt were retrieved from gel and precipitated in ethanol with 0.3 M sodium acetate pH5.2 in presence of 100mg glycogen. rRNA was depleted using riboPool (siTools), adjusting initial volume to 40µl reaction and adding 1µl of Murine RNase inhibitor (NEB) to each sample before and after hybridization step. RNA was precipitated and concentration was measured by Qubit (Thermo Fisher Scientific). 10 ng of each sample was engaged in library preparation using D-plex Small RNA-seq Kit (Diagenode) following manufacturer’s instructions with 10 cycles of PCR amplification in the last step followed by DNA purification with Monarch PCR DNA cleanup kit (NEB) and AMPpure XP beads final cleanup. Library molarity was analyzed using Tape Station and an equimolar pool of libraries was sequenced on a NovaSeq 6000 system (Illumina) with 10% PhiX.
MS IP
For immunopeptidomic experiments, DIS3-AID cell line was amplified in T300 flask to 80-90% confluency and auxin was added for 4h. Next, cells were washed 3 times in ice cold 1xPBS and collected by scrapping, followed by centrifugation at 4°C. The cell pellets were flash frozen in liquid nitrogen and stored at -80°C for further use. Two biological replicates were pooled together giving 600mln of cells that were further used for immunopeptidomic analysis. Cells were resuspended in lysis buffer at 4°C. After homogenization, the lysates were cleared by centrifugation and underwent a preabsorption on empty protein A-agarose columns. After this preclearance step, a pull down of the MHCI complexes were carried out, using clone W6/32 anti-MHCI antibody coupled to the protein A-agarose columns. After multiple washing steps, the MHC complexes are eluted and dissociated in mild acid elution buffer. Immune peptides were separated from other eluted components by reverse phase separation prior to loading on evotips for chromatographic separation by an evosep one HPLC system. An Evosep One LC system was coupled with the Bruker timsTOF Pro mass spectrometer equipped with the Bruker Captive Spray source. The 30 SPD method was used. The Endurance Column 15 cm x 150 μm ID, 1.9 μm beads (EV1106, Evosep) was connected to a Captive Spray emitter (ZDV) with a diameter 20 μm (1865710, Bruker) (both from Bruker Daltonik GmbH, Bremen). The timsTOF Pro was calibrated according to the manufacturer’s guidelines. The source parameters were: Capillary voltage 1500 V, Dry Gas 3.0 l/min and Dry Temp 180 °C. The temperature of the ion transfer capillary was set at 180°C. Column was kept at 40°C. The Parallel Accumulation–Serial Fragmentation DDA method was used to select precursor ions for fragmentation with 1 TIMS-MS scan and 10 PASEF MS/MS scans, as described by Meier et al. (2018). The TIMS-MS survey scan was acquired between 0.70 and 1.45 Vs/cm2 and 100–1,700 m/z with a ramp time of 100 ms. The 10 PASEF scans contained on average 12 MS/MS scans per PASEF scan with a collision energy of 10 eV. Precursors with 1–5 charges were selected with the target value set to 20,000 a.u and intensity threshold to 2,500 a.u. Precursors were dynamically excluded for 0.4 s. The timsTOF Pro was controlled by the OtofControl 6.0 software (Bruker Daltonik GmbH). Ten PASEF scans can contain up to 12 MS/MS scans per PASEF scan. Data were analyzed MS Fragger, integrated in the FragPipe pipeline80. The build-in “Nonspecific-HLA” workflow was used. Precursor and fragment mass tolerance was set at 20ppm. Oxidation (M), N-terminal acetylation, pyroglutamine, pyroglutamate and cysteinylation were set as variable modifications. The standard database search validation tools from the Nonspecific-HLA workflow were also used, including MSBooster, which is a rescoring tool that uses deep learning predictions of RT fragment intensities104. The FDR of peptide and PSM matches was set 1% and probability above 0.9. Uniprot human reference proteome database was concatenated with the custom lncRNA ORF database. The querry list included all the DIST-ORFs predicted to be actively translated in DIS3-depleted cells in Ribo-seq experiment, combined with the ribotricer prediction of ORF based only on the lncRNA specific sequences for the remaining 638 lncRNAs that were upregulated in cell line and MM data and not detected by Ribo-seq experiment.
smiFISH
For smiFISH experiments, cells were passaged at least 2 times after thawing. Two days before experiment, the cells were seeded on coverslips coated with 0.01% poly-L lysin (Sigma) to reach 40-60% of confluency on the day of experiment. At day of the experiment, cells were incubated with fresh media with or without auxin for 1h followed by 2x wash in 1xPBS and fixation with 4% PFA (Electron Microscopy Sciences) for 20 min. Next, the slides were washed 2x with PBS followed by addition of cold 70% EtOH and stored at 4℃ before hybridization with smiFISH probes. The smiFISH protocol was performed as described in53. The smiFISH probes were designed using Oligostan53. 24 specific probes passing the filters of Oligostan design tool were used for hybridization with fixed cells (listed in Table S6). The images were taken using Zeiss Axioimager Z1/Apotome) equipped with a 63x objective and a CCD camera (Axiocam MRm). Obtained images were processed and analyzed using FISH-quant v2 python packages105 for cell segmentation and spot detection. For each image set threshold (for ZNF674-AS1) or automatic thresholding (for CTD-2371O3.3) was used to adapt for variable background levels. Two independent replicates of the experiment were performed with DIS3-AID cell line incubated with or without auxin for 1h. For each replicate at least 90 cells were quantified.
Impact of DIS3 mutations on protein structure/activity
Initially, the atomic structures available for the DIS3 protein were analyzed, with the position altered residues, with respect to the defined catalytic active sites73. Then a systematic analysis of the structural alterations observed for DIS3 in patients was performed, using a combination of the missense3D software together with the structural analysis of the available atomic structures of the protein100,106. The atomic model (PDB 6D6Q)100 was visualized using USCF Chimera101. Only mutations impacting the activity but not the truncation of the protein were retained. The list of mutations with results of prediction are listed in Table S6.
Data analysis
Quality control and alignment of RNA-seq and Ribo-seq data:
The Curie bioinformatics platform performed the quality control of the RNA-seq and Ribo-seq data using MultiQC. The alignment of the reads to the human genome hg38 was done using STAR 2.6.1a, with the following parameters: –outSAMstrandField “intronMotif” –outSAMattributes “All” –outSAMtype “BAM” “SortedByCoordinate” –alignIntronMax 1000000 –outFilterMismatchNmax 999 –seedPerReadNmax 100000 –outFilterMultimapNmax 20. Multi-mapped reads were filtered-out, and for Ribo-seq alignments, only reads of length 25 to 34 nucleotides not falling into rRNA were kept for downstream analysis.
Transcriptomic profiling
Transcriptome assembly was performed on each RNA-seq alignment using Scallop v0.10.5, with the following parameters: –min_transcript_length_base 200 –library_type “first” –min_splice_bundary_hits 5 –min_transcript_coverage 10 –min_single_exon_coverage 20 scheme depicted in Fig. S6A. The obtained GTF annotations of all samples were merged with cuffmerge from cufflinks v2.2.1107, then genes for which the exons overlap GENCODE v26 annotation were filtered out with BEDTools v2.29108. The result was finally combined with GENCODE v26 (https://www.gencodegenes.org/human/release_26.html) and eRNAs annotation retrieved from50 for downstream analysis.
Enhancer RNAs
For eRNA published annotation in HCT116 of conservative eRNA was used50. Beside the eRNA placed outside of annotated genes, we analysed how many of these eRNA fall into regions of the upregulated genes defined in CYTO-seq. In total, 362 GENCODE annotated RNAs overlap with the upregulated eRNAs (with at least 1 nt of overlap), including 298 lncRNAs and 27 PCG, and 342 Scallop annotated lncRNA. For further analysis those RNAs were classified as eRNAs.
Premature RNAs analysis
We performed a separate quantification on exons and intron annotation of the GENCODE annotated RNAs. The genes that would have only differentially expressed first exon and intron13,47 were selected as PT. Among the upregulated transcripts in our DIS3-AID dependent list, 380 lncRNAs and 240 mRNAs fall into PT category (Fig. S1D and S1E).
Recursive analysis
For the gene to be retained in the final list, it had to be differentially expressed in the same direction in both control + versus tagged cells + but also between tagged cells – versus tagged cells +, if in the latter the same gene was not differentially expressed between control cells – versus control cells +. For XRN1-AID dataset first comparison: control + versus clone + and clone – versus clone + was performed for each clone independently and intersected with addition of genes differentially expressed for both clones in only comparison of clone – versus clone + but not in control cells – versus control + was retained in the list (Fig S6B), where + indicates addition of auxin.
Refining TSS and TES
For better definition of the start and end of our candidates that do not follow fully the standard annotation, we used a sliding window approach of 200nt.For a given gene, the sliding window is run on the metatranscript of exons only (successively from the start to determine the new start, and from the end, to determine the new end) ; if the first 200nt have non-null read coverage values across Dis3-AID+ samples, the original start or end is kept, otherwise, the window is slid until a minimum coverage of 5 reads inside the window is found, then the most upstream new start or the most downstream new end is reported. If no new start or new end can be reported, we use the same method, but on the gene level (the new start or new end can be in the introns in this case). Refined candidates with a length less than 200nt are re-processed with the sliding window approach at the gene level, but to find at the same time the new start and new end (as the steps above can fail for candidates with short exons and read coverage spanning the introns). After start and end refining, candidates with a length below 200nt are removed.
Transcript bidirectionality analysis
The division of DISTs to upstream antisense (ua) RNAs, convergent (con) RNAs and bidirectional eRNAs was performed as defined in50. For eRNA all the pairs where at least one eRNA would be upregulated were taken resulting in 777 bidirectional eRNA.
Metagenes
In order to select the wanted parts (exons and introns), we have created firstly metatranscripts as follows: the exons from all the transcripts of each gene were merged with BEDTools v2.29108, in order to have non-redundant segments of exons, and these exons were numbered; introns were inferred from these exons. The first exon, first intron, second exon, last intron and last exon were selected for each gene. The alignment files were converted in BigWig files with RPM normalization (reads per million of mapped reads), using UCSC tools (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/), and for each part and each gene, the read coverage was extracted. The obtained signal was scaled on 100 positions using the R base approx function (allows to compute interpolations), with the following parameters: method="linear", ties="ordered". The average value was computed at each of the 100 positions, and the result was plotted using the R packages: rtracklayer109 and ggplot2110.
Active ORFs detection
Ribotricer 1.3.1111 was used to detect long and short actively translating ORFs. The first step consisted in the generation of all candidate ORFs from the combined GTF annotation (GENCODE v26, Scallop and eRNAs) and hg38 genome, by using the command “prepare-orfs” with the following parameters: --start_codons “ATG, CTG, TTG, GTG, AGG, ACG, AAG, ATC, ATA, ATT” –min_orf_length 30 –longest. The resulting table was converted into GFF by custom command lines. The second step consisted in the use of the command “detect-orfs” to determine translating ORFs from the candidate ORFs for each Ribo-seq alignment. The GFF across all the samples were concatenated and then the redundancies were filtered out based on the following columns of the GFF: chromosome, start, end, strand, attributes. For downstream analysis, ORFs with at least 9 non-zero coverage leading codons were considered strongly active. As the ORF prediction is only based on the annotation, we further tested if the lncRNA ORFs that would span an exon-exon junction would also have such junction present in RNA-seq data in the DIS3 depleted conditions. To be sure that the ORF is translated from DIS3-sensitive isoforms of transcripts we have performed quantification of RNA-seq data on the ORF annotation followed by recursive analysis as for RNA-seq and only the ORFs that would be differentially expressed in the RNA-seq data for both gene and ORF annotation were retained as lncRNAs predicted to be translated upon DIS3-depletion. We have observed a peak from Ribo-seq on the snoRNA location in the snoRNA host genes, thus, to remove bias that could come from the snoRNA we eliminated the genes that the name would contain SNHG, SNO, SNU or RNU and one eRNA that also showed high peak from snoRNA. For Ribo-seq in total we obtained between 3 and 79 million of reads per sample (Table S8). Due to variable number of reads between the replicates we have selected ORFs that would be assigned as actively translated in at least one replicate and at least detected in the second replicate of each condition. To identify ORFs in lncRNAs specifically associated with ribosomes upon DIS3 depletion, we crossed the list of all predicted ORFs in lncRNAs for all 4 conditions, eliminating any ORFs overlapping snoRNAs and PCGs, to remove any ambiguity. P-site offset and periodicity were assessed based on 5'-end read coverage in RPM normalization, computed from the alignment file of each sample, and stored in BigWig format using BEDTools v2.29 and UCSC tools. The mean 5'-end read coverage for selected ORFs was then displayed on 50 nucleotides around their start or stop codon, using the R packages rtracklayer and ggplot2.
Read counting and Differential expression
Read counting on gene, exon, and ORF levels were performed on the combination of the human gene annotation GENCODE v26, eRNAs, and new transcripts from Scallop, using featureCounts v2.0.0112 with the parameters -s 2 -O 9 (raw counts for Ribo-seq for each of gene features are presented in Fig. S3B and table S5). Analysis for Fig1 were done using GENCODE annotation for normalization of DESeq2 data for unreferenced genes. For further quantifications, modified annotation including eRNA and Scallop transcripts was used. The conditions were compared using DESeq2, with the following parameters: betaPrior=FALSE, independentFiltering=F, cooksCutoff=F. Only the features with adjusted p-value <= 0.05, read counts >=20 and abs(log2FoldChange)>= 0.585 were retained as differentially expressed. Heatmaps of expression were obtained using the R package ComplexHeatmap from bioconductor113.
HLA binding prediction
HLA alleles from MM patients were determined using the tool seq2HLA76. ORFs candidates were converted into peptides using the SeqinR R package, and only the ones with a length >= 8 amino acids were kept. The lists of peptides and HLA alleles were given to the tool netMHCpan-4.1 in order to determine the binding status. Only the peptides with an elution ligand rank <= 0.5 were kept for downstream analysis.
Patients Data and GTEx analysis:
The data concerning patients samples were previously published 74. New counting on the raw reads was performed using kallisto75. All further analysis followed the standard DESeq2 protocol. The extended ORF annotation for kallisto indexing for counting of expression in both our MM dataset and GTEx tissue samples. We performed querry on both the Ribo-seq identified DIST-ORFs candidates in addition to DIST-ORFs with detected peptides in immunopeptidomic analysis. We used the 90th percentile values and TPM≤1 to define as threshold for defining the ORFs specifically expressed in MM, excluding testis.
Code availability:
Specific code used for analysis performed in the paper can be found at: https://github.com/MorillonLab/DIS3_analysis
Data availability:
Sequencing datasets generated in this study, including Cyto-seq and Ribo-seq, were deposited into the GEO database under the accession numbers: GSE188282, GSE188195 and GSE233699.