Data download
The TCGA dataset used in this study, including the RNA-seq BAM files, the gene raw count data (htseq-count files), and the annotated somatic simple nucleotide variation files (MuTect2 VCF) of patients with colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ), were accessed through dbGaP accession number phs000178.v11.p848 and downloaded using the gdc-client v1.6.0. The cinical overall survival (OS) information was obtained from Liu et al.80. The RNA-seq fastq files of normal and tumor tissues from another 18 CRC patients were downloaded from https://www.ncbi.nlm.nih.gov/geo under the accession number GSE5076081. The RNA-seq fastq files of the 59 colorectal cancer cell lines in cancer cell line encyclopedia (CCLE) were downloaded from https://www.ebi.ac.uk/ under the accession number PRJNA52338049, and the corresponding germline filtered CCLE merged mutation calls were acquired from https://portals.broadinstitute.org/ccle/data. The previously published RNA-seq and ChIP-seq raw reads fastq files generated with HCT116 cells or mice primary colon epithelial cells were downloaded from https://www.ncbi.nlm.nih.gov/geo under the accession numbers GSE71514 and GSE10196642,50.
RNA-seq analysis
Raw reads were first cleaned using trim_galore v0.6.0 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with default parameters. The reads from each RNA-seq sample were then mapped to hg38 or mm9 genome assembly downloaded from UCSC, using STAR v2.5.3a82. The key alignment parameters were as follows: “--outFilterMismatchNoverLmax 0.04 --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 500 --outMultimapperOrder Random --outSAMmultNmax 1”; the parameters “--outFilterMultimapNmax 500” and “--outMultimappedOrder Random” ensured that multiple aligned reads were included but only one position was assigned randomly. Genes expression was quantified using featureCounts v1.6.583 of subread-1.6.5 package based on hg38 RefSeq genes annotation file. Repeats expression was quantified using featureCounts v1.6.5 (“featureCounts --M --fraction”) based on repeats annotation files downloaded from https://genome.ucsc.edu/cgi-bin/hgTables. Principal component analysis was conducted with the functions “vst” and “plotPCA” from R package DESeq2 v1.22.284. Differential expression analysis was performed based on the negative binomial distribution using the functions “DESeq” and “results” from DESeq2. The heatmap of differentially expressed genes or repeats was created using R package pheatmap v1.0.12. The KEGG enrichment analysis was performed using the function “enrichKEGG” from the R package clusterProfiler v3.10.185. Venn diagrams were prepared with the R package Vennerable and venn.
Survival analysis
The curated clinical endpoint results (OS event and OS event times) of the 628 patients in TCGA-COREAD dataset were obtained from Liu et al.80. Only patients in stages II and later according to the American Joint Committee on Cancer (AJCC) pathologic tumor staging system were included. The 493 CRC patients were classified into HERVH-high (145 patients with HERVH-int CPM>8430.797) and HERVH-low groups (348 patients with CPM<8430.797), and the survival curves of the two groups were compared using log-rank test from the function “survdiff” in R package survival v2.44-1.1.
Integration analysis of whole-exome sequencing (WXS) and RNA-seq
WXS files (MuTect2 VCF) and RNA-seq data from 516 patients in TCGA-COREAD were analyzed (Fig. S1A). All the somatic mutational information was included regardless of their classification. For each gene, we classified the patients into WT or mutation group, and then calculated the Log2 FoldChange between these two groups using the expression values (CPM) of HERVK-int and HERVH-int. p values were calculated by Wilcoxon test.
ATAC-seq and ChIP-seq analyses
Raw reads were cleaned using trim_galore. The reads were then aligned to the hg38 genome assembly using Bowtie2 v2.3.5.186, with the default parameters that look for multiple alignments but only report the one with best mapping quality. Duplicate reads were then removed using MarkDuplicates from gatk package v.4.1.4.1. Replicate samples were merged using the samtools v1.1087. Peak calling was performed using MACS2 v2.2.688 (parameters: -g hs --keep-dup 1 --broad-cutoff 0.01). Peaks near active HERVH loci were identified using bedtools v2.26.089. For ATAC-seq, bigwig tracks were generated using bamCoverage from python package deeptools (parameteres: --skipNAs --normalizeUsing CPM)90. For ChIP-seq, bigwig tracks were generate using bamCompare from deeptools (parameters: --skipNAs --scaleFactorsMethod readCount --operation log2 --extendReads 200). Negative values were set to zero. ATAC-seq and ChIP-seq profiles were created by computeMatrix and plotProfile in deeptools. IGV v.2.4.13 was used to visualize the bigwig tracks91.
Cell culture and cell line generation.
The cell lines used in this study, including HCT116, DLD1, SW480, LS174T, SW620, HT29, HCT8, RKO, CRL1790/841, NCM460, and 293T, were cultured in RPMI 1640 or DMEM medium containing 10% FBS and incubated at 37 °C with 5% CO2 in a humidified incubator. To generate ARID1A KO cell lines, the indicated cells were transfected with LentiCRISPR-V2 plasmid carrying sgARID1A (Supplementary Table 8) using Lipofectamine 2000 (Invitrogen), and further selected by 1 mg/mL puromycin (Selleck, s7417) for 3 days. The cells were then plated at single-cell density in 100 mm petri dishes, and the emerged individual clones were picked and replated into 24-well plates. The loss of ARID1A expression was confirmed by western blot.
Organoid culture
The CRC organoid was generated as previously described92. All the human tissue related experiments were approved by the Medical Ethics Committee of Central South University, and the informed consent was obtained from the patients. From the resected colon segment, the tumor tissues as well as normal tissues were isolated and stored in ice-cold RPMI 1640 supplemented with 1% Penicillin-Streptomycin. The tissues were then washed in ice-cold DPBS supplemented with 1% Penicillin-Streptomycin and cut into 1-3 mm3 cubes. After centrifuging at 200 g for 5 min, the supernatant was removed and pellet was resuspended in collagenase IV (Gibco, 17104019) supplemented with 10 µM ROCK inhibitor Y-27632 dihydrochloride (Merk Millipore, SCM075). The tissues were digested at 37℃ for 1 hour and mixed up every 10-15 min by pipetting, washed with 10 mL advanced DMEM/F12 (Thermo Fisher Scientific, 12634-010) supplemented with Y-27632, and then centrifuged at 200 g for 5 min at 4 °C. The pellet was resuspended in DMEM/F12 supplemented with Y-27632 and filtered through 60 µm cell strainer. After centrifugation at 200 g for 5 min at 4 °C, the supernatant was discarded and the pellet resuspended in 70% Matrigel (Corning, 356231). 30 µL of the Matrigel mixture was plated on the bottom of 24-well plates, and 500 µL organoid medium (Accurate International Biotechnology, M102-50) was added to each well following incubation at 37℃ with 5% CO2 for 30 min. The organoid medium was changed every 2-3 days, and the organoids were passaged after 7 days of culture.
Cell growth assays
For cell viability assays, cells were plated into 96-well plates at the density of 2000-5000 cells per well after infected with lentiviruses expressing shGFP or shERVH. The cells were kept for another 7 days, and the viability was measured daily using MTT (Sigma, M5655) as previously described93. For chemosensitivity assays, the cells were seeded in 96-well plates and treated with the compounds at indicated concentrations for 72 hours, and then the cell viability was measured. For colony formation assays, the cells were seeded at the density of 1000-2000 cells per well in 6-well plates after infected with lentiviruses expressing shGFP or shERVH. The cells were allowed to grow for 10-14 days and then fixed for 10 min in 50% (v/v) methanol containing 0.01% (w/v) crystal violet.
Tumor sphere formation
The 6-well plates were coated with 12 mg/mL poly-hydroxyethylmethacrylate (polyHEMA, Sigma-Aldrich, P3932) in 95% ethanol. The indicated cells were digested by TrypLE, and approximately 1000 cells were suspended in 50% Matrigel (Corning, 356231) and plated in the precoated 6-well plates. The 6-well plates containing the cells were incubated at 37℃ for 30 min, and then 2 mL of phenol red-free DMEM/F12 (GIBCO, 21041) containing 1× B27 supplement (Invitrogen, 12587) and 20 ng/mL rEGF (Sigma Aldrich, E-9644) was added into each well. The culture medium was changed every 2-3 days, and the number of tumor spheres in each well was counted after 12 days.
Xenograft tumors
The 4-5 weeks old female BALB/c nude mice were purchased from Hunan SJA Laboratory Animal Co., Ltd. (Changsha, China). 5×105 of the indicated cells were suspended in 100 µL DPBS and injected subcutaneously into the flank of nude mice. The tumors were measured twice weekly with an electronic caliper, and the volumes were calculated using the formula: 0.5×(length × width2). All the animal experiments were approved by the Medical Ethics Committee of Central South University, and conducted according to the Guidelines of Animal Handling and Care in Medical Research in Hunan Province, China.
RNA interference
The siRNA oligos were synthesized by GenePharma (Shanghai GenePharma Co., Ltd.), and the sequences were listed in Supplementary Table 8. Cells were transfected with the indicated siRNA by Lipofectamine 2000. After 48 hours, the cells were harvested and the efficiency of silencing was verified by qPCR. For shRNA, shRNA oligos were synthesized by Tsingke (Tsingke Biotechnology Co., Ltd.) and cloned into pLKO.1 TRC Cloning vector (Supplementary Table 8). The shRNA and packaging vectors (pMD2.G and psPAX2) were transiently co-transfected into 293T cells by polyethylenimine (Sigma, P3143), and the resulted lentivirus particles were harvested and precipitated by PEG8000. The target cells were treated with lentivirus particles and 8 µg/mL polybrene for 24 hours, and the efficacy of shRNA interference was determined by qPCR.
HERVH knockdown in organoids
The organoids cultured in Matrigel were washed once with DPBS, and digested with TrypLE for 5 min at 37℃. During the digestion, Matrigel was disrupted by pipetting repeatedly. When cell clumps containing 2-10 cells were observed, 10 mL of advanced DMEM/F12 was added before centrifugation at 200 g for 5 min. The supernatant was removed and the cells were resuspended using organoid medium supplemented with 8 µg/mL polybrene. Then the cells were split equally into 2 wells of 24-well plate precoated with polyHEMA, and 50 µL of lentivirus carrying shGFP or shERVH was added. After spin infection at 2000 rpm for 1 hour, the cells were incubated at 37℃ with 5% CO2 for 4 hours. The cells were then resuspended in 10 mL of advanced DMEM/F12 and centrifuged at 200 g for 5 min. The pellet was resuspended with 100 µL of 70% Matrigel, and 10 µL of the mixture was plated per well into prewarmed 96-well plate. The organoids were cultured for 10-14 days and the medium was changed every 2-3 days.
Western blot
Cells were washed with cold DPBS for two times and then lysed in 2× Laemmli buffer (2% SDS, 20% glycerol, and 125 mM Tris-HCl, pH 6.8) supplemented with 1× protease inhibitor cocktail (Sigma, P8340). The cell lysate was scraped and sonicated, and the concentration of protein was determined by BCA assay. The protein was separated by SDS-PAGE and transferred onto nitrocellulose membrane. The membrane was then blocked with 5% non-fat milk for 1 hour at room temperature, and incubated with the indicated primary antibody overnight at 4℃ with shaking. The membrane was washed for 3 times and incubated with secondary antibodies (1:5000, Thermo Fisher Scientific) for 2 hours. The signal was then detected with ECL substrates (Millipore, WBKLS0500). Dilutions of primary antibodies were: rabbit anti-ARID1A/BAF250A Ab (1:1000, Cell Signaling, 12354S), rabbit anti-BRD4 Ab (1:1000, Active Motif, 39909), mouse anti-α-Tubulin Ab (1:3000, Cell Signaling, 3873s). Primary antibodies used in this study were listed in Supplementary Table 11.
RNA-seq and qPCR
The RNA of the treated cells was extracted by TRIzol (Life Technologies, 87804) according to the manufacturer’s protocol. Total RNA was made into libraries for sequencing using the mRNA-Seq Sample Preparation Kit (Illumina) and sequenced on an Illumina Hiseq platform (Novagene, Tianjin, China). The sequencing data was deposited to the GEO database (accession number GSE). For RT-qPCR, RNA was extracted by TRIzol, and reverse transcribed to cDNA using the PrimeScript RT reagent Kit (Takara, RR037A). The cDNA was then used as templates and qPCR was performed using the SYBR Green qPCR Master Mix (SolomonBio, QST-100) on the QuantStudio 3 Real-Time PCR system (Applied Biosystems). Primers used in qPCR were listed in Supplementary Table 8.
Chromatin immunoprecipitation
The indicated cells in 100 mm petri dishes were cross-linked with 1% formaldehyde for 10 min at room temperature, and quenched with 125 mM ice-cold glycine. The cells were then rinsed with 5 mL ice-cold 1× PBS for two times, and harvested by scraping using silicon scraper. After spinning at 1350 g for 5 min at 4℃, the supernatant was discarded, and the pellet was resuspended in Lysis Buffer I (50 mM HEPES-KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100 and 1× protease inhibitors) and incubated at 4℃ for 10 min with rotating. After spinning at 1350 g for 5 min at 4℃, the pellet was resuspended in Lysis Buffer II (10 mM Tris-HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA and 1× protease inhibitors), incubated for 10 min at room temperature, and spun at 1350 g for 5 min at 4℃. The pellet was again resuspended in Lysis Buffer III (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.1% Na-Deoxycholate, 0.5% N-lauroylsarcosine and 1× protease inhibitors) and transferred into Covaris microTUBEs. The DNA was sonicated to 200 bp fragments using Covaris S220 (duty cycle: 10; intensity: 4; cycles/burst: 200; duration: 200 s). After quenching the SDS by 1% of Triton X-100, the lysate was spun at 20,000 g for 10 min at 4℃. 50 µL of supernatant from each sample was reserved as input, and the rest lysate was incubated overnight at 4℃ with the magnetic beads bound with ARID1A (CST, 12354S), ARID1B (Santa Cruz, sc-32762X), SMARCA4 (Abcam, ab110641) or H3K27ac (Abcam, ab4729) antibody respectively. The beads were washed three times with Wash Buffer (50 mM Hepes-KOH, pH 7.6, 500 mM LiCl, 1 mM EDTA, 1% NP-40, 0.7% Na-deoxycholate), and washed once with 1 mL TE buffer containing 50 mM NaCl. The DNA was eluted with 210 µL of Elution Buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, 1% SDS). The cross-links were reversed by incubated at 65℃ overnight. 200 µL of TE buffer was added to each tube, and the RNA was degraded by incubation with 16 µL of 25 mg/mL RNase A at 37℃ for 60 min. The protein was degraded by adding 4 µL of 20 mg/mL proteinase K and incubating at 55 °C for 60 min. The DNA was then purified by phenol:chloroform:isoamyl alcohol extraction, and resuspended in 50 µL ddH2O. The fragments of HERVH DNA were detected by qPCR (Supplementary Table 8).
The RNAscope™ in situ hybridization (ISH)
The colon cancer tissue array (HCol-Ade180Sur) was purchased from Shanghai Biochip Co. Ltd (Shanghai, China). The RNAscope analysis with probes targeting the HERVH-gag sequence was performed using the RNAscope Multiplex Fluorescent Reagent Kit v2 (ACD bio, 323100) according to the manufacturer’s protocol. The HERVH consensus sequence used for probe design was listed in Supplementary Table 9. Following the RNAscope staining, the tissue array was imaged with a LSM880 confocal microscope (Zeiss).
RNA-FISH combined with immunofluorescence
RNA-FISH combined with immunofluorescence was performed as previously described58. Cells cultured on poly-L-lysine-coated coverglasses were fixed with 10% formaldehyde in DPBS for 10 min. After three washes in DPBS, cells were permeabilized with 0.5% Triton-X100 for 10 min. The cells were then washed three times in DPBS and blocked with 4% Bovine Serum Albumin for 30 min. The cells were incubated with the indicated primary antibody diluted in DPBS overnight, washed three times in DPBS, and incubated again with the secondary antibody for 1 hour. After washing twice with DPBS, the cells were fixed again with 10% formaldehyde in DPBS for 10 min. Following two washes with DPBS, the cells were further washed in Wash Buffer I (20% Stellaris RNA FISH Wash Buffer A (Biosearch Technologies, Inc., SMF-WA1-60), 10% Deionized Formamide (Invitrogen, AM9342) in RNase-free water) for 5 min. The RNA probe (Stellaris) in hybridization buffer was added to the cells and incubated at 37℃ for 16 hours. After washing with Wash Buffer I at 37℃ for 30 min, the cells were stained with 1 µg/mL DAPI for 5 min. The cells were then washed with Wash Buffer B (Biosearch Technologies, Inc., SMF-WA1-60) for 5 min, and rinsed once in water before mounting with SlowFade Diamond Antifade Mountant (Invitrogen, S36963). The sequence of the RNA probe (Stellaris) was listed in Supplementary Table 10.
RNA-FISH and immunofluorescence with organoids
After dissolving the Matrigel with ice-cold cell recovery solution (Corning, 354253), the organoids were placed on a poly-L-lysine-coated glass slide for 30 min. The organoids attached to the slide were fixed with 10% formaldehyde for 45 min at 4 °C, and washed with DPBS for three times. The organoids were then permeabilized with 0.5% Triton-X100 for 15 min and washed with DPBS for two times. After one wash with Wash Buffer A for 5 min, the organoids were hybridized with the RNA-FISH probe overnight at 37℃. After one wash with Wash Buffer A for 30 min at 37℃, the organoids were stained with 1 µg/mL DAPI in Wash Buffer A for another 30 min, and washed twice with Wash Buffer B for 30 min. The organoids were rinsed with ddH2O and mounted with SlowFade Diamond Antifade Mountant (Invitrogen, S36963). The images were taken with a LSM880 confocal microscope (Zeiss).
The immunofluorescence of organoids was performed as previously described94. The organoids cultured in 96-well plate were washed once with DPBS without disrupting the Matrigel, and then 200 µL of ice-cold cell recovery solution (Corning, 354253) was added and incubated at 4 °C for 1 hour with shaking at 60 rpm. After the Matrigel was dissolved, the organoids were rinsed out using ice-cold PBS with 1% BSA and spun down at 70 g for 3 min at 4 °C. The pellet of organoids was resuspended in 1 mL of 10% formaldehyde in DPBS, and incubated at 4 °C for 45 min. 9 mL of ice-cold PBT (0.1% Tween 20 in DPBS) was added and incubated at 4 °C for 10 min. The organoids were then spun down at 70 g for 5 min at 4 °C, resuspended in 200 µL ice-cold OWB (0.1% Triton X-100, 0.2% BSA in DPBS), and transferred into 24-well plate precoated with polyHEMA. Following incubation at 4 °C for 15 min, 200 µL of the indicated primary antibody diluted in OWB was added and incubated overnight at 4 °C with shaking at 60 rpm. The next day, 1 mL of OWB was added into each well. After all the organoids were settled at the bottom of the well, the OWB was removed with just 200 µL left in each well. The organoids were washed three times with 1 mL of OWB and incubated at 4 °C for 2 hours with shaking at 60 rpm. The OWB was removed with just 200 µL left in each well, and then 200 µL of secondary antibody diluted at 1:200 in OWB was added and incubated overnight at 4 °C with shaking at 60 rpm. After the incubation, the organoids were washed once with OWB, and 200 µL of 2 µg/mL DAPI in OWB was added and incubated at 4 °C for 30 min. The organoids were then washed two times with OWB, transferred to 1.5-mL Eppendorf tube, and spun down at 70 g for 3 min at 4 °C. The OWB was removed as much as possible without touching the organoids, and the organoids were resuspended with fructose-glycerol clearing solution (60% glycerol and 2.5 M fructose in ddH2O). Drew a 1×2 cm rectangle in the middle of a slide, and placed 3 layers of sticky tape at both sides of the rectangle. The organoids were transferred into the middle of the rectangle, and put the coverslip on the top. The images were taken with a LSM880 confocal microscope (Zeiss).
Fluorescence Recovery After Photobleaching (FRAP)
The treated cells were plated into 35 mm glass bottom confocal dishes (NEST, 801001), and the FRAP experiment was performed on the Zeiss LSM880 Airyscan confocal microscope with a 63x Plan-Apochromat 1.4 NA oil objective. The Zeiss TempModule system was used to control the temperature (37 °C), the humidity and the CO2 (5%) of the imaging system. After imaging for 3 frames, the cells were photo-bleached using 100% laser power with the 488 nm laser (iterations: 50, stop when intensity drops to 50%). The cells were then imaged again every two seconds. The images were analyzed and measured with ZEN 2 blue edition (Zeiss).
Code Availability.
All custom scripts are available from the authors upon request.