2.1 Colorectal cancer (CRC)-associated SNPs and germline genotype data
CRC-associated SNPs were extracted from the National Human Genome Research Institute (NHGRI) GWAS database including 50 CRC risk loci (Supplement to table S1). The datasets for germline genotypes, ancestry, expression profiles, methylation, somatic copy number aberrations, germline copy number aberrations regarding CRC were downloaded from The Cancer Genome Atlas (TCGA) portal. SNP loci with minor allele frequency (MAF) > 0.05 from TCGA (subjects) and HapMap cell lines (controls) were downloaded on EIGENSTRAT and the top two principal components were retrieved (Figure S1). We calculated the average of segmented copy-number scores of genetic interval between the transcription start and end sites as gene-based somatic copy-number measure (Figure S2A). CpG methylation status was determined by discretization CpG methylation value with cut-off values of 0.2, 0.4, 0.6, 0.8, and 1.0 (Figure S2B). Afterward, we calculated the expression levels for each gene as TPM values.
2.2 Association analysis and eQTL analysis
eQTL analysis was performed according to the flowchart described in Li et al. (13). Briefly, the expression data of gene was adjusted for somatic copy-number effects and CpG methylation status using a multivariate linear model. The P value corresponds to the regression coefficient based on residual expression levels and germline genotype. We performed cis-eQTL analysis between 656 857 SNP loci and corresponding mRNA transcripts. Then excluded 243 359 SNP loci with MAF < 0.05 and their genes with absent calls >90% and false discovery rate (FDR) > 0.1. With the 50 SNPs from NHGRI GWAS database, 18 SNPs were present in TCGA germline genotype. For variants not directly genotyped, we used proxy SNPs (nearest SNP with linkage disequilibrium > 0.5). SNAPinfo software was selected to obtain pairwise linkage disequilibrium between SNPs(13). In cis-eQTL analysis, we evaluated the association between genotype of given SNP locus and the transcripts located within ± 1Mb regions. For risk SNP locus, the relationship between it and transcripts at a genome-wide level was evaluated. For each target gene, 1-50 Kb regions on either side of transcription start site were considered putative enhancer regions. They were overlapped using ENCODE DNaseI hypersensitivity data from HCT116 cell line and then analyzed for transcription factor (TF) DNA binding motif enrichment. Hypergeometric distribution test was used for overlap analyses with a significance level of P value < 0.05. TFs that satisfied the above criteria were considered as candidates for trans-acting risk SNPs.
2.3 Ancestry verification samples, tissue samples and TMAs
130 cases of CRC patients from Shandong Provincial Hospital were used for ancestry verification. Germline genotypes were measured with patients’ peripheral blood samples and matched tumor samples. Another 24 fresh tissues were obtained from the Department of Gastrointestinal Surgery of Shandong Provincial Hospital, including 12 colon adenocarcinomas and 12 paired adjacent normal colons. All patients had not received preoperative radiotherapy or chemotherapy. All procedures were performed in accordance with the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS) and Declaration of Helsinki. We obtained written informed consent from all patients. Tissue microarrays (TMA) purchased from Molbase Co. Ltd.(Shanghai, China), consisted of 75 pairs of human COAD and their adjacent colon tissue.
2.4 Immunohistochemistry staining
Immunohistochemical staining was performed using Power-Vision two-step tissue staining kit (ZSGB-BIO, Cat. PV-6001, Beijing, China). After deparaffinization and rehydration, tissue slides were incubated with 3% H2O2 for 10 minutes at room temperature. 10 mmol/L EDTA solution used for antigen retrieval. Primary antibodies were incubated overnight at 4℃ and then secondary antibodies incubated at 37℃ for 30 minutes. After washed, stained with DAB-H2O2 and counterstained with hematoxylin. The results of IHC were evaluated using H-scores by 3 researchers independently (Table S2).
2.5 Cell lines and cell culture
The human colon cancer cells selected were HCT116, HT29, LOVO, SW480 and SW620, CCD-18Co cell line as the control. All cell lines were obtained from the American Type Culture Collection (ATCC, USA). HCT116, HT29 and CCD-18Co cell lines were cultured in DMEM/high glucose medium (Hyclone, Cat.SH30022.01B, USA), LOVO cell line was F-12K medium (Gibco, Cat.21127022, USA), SW480 and SW620 cell lines were L-15 medium (Gibco, Cat.11415064, USA). Each medium was supplemented with 10% fetal bovine serum (Hyclone, Cat.SH30087.01, USA) and 1% penicillin-streptomycin (Hyclone, Cat.SH30010, USA). Cells were cultured at 37℃ containing 5% CO2.
2.6 RNA interference and overexpression of CCDC12
Three CCDC12 short interfering RNA (siRNA) were purchased from RiboBio (Cat. 140901180506, Shanghai, China) and were transfected into SW480 and LOVO cell lines using Lipofectamine™ RNAiMAX reagent (Invitrogen, Cat.13778075, USA) for 24 hours at 37 ℃. Lentivirus with puromycin resistance expressing green fluorescent protein was used to overexpress CCDC12 in HCT116 cells and interfered SW480 (SW480-KD, transfected siCCDC12), designed and synthesized by Genechem (Shanghai, China). 72 hours after transfection, cells were exposed to puromycin for 48 hours to select.
2.7 Over expression of Snail in SW480 cell lines
Lentivirus expressing SNAI1 was used to infect SW480 cell line and expressed red fluorescent protein. Seventy-two hours later, fluorescence of red fluorescent protein was observed under a fluorescent microscope (Olympus, IX71, Japan) then they were used for subsequent experiments.
2.8 Western blotting
The extracted proteins from cells and tissues were electrophoresed on a 10% SDS-PAGE, and then transferred onto a 0.45μm Immobilon-P Transfer Membrane (Millipore, Cat.IPVH00010, USA) using the wet transfer method. Incubated with primary antibody overnight at 4℃ and then incubated with corresponding secondary antibody for 1 hour at room temperature. Bands were visualized using ECL kit (Millipore, Cat.WBKLS0500, USA) and the Amersham Imager 680 system. Primary antibodies used for western blotting are listed in Table S2.
2.9 Real-time Quantitative Polymerase Chain Reaction (RT-PCR)
Total RNA was extracted with RNA isolation kit (Invitrogen, USA) and reverse transcribed to cDNA using the Reverse Transcription System (Promega, USA). Quantitative RT-PCR was performed with SYBR Green qPCR SuperMix (Invitrogen, USA) and ABI PRISM® 7500 Sequence Detection System based on the manufacturer’s instructions. 18srRNA was used as the internal reference control. Primer sequences were as follows: 18srRNA forward 5’-CCTGGATACCGCAGCTAGGA-3’, reverse 5’-GCGGCGCAATACGAATGCCCC-3’, CCDC12 forward 5’-CTGACTGGGACCTCAAGAGA-3’, reverse 5’-CCTTTCAGCCTTTCACGGAT-3’, Snail forward 5’-GAGGCGGTGGCAGACTAGAGT-3’, reverse 5’-CGGGCCCCCAGAATAGTTC-3’.
2.10 Colony-forming and MTS assays
100 cells in logarithmic growth phase were resuspended in 300μl medium and then seeded into a 6-well plate. After colony formation, stained with 1% crystal violet solution for 20 minutes. Number of colonies were counted under a microscope.
In MTS assay, 1×104 cells were seeded into a 96-well plate. CellTiter 96® A Queous One Solution Cell Proliferation Assay (Promega,Cat.G3582, USA) was used to measure cell proliferation and was performed based on the manufacturer’s instructions. OD was measured using an Multiscan MK3 microplate reader at 490nm.
2.11 Wound-healing assays
Cells were seeded into a 6-well plate and cultured until 95% confluent. Monolayer cell was scraped off using a pipette tip in the middle of plate. Cell migration was measured every 6 hours using the Image Pro-Plus 6.0 and migration rate was calculated with (Distance0h - Distancedifferent time points) / Distance 0h.
2.12 Cell invasion assays
Transwell chambers (BD, Cat.353097) with Matrigel (BD, Cat.356234, USA) were used for invasion assays. 100μl of cells suspension (1×105 cells) with serum-free medium were placed in the upper chamber. The bottom chamber contained medium with 20% serum. Cells were incubated for 24 hours at 37℃, 4% paraformaldehyde was used to fix the cells for 15 minutes, and then stained with crystal violet solution. The cells passing through the chamber were observed with a microscope.
2.13 Apoptosis assays
The Annexin V-FITC apoptosis detection kit (Keygen, Cat.KGA106, Jiangsu, China) was used per manufacturer’s instructions. 1.25μl Annexin V-FITC reagent was added to 500μl of cell suspension (1×106/ml), and then incubated for 15 minutes at room temperature in the dark. After centrifuged at 1 000×g for 5 minutes, supernatant was removed, and cells were resuspended in 0.5 ml pre-cooled binding-buffer. Then, 10 μl Propidium Iodide was added and incubated in the dark before being read on the BD FACSCalibur CellSorting System.
2.14 Cell cycle analysis
Cell Cycle Detection Kit (Keygen, Cat.KGA511, Jiangsu, China) was used for cell cycle analysis. 5 μl (10mg/ml) of RNase A was added to cells and incubated at 37℃ for 1 hour. Afterward, 50 μg/ml PI and 0.2% Triton X-100 were added and incubated at 4℃ in the dark for 30 minutes. BD FACSCalibur CellSorting System was used to measure cell cycle phases. 2-3×104 cells were counted and analyzed using ModFit software.
2.15 Xenograft mouse models
4-week-old BALB/c nude mice were purchased from Charles River Laboratories (Beijing, China) fed on ordinary diet. Xenograft tumors were established by subcutaneous injection of 200μl cell suspension (5×105 cells) into the underarms or backs of nude mice. The tumor volume was calculated using the following formula, Tumor Volume(mm3) = (Long diameter × Short diameter2)/2. Mice were euthanized 30 days after inoculation, and tumors removed for subsequent analysis.
2.16 4 plex Isobaric Tags for Relative and Absolute Quantitation (iTRAQ) assays
Proteins was extracted from CCDC12 over-expressing HCT116 cells and control HCT116 cells. The Bradford quantitative method was used to determine the total protein content. Proteins were reductively alkylated using DTT and TEAB. After enzyme digestion, proteins were acidified with 0.1% FA. The components were grouped based on high pH C18 (chromatograph was Thermo DINOEX Ultimate 3000 BioRS; analytical column was Durashell C18, 5μm, 100 Å, 4.6×250mm) and the isolated components were analyzed by LC-MS/MS (Thermo Fisher Q-exactive HF-X; AB SCIEX analytical column: 75 μm inner diameter, packed with 3 μm, 120 Å ChromXP C18 column, 10 cm long; eksigent Chromxp Trap Column: 3 μm C18-CL, 120 Å, 350 μm×0.5 mm). The results were analyzed through Cluster of Orthologous Groups of proteins (COG) analysis, Gene Ontology (GO) analysis, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment.
2.17 Statistical Analysis
All statistical analysis was performed using R 3.5.1. Data were expressed as mean ± SD. One-way analysis of variance and Student’s T test were used to analyzed differences among groups. χ2 test or linear correlation was used to determine the correlation between CCDC12 expression and clinicopathological features. Kaplan-Meier method was used to generate survival curves with log-rank test. MAF > 0.05, FDR < 0.1, α = 0.05 and Pvalue < 0.05 with two sides were considered statistically significant.