mRNA expression analysis
Oncomine (http://www.oncomine.org) is a large cancer microarray database and web-based data-mining platform which covers thousands of samples for gene differential expression analysis[15]. The mRNA expression of EPB41L1 was compared between colorectal cancer tissues and normal tissues by Oncomine database, as well as the DNA copy number variations (CNVs). This analysis based on a series of researches about COAD, including TCGA Colorectal 2, Kurashina Colon, and Ki Colon[16, 17]. We also performed a meta-analysis on related colorectal cancer studies to further confirm EPB41L1 expression levels. The student’s t-test was performed to assess whether EPB41L1 expressed higher in cancer tissue than in normal tissues. The threshold of p-value was set to 1E-4 and the threshold of fold change was set to 2 as conditions for screening analyses.
Meanwhile, two microarray datasets GSE41328 and GSE81558, which include 10 and 23 colorectal cancer patients, respectively, were analyzed by GEO2R software for external validation[18]. The normalized expression matrix of microarray data could be directly downloaded from the dataset. The probes were annotated by using the corresponding annotation files from the dataset as well.
Expression of EPB41L1 in various COAD sub‑groups
UALCAN (http://ualcan.path.uab.edu) is a comprehensive and interactive web resource that performs to in-depth analyses of TCGA gene expression data by using TCGA level 3 RNA-seq and clinical data. It allows users to analyze the relative expression of a certain gene across tumor and normal samples, and in various tumor sub-groups based on individual cancer stages, tumor grade, race, body weight and other clinicopathologic features[19].
Prognostic analysis of Differentially Expressed EPB41L1 Gene in COAD patients.
The association of differential expression status of EPB41L1 gene with patient survival was examined using TCGA database information via The Human Protein Atlas (https://www.proteinatlas.org/)[20]. Kaplan–Meier (KM) survival plot were drawn for prognostic values of EPB41L1 gene that had significant up-or downregulation in COAD patients at the mRNA level. Meanwhile, the prognostic analysis by Kaplan–Meier survival curve was obtained by the UALCAN database for validation. Kaplan meier plot showing effect of gene expression on patient survival. Significance of survival impact is measured by log rank test.
EPB41L1 mutation in COAD
The cBioPortal for Cancer Genomics (http://cbioportal .org) is a Web resource that can be used to explore and analyze multidimensional cancer genomics datasets[21, 22]. We used c-BioPortal to analyze EPB41L1 mutation in the CPTAC-2 Prospective whole Exome Sequencing of 110 COAD tumor samples[23]. And the OncoPrint is a graphical summary that can display genetic alterations across a set of tumor samples in EPB41L1.
Co-expression gene prediction Correlated with EPB41L1 in COAD
The LinkedOmics database (http://www.linkedomics.org) is a publicly available portal that contains multi-omics and clinical data for 32 cancer types from the TCGA project and allows users to analyze these data comprehensively[24]. The differentially expressed genes related to EPB41L1 were screened from the TCGA COAD cohort (n = 379) through the LinkFinder analytical module in the LinkedOmics database. And the correlation of results was tested by the Spearman correlation coefficient.
GO and KEGG Pathway Enrichment Analysis
The differentially expressed genes related to EPB41L1 from the LinkedOmics database were annotated by the Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.8 (https://david.ncifcrf.gov/), an bioinformatics resource provides a functional annotation tools for investigators to understand biological meaning from large genes lists[25, 26]. Gene Ontology (GO) [containing cellular component (CC), biological process (BP), and molecular function (MF)] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of co-expression genes and the genes in the most highly connected module from MCODE plugin in were also performed by the DAVID tool. The results were evaluated significantly at P < 0.05 statistical level verified by Fisher’s exact test.
Establishment of interactive network and modules
The interaction between proteins could be sought through the STRING database that is available online at https://string-db.org/, we screened out co-expressed genes with interaction scores greater than 0.4 to establish a protein–protein interactive (PPI) network[27]. Then the PPI network was visualized by Cytoscape analysis software with version 3.7.2[28], in which we could find the densely interconnected protein-interactive regions, and cluster them into hub gene modules with the degree cut-off, haircut on, k-core, node score cut-off, and max depth set as 2, 0.1, 2, 0.2, and 100 through the Molecular Complex Detection (MCODE) version 1.6.1 plugin in Cytoscape to prepare for the next analysis[29].
Statistical analysis
T test was used for differential expression analysis, Log-rank test was used to indicate statistical significance of survival correlation between groups, the differences were considered significant when P ˂ 0.05.