Gained microarray data
The GSE20711(6), GSE61304(7), GSE139038(8), GSE124646(9), GSE33447(10) and GSE5764(11) gene expression profile matrix files were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The platform of the GSE20711 dataset is the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, and this dataset contains 2 normal breast tissue and 88 breast cancer tissues. The platform of the GSE61304 dataset is the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, and this dataset contains 4 normal breast tissue and 58 breast cancer tissues. The platform of the GSE139038 dataset is the GPL27630 Print_1437, and this dataset contains 24 normal breast tissue and 41 breast cancer tissues. The platform of the GSE124646 dataset is the GPL96, and this dataset contains 10 normal breast tissue and 10 breast cancer tissues. The platform of the GSE33447 dataset is the GPL14550, and this dataset contains 8 normal breast tissue and 8 breast cancer tissues. The platform of the GSE5764 dataset is the GPL570, and this dataset contains 20 normal breast tissue and 12 breast cancer tissues.
Identification of robust DEGs
We downloaded series matrix files of datasets from GEO. The R package “limma”(12) was utilized to normalize the data and find DEGs. We then used RRA to integrate the results of those 6 datasets to find the most significant DEGs(13). The P value of each gene indicated its ranking in the final gene list, and genes with adjusted P < 0.05 were regarded as significant DEGs in the RRA analysis.
Function enrichment analyses
The commonly used bioinformatics analysis database, DAVID 6.8 database (https://david.ncifcrf.gov/) is a commonly used database for gene enrichment and functional annotation analysis. The database integrates biological data and analysis tools to provide systematic and comprehensive annotations of biological functions for large-scale gene or protein lists. Use DAVID to perform
and KEGG pathway enrichment analysis on the identified DEG, and download GO and KEGG pathway enrichment analysis results for subsequent. Then utilize Cytoscape 3.6.1 software to conduct a visual network analysis of the KEGG analysis results. If P <0.05, the result is considered statistically significant.
PPI network analysis
Studying the interaction network between proteins helps to mine the core regulatory genes. What we are interested in being actually "gene interaction". The search tool for searching interacting genes/proteins is a search tool that can analyze the interaction between proteins (https://string-db.org/). Using STRING to analyze DEG's PPI network can help us understand this relationship between different genes. Cytoscape software was utilized to screen hub genes according to degree.
Prognosis analysis and Methylation analyses
UALCAN is a comprehensive, user-friendly and interactive web resource for analyzing cancer OMICS data. UALCAN provides easy access to published cancer OMICS data (TCGA and MET500) and enables users to identify biomarkers or perform computer verification of potential genes of interest. It provides graphs and graphs describing gene expression and patient survival information based on gene expression , evaluate gene expression in molecular subtypes of breast and prostate cancer, and evaluate epigenetic regulation of gene expression by promoter methylation, and correlate with gene expression. UALCAN conducts a full-oncogene expression analysis. These resources allow researchers to collect valuable information and data about genes/targets of interest(14). We utilized this website to compare methylation levels of hub genes between the breast cancer and paracancerous normal tissues.
Analysis of gene expression and tumor-infiltrating immune cells
To investigate the correlation between the expression of selected hub genes and tumor infiltrating immune cells (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells), we applied the online tool TIMER (https://cistrome.shinyapps.io/timer/)(15, 16) which contains 10,897 samples from diverse cancer types available in the TCGA database.
Ethical statement
The study was approved by the Ethics Committee of Zhuzhou Central Hospital and conducted in accordance with the Declaration of Helsinki. Prior to the start of the study, all participants gave written informed consent.
Tissue samples and clinical data
51 breast cancer tissues (age, 45±0.26 years; male/female patient ratio, 1/60) and 32 non-tumor breast tissues (age, 47±0.73 years; n=31 female patients) were collected from the Zhuzhou Central Hospital (Hunan, China) betwee February in 2015 and July 2019. Patients with diabetes, nephritis, or cardiovascular disease were excluded. Patient information was obtained from medical records. The present study was approved by the Ethics Committee of Zhuzhou Central Hospital. Written informed consent was obtained from all of the participants.
Cell culture
The human breast cancer cell lines MCF-7 and breast cell lines MCF10A were obtained from American Type Culture Collection (Manassas, VA, USA). Cells cultured in high glucose Dulbecco's modified Eagle's medium (Invitrogen; Thermo Fisher Scientific, Inc., Waltham, MA, USA) containing 10% fetal bovine serum (FBS; Gibco; Thermo Fisher Scientific, Inc.) and maintained in a humidified atmosphere of 5% CO2 in air at 37℃.
RNA extraction, real-time PCR and RT-PCR
Total RNA was isolated from tissue samples using TRIzol reagent (Invitrogen) according to the manufacturer’s protocol. The cDNA was synthesized from the total RNA using a Reverse Transcription System (Fermentas, Glen Burnie, MD, USA) according to the manufacturer’s instructions. GAPDH was amplified in parallel as an internal control. The expression level of each gene was quantified by measuring the cycle threshold (Ct) values and normalized relative to that of GAPDH using the 2-ΔΔCt method. The primers used in the reaction were as follows:
COL11A1,(forward, 5’-TAACATCGCTGACGGGAAGTG-3’, reverse, 5’-CCGTGATTCCATTGGTATCAACA-3’).
SFRP1,(forward, 5’-ACGTGGGCTACAAGAAGATGG-3’, reverse, 5’-CAGCGACACGGGTAGATGG-3’).
MMP1,(forward, 5’-CTCTGGAGTAATGTCACACCTCT-3’, reverse, 5’-TGTTGGTCCACCTTTCATCTTC-3’).
WIF1,(forward, 5’-CTGATGGGTTCCACGGACC-3’, reverse, 5’-AGAAACCAGGAGTCACACAAAG-3’)
Western blot
Protein was extracted from indicated cells by using RIPA lysis buffer. Protein concentrations were determined using a BCA Protein Assay kit (Thermo Fisher Scientific, Rockford, IL, USA). A total of 60 μg of protein was separated on a 10% SDS-PAGE gel and transferred onto polyvinylidene difluoride membranes (Millipore, Billerica, MA, USA), which were soaked in 5% nonfat milk for 1 h and then incubated with corresponding primary antibodies overnight at 4℃. Antibodies used in this study: rabbit polyclonal anti-COL11A1(ab64883)(1:500 dilution), rabbit polyclonal anti-SFRP1(ab4193)(1:500 dilution), rabbit polyclonal anti-MMP1(ab137332)(1:500 dilution), rabbit polyclonal anti-WIF1(ab186845) (1:500 dilution), and rabbit polyclonal anti-β-Tubulin (1:3000 dilution) from Proteintech (Wuhan, China). After washing with 1×TBST three times for 8 min each, the membranes were incubated with the corresponding secondary antibodies for 1 h at 37℃, and then washed with 1×TBST for three times again, and finally the bands were visualized using an ECL kit (Millipore). Signals were quantified by Image-J software and normalized to β-tubulin.