1. Gene expression data download
In this study, we downloaded 33 tumor expression profile data and clinical data including breast cancer(a total of 1217 samples, including 1104 tumor samples and 113 normal tissue samples adjacent to cancer) from the Cancer Genome Atlas (TCGA,https://portal.gdc.cancer.gov/)[12]. Download the GSE42568[13] dataset from the Gene Expression Omnibus (GEO,https://www.ncbi.nlm.nih.gov/geo/)[14], including 121 samples (104 cases of breast cancer tissue, 17 cases of normal tissue).
2. Screening of differentially expressed mRNA
2.1 TCGA analysis
We download R project[15], library the “limma” package[16], set the parameters Ιlog2FCΙ > 2, adjust p.value<0.05 to screen the samples for differential mRNA expression. Select one of the differentially expressed genes as the research object. Considering that there are fewer normal samples in TCGA, we log on to the GEPIA website[17] and added normal samples in the GTEx database [18]for the difference analysis of target gene.
2.2 GEO array analysis
Library the “impute” package[19], set the parameters|logFC|>1, adjustP<0.05, input GSE42568 series matrix file, Obtain differentially expressed genes through Bayesian test. Obtain the differential expression of target gene in normal tissues and tumor tissues and use GraphPad prism 9[20] to draw a box-plot for visualization.
3. pan-cancer differential analysis
Use R-project and library the limma package, input the gene expression file of each tumor in a loop. Set parameters to creen tumor types with a sample size greater than 5 in the normal group and apply wilcox.test to extract the expression of the target gene in 33 cancers. Finally draw a box plot.
4. miRNA data download
Download Isoform Expression Quantification data from TCGA, Predict miRNA binding to mRNA from starbase [21] and download all data .
5. Screening of miRNA co-expressed differently with mRNA
Library the limma, ggpubr, ggExtra, reshape2 R packages[22–24], set the corFilter<-0.1, pvalue Filter<0.01, input the gene and miRNA expression file from TCGA, miRNA list file from starbase. Use R-project scripts to draw related miRNA scatter-plots and box-plots.
6. lncRNA data download
Predict lncRNA binding to miRNA from starbase and download all data. All the downloaded files are used for differential expression and survival analysis.
7. Screening of lncRNA co-expressed differently with miRNA
Library the limma, ggpubr, ggExtra, reshape2 R packages, set the corFilter>0.1, pvalue Filter<0.001, input the miRNA and mRNA expression file from TCGA and lncRNA list file from starbase. Use R project scripts to draw related lncRNA scatter-plots and box-plots.
8. Survival analysis
In this study, we used the Kaplan-Meier method to analyze the survival of mRNA, miRNA and lncRNA, and drew the survival curves. Library the limma, survival[25], survminer[26] packages, read survival data and expression data. Choose the best cut off and divide the samples into two groups with high and low expression, compare the survival differences between the high and low expression groups, and draw a survival curve. The survival curve of miRNA and lncRNA is drawn through R project, and the survival curve of mRNA is drawn through the GEPIA website.
9. The correlation between single gene and immune cells
Through the previous differential expression analysis, we obtained the target gene. In order to explore the correlation between gene copy number and immune cells and the correlation between gene expression and immune cells, We use the TIMER database[27], select sCNA, enter the target gene and target immune infiltrating cells, Select the high copy and submit it to get the correlation between the number of gene copies and immune cells. As for the correlation between gene expression and immune cells, we also use the TIMER database, select ‘gene’, target gene and target immune infiltrating cells. When the correlation coefficient is greater than 0.15, we believe that the two have a positive correlation.
10. Correlation analysis between single gene and immune checkpoint
We select CD274, CD276, CTLA-4 and PDCD1 as immune checkpoint genes for analysis. Log in to the TIMER website, select ‘Gene’, enter the target gene and immune checkpoint gene respectively, get the result graph, and verify it through the GEPIA website, save the result.When the correlation coefficient is greater than 0.15, we believe that the two have a positive correlation.