Data collection and Data preprocessing
We collected two breast cancer datasets in the present study with complete expression profiles, copy number and PAM50 molecular subtype data from Breast Invasive Carcinoma (TCGA, PanCancer Atlas, train dataset) and Breast Cancer (METABRIC, Nature 2012 & Nat Commun 2016, validation set). The data from TCGA batch-corrected gene expression profile, curated clinical information, and unified mutation and segmented data were download from PanCanAtlas (https://gdc.cancer.gov/about-data/publications/pancanatlas). As was described by Guan et al[17], quartile normalization was applied to group sample using “normalizeBetweenArrays” of R packages limma (Version 3.48.1) and Log2(TPM+0.5) transformation was conducted for subsequent data analysis.
As for the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium), the corresponding data were processed to extract from cBioPortal for Cancer Genomics[18] (cBioPortal, http://www.cbioportal.org), and was exported in “mRNA expression z-scores relative to all samples (log microarray)” format. Specifically, genes with zero expression were excluded in all samples and all gene names were remapped to official gene symbols according to the Multi-symbol checker tools (https://www.genenames.org/tools/multi-symbol-checker/).
The stringent inclusion and exclusion criteria were developed for the current study based on the objective of the study and bias reduction as follows. We only retained the “primary breast invasive ductal carcinoma” histopathological subtype and extracted luminal A molecular subtype from “subtype” field of TCGA and “PAM50 + Claudin-low subtype” field of METABRIC.
Copy number alteration analysis of MDM4
The cBioPortal database[18] (https://www.cbioportal.org) developed by research team from Memorial Sloan-Kettering Cancer Center, was user-friendly and public available versatile online platform with exploring and visualizing multi-dimensional cancer genomics, cancer transcriptomics.
As was mentioned above, we performed the corresponding search expression patterns to filter eligible samples based on cBioPortal Onco Query Language (OQL) as follows: TP53: MUT=R175 R245 R248 R249 R273 R282; MDM2: CNA >= GAIN; MDM4: CNA >= GAIN; Moreover, the oncoprint, mutual exclusivity and co-occurrence of MDM4-related genes were analyzed accordingly. Besides, relations between the CNV of MDM4 and expression level of MDM4 were visualized by Vioplot and correlation plot with R package ggstatsplot (Version 0.8.0).
mRNA expression profile analysis of MDM4
UALCAN platform[19] (http://ualcan.path.uab.edu/index.html) designed by researchers from the University of Alabama Birmingham, aims to provide frequently updated and interactive bioinformatics platform for analyzing cancer multi-omics data. Besides, bc-GenExMiner v4.7[20] (Breast Cancer Gene-Expression Miner v4.7) online data set (http://bcgenex.centregauducheau.fr) was used to finely validate the findings above.
Gene set enrichment analysis (GSEA)
Gene set enrichment analysis (GSEA), which showed a significant improvement in the use of gene expression profile data, was performed by R package fgsea (Version 1.18.0) with parameters: minSize= 5, maxSize= 1500, nperm=15000 using MSigDB C7: immunologic signature gene sets (5219 gene sets available, V7.4), C8: cell type signature gene sets (671 gene sets available, V7.4) and CP: canonical pathways (2922 gene sets available, V7.4). The Gene-pathway data file is available for download on publicly available MSigDB website[21]: http://www.gsea-msigdb.org/gsea/msigdb/index.jsp. Gene sets with a false discovery rate (FDR) value < 0.05 was considered as significantly enriched.
Retrieving immune gene signatures of biomarkers for ICIs therapy
Increasingly the literature [22-24]suggests that response of cancer immune check-point therapy was predicted by a series of biomarkers, including PD1/PD-L1, DNA damage response related gene, major histocompatibility complex (MHC), IFN responses and so on.
Determining gene module related to biomarker of ICIs response using WGCNA algorithm
WGCNA methods [25]was performed to identify gene modules that are significantly related to biomarker of ICIs response in MDM4 gain/amplification luminal A type BC. Firstly, we pre-processed the input data, including remove the outlier sample and gene zero variance. Then, optimal soft threshold for correlation matrix, adjacency matrix and topology overlap matrix was calculated and network was constructed using the blockwiseModules function with the following parameters: networkType = "signed", TOMType = "signed", corType = "bicor", minModuleSize = 30, mergeCutHeight = 0.25, minKMEtoStay = 0.8. The heatmap of module-trait associations was generated using the "labeledHeatmap" function. The statistical computing and visualization operations are carried out with functions from the R packages WGCNA (Version 1.70-3). We extracted modules with the highest correlation of gene module related to biomarker of ICIs response.
GO and KEGG enrichment analysis
Metascape[26] (http://metascape.org/gp/index.html), was applied to perform the Gene Ontology (GO), KEGG (Kyoto Encyclopedia of Genes and Genomes), and transcript factor enrichment analysis for gene module from WGCNA. Enriched categories with a corrected p-value < 0.05 cutoff were considered statistically significant.
Gene Set Variation Analysis
The input expression matrix and reference gene set were integrated to calculate gene signature score by R package gene set variation analysis (GSVA, Version 1.40.1) with default parameters based on single-sample gene-set enrichment analysis (ssGSEA).
Consensus clustering for immune subtypes
Unlike other clustering algorithm represented by K-means, consensus clustering analysis does not require a priori specifying the number of clusters. Breast Invasive Carcinoma (TCGA, PanCancer Atlas, training dataset) and Breast Cancer (METABRIC, Nature 2012 & Nat Commun 2016, validation set) were grouped into several subgroups according the mRNA expression levels of immune-related gene signatures (IRGs) using consensus clustering analysis by R package ConsensusClusterPlus (Version 1.56.0), respectively. Subsequently, principal component analysis (PCA) was carried out from results of consensus clustering analysis with the R package PCAtools (Version 2.4.0).
Predicting response to ICIs therapy for immune subtypes
More recent studies[27-29]indicated that ICIs therapy have proven effective in the treatment of numerous cancers, especially melanoma[30, 31] and non-small cell lung cancer[32]. However, immunotherapy seems to be less effective in breast cancer compared with another tumor. Besides, there are fewer clinical trials of ICIs in human breast cancer. Fortunately, TIDE algorithm[33](http://tide.dfci.harvard.edu/) was computationally designed to predict ICIs response.
Statistical analysis
Most of the statistical analysis were performed using R software version 4.1.0 (R Core Team, 2021). MDM4 mRNA expression in different MDM4 copy number status was were analyzed using one-way ANOVA (Welch ANOVA in cases of unequal variance) followed by the Games-Howell post hoc test. Correlation between the MDM4 gene copy number alteration and expression level of its was performed using the Spearman's rank correlation coefficient. All the P values were for a two-side test and considered statistically significant when P <0.05.