Transcriptome Data acquisition and processing
The transcriptome data, mutation data, and clinical data of BRCA were acquired from the The Cancer Genome Atlas (TCGA) databases, comprising a total of 1168 samples. Among these, 111 samples were classified as normal, while 1057 samples were categorized as tumour samples. Additionally, the Gene Expression Omnibus (GEO) expression profiles, specifically GSE42568, GSE20685 and GSE96058, were downloaded from the GEO database. Additional table S1 detailed the characteristics of samples derived from different cohorts (TCGA, GSE20685, GSE42568, GSE96058). To account for any discrepancies between the TCGA and GEO expression profiles, the "sva" package was employed for batch adjustment.
ScRNA-seq data acquisition and processingThe scRNA-seq data was obtained from the GSE161529 and GSE176078 database. Quality control of the scRNA-seq data was conducted using the "seurat" and "singleR" R packages. Cells with less than 15% expression of both mitochondrial and ribosomal genes were retained, along with genes whose expression levels ranged from 200 to 10,000 and were expressed in at least three cells. The remaining cells were normalized using a linear regression model with the "Log-normalisation" technique. Additionally, the "FindVariableFeatures" function was employed to identify 2000 hypervariable genes. The data was scaled using the "ScaleData" function, followed by the identification of the top 15 principal components (PCs) through t-distributed stochastic neighbor embedding (t-SNE) analysis to identify significant clusters.
Acquisition of the mitochondria and lysosome related genes
Mitochondria-related genes were derived from the intersection of MitoProteome and MitoCarta3.0 databases (Additional Fig. 1A), while the genes related to lysosomes were retrieved from the Gene Ontology (GO) database, resulting in a total of 872 lysosome-related genes. These genes are listed in Additional Table S2.
AUCell
The "AUCell" R package was utilized to compute activity scores for mitochondria and lysosomes in each cell lineage. The gene expression of each cell was subsequently ranked based on the area under the curve (AUC) value of mitochondria and lysosome related genes, enabling estimation of the proportion of highly expressed gene sets. Subsequently, the cells were stratified into high and low-AUC groups using the median score. The "FindAllMarkers" function was employed to analyze the differences between these high and low groups.
Weighted co-expression network (WGCNA) analysis
The "WGCNA" R package was employed to examine the interconnectivity among distinct gene sets and the correlation between the phenotype and various gene sets. The construction of the gene co-expression network entailed the utilization of weighted expression correlation. Subsequently, hierarchical cluster analysis was conducted based on weighted correlation, resulting in the identification of diverse gene modules. Subsequently, the phenotypic traits under investigation were presented for weighted analysis, wherein the correlation and reliability of all genes within each gene module and phenotypic traits were computed. The core module, deemed the most pertinent and significant, was subsequently identified. Genes originating from the blue and green modules were then designated as hub mitochondria/lysosome regulators.
Establishment of prognostic signature derived from Machine-learning
To assess the association between MLRGs and prognosis, we utilized the TCGA cohort as the training set and the GEO cohort as the testing set. We employed ten machine learning algorithms, namely CoxBoost, stepwise Cox, Lasso, Ridge, elastic net (Enet), survival support vector machines (survival-SVMs), generalized boosted regression models (GBMs), supervised principal components (SuperPC), partial least Cox (plsRcox), and RSF, to construct a prognostic model that is both accurate and comprehensive. The construction process of the prognostic model proceeded as follows: initially, prognostic mlMSGs were chosen through univariate Cox analysis in both the TCGA and GEO cohorts. Subsequently, 101 algorithms, resulting from pairwise combinations of 10 machine learning algorithms, were employed to develop the most precise and comprehensive model with the highest C-index performance. Lastly, the C-index of each validation cohort was computed, and the optimal model was determined based on the highest average C-index value.
Superiority and validity evaluation of the prognostic model
ROC curves were created to evaluate the prediction accuracy of the model over 1,3, and 5 years. The risk scores of the samples in the training and validation sets were computed, and subsequently, these samples were categorized into high- and low-risk groups based on their respective risk scores. The Kaplan-Meier method was employed to generate survival curves, and the statistical significance was determined through log-rank tests. Furthermore, a total of 18 prognostic models pertaining to breast cancer were obtained, and the C-index of each model within each cohort was calculated to assess the prognostic predictive capability of the entire signature. To evaluate the clinical utility of the model, we conducted a comprehensive analysis of clinical data obtained from samples in the TCGA and GEO cohorts. This analysis encompassed various factors such as Age, ER, PR, Grade, HER2, Stage, N, and Lymph node status. Furthermore, we compared the prognostic predictive capabilities of these clinical variables with those of risk scores, as measured by the C-index.
Mutational landscape analysis
The Mutation Annotation Format (MAF) was acquired using the "maftools" R package to illustrate the mutation landscape between high- and low-risk groups. Subsequently, we examined the associations of gene mutations through co-occurrences.
Analysis of immune-omics molecular characterization
The infiltration of various immune cells was compared between the high and low risk groups in the immune-related database, utilizing the IOBR package. Specifically, the distribution of M0, and M2 cells between these groups was examined. ImmuneCell AI was used to compare the differences in immune infiltration between high and low-risk groups. At the same time, the wilcoxTest was used to compare the differences in immune function between high and low-risk groups, and further compare the differential expression of ICDs and HLA family genes in high and low-risk groups. Additionally, the anti-cancer immune status of tumor tissue and normal tissue was inferred through the analysis of the tumor immune cycle across seven stages. This study aims to analyze the current state of anti-cancer immunity by examining the various stages of the Cancer-Immunity Cycle, which include the release of cancer cell antigens (Step 1), the presentation of cancer antigens (Step 2), the priming and activation of immune responses (Step 3), the migration of immune cells to tumor sites (Step 4), the infiltration of immune cells into tumors (Step 5), the recognition of cancer cells by T cells (Step 6), and the subsequent elimination of cancer cells (Step 7). The relative abundance of stromal cells, immune cells, and tumor cells was assessed in high- and low-risk groups using the "estimate" R package. Additionally, the immune checkpoint conditions and immune function of these two groups were analyzed and visualized using the "ggplot2" R package.
Drug sensitivity analysis
The R package "oncoPredict" was utilized to predict drug sensitivity by analyzing gene expression levels. The calculation of half-maximal inhibitory concentrations (IC50) for chemotherapeutic drugs was performed using the aforementioned "oncoPredict" R package. TIDE and ImmuneCell AI analyses were performed to assess the immunotherapy sensitivity of the high and low-risk groups.
Cell culture
HBL-100 human normal breast epithelial cells were provided by the Cell Resource Center of Shanghai Life Sciences Institute, as well as MDA-MB-231, HCC1806, MCF-7, and BT-474 human breast cancer cell lines. DMEM or RPMI-1640 (Gibco BRL, USA) was used to culture these cell lines. They were growth at 37°C with 5% CO2 and 10% fetal bovine serum (FBS) (Gibco BRL, USA).
Quantitative real-time PCR (qRT-PCR)
Total RNA was isolated from breast cancer cells using TRIzol reagent (Invitrogen, 15596018) following the manufacturer's instructions. The concentration and quality of the total RNA were assessed using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Subsequently, the total RNA was reverse-transcribed using a PrimeScript RT reagent kit (Takara Bio Inc., Japan). The resulting cDNA was combined with SYBR Green qPCR Master Mix (Takara Bio), primer, and Diethyl Pyrocarbonate (DEPC; Beyotime Institute of Biotechnology, China) water to achieve a final volume of 10 µL. The experiment was conducted in 96-well plates utilizing the LightCycler 480® Real Time PCR System (Roche Diagnostics, Switzerland). Each reaction was replicated at least three times. The mRNA expression levels were standardized against the levels of glyceraldehyde 3-phosphgate dehydrogenase (GAPDH). The detection of nonspecific amplifications was monitored through the analysis of melting curves. The primer sequences were listed in Additional table S3.
Small-interfering RNA (siRNA) transfection
The control used in this study was a non-specific scramble siRNA, while the experimental siRNA (RiboBio, Guangzhou, China) was transfected into breast cancer cells with 80% confluency using Lipofectamine 3000 (Invitrogen, CA, USA) as per the manufacturer's instructions. Following transfection, the cells were incubated for 48 hours in a culture incubator.
Colony formation
A total of 1×103 cells were transfected and subsequently cultured in 6-well plates for a duration of two weeks. Following the formation of cell clones, the cells were rinsed and fixed using a 4% paraformaldehyde (PFA) solution for a period of 15 minutes. Subsequently, these cells were stained with Crystal violet (Solarbio, China) for a duration of 20 minutes.
Pan-cancer analysis
The biogenic analysis tool sxdyc.com was utilized to investigate the differential expression of SHMT2 in normal and cancerous tissues, as well as its associations with stemness score, immune infiltration, immune inflammation, ICB, and TMB.
CCK8 assay
Cells were initially plated onto 96-well plates at a density of 2×103 and allowed to incubate overnight. Subsequently, the cells were subjected to varying durations of incubation (1, 2, 3, 4, 5, 6 days) at 37°C and 5% CO2. Following this incubation period, 10 µL of the CCK8 labeling reagent (0.5 mg/mL, Dojindo, Japan) was added to each well, and the cells were further incubated for an additional 2h at 37°C and 5% CO2. The absorbance of the cells at 450 nm was then measured using an enzyme-labeled meter (Thermo Scientific, Shanghai, China).
Migration and invasion experiments
Chambers with a diameter of 8µm (CORNING) were employed in the cell migration and invasion assay, either with or without matrigel (CORNING). Approximately 5x104 MCF-7 cells were seeded into the upper chamber and incubated in serum-free media containing varying peptide concentrations or an Akt inhibitor. Subsequently, DMEM with 10% fetal bovine serum was added to the lower chamber. Following a 48-hour incubation period at 37ºC with 95% air and 5% CO2, the membranes were fixed using a 4% paraformaldehyde solution for 20 minutes. Subsequently, the membranes were stained using a crystal violet solution for 15 minutes. The cells present on the upper chambers were gently wiped using a cotton swab. The average counts of migrative and invasive cells were assessed in five randomly selected fields under a light microscope at a magnification of 100x.
Statistical analysis
R (4.3.1) and Strawberry Perl (v5.32.1) were used to generate all the results and figures. The specific research methods and R packages used are described above. Three independent experiments were performed to record the data as mean ± standard deviation (SD). Student’s t-tests were utilized for comparisons between high- and low-groups. (*P < 0.05, **P < 0.01, ***P < 0.001).