Machine learning and single-cell sequencing reveal the potential regulatory factors of mitochondrial autophagy in the progression of gastric cancer

As an important regulatory mechanism to remove damaged mitochondria and maintain the balance between internal and external cells, mitochondrial autophagy plays a key role in the progression and treatment of cancer Onishi (EMBO J 40(3): e104705, 2021). The purpose of this study is to comprehensively analyze the role of mitochondrial autophagy-related genes in the progression of gastric cancer (GC) by RNA sequencing (RNA-seq) and single-cell RNA sequencing (scRNA-seq). GSE26942, GSE54129,GSE66229,GSE183904 and other data sets were obtained by GEO databases. Using support vector machine recursive feature elimination (SVM-RVF) algorithm and random forest algorithm, the mitochondrial autophagy-related genes related to gastric cancer were obtained, respectively. After that, the model was constructed and the inflammatory factors, immune score and immune cell infiltration were analyzed. Furthermore, according to the scRNA-seq data of 28,836 cells from 13 GC samples, 18 cell clusters and 7 cell types were identified by scRNA-seq analysis. The expression level and signal pathway of related genes were verified by cell communication analysis. Finally, the regulatory network of cells was analyzed by SCENIC. MAP1LC3B, PGAW5, PINK1, TOMM40 and UBC are identified as key genes through machine learning algorithms. CXCL12-CXCR4, LGALS9-CD44, LGALS9-CD45 and MIF (CD74 + CD44) pathways may play an important role in endothelial cells with high score scores of T cells and monocytes in tumor environment. CEBPB, ETS1, GATA2, MATB, SPl1 and XBP1 were identified as candidate TF with specific regulatory expression in the GC cell cluster. The results of this study will provide implications for the study of the mechanism, diagnosis and treatment of mitochondrial autophagy in GC.


Introduction
Gastric cancer is a global health problem, and more than 1 million people are newly diagnosed with gastric cancer every year.Despite a decline in global morbidity and mortality over the past 50 years, gastric cancer remains the third leading cause of cancer-related death.Improving the survival rate of GC is the main clinical goal at present (Ajani et al. 2022).Heterogeneity in tumor microrings is considered to be one of the main determinants of tumor metastasis, drug resistance and recurrence.Seeking the fine characteristics of tumor heterogeneity and understanding the molecular mechanism of GC are the core work of developing potential drug targets for GC therapy (Li et al. 2022).
Mitochondrial autophagy is the effective degradation, removal and recycling of dysfunctional and damaged mitochondria through autophagy (Onishi 2021;Abate et al. 2020;Ashrafi and Schwarz 2013;Youle 2019).In the process of cancer progression, mitochondrial autophagy has been proven to contribute to carcinogenesis, inhibition of iron cell apoptosis, maintenance of cancer stem cells, tumor immune escape, drug resistance and so on (White et al. 2015;Liu et al. 2020;Smith and Macleod 2019;Li et al. 2019;DeVorkin et al. 2019;Mocholi et al. 2018).In addition, mitochondrial autophagy has been found to promote the plasticity of cancer stem cells through metabolic remodeling to better adapt to the tumor microenvironment (Katajisto et al. 2015).Therefore, it is necessary to find a target that can affect mitochondrial autophagy and regulate the progression of gastric cancer.
Single-cell RNA sequencing (scRNA-seq) is an important tool for analyzing the characteristics of various cell types in and around tumors (Suvà and Tirosh 2019).It provides high-throughput and high-resolution transcriptome analysis of individual cells, and enables a deeper understanding of the diversity of cell states and the heterogeneity of cell populations (González-Silva et al. 2020).At the same time, scRNA-seq contributes to the analysis of cellular communication, which is the basis for tissue development and homeostasis.Where the tissue function is appropriate, cells communicate through physical interactions and form a cell network by exchanging metabolites and ligand-receptor signals to draw a comprehensive map of multicellular ecosystems in the tumor microenvironment (TME) (Jin et al. 2021).SCENIC established a potential transcriptional regulatory network to analyze the cell-specific transcription factors that support the development of GC (Aibar et al. 2017).
Because of the important role of mitochondrial autophagy in the development of cancer, this paper comprehensively analyzed the role of mitochondrial-related genes in gastric malignant tumors.We use SVM-RVF algorithm and random forest algorithm to construct a clinical prediction model from GEO data and analyze the correlation between the model and tumor microenvironment, immune infiltration and treatment.The purpose of this study is to provide a prognostic model for the clinical diagnosis and treatment of gastric cancer, and to explore the potential mechanism of mitochondrial autophagyrelated genes affecting the progression of gastric cancer.

Data collection and preprocessing
Gene expression data and corresponding clinical information GSE26942, GSE54129, and GSE66229 dataset (Oh et al. 2018), including 749 GC patients, were retrieved from the Gene Expression Synthesis (GEO) database (http:// www.ncbi.nlm.nih.gov/ geo/.The batch effects were removed by the "combat" function of the "sva" Package of Limma was used to normalize the quantile of gene expression data (Ritchie et al. 2015).To ensure the accuracy and stability of the model, 29 human-related and verified mitochondrial autophagy-related genes were obtained and provided in Supplementary Table S1.

GSVA analyzes differences in autophagy levels between cancerous and normal tissues
The mitochondrial autophagy gene set was used as a whole, and the "R package of GSVA" was used to analyze the difference in mitophagy levels between gastric cancer and normal tissues.

Identification of differentially expressed and prognostic genes
A machine learning technology called SVM-RFE(R package "e1071") (Sanz et al. 2018), whose working principle is support vector machine, finds the best variable by deleting the feature vectors generated by SVM-RFE to further screen genes related to mitochondrial autophagy in gastric cancer.In the junior queue, use R.4.0.3 the latest versions of "stringr" and "limma" software packages identified the differentially expressed mitochondrial autophagy-related genes between tumor and adjacent tissues: through, the prognosis model was constructed.For ranking the important indicators, the mean decrease Gini (MDG) in the random forest algorithm was applied to quantify which index contributes most to the classification accuracy.Here, we used GSE26942, GSE54129 and GSE66229 to develop the random forest model with the specific parameters of max features: 1.auto, n estimators: 500 min, sample leaf, and the number of variables tried at each split.2. The top 10 screened-out hub genes were used for subsequent research and experimental verification.The top10 gene is selected to intersect with the gene set obtained by the random forest algorithm (Sapir-Pichhadze and Kaplan 2020).Then, we used logistic regression to build a predictive nomogram related to mitochondrial autophagy genes for the gene sets that intersected, used bootstrap to internally verify the data set, and used the ROC curve and DCA decision curve to evaluate the accuracy of the model.

Unsupervised clustering distinguishes different clinical characteristics of patients
The ConsenusClusterPlus package was used to analyze the cluster analysis of the above 749 GC samples.According to the optimal CDF curve and triangular area diagram (see supplementary figure S2, S3), three molecular subtypes clus-ter1, cluster2 and cluster3 were produced.The comprehensive evaluation was divided into three groups to evaluate the clinicopathological characteristics of different subgroups of patients.

scRNA-seq sample processing and analysis
scRNA-seq data for 13 GC samples are also downloaded from the GEO database (item number GSE183904) (Kumar et al. 2022).First, use "seuratRpackage" to process scRNA-seq data.The percentage of genes was calculated by PercentageFeatureSet function, and then the data were qualitatively controlled.The standard was that the number of genes expressed in each cell was more than 200 but less than 2500, and then the logarithmic normalization method was used to merge and standardize the sample data.The first 2000 highly variant genes were screened by FindVariableFeature function.Scaling analysis and principal component analysis were carried out according to the RunPCA function of highly variant genes.Then, the cells are clustered by FindNeighbors and FindCluster functions.The first 40 principal components are selected and the dimension is further reduced by t-distribution random neighborhood embedding (t-SNE).
The marker genes in each cluster were identified by Fin-dAllMarkers.After that, cellular communication in tumor microenvironment was evaluated by CellChat (R packet "CellChat").CellChat can use network analysis and pattern recognition technology to predict the main output and input of cell signals, as well as how cells and signals coordinate their tasks.

Single-cell regulatory network reasoning and clustering (SCENIC) analysis revealed the gene regulatory network
To further analyze the regulators of transcription factors, we used "SCENIC R package" to analyze the activation regulators of each subgroup.In particular, a co-expression network of TF and gene sets is constructed using "runGenie R package".Then use the."RcisTarget R package" analyzes the potential direct binding targets of transcription factors.Then, use the Motif dataset (hg19-tss-centered-10kb7species.mc9nr.feather) to construct rules for each TF in the scenic spot.The co-expression genes of each TF were constructed with "GENIE3Rpackage", and then the "runSCENIC" program was used to assist the generation of regulators (GRN).Finally, "AUCell R package" calculated the regulator activity score of each cell.For visualization, we map the regulator activity (AUC) score to TSNE maps and heatmaps (Wickham 2009).

Analysis of mitochondrial postural levels in gastric cancer tissues and normal tissues
To clearly demonstrate the specific process of this study, we have summarized the process of bioinformatics analysis, as shown in the figure (Fig. 1).Taking the mitochondrial autophagy gene set as a whole, through the analysis of "R package of GSVA", it was found that the level of mitochondrial autophagy was significantly up-regulated in gastric cancer tissues (Fig. 2).

The selection of hub genes based on mitochondrial autophagy gene set
The expression of the mitochondrial autophagy gene set in gastric cancer and normal tissues was drawn by heatmap and boxplot (Fig. 3a, b).The first 10 hub genes (MAP1LC3B, PGAM5, UBC, TOMM40, PINK1, ULK1, SQSTM1, MFN2, RPS27A, TOMM70A)) were selected in the analysis of SVM-RFE algorithm.The first 10 hub genes (MAP1LC3B, PGAM5, UBC, TOMM40, PINK1, ULK1, SQSTM1, MFN2, RPS27A, TOMM70A) are selected in the random forest algorithm(Fig.2C).Combining the results of the two machine algorithms, screened MAP1LC3B (OR = 17.41),PGAM5 (OR = 3.29), UBC (OR = 11.62),TOMM40 (OR = 1.08) and PINK1 (OR = 0.25), and drew a forest map (Fig. 3d).We think that they may be key mitochondrial autophagy-related genes affecting gastric cancer.Then, the predictive nomogram of five genes was constructed by logistic regression (Fig. 3e).The ROC curve showed that (AUC = 0.797) and bootstrap internal verification showed that our model was predictable for the prognosis of GC patients (Fig. 3f, g).The calibration curve is proved to have high accuracy of the actual observation results, and the DCA decision curve also has certain advantages in the surface model (Fig. 3h, I).

Immune infiltration analysis
According to the three methods in the TIMER database, the immune cell infiltration in gastric cancer TME regulated by five genes was calculated and the heat map was drawn.According to the CIBERSORT algorithm, MAP1LC3B, PGAM5, UBC and TOMM40 are closely related to the activity of M0 macrophages, M1 macrophages and CD4 memory T cells, while PINK1 is related to the decrease of M0 macrophages, M1 macrophages and CD4 memory T cells.On the contrary, MAP1LC3B, PGAM5, UBC and TOMM40 were negatively correlated with plasma cell activity, while PINK1 was closely related to increased plasma cell infiltration (Fig. 4a).According to the results of the inflammatory factor algorithm, the infiltration of IFNG, ILA1, CSF3, TNF and IL11 was closely related to the high expression of MAP1LC3B, PGAM5, UBC and TOMM40, and negatively correlated with the expression of PINK1.On the contrary, TGF β 3 was closely related to the high expression of PINK1, and negatively correlated with the expression of MAP1LC3B, PGAM5, UBC and TOMM40 (Fig. 4C).According to the results of MCPcounter, the infiltration of myeloiddendriticcell, endothelial cells and fibroblasts may be negatively correlated with the expression of MAP1LC3B, PGAM5, UBC and TOMM40, but positively correlated with the expression of PINK1, which may suggest that the infiltration of myeloiddendriticcell, endothelial cells and fibroblasts play a central role in mitochondrial autophagy in GC (Fig. 4b, d).

The relationship between prognosis-related immune genotyping and different clinicopathological characteristics in patients with GC
The comprehensive evaluation was divided into three groups to evaluate the clinicopathological characteristics of different subgroups of patients (Fig. 5a).Compared with the clinicopathological features of the three clusters in the GEO data set, cluster2 and cluster3 compared to the cluster1stage grade, compared with cluster1 and clus-ter2, the proportion of cluster3 males was significantly higher, and the age was relatively older.In addition, we also observed that the disease probability of cluster2 population was lower than that of the other two groups (Fig. 5b, c), and drew the box line of expression level for the three types of typing, and used inflammatory factors and MCPcounter to evaluate the immune infiltration of the three types (Fig. 5d, e, f).

Single-cell transcriptome map of gastric cancer
To study the cell diversity in gastric cancer, scRNA-seq analysis was performed on cancer and paracancerous  samples from 13 patients with GC.After the preliminary quality control assessment described above, a total of 28,836 cells were analyzed and single-cell transcripts were obtained (Fig. 6a, b).
After the standardization of gene expression, cells are clustered by FindNeighbors and FindCluster functions.The first 40 principal components were selected, and the dimension was further reduced by t-distribution random neighborhood embedding (umap), and seven main isolated cell groups were identified (Fig. 6c).Subsequently, the expression levels of the five core genes selected above were determined (Fig. 6d).It was found that five genes were obviously expressed in all 7 kinds of cells, which may indicate that they play an important role in the prognosis of GC.Using the AddMouduleScore function (R packet "Seurat") to calculate a score based on the five genes in the predictive model, it was found that the expression of the five core genes in gastric cancer and paracancerous tissues was the most significant difference in endothelial cells (Fig. 7a).

Complex cell-cell communication networks in the TME
To understand the tumor microenvironment of GC, CellChat (R packet "CellChat") was used to evaluate the cellular communication in the tumor microenvironment.In the output of ligand receptor results, the thermograph function is used to analyze the interaction between immune cells.According to the score of endothelial cells, we divided them into high, medium and low groups.The quantitative analysis of interaction showed that there was no significant difference between the score high endothelial cell group and score low endothelial cell group and other cells.However, the interaction weight between the score high endothelial cell group and T cells, NK cells and monocytes was stronger than that of the score low epithelial cells group (Fig. 7a).The number and weight of interactions showed that the score-high endothelial cell group had more interaction with T cells, NK cells and monocytes than the low endothelial cell group (Fig. 7b).We found that CXCL12-CXCR4, LGALS9-CD44, LGALS9-CD45 and MIF (CD74 + CD44) pathways may play an important role in endothelial cells with high score scores of T cells and monocytes in tumor environment (Fig. 7c, d).

Identification of transcription factor-target gene network
To study the activity of TF in seven cell subsets, SCENIC clustered the regulatory network of single-cell transcription factors.SCENIC remodeling regulators (transcription factors and their target genes) evaluate the activity of these found regulators in individual cells and use these cell activity patterns to find meaningful cell clusters.Regulator activity can also distinguish different cell subsets.Therefore, GC regulator activity is binarized and matched with cell subsets (Fig. 8a-c).Next, CEBPB, ETS1, GATA2, MATB, SPl1 and XBP1 were identified as candidate TF with specific regulatory expression in the GC cell cluster, and the corresponding motifs were listed (Fig. 8d).

Discussion
The role of mitochondrial autophagy in development and disease has been deeply studied.Cancer can use autophagy-mediated circulation to maintain mitochondrial function and energy homeostasis to meet the metabolic needs of growth and proliferation (White et al.The number of cell-to-cell interactions and the total interaction strength by R package "CellChat".c Pathway analysis between cells by R package "CellChat".d Heatmap of cell-to-cell pathway analysis 2015).However, the exact mechanism of the role of mitochondrial autophagy in gastric cancer is still unclear.Therefore, there is an urgent need to reveal the potential molecular mechanism of the effect of mitochondrial autophagy on tumor heterogeneity in GC TME with high cell resolution.Therefore, we used machine learning to construct a clinical prediction model based on mitochondrial autophagy and screened the core genes.In this study, we combined RNA-seq and scRNA-seq data to explore the role of mitochondrial autophagy-related genes in different cell subsets of GC and the relationship between them and analyzed the regulatory network of cells by SCENIC.
Five key mitochondrial autophagy genes were identified by SVM-RVF and random forest machine learning algorithms to establish prognostic models, including MAP1LC3B, PGAW5, PINK1, TOMM40 and UBC.MAP1LC3B has been shown to play a role in mitochondrial autophagy, helping to regulate the quantity and quality of mitochondria by eliminating mitochondria to basal levels to meet cell energy needs and preventing excessive ROS production (Wei et al. 2017).PGAM5 has been shown to play an important role in peptide-threonine dephosphorylation and necrotizing apoptosis (Wang et al. 2012).PINK1 encodes a serine/threonine protein kinase located in mitochondria, which is thought to protect cells from stressinduced mitochondrial dysfunction (Bernardini et al. 2017;Kim et al. 2008).TOMM40 has been shown to enhance protein transmembrane transporter activity (Humphries et al. 2005).Complex with BCAP31 mediates the translocation of mitochondrial membrane respiratory chain NADH dehydrogenase (complex I) from cytoplasm to mitochondria (Namba 2019).UBC is phosphorylated at Ser-65 by PINK1 during mitochondrial autophagy.Phosphorylated ubiquitin specifically binds and activates Parkin (PRKN), triggering mitochondrial autophagy (Kazlauskaite et al. 2014;Kane et al. 2014;Koyano et al. 2014;Wauer et al. 2015).According to the analysis of immune infiltration results, five signatures may play a key role in T cells and endothelial cells.After our work, these five core genes may play an important role in the occurrence and development of gastric cancer.
According to the scRNA-seq results, Mitochondrial autophagy in endothelial cells may play a key role in the progression of gastric cancer.We found that the five genes screened were highly expressed in TME of gastric cancer, while being more pronounced in T cells and endothelial cells.This is consistent with the results of Bulk RNA seq analysis.Endothelial cells have been shown to play an important role in the occurrence and development of cancer, including providing nutrition and oxygen to TME, activating tumor metastasis and immunomodulatory function (Butler et al. 2010;Wang et al. 2017;Tokumoto et al. 2015).In the study, we found that T cells, NK cells and monocytes all communicate extensively with tumor endothelial cells with high scores.In addition, cellphone showed that ligand-receptor pairs, including CXCL12-CXCR4, LGALS9-CD44 and LGALS9-CD45 and MIF (CD74 + CD44) mediate the communication between cell subtypes and endothelial cells in TME.Previous studies have suggested that signals mediated by CXCL12 and its receptor CXCR4 can participate in the formation of immunosuppressive TME and promote cancer progression by enhancing tumor angiogenesis (Gil et al. 1950).Compared with the results of scRNA-seq, the difference in endothelial cell subsets is the most significant and has extensive communication with T cells.It can be speculated that the genes related to mitochondrial autophagy can promote cancer progression through CXCL12-CXCR4 and participate in the formation of immunosuppressive TME.MIF is considered as an upstream regulator of innate immunity.Studies have shown that tumor cells regulate the expression of angiogenesis genes through MIF/CD74/MAPK axis in vitro, which is consistent with our results (Klemke et al. 2021).However, the role of mitochondrial autophagy in this process still needs further discussion.
To identify the cell-specific gene regulatory network, we analyzed TF at the single-cell level.In general, each subtype of endothelial cells, epithelial cells, monocytes, NK cells, smooth muscle cells, B cells and T cells shows different TFs characteristics.For NK cells exhibited a unique TF gene signature, such as GATA2 and FOSB, studies have shown that GATA2 deficiency is the cause of severe NK cell dysfunction (Mace et al. 2013).ETS1 was found to be mainly concentrated in endothelial cells and T cells, which is consistent with previous studies (Wang et al. 2022;Kim et al. 2019).In addition, CEBPB was found concentrated in smooth muscle cells and monocytes, XBP1 was enriched in B cells, and MAFB and SPL1 were concentrated in monocytes.To sum up, mitochondrial autophagy-mediated cell subtypes can regulate different TF regulatory networks to remodel and reprogram TME.

Conclusion
To sum up, we analyzed the RNA-seq data, identified several mitochondrial autophagy genes that are most critical to the progress of GC, analyzed their effects on immune infiltration in GC TME and clinicopathological characteristics, and constructed a more accurate prediction model.Using scRNAseq, we deeply analyzed seven kinds of cell clusters in TME and revealed their association and TF activity.This not only emphasizes the heterogeneity of mitochondrial autophagyrelated genes in TME but also provides a direction for the study of the GC regulation mechanism.However, as a preliminary study, our scRNA-seq depth is low and the samples are insufficient, and the conclusion still needs to be verified by further experiments.

Fig. 2 aFig. 3 aFig. 4 a
Fig. 2 a The "sva R package" is used to combine three datasets.b Boxplot of differences in mitophagic levels between gastric cancer tissue and normal tissue

Fig. 5 a
Fig. 5 a Consensus clustering matrix for k = 3. b Heatmap of the clinicopathological features of mitochondrial autophagy genes.c Clinicopathological characteristics of three clusters in the GEO dataset, including stage, gender, chemotherapy, and prevalence probability.

Fig. 6 a
Fig. 6 a Scatterplot of cell distribution of normal and cancer samples.b TSNE map of single cell data by R package "Seurat".c 7 cells were classified by cell type indicated by the genetic marker.d Expression

Fig. 7 a
Fig. 7 a Mitoscore of cancer and normal tissues in seven cell subsets.b The number of cell-to-cell interactions and the total interaction strength by R package "CellChat".c Pathway analysis between cells by R package "CellChat".d Heatmap of cell-to-cell pathway analysis

Fig. 8 a
Fig. 8 a Heatmap of regulon activity analyzed by SCENIC with default thresholds for binarization.b Heatmap shows the average regulatory activity of transcription factors in different cell types, with color bonds from blue to red indicating low to high relative expression levels.c Heatmap of the average regulatory activity of tran-