ARGS acquisition
135 differentially expressed genes (DEGs) associated with anoikis were identified through the TCGA-BC and GEO databases (Figure 1 A, Figure 1B). a total of 640 ARGS were available in the Genecards and Har monozome websites. based on univariate Cox regression analysis, 42 of 135 DEGs were associated with survival and p< 0.05,km<0.05. Forest plots showed that the top 42 DEGs were most strongly associated with prognosis of BC patients (p<0.05,Figure 1C). The remaining 30 genes were linked to poor prognosis except for 12 genes, PIK3R1, CCND1, NTRK2, FOXA1, KRT14, TP63, SPP1, BUB3, NAT1, LAMB3, LAMA3 and MUC1. Meanwhile, the network diagram more clearly arranged the relationship between gene expression levels in the top 42 positions (Figure 1D). We downloaded CNV data from the TCGA database to investigate the variance of deleted ARGS on chromosomes and the loci of each gene on chromosomes. The frequency plots showed that 27 genes, including CCND1, BIRC5, S100A11, MUC1, LAMB3, CENPF, and TP63, had a greater copy number increase frequency than the deletion frequency; 13 genes, including BC2, KRT14, YAP1, and BUB3, had a greater copy number deletion frequency than the increase frequency, on the contrary. The circle diagram showed that MMP13 and YAP1, which mainly showed "loss", and CCND1 and KIF18A, which showed "gain", were located on chromosome 11 (Figure 1E and Figure 1F).
Consistent Clustering of the 42 ARGs in BC
Consensus clustering was performed on 42 ARGS (p<0.05) and univariate cox analysis results using the "Consensus Cluster Plus" R program. When k = 2, the cohort was effectively divided into 2 subtypes (Figure 2A). OS analysis showed a significant difference in prognosis between the 2 subtypes (p<0.001,Figure 2B). The accuracy was checked using PCA, tSNE, and UMAP, and the results showed that the samples of subtype A and subtype B were separated in the graph, which then indicated that the samples of subtype A and subtype B could be distinguished according to ARGS expression (Figure 2C).According to the heat map of ARGS expression and the corresponding clinicopathological characteristics of the two subtypes, high ARGS expression in group A was associated with a better prognosis for BC patients, and conversely, high expression of ARGS in group B was associated with a worse prognosis. In addition to analyzing the 42 ARGS's overall distribution, the obvious differences between clusters A and B need to be taken into account, and differential enrichment of GO and KEGG pathways was performed for clusters A and B by GSVA software (Figure 2D).
Immune infiltration and differential gene expression in group A and B
Significant differences in the level of immune cell infiltration between the two groups were shown using box plots (Figure 3A). cd56dim.natural.killer.cell, Mast.cell, Neutrophi, Eosinophi these immune cells are upregulated in fraction A; Activated.B.cell, Activated CD4.T.cell, and Activated.CD8.T.cell were upregulated in typing B. Interestingly, the percentage of infiltration of most all immune cells was higher in group A than in group B. Comparative study between groups A and B was done. The heat map of differential analysis showed that genes such as BUB3, CCND1, SLC39A6, NAT1 and FOXA1 were upregulated in fractal A; genes such as CD24, CDC25C, BUB1, PBK, CDK1 and MAD2L1 were upregulated in fractal B (Figure 3B); as observed by box line plot of differential analysis in between typing A and B is differential ARGS (Figure 3C).The GO and KEGG enrichment analyses of these differential genes revealed a number of associations, including "INTRACILIARY TRANSPORT" in the Biological Process (BP) class, "CILIARY TRANSPORT" in the Cellular Component (CC) class, and "CHANNEL INHIBITOR ACTIVITY" in the Molecular Function (MF) class. The molecular KEGG results showed that these genes were associated with "CELL_CYCLE" and "GRAFT_VERSUS_HOST_DISEAS" (Figure 3D).
ARGs determination and validation
Lasso regression and Cox analysis were performed using these 42 ARGS involved (Figure 4A, Figure 4B). By multivariate Cox analysis, 10 ARGS were identified as independent prognostic factors, of which CD24, PHLDA2, SLC2A1, YAP1, CDC25C, and EDA2R were high-risk genes; SLC39A6, LAMB3, BAK1, and PIK3R1 were low-risk genes (Figure 4C). According to the KM curves displayed, patients in the high-risk group in the TCGA-BC cohort had a worse outcome(Figure 4D). Sanjogram shows the association of clusters, risk and survival status associated with ARGS (Figure 4E). The risk difference analysis (Figure 4F) illustrates that the patient's risk score it is different between the two subtypes (p < 0.05), and that the patient's risk score is higher in subtype B.
Immune infiltration in different risk groups
To display the proportion of different immune cells, the risk scores of BC samples were first sorted from low to high (Figure 5A). The proportion of Macrophages gradually grew in proportion to the risk score as it rose (R=0.15, Figure 5B). SLC39A6 in particular was closely connected to PIK3R1 infiltration in Mast cells. The 10 genes used to calculate the risk score had strong ties to numerous immune cells (Figure 5C). Infiltration of T cells CD4 and mast cells was higher in the low-risk group (Figure 5D). This shows that the poor prognosis of BC may be significantly influenced by reduced mast cell expression. Also, the predicted values of the expression profiles were used to compute the stromal score and immunological score in the high-risk and low-risk groups. (Figure 5E).
Establishment of a prognostic Nomogram for BC patients
By multivariate Cox analysis and with a p value <0.05 ,it was indicated that age, T1, T2, N1, N2 and risk score were identified as independent predictors of BC in the TCGA population (Figure 6A). Information (Gender, Age, T, N, risk) was subsequently incorporated in the Nomogram (Figure 6B). To assess the agreement between the predicted OS of the prediction model and the actual OS calibration plots were created. The results showed that the predictions of the column line plots were accurate (Figure 6C). The efficiency of the developed model in precisely forecasting OS in BC patients was evaluated using time-dependent ROC curves. The risk scores fared well in regards to the TCGA cohort's OS prediction (AUCs for the 1-year, 3-year, and time-dependent ROC curves were used to assess the accuracy of the developed model for predicting OS in BC patients. The TCGA cohort performed well in predicting OS in these populations with risk scores (1-year, 3-year, and 5-year OS: 0.719, 0.719, and 0.709, respectively; Figure 6D). The DCA curves showed that risk scores at one, three, and five years were good predictors of survival in BC patients (Figure 6E).
Correlation analysis of ARGS and TME
We downloaded the single-cell dataset EMTAB8107 of BC through the TISCH database (http://tisch.comp-genomics.org/) and then examined the expression of 10 ARGS in TME. The EMTAB8107 dataset contains 19 cell populations and 11 intermediate cell types, and the photos show their distribution and numbers (Figure 7A.). CD24 and SLC39A were mainly expressed in malignant cells. PHLDA2 and YAP1 were mainly expressed in myofibroblast cells. In contrast, PIK3R1 and BAK1 were more uniformly expressed in individual cells (Figure 7B, Figure 7C).
RT-PCR validation of prognosis-related ARGS
We employed normal cell (MCF-10A) and BC cell (MDA-MB-231) to confirm the expression levels of these 10 genes in order to further confirm the expression of ARGS. As a result of RT-PCR research, it was discovered that 8 ARGS (YAP1, PIK3R1, BAK1, PHLDA2, EDA2R, CD24, SLC2A1, and CDC25C) were strongly expressed in BC cells while 2 ARGS (SLC39A6 and LAMB) were only moderately expressed (Figure 8). The results suggest that these genes may be important biomarkers associated with prognosis in BC patients.