1. Identification of DEGs of MCF7 cells among 0.1% hypoxia or normoxia microenvironment using scRNA-seq data
Quality control of scRNA data of MCF7 cells subjected to 0.1% hypoxia or normoxia microenvironment was shown in Fig. 1A, plotting the range of single-cell RNA amounts, total gene numbers, and the proportion of mitochondria sequenced per cell. Scatterplot of mitochondrial gene proportions in relation to sequencing data and scatterplot of genes in relation to sequencing data was shown in Fig. 1B. 13186 genes and 1,500 highly variable genes found in the scRNA data between cells were list in Fig. 1C, and the top 10 variable genes were labeled. And these variable genes were divided into 20 different components by PCA, of which statistically significant components were further used for tSNE dimensionality reduction (Fig. 2A-D). After tSNE dimensionality reduction, MCF7 cells were divided into 4 clusters. According to the experimental design, cluster 0 and cluster 3 from 0.1% hypoxic microenvironment, cluster 1 and cluster 2 from normoxic microenvironment (Fig. 3A-B). Subsequently, 329 DEGs between hypoxia and normoxia clusters were identified, the violin plot shows the top 10 up-regulated and top 10 down-regulated differentially expressed genes between these two clusters. (Fig. 3C).
2. Functional analysis of hypoxia related DEGs
GO function and KEGG pathway analysis was performed on 329-hypoxia associated DEGs. In the biological processes, the hypoxia related DEGs were mainly enriched in response to oxygen levels, response to hypoxia, response to decreased oxygen levels et al. In the cellular component, chromosomal region, chromosome, telomeric region; nuclear chromosome, telomeric region et al. In the molecular functions, the hypoxia related DEGs were mainly enriched in ATPase activity, DNA-binding transcription factor binding, nuclear receptor binding et al. (Fig. 4A). In the KEGG pathways, the hypoxia related DEGs were mainly enriched in Pathways of neurodegeneration-multiple diseases, Parkinson disease, Alzheimer disease et al. (Fig. 4B). We used GSVA enrichment analysis to investigate the differences in biological behavior between hypoxic and normoxic clusters. As shown in Fig. 4C, in the hypoxic state, such as hypoxia, glycolytic pathways are activated, while oxidative phosphorylation and DNA repair pathways are inhibited, and this analysis is also consistent with experimental studies.
3. Construction and validation of hypoxia related signature in breast cancer patients
Total 1099 BRCA tumor samples downloaded from TCGA database were included in the analysis by removing the tissue information of 113 normal samples. A matrix of 329 depleted hypoxia-related DEGs was extracted for subsequent analysis. The results of univariate analysis indicated that 22 prognostic genes were statistically correlated with OS in BRCA patients, of which, 16 genes (BAMBI, BNIP3, CBX5, GRB10, HSPB8, KRT80, LSS, NDRG1, P4HA2, PGK1, SLC7A5, SRD5A3, ST3GAL1, TUBA1B, TXNRD1, XPOT) were considered as dangerous genes, 6 genes (CYBA, DUS1L, ERRFI1, FAU, PSME2, STC2) were thought as protective genes (Fig. 5A). LASSO regression analysis was done for the selection of the ideal genome, and 14 candidate prognostic genes were screened, including BAMBI, CBX5, CYBA, DUS1L, ERRFI1, GRB10, HSPB8, KRT80, LSS, P4HA2, PGK1, PSME2, SRD5A3, STC2(Fig. 5B-C). Finally, according the result of multivariate Cox regression analysis, 4 genes (ERRFI1, HSPB8, PGK1, STC2) was thought as independent prognostic factors in BRCA patients, and prognostic models were constructed (Fig. 5D). The risk score was obtained below: Risk score = (-0.3240× ERRFI1) + (0.1613× HSPB8) + (0.5816× PGK1) + (-0.1245 × STC2) (Fig. 4). Risk scores were used to classify TCGA-BRCA patients into high-risk and low-risk groups. Kaplan-Meier survival analysis showed a negative correlation between risk score and patient survival time (Fig. 6A). As the risk score increased, there were significantly more deaths in BRCA patients (Fig. 7A-C). And the area under the curves (AUCs) in ROC cures was 0.747, demonstrates positive predictive accuracy of this prognostic model (Fig. 6B). We also validated the predictive power of this prognostic feature with the GSE20685 dataset containing 327 breast cancer samples. As well, the number of people in the high-risk group who were in the death state was significantly higher (Fig. 7D-F). The accuracy of this model was confirmed in validation set with an AUC of 0.674 in the ROC curve (Fig. 6C-D).
4. Construction of Nomogram based on risk score and clinical characteristics
Samples were excluded for lack of any clinical features, at last, total of 705 cases of clinical information data were included. Clinical characteristic of 705 BRCA tumor samples in the TCGA database were presented by Table 1. The clinicopathological factors such as age, histological type (Infiltrating Ductal Carcinoma, Infiltrating Lobular Carcinoma and other), T stage (T0, T1, T2 and T3), N stage (N0, N1, N2, and N3), M stage (M0 and M1) were included. The results of univariate Cox regression analysis indicated that age, T stage, N stage, and M stage were correlated with OS in BRCA patients (Fig. 8A). After multivariate Cox regression analysis, it was determined that risk score, age, N stage, and M stage in clinical characteristics, were significantly associated with OS in BRCA patients (Fig. 8B). Depending on the results of the multifactor Cox regression analysis, age, T stage, N stage, M stage and risk score were integrated into the OS competing nomograms of BRCA patients (Fig. 9A). The calibration plot for the probability of OS at 3-, 5-, and 10-years showed promising line of agreement between the prediction using nomograms and actual observed survival (Fig. 9B-D).
Table 1
Clinical characteristic of breast invasive carcinoma (BRCA) patients in the TCGA database.
Characteristics
|
|
Total
|
%
|
Age at diagnosis(y)
|
≥ 65
|
186
|
26.38
|
|
< 65
|
519
|
73.62
|
Histological type
|
IDC
|
530
|
75.18
|
|
ILC
|
112
|
15.89
|
|
Other
|
63
|
8.94
|
T stage
|
T1
|
184
|
26.10
|
|
T2
|
432
|
61.28
|
|
T3
|
74
|
10.50
|
|
T4
|
15
|
2.13
|
N stage
|
N0
|
360
|
51.06
|
|
N1
|
224
|
31.77
|
|
N2
|
83
|
11.77
|
|
N3
|
38
|
5.39
|
M stage
|
M1
|
695
|
98.58
|
|
M0
|
10
|
1.42
|
IDL, Infiltrating Ductal Carcinoma; ILC, Infiltrating Lobular Carcinoma. |
5.The correlation of risk scores and immune cell infiltration in TCGA-BRCA database
Immune cell infiltration was computed using the CIBERSORT algorithm for 1099 BRCA tumor samples obtained from the TCGA database. As the risk score increased, B cell memory, B cell naive, Monocyte, T cell follicular helper, Macrophage M1, T cell CD8+, Myeloid dendritic cell resting was negatively activated, whereas NK cell resting, Macrophage M0, Macrophage M2 were positively activated (Fig. 10).