3.1 DE-FRGs and DE-FRGs were associated with prognosis in LUAD
A total of 48 FRGs associated promisingly with prognosis (Table S1 and Fig. S1) and 272 ERGs associated promisingly with prognosis (Table S2 and Fig. S2) were screened in LUAD samples of TCGA-LUAD. Afterwards, 40 prognostic-related DE-FRGs (Fig. 1a) and 222 prognostic-related DE-ERGs (Fig. 1b) were significantly different between LUAD and Control cohorts. And there were 193 relationship pairs from 28 DE-FRGs and 74 DE-ERGs, like TFAM-EIF2S1 (r = 0.52) (Fig. 1c). The PCA results showed that the explanatory degree of PC1 for ferroptosis level was 21.01% and that of PC2 was 9.16%, and the explanatory degree of PC1 for PCA of EMT level was 17.03% and that of PC2 was 8.43% (Fig. 1d). Then, we scored the ferroptosis level and EMT level combined with gene expression according to the explanatory degrees of PC1 and PC2, and obtained the correlation between the scores of ferroptosis and EMT levels (R = 0.92, P < 0.05) (Fig. 1e). The K-M survival curves showed that the group with the worst survival status was the FPI high + EPI high group among the 4 subgroups (Fig. 1f). And the LUAD patients in the other 3 groups were combined into the “Others” group, and the survival status between the FPI high + EPI high group and the “Others” group were different notably (Fig. 1g).
3.2 There were 7 candidate genes by MR analysis
MR analysis of the 40 prognostically relevant DE-FRGs as exposure factors and LUAD as outcome yielded 3 genes with significant causality (P < 0.05 in IVW): TFAM, SLC7A11, ALOX15 (Table 1). Next, MR analysis of 222 prognostically relevant DE-ERGs as exposure factors LUAD as an outcome yielded 4 genes that were significantly causally associated with LUAD (P < 0.05 in IVW), namely: GATA1, ALAD, PDCD4, NDRG2 (Table 2). The above 7 genes were used as candidate genes for further analysis.
Table 1 Information for three differentially expressed ferroptosis-related genes (DE-FRGs) causally associated with LUAD
Exposure
|
Outcome
|
Method
|
nsnp
|
b
|
se
|
P-value
|
Hor
|
eqtl-a-ENSG00000108064
(TFAM)
|
LUAD
ieu-a-965
|
IVW
|
3
|
0.050
|
0.003
|
1.51E-50
|
0.986
|
eqtl-a-ENSG00000151012
(SLC7A11)
|
4
|
0.460
|
0.125
|
0
|
0.271
|
eqtl-a-ENSG00000161905
(ALOX15)
|
3
|
-0.090
|
0.019
|
1.97E-06
|
0.700
|
Abbreviations: Hor: Horizontal Pleiotropy; IVW: Inverse variance weighted.
Table 2 Information for four differentially expressed epithelial-mesenchymal transition (EMT)-related genes (DE-ERGs) causally associated with LUAD
Exposure
|
Outcome
|
Method
|
nsnp
|
b
|
se
|
P-value
|
Hor
|
eqtl-a-ENSG00000102145
(GATA1)
|
LUAD
ieu-a-965
|
IVW
|
4
|
0.116
|
0.053
|
0.028
|
0.960
|
eqtl-a-ENSG00000148218
(ALAD)
|
4
|
0.144
|
0.070
|
0.041
|
0.911
|
eqtl-a-ENSG00000150593
(PDCD4)
|
4
|
0.116
|
0.046
|
0.012
|
0.885
|
eqtl-a-ENSG00000165795
(NDRG2)
|
6
|
-0.086
|
0.031
|
0.006
|
0.921
|
Abbreviations: Hor: Horizontal Pleiotropy; IVW: Inverse variance weighted.
3.3 A risk model was constructed and validated
Totally 6 model genes (TFAM, ALOX15, GATA1, ALAD, PDCD4 and NDRG2) were verified and selected through PH test (P > 0.05) and LASSO depended on 7 candidate genes (Fig. 2a). Subsequently, a risk model was structured and risk score was computed (Fig. 2b) on the basis of 6 model genes expression in TCGA-LUAD. The AUC values of ROC were all above 0.6 at 1, 3, and 5 years, suggesting that these 6 model genes were valuable predictors of survival status (Fig. 2c). Meanwhile, the K-M curves showed a significant difference between high and low risk teams of LUAD patient survival rates (P < 0.0001), with patients in high risk team having a worse survival status (Fig. 2d). Additionally, the risk curve was charted based on risk score and expression of model genes in two risk teams were different (Fig. 2e-f).
The risk model was validated by GSE11969. The AUC values in GSE11969 were also exceeded 0.6 at 1, 3 and 5 years, revealing that risk model predicted fairly well (Fig. S3a). K-M curves showed that LUAD patients had a higher survival probability in low risk team, consistent with training set (Fig. S3b). With both risk teams, a heatmap of the expression of the model's genes as well as the distribution of risk score and survival time were plotted (Fig. S3c-d).
3.4 Stage, N, T, and risk score were identified as independent prognostic factors
Stage, N, T, and risk score were screened out as independent prognostic factors via univariate and multivariate cox regression (Fig. 3a-b). Independent prognostic factors were scored separately and survival at 1, 3, and 5 years was predicted based on the total points, and the results were displayed in the nomogram (Fig. 3c). The calibration, ROC (AUC > 0.6) and DCA curves suggested that nomogram had favored prediction accuracy (Fig. 3d-f).
3.5 GSVA, GO function and KEGG pathway enrichment analysis
Analysis of GSVA enrichment showed that the high and low risk teams were highly enriched in a total of 138 pathways (Fig. S4a). Based on the t-value of GSVA, the top 10 up-regulated (e.g. ‘pyrimidine metabolism’, ‘cell cycle’, ‘basal transcription factors’) and down-regulated (e.g. ‘GnRH signaling pathway’, ‘calcium signaling pathway’, ‘aldosterone regulated sodium reabsorption’) pathways in terms of absolute t-value from largest to smallest were shown (Fig. S4b). According to the gene expression levels between both risk teams of LUAD samples in TCGA-LUAD dataset, a total of 557 DEGs were screened out, of which 194 were up-regulated and 363 were down-regulated. DEGs was mainly enriched in GO entries such as ‘microtubule−based movement’, ‘mitotic nuclear division’ and ‘nuclear division’ in BP, ‘microtubule’, ‘axoneme’ and ‘ciliary plasm’ in CC, and ‘microtubule motor activity’, ‘cytoskeletal motor activity’, and ‘tubulin binding’ in MF (Fig. S4c). KEGG results indicated that 5 pathways were enriched in ‘motor proteins’, ‘cell cycle’, ‘hematopoietic cell lineage’, ‘arachidonic acid metabolism’ and ‘progesterone-mediated oocyte maturation’ (Fig. S4d).
3.6 Six model genes were significantly causally associated with LUAD
The GWAS IDs of model genes and LUAD were listed in Supplementary Table 3. Following screening, 3 IVs for TFAM, 3 IVs for ALOX15, 4 IVs for GATA1, 4 IVs for ALAD, 4 IVs for PDCD4, 6 IVs for NDRG2 were obtained to MR analysis. The IVW method revealed a causal relationship between TFAM (P < 0.05, OR = 1.05), ALOX15 (P < 0.05, OR = 0.91), GATA1 (P = 0.03, OR = 1.12), ALAD (P = 0.04, OR = 1.15), PDCD4 (P = 0.01, OR = 1.12), and NDRG2 (P = 0.01, OR = 0.92) with LUAD (Table 3). Scatter plots for TFAM, GATA1, ALAD and PDCD4 had positive slopes, suggesting that they were risk factors, while ALOX15 and NDRG2 had negative slopes, indicating that they were protection factors for LUAD (Fig. S5). The results of forest plots were consistent with the previous results (Fig. S6). The symmetrical distribution of the samples along both sides of the IVW line in the funnel plot revealed that MR analysis results conformed to the second law of MR grouping (Fig. S7).
Table 3 Mendelian randomization (MR) analysis for causal relationship of six model genes and LUAD
Exposure
|
Outcome
|
Gene name
|
Method
|
nsnp
|
OR
|
95%CI
|
P-value
|
eqtl-a-ENSG00000108064
|
LUAD
|
TFAM
|
MR Egger
|
3
|
1.05
|
0.88-1.26
|
0.68
|
Weighted median
|
3
|
1.05
|
0.95-1.16
|
0.33
|
Inverse variance weighted
|
3
|
1.05
|
1.04-1.06
|
0.00
|
Simple mode
|
3
|
1.06
|
0.87-1.28
|
0.64
|
Weighted mode
|
3
|
1.05
|
0.95-1.16
|
0.43
|
eqtl-a-ENSG00000161905
|
LUAD
|
ALOX15
|
MR Egger
|
3
|
0.94
|
0.81-1.08
|
0.52
|
Weighted median
|
3
|
0.91
|
0.82-1.01
|
0.09
|
Inverse variance weighted
|
3
|
0.91
|
0.88-0.95
|
0.00
|
Simple mode
|
3
|
0.80
|
0.62-1.03
|
0.22
|
Weighted mode
|
3
|
0.92
|
0.83-1.02
|
0.25
|
eqtl-a-ENSG00000102145
|
LUAD
|
GATA1
|
MR Egger
|
4
|
1.09
|
0.35-3.41
|
0.90
|
Weighted median
|
4
|
1.10
|
0.79-1.53
|
0.59
|
Inverse variance weighted
|
4
|
1.12
|
1.01-1.24
|
0.03
|
Simple mode
|
4
|
1.09
|
0.71-1.66
|
0.73
|
Weighted mode
|
4
|
1.09
|
0.73-1.62
|
0.72
|
eqtl-a-ENSG00000148218
|
LUAD
|
ALAD
|
MR Egger
|
4
|
1.17
|
0.83-1.65
|
0.46
|
Weighted median
|
4
|
1.16
|
0.91-1.47
|
0.24
|
Inverse variance weighted
|
4
|
1.15
|
1.01-1.33
|
0.04
|
Simple mode
|
4
|
0.99
|
0.67-1.46
|
0.95
|
Weighted mode
|
4
|
1.18
|
0.90-1.53
|
0.31
|
eqtl-a-ENSG00000150593
|
LUAD
|
PDCD4
|
MR Egger
|
4
|
1.09
|
0.75-1.58
|
0.69
|
Weighted median
|
4
|
1.12
|
0.93-1.35
|
0.21
|
Inverse variance weighted
|
4
|
1.12
|
1.03-1.23
|
0.01
|
Simple mode
|
4
|
1.21
|
0.92-1.58
|
0.26
|
Weighted mode
|
4
|
1.13
|
0.93-1.37
|
0.31
|
eqtl-a-ENSG00000165795
|
LUAD
|
NDRG2
|
MR Egger
|
6
|
0.91
|
0.70-1.17
|
0.50
|
Weighted median
|
6
|
0.91
|
0.75-1.10
|
0.33
|
Inverse variance weighted
|
6
|
0.92
|
0.86-0.98
|
0.01
|
Simple mode
|
6
|
0.89
|
0.65-1.20
|
0.47
|
Weighted mode
|
6
|
0.90
|
0.74-1.11
|
0.37
|
Abbreviations: OR: Odds ratio; 95%CI: Confidence interval.
In this study, there was no heterogeneity for TFAM (Q_pval = 1.00), ALOX15 (Q_pval = 0.88), GATA1 (Q_pval = 0.94), ALAD (Q_pval = 0.79), PDCD4 (Q_pval = 0.85), and NDRG2 (Q_pval = 0.99) as exposure factors, respectively (Table 4). Horizontal pleiotropy showed that no horizontal pleiotropy between 6 exposure factors and LUAD (P > 0.05, Table 5). Furthermore, there were no points of serious bias in the results of LOO analysis, indicating that the results were reliable (Fig. S8). So 6 exposure factors were causally associated with the occurrence of LUAD.
Table 4 The heterogeneity for causal relationship of six model genes and LUAD
Exposure
|
Outcome
|
Gene name
|
Method
|
Q
|
Q- Pvalue
|
eqtl-a-ENSG00000108064
|
LUAD
|
TFAM
|
MR Egger
|
0.01
|
0.93
|
Inverse variance weighted
|
0.01
|
1.00
|
eqtl-a-ENSG00000161905
|
LUAD
|
ALOX15
|
MR Egger
|
0.00
|
0.99
|
Inverse variance weighted
|
0.26
|
0.88
|
eqtl-a-ENSG00000102145
|
LUAD
|
GATA1
|
MR Egger
|
0.41
|
0.81
|
Inverse variance weighted
|
0.41
|
0.94
|
eqtl-a-ENSG00000148218
|
LUAD
|
ALAD
|
MR Egger
|
1.05
|
0.59
|
Inverse variance weighted
|
1.07
|
0.79
|
eqtl-a-ENSG00000150593
|
LUAD
|
PDCD4
|
MR Egger
|
0.78
|
0.68
|
Inverse variance weighted
|
0.81
|
0.85
|
eqtl-a-ENSG00000165795
|
LUAD
|
NDRG2
|
MR Egger
|
0.64
|
0.96
|
Inverse variance weighted
|
0.65
|
0.99
|
Table 5 Horizontal pleiotropy for causal relationship of six model genes and LUAD
Exposure
|
Outcome
|
Gene name
|
egger_intercept
|
P-value
|
eqtl-a-ENSG00000108064
|
LUAD
|
TFAM
|
0.00
|
0.99
|
eqtl-a-ENSG00000161905
|
LUAD
|
ALOX15
|
-0.01
|
0.70
|
eqtl-a-ENSG00000102145
|
LUAD
|
GATA1
|
0.00
|
0.96
|
eqtl-a-ENSG00000148218
|
LUAD
|
ALAD
|
0.00
|
0.91
|
eqtl-a-ENSG00000150593
|
LUAD
|
PDCD4
|
0.01
|
0.88
|
eqtl-a-ENSG00000165795
|
LUAD
|
NDRG2
|
0.00
|
0.92
|
3.7 Model genes were mainly enriched in NK cells, T cells, Endothelial cell clusters, etc.
After quality control (QC) of single-cell data GSE131907, a total of 71,451 cells and 29,634 genes from 21 samples were screened (Fig. 4a). Subsequently, 2000 highly variable genes (Fig. 4b) and 20 PCs (dims = 20) were utilized for further analyses (Fig. 4c-d). And the Sankey diagrams of cell flow at different resolutions and t-SNE graph showed that there were 11 clusters in LUAD and Control samples (Fig. 4e-f). According to marker genes in 11 cell clusters, 8 cell types were identified, namely MAST cell, B cell, Endothelial cell, Fibroblast, NK cell, T cell, Epithelial cell and Macrophages (Fig. 4g-h).
Expression differences of the six model genes were analyzed between the Control and LUAD cohorts. Notably, there was a significant variation observed in T cells in all the model genes (Fig. 5). Furthermore, the model genes’ expression distribution indicated that GATA1 was predominantly highly expressed in MAST cells, PDCD4 was mainly highly expressed in NK cells and T cells, ALAD was primarily highly expressed in Epithelial cells, and TFAM, ALOX15, and NDRG2 were expressed in all 8 cell types (Fig. 6). In addition, model genes were predicted to be enriched in cell clusters such as NK cells, T cells, and endothelial cells according to AUCell scores (Fig. S9).