3.1 Single-cell analysis of differentially expressed genes in tumor cells before and after neoadjuvant therapy
Firstly, we selected tumor cells before and after neoadjuvant chemotherapy in patients with lung adenocarcinoma (LUAD) and esophageal squamous carcinoma (ESCC) treated with cisplatin-containing neoadjuvant chemotherapy, respectively, and performed differential analysis. These differential genes were intersected to obtain 485 genes expressed differentially in tumor cells before and after neoadjuvant treatment for LUAD and ESCC. The differential analysis of GO functional enrichment of these 485 genes revealed that they mainly focused on cell adhesion molecule binding, enzyme inhibitor activity, cadherin binding, and KEGG signaling pathway enrichment, which mainly focused on metabolites, degenerative changes, carcinogenesis, etc. KEGG signaling pathway enrichment revealed that they mainly focus on metabolites, degenerative changes, carcinogenesis, etc.
3.2 Combined analysis of neoadjuvant chemotherapy differential genes and cisplatin resistance-associated genes
In our previous study, we performed a multi-omics analysis of cisplatin-resistant and sensitive cell lines, in which we obtained a relevant differential mRNA set. We intersected this mRNA set and the 485 differential genes obtained previously and reached the result of 64 critical genes associated with neoadjuvant chemotherapy efficacy. These 64 genes were eliminated one by one. Finally, 12 genes with consistent expression trends in both LUAD, ESCC residual tumor cells after neoadjuvant chemotherapy and cisplatin-resistant cell lines were found (Fig. 2) (Supplement Table 1).
3.3 Validation of Neoadjuvant Chemotherapy Score (NCS) in LUAD, ESCC single-cell sequencing samples before and after NACT as well as CCLE database
We reintroduced the 12 genes, CAV2, PHLDA1, DUSP23, VDAC3, DSG2, SPINT2, SPATS2L, IGFBP3, CD9, ALCAM, PRSS23, PERP, constructed as Neoadjuvant Chemotherapy Score into the single cell sequencing data of LUAD and ESCC patients before and after neoadjuvant therapy as well as the CCLE database. It was not difficult to find that the NCS of tumor cells before neoadjuvant therapy was significantly higher than that of tumor cells after neoadjuvant therapy in LUAD and ESCC (Fig. 3A-F). The NCS in the low IC50 group, which is the cisplatin-sensitive group, was higher than that in the high IC50 group in the CCLE database (Fig. 3G).
3.4 Application of 12 genes and NCS in LUAD and ESCC single-cell sequencing samples
Analyses were then performed on single-cell sequencing samples obtained from LUAD patients. Tumor cells were selected to detect the expression of the above 12 genes, and it was not difficult to find that CAV2, PHLDA1, and VDAC3 were generally highly expressed in tumor cells. (Supplement Fig. 1). NCS was then constructed using the AddModuleScore function in R to show the overall expression of the 12 genes in different cell subgroups in the overall LUAD, ESCC and the corresponding paraneoplastic tissue samples (Fig. 4A, 4E). The samples were also classified by tumor and normal tissue. It was not difficult to find that the Neoadjuvant Chemotherapy Score, compared to the expression of the 12 genes, was significantly higher in tumor tissue than in normal tissue (Fig. 4B, 4D, 4F, 4H). Further, in tumor cells, we applied the Neoadjuvant Chemotherapy Score. The tumor cells were divided into 2 groups according to the mean values to further investigate their differences (Fig. 4C, 4G).
3.5 Single-cell analysis of differentially expressed genes in tumor cells grouped by NCS
Then we applied the NCS on single-cell sequencing samples obtained from LUAD and ESCC patients and divided tumor cells into two groups. In LUAD, we then performed differential analysis between these two groups and obtained a total of 796 differentially expressed genes (p < 0.001), of which 119 had significant fold change (|logFC| > 0.5). 77 of these 119 genes were significantly up-regulated in the low NCS group, including VDAC3, CAV3, PHLDA1, CD9, etc., and 42 were significantly downregulated, including LRRK2, SPINK13, TANC2, etc. (Fig. 5A). The GO functional enrichment analysis of these 119 significantly differentially expressed genes revealed that they were mainly concentrated in cadherin binding, actin binding, peptidase regulator activity, endopeptidase inhibitor activity, etc. (Fig. 5D), and the KEGG signaling pathway enrichment revealed that they were mainly concentrated in fluid shear stress and atherosclerosis, proteoglycans in cancer, transcriptional misregulation in cancer, etc. (Fig. 5F). We then performed further differential analysis by GSVA at the overall level of gene function and signaling pathways. We found that seleno amino acid metabolism, nonsense mediated decay nmd, ribosome, response of EIF2AK4 GCN2 to amino acid deficiency were significantly up-regulated in the high NCS group. In contrast, ATF2 targets, response to oxidized phospholipids, and response to methotrexate were significantly downregulated (Fig. 5C).
Similarly, in ESCC, we obtained 1676 differential genes, 65 significantly up-regulated by KRT17, NFKBIA, IER3 and 14 significantly down-regulated by KRT13, KRT15, SPRR3, etc. (Fig. 5B). Different GO relied on receptor-ligand activity, extracellular matrix, etc. (Fig. 5E). The KEGG signaling pathways were enriched in Cytokine-cytokine receptor interaction, Viral protein interaction with cytokine and cytokine receptor, IL-17 signaling pathway, etc. (Fig. 5G). GSVA found that Tumor microenvironment, Epithelial cells, as well as Barrett's esophagus and esophagus cancer were significantly up-regulated in the high NCS group, while TP73 and TP63 targets, Response to metal ions, Response to THC were significantly downregulated (Fig. 5H)
And then, we performed LASSO regression analysis on the expression data of the intersection of two sets of differential genes and obtained the coefficients and binomial deviation curves in LASSO regression with log(λ) (Fig. 5I, 5J). The predictive effects of the models of the obtained characteristic variables were different for different λ taking values. The 22 characteristic variables, CD24, D9, DUSP23, KRT19, LRRK2, MAL2, PERP, SPINT2, TACSTD2, TSPAN13, C8orf4, FMO5, CAPN8, MDK, SLC20A1, SPINK13, TANC2, CLDN3, RGMB, S100P, MUC4, and C19orf33 that minimized the mean squared deviation achieved by the whole model were selected to construct the prediction model. We then formed a generalized linear equation by glm function for these 22 genes. 17 indicators were obtained to be enrolled and a model: score = (-0.17) *CD24 + (-0.91) *CD9 +(-0.76) * DUSP23 + (-1.43) *KRT19 + 0.11*LRRK2 + (-0.58) *MAL2 + (-0.87) *PERP + (-1.02) *SPINT2 + (-0.24) *PHLDA1 + 0.15 *C8orf4 + (-0.09) *CAV2 + 0.19 *SLC20A1 + 0.24 *SPINK13 + 0.1*TANC2 + 0.37*CLDN3 + 0.08 *RGMB + (-0.07) *VDAC3, was constructed to predict the high and low NCS (Fig. 5K).
3.6 Survival analysis of 12 genes among LUAD patients with chemotherapy records
Based on the TCGA and GEO datasets, we first performed a K-M survival analysis for each gene in LUAD patients with chemotherapy records. As described in Supplement Fig. 1, these genes, CAV2, PHLDA1, ALCAM, CD9, IGBP3, and VDAC3, were significantly associated with prognosis (p < 0.05). Except for ALCAM, the prognosis of patients with high expression of the other five genes was worse. The most significant association with prognosis was observed for CAV2, PHLDA1 and VDAC3.
We further wanted to validate and explore the probable mechanism in all LUAD patients. We then used these 12 genes for clustering analysis in LUAD patients in the TCGA database. We found that at k = 2 these genes could clearly classify patients into two subtypes (Fig. 6A-D). There was a significant survival difference between these two subtypes (Fig. 6E), so we could tentatively conclude that in LUAD, by the expression of these 12 genes, we could get a platinum-containing adjuvant chemotherapy insensitive subtype, suggesting that patients in this subtype may not benefit from neoadjuvant or adjuvant chemotherapy.
Furthermore, cox and nomogram analyses were performed based on these 12 genes. Univariate analysis (Fig. 6F) revealed that ALCAM, CAV2, VDAC3, PHLDA1, DUSP23, and SPATS2L were significant predictors of the prognosis of LUAD patients, Multivariate Cox proportional hazard analysis (Fig. 6G) demonstrated ALCAM, CAV2, CD9, VDAC3, PHLDA1 were independent prognostic factors for survival in LUAD patients. A nomogram relating to 5 independent risk factors (ALCAM, CAV2, CD9, VDAC3, PHLDA1), which were concluded from MVA. The Points could calculate 1-year, 3-year and 5-year overall survival (OS) at the model's top (Fig. 6H). The internal evaluation was performed (Fig. 6I) with the same TCGA database, the C-index of which was 0.652 (0.603-1). In general, LUAD patients with a lower expression of CAV2, VDAC3, and PHLDA1 had longer predicted survival time.
3.7 Expression of PHLDA1, CAV2, VDAC3 is associated with the sensitivity of cisplatin in A549, PC9 and TE1 Cells.
Next, we validated whether the three genes most relevant to prognosis were related to cisplatin resistance at the level of in vitro experiments. We selected two LUAD cell lines, A549 and PC9, and one ESCC cell line TE1 for the experiments. First, PHLDA1, CAV2, VDAC3 and control (NC) siRNAs were transfected in the 3 cell lines to knock down the expression of these three genes (Fig. 7A-C), respectively. We then performed CCK8 and EdU assays on knockdown PHLDA1, CAV2 and VDAC3 (selected siRNA sequences with the highest knockdown efficiency and compared with knockdown NC cells) to detect changes in their sensitivity to cisplatin, respectively. As shown in Fig. 7D and 7E, the sensitivity of A549, PC9 and TE1 cells to cisplatin increased after the knockdown of these three genes.