3.1. LND is a critical factor in determining the prognosis of ESCC
In total 8716 eligible esophageal cancer patients were analyzed. The value of consecutive lymph nodes was determined and stratified using R code. As list in Table 1, the average ages at the patients alive or dead were 62.21 ± 9.46 and 64.15 ± 10.02, respectively. Most of the patients in the present cohort were married, white, male, received surgery, radiation therapy and chemotherapy. In the LND (Stratified) < 0.12 (69.0%) subgroup, nearly half of patients were alive (2701, 45.2%), whereas in the LND (Stratified) ≥ 0.12 (31.0%) subgroup, majority of patients were dead (2279, 83.2%). The selection strategy for eligible patients was shown in Fig. 1A.
Table 1
Variables | Alive | Dead | p-value |
Total | 3162(36.3%) | 5554(63.7%) | |
Age | 62.21 ± 9.46 | 64.15 ± 10.02 | |
Sex | | | |
Male | 2597(35.9%) | 4631(64.1%) | 0.136 |
Female | 565(38.0%) | 923(62%) | |
Marital status | | | |
Married | 2150(37.4%) | 3606(62.6%) | < 0.001 |
Unmarried | 455(37.6%) | 756(62.4%) | |
SWD | 444(30.3%) | 1020(69.7%) | |
Unknown | 113(39.6%) | 172(60.4%) | |
Race | | | |
White | 2854(36.6%) | 4951(63.4%) | < 0.001 |
Black | 134(28.4%) | 338(71.6%) | |
Other | 160(37.9%) | 262(62.1%) | |
Unknown | 14(82.4%) | 3(17.6%) | |
First malignant primary indicator | | | |
Yes | 2718(37.0%) | 4622(63.0%) | 0.001 |
No | 444(32.3%) | 93267.7%) | |
LND | | | |
< 0.12 | 2701(45.2%) | 3275(54.8%) | < 0.001 |
≥ 0.12 | 461(16.8%) | 2279(83.2%) | |
Surgery | | | |
Yes | 3084(37.9%) | 5045(62.1%) | < 0.001 |
No/Unknown | 78(13.3%) | 509(86.7%) | |
Radiation | | | |
Yes | 1809(34.8%) | 3388(65.2%) | 0.001 |
No/Unknown | 1353(38.4%) | 2166(61.6%) | |
Chemotherapy | | | |
Yes | 1990(34.9%) | 3719(65.1%) | < 0.001 |
No/Unknown | 1172(39.0%) | 1835(61.0%) | |
AJCC | | | |
I | 1203(56.8%) | 914(43.2%) | < 0.001 |
II | 1166(37.6%) | 1935(62.4%) | |
III | 653(23.9%) | 2078(76.1%) | |
IV | 140(18.3%) | 627(81.7%) | |
N | | | |
N0 | 2204(47,7%) | 2415(52.3%) | < 0.001 |
N1 | 753(20.4) | 2940(79.6%) | |
N2 | 161(55.9%) | 127(44.1%) | |
N3 | 44(37.9%) | 72(62.1%) | |
According to the univariate cox analysis, we found that age, marital status, race, sex, surgery, radiation, chemotherapy, LND consequent and AJCC stage had significant impact on the survival of esophageal cancer patients. We further included all of these variables in the multivariate cox regression. We found the age of diagnosis was a significant variable in mutivariate analysis, age at diagnosis (hazard ratio [HR] = 1.020, 95% confidence interval [CI] = 1.017–1.023, p < 0.001). Thus, age might be an important factor affecting the esophageal cancer patients' survival. Indeed, marital status (unmarried vs married: HR = 1.141, 95%CI = 1.053–1.238, p = 0.001; separated, widowed or divorced [SDW] vs married: HR = 1.161, 95%CI = 1.081–1.246, p < 0.001), race (black vs white, HR = 1.278, 95%CI = 1.142–1.430, p < 0.001), first malignant primary indicator (no vs yes: HR = 1.150, 95%CI = 1.069–1.237, p < 0.001), surgery (no/unknown vs yes: HR = 1.696, 95%CI = 1.531–1.879, p < 0.001), chemotherapy (no/unknown vs yes: HR = 1.345, 95%CI = 1.257–1.439, p < 0.001), AJCC stage (II vs I: HR = 2.164, 95%CI = 1.983–2.362, p < 0.001; III vs I: HR = 3.366, 95%CI = 3.039–3.727, p < 0.001; IV vs I: HR = 3.779, 95%CI = 3.332–4.286, p < 0.001) were significantly correlated with the risk of esophageal cancer. R software was used to determine the following optimal cutoffs for the LND stratified: <0.12, ≥ 0.12 (Fig. 1B). Importantly, LND stratified was also related to an increased risk of esophageal cancer (LND ≥ 0.12 vs LND < 0.12: HR = 1.629, 95%CI = 1.521–1.745, p < 0.001)(Table 2).
Table 2
Univariate and multivariate cox regression
Variables | Univariate analysis | Multivariate analysis |
HR | 95%CI | p-value | HR | 95%CI | p-value |
Age at diagnosis | 1.017 | 1.014–1.020 | < 0.001 | 1.020 | 1.017–1.023 | < 0.001 |
Marital status |
Married | Reference | Reference |
Unmarried | 1.061 | 0.981–1.148 | 0.137 | 1.141 | 1.053–1.238 | 0.001 |
SWD | 1.222 | 1.140–1.310 | < 0.001 | 1.161 | 1.081–1.246 | < 0.001 |
Unknown | 1.033 | 0.886–1.203 | 0.680 | 1.141 | 0.979–1.330 | 0.092 |
Race |
White | Reference | Reference |
Black | 1.292 | 1.157–1.442 | < 0.001 | 1.278 | 1.142–1.430 | < 0.001 |
Other | 1.004 | 0.887–1.137 | 0.948 | 0.948 | 0.836–1.074 | 0.378 |
Unknown | 0.216 | 0.070–0.671 | 0.008 | 0.221 | 0.071–0.686 | 0.009 |
Sex |
Male | Reference | Reference |
Female | 0.970 | 0.904–1.041 | 0.398 | | ... | |
First malignant primary indicator | | | | | | |
Yes | Reference | | | Reference | | |
No | 1.223 | 1.140–1.313 | < 0.001 | 1.150 | 1.069–1.237 | < 0.001 |
Surgery |
Yes | Reference | Reference |
No/Unknown | 2.840 | 2.590–3.114 | < 0.001 | 1.696 | 1.531–1.879 | < 0.001 |
Radiation |
Yes | Reference | Reference |
No/Unknown | 0.775 | 0.734–0.819 | < 0.001 | | ... | |
Chemotherapy |
Yes | Reference | Reference |
No/Unknown | 0.762 | 0.720–0.806 | < 0.001 | 1.345 | 1.257–1.439 | < 0.001 |
LND (Stratified) |
< 0.12 | Reference | Reference |
≥ 0.12 | 2.648 | 2.507–2.797 | < 0.001 | 1.629 | 1.521–1.745 | < 0.001 |
AJCC Stage |
I | Reference | Reference |
II | 1.982 | 1.831–2.144 | < 0.001 | 2.164 | 1.983–2.362 | < 0.001 |
III | 3.541 | 3.271–3.833 | < 0.001 | 3.366 | 3.039–3.727 | < 0.001 |
IV | 4.502 | 4.062–4.989 | < 0.001 | 3.779 | 3.332–4.286 | < 0.001 |
We further stratified the esophageal cancer patients according to LND stratification. We first analyzed all of the variables both in the univariate cox regression analysis and multivariate cox regression. As listed in (Table 3).
Table 3
Multivariate cox regression analysis in the LND Stratified
Variables | LND (Stratified) < 0.12 | LND (Stratified) ≥ 0.12 |
HR | 95%CI | p-value | HR | 95%CI | p-value |
Age at diagnosis | 1.025 | 1.021–1.029 | < 0.001 | 1.012 | 1.008–1.016 | < 0.001 |
Marital status |
Married | Reference | Reference |
Unmarried | 1.098 | 0.986–1.222 | 0.087 | 1.169 | 1.034–1.322 | 0.013 |
SWD | 1.126 | 1.027–1.236 | 0.012 | 1.179 | 1.056–1.317 | 0.003 |
Unknown | 1.148 | 0.949–1.389 | 0.156 | 1.181 | 0.912–1.530 | 0.207 |
Race |
White | Reference | Reference |
Black | 1.423 | 1.233–1.643 | < 0.001 | | | |
Other | 1.084 | 0.927–1.268 | 0.312 | | ... | |
Unknown | 0.228 | 0.057–0.911 | 0.037 | | | |
Sex |
Male | Reference | Reference |
Female | | ... | | | ... | |
First malignant primary indicator | | | | | | |
Yes | | Reference | | | Reference | |
No | 1.278 | 1.166–1.401 | < 0.001 | | ... | |
Surgery |
Yes | Reference | Reference |
No/Unknown | 1.747 | 1.412–2.163 | < 0.001 | 1.669 | 1.484–1.877 | < 0.001 |
Radiation |
Yes | Reference | Reference |
No/Unknown | 0.787 | 0.695–0.892 | < 0.001 | | ... | |
Chemotherapy |
Yes | Reference | Reference |
No/Unknown | 1.299 | 1.141–1.479 | < 0.001 | 1.821 | 1.599–2.074 | < 0.001 |
AJCC Stage |
I | Reference | |
II | 1.931 | 1.758–2.120 | < 0.001 | | Reference | |
III | 3.129 | 2.789–3.510 | < 0.001 | 1.497 | 1.332–1.682 | < 0.001 |
IV | 2.985 | 2.471–3.605 | < 0.001 | 1.766 | 1.533–2.035 | < 0.001 |
In the LND < 0.12 cohort, age at diagnosis (HR = 1.025, 95%CI = 1.021–1.029, p < 0.001), marital status (unmarried vs married was no significant difference whereas widowed or divorced [SDW] vs married: HR = 1.126, 95%CI = 1.027–1.236, p = 0.012), race (black vs white, HR = 1.423, 95%CI = 1.233–1.643, p < 0.001), first malignant primary indicator (no vs yes: HR = 1.278, 95%CI = 1.166–1.401, p < 0.001), surgery (no/unknown vs yes: HR = 1.747, 95%CI = 1.412–2.163, p < 0.001), chemotherapy (no/unknown vs yes: HR = 1.299, 95%CI = 1.141–1.479, p < 0.001) and AJCC stage (II vs I: HR = 1.931, 95%CI = 1.758–2.120, p < 0.001; III vs I: HR = 3.129, 95%CI = 2.789–3.510, p < 0.001; IV vs I: HR = 2.985, 95%CI = 2.471–3.602, p < 0.001) were significantly correlated with the risk of esophageal cancer.
In the LND ≥ 0.12 cohort, age at diagnosis (HR = 1.012, 95%CI = 1.008–1.016, p < 0.001), marital status (unmarried vs married: HR = 1.169, 95%CI = 1.034–1.322, p = 0.013; separated, widowed or divorced [SDW] vs married: HR = 1.179, 95%CI = 1.056–1.317, p = 0.003), surgery (no/unknown vs yes: HR = 1.669, 95%CI = 1.484–1.877, p < 0.001), chemotherapy (no/unknown vs yes: HR = 1.821, 95%CI = 1.599–2.074, p < 0.001) and AJCC stage (III vs II: HR = 1.497, 95%CI = 1.322–1.682, p < 0.001; IV vs II: HR = 1.766, 95%CI = 1.533–2.035, p < 0.001) were significantly increased the risk of esophageal cancer. Patients in LND < 0.12 subgroup were in a lower burden of deaths and hazard ratio than in LND ≥ 0.12 subgroup deaths and hazard ratio during the study period (Fig. 1C).
3.2. Identification of Biomarkers of Lymph Node Metastasis in Esophageal Squamous Cell Carcinoma
In this study, we conducted a comprehensive analysis using advanced computational methods to examine the key roles and prognostic significance of genes between patient in N0 or NX (lymph node metastasis). We obtained ESCC datasets from TCGA, comprising 77 ESCC tumor samples. Data processing and identification of differentially expressed genes (DEGs) were performed using R software packages. Out of 16,190 genes analyzed, 382 genes met the screening criteria of this study (P < 0.05, qValue < 0.2, |log2FC>|1.0), including 168 upregulated genes and 214 downregulated genes.. (Fig. 2A). To investigate the function of the identified genes, we performed functional enrichment analysis on the differentially expressed genes (DEGs). The results indicated that pathways related to glutathione metabolism were enriched in both Gene Ontology Biological Processes (GO BP) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis (Fig. 2B and 2C). Following that, we developed a genetic risk score model for predicting prognosis in lymph node metastasis samples of esophageal squamous cell carcinoma (Fig. 2D). This model was constructed using 26 candidate hub genes associated with prognosis, identified through a unique stepwise Cox regression analysis. We performed further analysis using multiple stepwise Cox regression to assess their impact on patient survival status, seven of these hub genes were found to be independent predictors of lymph node metastasis in ESCC (Fig. 2E). The survival analysis was performed to assess the predictive ability of the prognostic model, which indicated that patients in the high-risk subgroup exhibited significantly poorer overall survival (OS) compared to those in the low-risk subgroup (Fig. 2F). Subsequently, a total of 77 LUAD patients were divided into train or test groups, each group was stratified into low-risk and high-risk subgroups based on the median risk score. Consistent with previous findings, the high-risk subgroup demonstrated significantly reduced OS relative to the low-risk subgroup (Fig. 2G and 2J). To further validate the prognostic efficacy of the seven biomarkers, a time-dependent ROC analysis was conducted, revealing moderate diagnostic performance (Fig. 2H and 2K). An expression heatmap and survival status plot of patients, incorporating seven genes, were generated for both subgroups (Fig. 2I and 2L). These findings underscore that the prognostic model possesses robust sensitivity and specificity.
3.3. The transcription factors CTCFL and AHR collaboratively regulate Glutathione S-Transferase (GSTs)-mediated lymph node density.
To investigate the regulatory mechanisms underlying lymph node metastasis in esophageal cancer, we delved into the downstream pathways associated with this process. Among the seven identified hub genes, CTCFL and AHR were notable as transcription factors, implying that lymph node metastasis in esophageal squamous cell carcinoma patients may be driven by transcriptional regulatory mechanisms. (Fig. 3A). Notably, CTCFL and AHR, known for their elevated expression across various tumor types (Fig. 3B). Besides, analysis demonstrated a significant positive correlation in patients with esophageal squamous cell carcinoma (ESCC) (Fig. 3C). Given the consistent expression and functional patterns of CTCFL and AHR, we endeavored to elucidate their intrinsic relationship. To investigate potential interactions among these two transcription factors promoting lymph node density (LND), we employed a protein-protein interaction docking prediction model merging the three-dimensional structures of Alpha Fold2 and ZDOCK docking. Our findings revealed a stable interaction between CTCFL and AHR, with a binding free energy of 32.8 kcal/mol. This stability is facilitated by five hydrogen bond connections and one halogen bond connection between the two proteins (Fig. 3D).
Following that, we conducted Co-IP experiments to validate the interaction between CTCFL and AHR (Fig. 3E). Subsequently, immunofluorescence staining confirmed the colocalization of CTCFL and AHR, a finding further supported by overexpression using distinct tags (Fig. 3F). It appears that these two factors interact to execute specific biological functions. To delve into the regulatory role of their interaction in the glutathione metabolism pathway, we considered the close relationship between esophageal squamous cell carcinoma's lymph node density and GSTs expression.
Hence, we conducted CUT&Tag sequencing analysis on CTCFL, AHR, and their combined form to investigate their role in activating downstream genes within this pathway. The data revealed that CTCFL or AHR individually can bind to the transcription initiation regions of GSTA1, GSTA2, GSTM3, and GSTM4 (Fig. 3G). Surprisingly, upon interaction between CTCFL and AHR, there was a notable enhancement in their binding to the promoter regions of GSTs (Fig. 3G).
3.4. CTCFL and AHR condensates enrich the transcription apparatus to promote GSTs transactivation.
It's notable that nuclei in ESCC cells displayed a droplet-like distribution of co-localized CTCFL and AHR staining, suggesting the self-aggregation capability of these proteins. To delve into the structural basis of CTCFL and AHR condensates, we employed IUPred2 and PONDR to analyze their amino acid sequences. The results uncovered a classic intrinsically disordered region (IDR) from amino acids 37–252 within CTCFL protein, but the IDRs of AHR is not obvious (Fig. 4A and 4B). To validate the formation of CTCFL and AHR condensates, we conducted overexpression and purification of EGFP-tagged CTCFL and mCherry-tagged AHR proteins in HEK293T cells. Fluorescence phase contrast microscopy revealed that at 37°C, CTCFL exhibited protein concentration-dependent condensation-forming ability, whereas AHR did not. However, the fusion proteins of CTCFL and AHR exhibited droplet-forming ability (Fig. 4C).
To assess whether CTCFL or AHR demonstrate liquid-like properties within cells, we conducted in vivo experiments using the fluorescence recovery after photobleaching (FRAP) assay. The results indicated rapid recovery of CTCFL and the fusion proteins of CTCFL and AHR in solution, while AHR protein alone did not exhibit this ability. This suggests the significance of the intrinsically disordered region (IDR) domain for CTCFL's phase separation potential. Moreover, the fusion of CTCFL with AHR enabled the latter to recover after bleaching, emphasizing the collaborative role in this process (Fig. 4D). The findings above indicate that CTCFL and AHR interact through phase separation, with AHR's phase-separation ability contingent on CTCFL. Additionally, the fusion protein of CTCFL and AHR notably enhances their capacity to transcribe downstream genes.
We next employed siRNAs to downregulate CTCFL, AHR, and both of two TFs in KYSE150 esophageal squamous cell carcinoma cells. The combined Wound Healing and Transwell experiments demonstrated that CTCFL along with AHR collectively governs the migration capability of esophageal cancer cells (Fig. 4E and 4F).
3.5. GSTs impact LND in esophageal squamous cell carcinoma by controlling Reactive Oxygen Species (ROS) levels.
The results above indicate that CTCFL undergoes liquid-liquid phase separation, leading to the formation of droplets with AHR and the aggregation into a CTCFL-AHR complex. AHR likely functions as a transcriptional regulator of CTCFL, enhancing its transcriptional activity. For instance, AHR facilitates CTCFL in transcribing GSTs (Fig. 5A).
To validate the regulatory role of CTCFL and AHR in the transcription of GSTs, we employed siRNA to knock down CTCFL, AHR, and the combination of CTCFL with AHR in KYSE150 cells. The results revealed that reducing CTCFL expression led to decreased expression levels of GSTA1, GSTA2, GSTM3, and GSTM4 within the GST family, resulting in varying degrees of transcriptional inhibition. Knocking down AHR individually showed minimal impact. However, when both transcription factors were downregulated, there was a dramatically inhibiting in GST expression levels (Fig. 5B). The protein expression levels of GSTs corresponded with the observed transcriptional trends (Fig. 5C). To delve into the mechanism by which GSTA1, GSTA2, GSTM3, and GSTM4 regulate lymph node density (LND), we constructed a protein-protein interaction (PPI) network using Cytoscape software. This network integrated 1065 nodes sourced from the STRING database. Leveraging the MODE tool, we analyzed the co-expression network to identify potential key modules. The primary module obtained genes in pivotal subgroups enriched in processes related to glutathione metabolism and transferase pathways (Fig. 5D), Suggesting interactions between GSTA1, GSTA2, GSTM3, and GSTM4. To validate their direct interactions, we investigated the relationships between these four transsulfurases in KYSE150 cells. GSTA1-GSTM3 and GSTA2-GSTM4 exhibited co-localization with each other (Fig. 5E and 5F). Indeed, these proteins demonstrated potential co-localization and played essential roles in the transsulfurization pathway of glutathione synthesis.
Based on previous bioinformatics analyses, glutathione metabolism was notably enriched in the high LND group. GSTs serve as pivotal roles in this metabolic pathway. The levels of glutathione (GSH) and its biosynthetic capacity are influenced by the enzymatic activity of GSTs, consequently impacting tumor malignancy through the regulation of ROS levels.24–26 Therefore, we assessed glutathione (GSH) levels (Fig. 5G) and reactive oxygen species (ROS) levels (Fig. 5H) in KYSE 150 cells by overexpressing GSTA1, GSTA2, GSTM3, and GSTM4 in AHR, CTCFL, AHR combined with CTCFL knockdown cell lines, respectively. The findings revealed that knockdown of AHR and CTCFL significantly reduced GSH levels while increasing ROS levels. Supplementing any of these four enzymes individually could partially restore GSH content and reduce ROS levels. However, supplementing a single type of GST alone was insufficient to normalize ROS levels, suggesting a comprehensive regulation of ROS levels by GSTs.
These results highlight the role of CTCFL-AHR in regulating GST expression in esophageal cancer cells and mediating ROS changes, which in turn influence the lymph node density of ESCC.
3.6. CTCFL and AHR collaboratively activate GSTA1 and GSTA2, promoting LND of ESCC in mice.
To confirm lymph node density (LND) in esophageal cancer relies on CTCFL, AHR and GSTs in vivo, we established a spontaneous ESCC model in p53-/-, ED-L2cre mice and generated organoids from mouse esophageal tumor tissues (Fig. 6A and 6B). These were then compared with organoids from low-LND mice (Fig. 6C). Remarkably, the organoids from high-LND mice exhibited clear growth advantages (Fig. 6D), accompanied by a significant upregulation in protein expression levels of CTCFL, AHR and GSTs (Fig. 6E). These indicate that the transcriptional factors CTCFL and AHR regulate GSTs expression, thereby influencing the lymph node density in ESCC.