Differential expressed homeobox-containing genes in lung adenocarcinoma
We gathered 238 human protein-coding homeobox-containing genes to investigate expression patterns in five different lung adenocarcinoma (LUAD) clinical sample datasets: TCGA-LUAD (N = 59, T = 517), GSE18842 (N = 45, T = 46), GSE19188 (N = 65, T = 91), GSE32863 (N = 58, T = 58), GSE40419 (N = 77, T = 87) (N: Normal; T: Tumor) (Figure 1A-E). Following intersection of the differentially expressed genes (DEGs) from each dataset (log2FC > 0.4 or < -0.4, adjusted P.Value < 0.05), seven significantly up-regulated candidates were found in tumor samples including PITX1(Paired-like homeodomain) 1, SIX1/4 (Sine oculis homeobox), BARX1 (Homeobox protein BarH-like), HOXA10(Homeobox protein Hox-A), HOXB7(Homeobox protein Hox-B) and SATB2(Special AT-rich sequence-binding protein) (Figure 1F). Seven others were significantly down-regulated including IRX2 (Iroquois homeobox), PKNOX2 (PBX/knotted 1 homeobox), MEIS1/2 (Meis homeobox), HLX (H2.0-like homeobox), HHEX (Hematopoietically-expressed homeobox protein) and HOXA5 (Figure 1B). According to the classification of homeobox genes, PITX1 belongs to the PRD (Paired Domain) class, SIX1/4 to the SINE class, SATB2 to the CUT class, BARX1, HHEX, HLX1, HOXA5, HOXA10, HOXB7 to the ANTP class, IRX2, PKNOX2, and MEIS1/2 to the TALE (Three Amino Acid Loop Extension) class. Figure 1F depicted the log2FC heatmap of 14 differentially expressed homeobox genes in these datasets.
Prognostic significance of homeobox-containing gene in LUAD
The univariate Cox proportional hazards regression analysis was then used to examine the relationship between the gene expression value of these DE homeobox genes and the Overall Survival (OS) in 515 TCGA LUAD patients. In univariate cox analysis result, we discovered that 7 genes (PITX1, SIX1, IRX2, HOXA10, HOXB7, PITX1, SATB2, PKNOX2) were significantly associated with LUAD patients OS (P-Value < 0.05). As shown in Figure 2A, the survival prognosis forest map of univariate Cox analysis, IRX2, PKNOX2 and SIX1 are protective factors (Hazard Ratio <1, P-Value < 0.05) while others were risk factors (Hazard Ratio >1, P-Value < 0.05). Subsequently, we performed lasso regression and all the seven genes had coefficients that were not zero with 1000 repeats (Figure 2B and 2C). Risk score was calculated by the lasso coefficients: (0.0261 × HOXA10) + (-0.0365 × HOXC6) + (0.0357 × PITX1) + (-0.0857 × SIX1) + (-0.0581 × IRX2) + (-0.0732 × PKNOX2) (Figure 2D). The lasso Cox regression risk score results could be used to predict the OS (HR = 2.63 [1.80-3.81], P-Value = 5.145e-07) (Figure 2E), Disease-specific survival (DSS) (HR = 2.63 [1.8-3.8], P-Value = 1.974e-05) (Figure 2E) and Progression-Free interval (PFI) in TCGA LUAD cohort (Figure 2E-F). The ROC curves were then used to assess the prognostic accuracy of the model. The AUC was 0.714 (1-year), 0.653 (3-year), and 0.636 (5-year) in TCGA-LUAD dataset (Figure 2H). In validation dataset GSE37745, the AUC was 0.583 (1-year), 0.605 (3-year), and 0.702 (5-year) (Figure 2I).
Prognostic significance of IRX2 in LUAD
We then aimed to investigate the prognostic significance of IRX2 in lung adenocarcinoma patients. Cox analysis results suggested that IRX2 using age, gender, pathological stage, pathological M and T is also an independent risk factor in univariate (HR = 0.84 [0.770-0.932], P-Value < 0.001) and multivariate analysis (HR = 0.876 [0.794-0.967], P-Value < 0.001) (Figure 2A). Furthermore, we used Kaplan-Meier Plotter for lung cancer to confirmed the prognosis significances of IRXs. We found that expression levels of IRX2 (228462_at) were significantly associated with OS and FP (Free progression) in LUAD (HR = 0.50 [0.39-0.64], P-Value = 2.1e-07, HR = 0.51 [0.37-0.70], P-Value = 3.3e-05) in Kaplan-Meier Plotter for lung cancer (Figure 4B). Also, we showed in TCGA LUAD RNA-seq datasets, IRX2 expressions were also associated with OS (HR = 0.55 [0.41-0.74], P-Value = 5.0e-05) (Figure 4B). According to the evidence, IRX2 could be used as a diagnostic homeobox gene for LUAD. However, although IRX2 is downregulated both in LUAD and LUSC (lung squamous cell carcinoma tumor) (Figure 4A), but it is a risk factor for LUSC (lung squamous cell carcinoma) (HR = 1.48 [1.09-2.02], P-Value = 0.012) in the Kaplan-Meier plotter (Figure 4A). This could be used to demonstrate the differences in molecular expression levels between LUAD and LUSC. Furthermore, we also further performed univariate Cox analysis associated with the DSS (Disease-specific survival) and PFI (Progression-free interval). IRX2 was the most significant DE homeobox gene associated with OS (hazard ratio = 0.90 [0.85-0.95], P-Value = 0.00011), DSS (hazard ratio = 0.88 [0.82-0.94], P-Value = 0.00028), and PFI (hazard ratio = 0.93 [0.88-0.98, P-Value = 0.0064) in LUAD (Figure 2B and 2C).
Loss of IRX2 expression in lung adenocarcinoma
We further validated the IRX2 down-regulations in GEPIA using TCGA-LUAD (n = 483) and LUSC (n = 486) with matched TCGA normal and GTEx data (n = 347) (Figure 4A). In the RNA-seq of blood platelets from non-small cell lung carcinoma patients (n = 60) and healthy donors (n = 59) from the E-GEOD-68086 dataset, IRX2 mRNA level was decreased in cancer patients (log2FC = 2.70, P-Value = 2.76e-07) (Figure 4B). Meanwhile, protein level for IRX2 in LUAD was also suppressed. According to the proteomic expression profile provided by ULCAN based CPTAC (Clinical Proteomic Tumor Analysis Consortium)-LUAD dataset, the total protein level of IRX2 was decreased in tumor samples (n = 111) (P-Value = 8.067e-04) (Figure 4C). According to the individual cancer stage, the IRX2 protein level was significantly lower in Stage 1 (P-Value = 4.920e-02) and Stage 2 (P-Value = 8.504e-03) but not Stage 3 (P-Value = 5.451e-02) compared to normal tissues (Figure 4D). Furthermore, IRX2 protein was also lost in LUAD samples with higher tumor grade, Normal vs Grade 2 (P-Value = 3.696e-02), Normal vs Grade 3 (P-Value = 1.294e-04), Grade 1 vs Grade 2 (P-Value = 2.068e-02) and Grade 2 vs Grade 3 (P-Value < 1e-12) (Figure 4E). Furthermore, we used the IHC (Immunohistochemistry) data from HPA (Human Protein Atlas) database to explore the expression patterns of IRX2 protein. It was weakly expressed in macrophages but not detected in alveolar cells in normal lungs (Figure 4F, upper panel). Its expression was not detected in 7 of 11 HPA lung cancer samples (Figure 4F, lower panel). In the bulk tissue gene expression profile from GTEx dataset, however, IRX2 expression (ENSG00000170561.12) is enriched in normal lung (Figure S2A). In HPA single cell type specificity analysis result, the IRX2 expression was enriched in groups including alveolar cell type1 and type2 (Figure S2B). Taken together, these findings suggested that the IRX2 protein was enrich in normal lung but frequently reduced in LUAD.
IRX2 promoter methylation level increased in LUAD
Gene hypermethylation can silence gene expression and regulate biological processes, particularly tumor suppressor genes in cancer tissues. We calculated the methylation levels of 11 CpG sites (cg09524455, cg26504021, cg19679633, cg02135861, cg13702053, cg21093166, cg11552694, cg08204280, cg11793269, cg08235864, cg04992127) around the IRX2 genomic promoter region in the TCGA-LUAD dataset using the UALCAN database. The higher average methylation levels were found in primary tumor (n = 473) than normal samples (n = 32) (P.Value = 1.624e-12) (Figure 5A). More importantly, the IRX2 promoter region has a high methylation level in individual cancer stages when compared to normal tissues (Normal vs Stage 1: P.Value = 1.623e -12; Stage 2: P.Value = 6.447e-12; Stage 3: P.Value = 2.483e-07; Stage 4 : P-Value = 9.143e-03) (Figure 5B). Using LinkedOmics database, we found that patients with high levels of IRX2 promoter methylation have a worse prognosis than those with low levels of IRX2 promoter methylation (HR = 1.239, P.Value = 0.0109) (Figure 5C). Using Pearson's correlation analysis, we discovered that IRX2 mRNA expression levels were negatively correlated with its methylation level in both the TCGA- (r = -0.673, P.Value = 2.339e-61) and CTPAC-LUAD datasets (r = -0.482, P.Value = 1.916e-07) (Figure 5D and 5E). These findings suggest that IRX2 hypermethylation is a risk factor in LUAD patients and may be linked to dysregulations of IRX2 mRNA expression.
Gene enrichment analysis and IRX2 regulatory co‑expression network
In the LinkeDomics database, we discovered that 10530 genes have a positive correlation with IRX2, while 9668 genes have a negative correlation with IRX2 (P.Value 0.05). The top 20 most positive and negative genes associated with IRX2 are depicted as heat maps and volcano plot in Figures 6A and 6B. IRX2 expression has a positive correlation with genes such as C5orf38 (r = 0.889, P.Value = 8.17E177), SUSD2 (r = 0.529, P.Value = 1.53E38), and GPR116 (r = 0.523, P.Value = 1.30E37), but a negative correlation with genes such as UCK2 (r = -0.456, P.Value =1.24E29), CENPA (r= -0.434, P.Value =4.19E−25), CTSL2 (r = -0.429, P.Value = 1.61E−24). We discovered that the mRNA expression of the tumor suppressor gene SUSD2 (Sushi Domain Containing 2) was highly expressed in IRX2 high-group (Figure 5B). We then confirmed a positive correlation with IRX2 in various LUAD datasets, including TCGA (r = 0.49, P.Value = 1.9e-05), GSE18842 (r = 0.64, P.Value = 1.2e-11), GSE32863 (r = 0.44, P.Value = 8.0e-07) and GSE40419 (r = 0.54 P.Value = 1.2e-13) (Figure S3A-D). We divided the tumor samples of TCGA-LUAD into two groups (high and low) according to the expression levels of IRX2. A total of 842 DEGs (373 up and 469 down). Following that, we performed GO: BP (biological pathway) analysis and discovered that they were related to antimicrobial humoral immune response and cilium movement. These findings suggested that IRX2 might play a role in lung immune response and normal psychological functions (Figure 6C). The GSEA results, on the other hand, suggested that it was positively correlated with the cell cycle and some tumorigenesis pathways (Figure 6D).
Subtype expression and clinical relevance of IRX2 in LUAD
In TCGA-LUAD and GSE36471, we discovered that IRX2 was expressed in the bronchioid subtype and not in the magnoid or squamoid subtypes (Figure 6A and 6B). Bronchioid subtype patients have a better prognosis in both TCGA-LUAD datasets (P.Value = 0.0473) (Figure 7C) in GSE36471 (Wilkerson et al.). Meanwhile, in TCGA LUAD cohort, IRX2 expression was significantly higher in patients who had a complete response (CR) compared to those who had progressive disease after therapy (P.Value < 0.05) (Figure 7D). However, there was no difference in IRX2 mRNA levels between the groups with stable disease and those with progressive disease (P.Value > 0.05) (Figure 5D). The expression of IRX2 was found to be lower in the N3 stage when compared to the N1 stage in TCGA cohort (Figure 7E). Because the N stage in the TNM staging system stands for lymph nodes invasiveness, these findings suggested that decreased IRX2 expression may be related to tumor invasion. We also found additional evidence in a lung cancer cell line. When the expression profiles of poorly invasive CL1-0 and highly invasive CL1-5 lung adenocarcinoma cell lines were compared, it was discovered that IRX2 was significantly decreased in the CL1-5 cell line (log2FC = 2.13, FDR = 2.37e-05) from dataset GSE42407 (Figure 7F). These findings suggested that IRX2 may be involved in the invasion and mobility of tumor cancer cells. We further calculated the immune cell subtype correlations with IRX2 and discovered that IRX2 expression is differed in immune and stromal score (Figure S4A) and negative associated with Macrophage M1 cells (Figure S4B-D).