Prediction of the diagnostic and prognostic values of homeobox family in lung cancer using bioinformatics

TINGTING HU Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University PanJun Gao Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University Min Jiang Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University Zheng Li Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University Dan Xu Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University Jing Jing Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University JING WANG (  jingw_xj@163.com ) Traditional Chinese Medicine Hospital A liated to Xinjiang Medical University


Introduction
Lung cancer is a frequently diagnosed malignancy and is the leading cause of cancer-related deaths worldwide [1,2]. Based on the global cancer statistics, 1,824,700 new lung cancer cases were identi ed, causing approximately 1,589,900 deaths in 2012 [3]. Non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) are the two types of lung cancer, and NSCLC accounts for approximately 85% of all lung cancer cases [4]. Despite advancements in diagnostic and treatment methods, the 5-year overall survival rate for patients with lung cancer still remains less than 15% [5]. Therefore, development of e cient diagnosis markers and drug targets should be put on the agenda to improve the prognosis of patients with lung cancer.
The development of high-throughput molecular tools including microarrays and large-scale databases of serial analysis of gene expression and expressed sequence tags have identi ed that lots of genes are deregulated in lung cancer and closely implicated in the carcinogenesis [6,7]. However, it still inevitably encounter many obstacles in lung cancer research due to some of the limitations in microarray technology, such as small sample size, measurement error and insu cient information [8]. Therefore, it is of importance to further identify genes which are valuable in early diagnosis and treatment of lung cancer.
The homeobox (HOX) genes family is a group of transcription factors which includes 39 members in mammals. Although the expression of HOX genes is typically suppressed in adults, they are necessary for body repair and various homeostatic cellular processes [9,10], including wound healing and vascularization [11], hematopoiesis [12], and female reproductive tract development and fertility [13], as well as cell survival and differentiation [14]. Noticeably, accumulated evidences have identi ed that HOX genes are deregulated in many kinds of cancers including lung cancer, where the deregulation of HOX exerts an oncogenic or a suppressive role in carcinogenesis [15][16][17], suggesting a vital role of HOX genes play in cancer progression.
In this study, we carried out the bioinformatics to integrate the data of lung cancer in UCSC-Xena database, aiming to further identify the genes related to the prognosis and progression of lung cancer.
Our results revealed that 8 genes (HOXA11/A13/B9/B13/C11/C13/D11/D13) of HOX family might play an important role in lung cancer progression with high value in the diagnosis and prognostic prediction in lung cancer. The obtained results might be bene cial to the diagnostics and therapeutics of lung cancer.

Data collection
Raw RNA microarray data 'TCGA Lung Cancer (LUNG)' was obtained from the database of UCSC-Xena, which was obtained by using the sequencing platform of IlluminaHiSeq https://xenabrowser.net/datapages/ . A total of 1006 lung cancer samples and 110 normal lung samples were included in this database, among which 521 cases were at stage I, 284 cases were at stege II, 168 cases were at stage III and 33 cases were at stage IV. The process analysis diagram was shown in Figure 1.

Identi cation of differentially expression genes (DEGs)
The DEGs between lung cancer tissue and normal lung tissue specimens were identi ed by using the R software (http://www.R project. org/version 3.4.2) limma (version 3.32.3) package [18]. Once P value < 0.05 and the absolute value of Fold Chang (FC) >1, genes were considered as DEGs. R software ggplot2 package (http:// cran.r-project.org/web/packages/gplots/index.html) was used to build Volcano plot ltering and Hierarchical clustering to seek DEGs with statistical signi cance among atherosclerotic plaque samples and the normal arterial intima specimens [19].

Functional enrichment analysis
Gene ontology (GO) analysis, covering biological processes (GO-BP), cellular component (GO-CC), and molecular functions (GO-MF) three domains were used to evaluate the enriched function of the total DEGs using R software clusterPro ler. Fisher's exact test was applied to detect the overlap among the DEGs and the GO annotation list beyond that which would be expected by chance. The P value < 0.05 was considered to be signi cantly enriched among DEGs.
Moreover, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was also executed to predict the functional enrichment of the DEGs by using Cytoscape plug-in BinGO and Database for annotation, visualization, and integration [20].
The 'Broad Institute Web Platform for Genome Networks' (GeNets, http://apps.broadinstitute.org/genets) was used to predict the gene-gene interaction of the DEGs.
Gene expression pro ling interactive analysis (GEPIA) GEPIA (http://gepia.cancer-pku.cn) [21] is a newly developed interactive web server which provides help in analyzing the RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA and the GTEx projects. GEPIA offers several customizable function analysis, such as tumor/normal differential expression analysis, survival analysis, pro ling based on cancer types or pathological stages, similar gene detection and correlation analysis. As such, we applied GEPIA to analyze the different expression levels of HOX genes in different stages of lung cancer tissue, and to explore the relationship between the expression levels of HOX genes and the overall survival (OS), progression-free survival (FP), and post progression survival (PPS) for patients with lung cancer.
In detail, the prognostic value of HOXs mRNA expression was evaluated using an online database, Kaplan-Meier Plotter (www.kmplot.com), from which we can obtain the information about gene expression data and survival of 2,437 clinical lung cancer patients (http://kmplot.com/analysis/index.php?p=service&canc er=lung). To analyze the OS, FP, and PPS, lung cancer samples were divided into two groups (high vs. low expression) based on the median expression level, followed by being assessed using a Kaplan-Meier survival plot with the hazard ratio (HR) and a 95% con dence intervals (CI) and a log rank test. In addition, the receiver operating characteristic analyses (ROC) and logistic regression were carried out to assess discriminatory accuracy of lung cancer samples from the healthy individuals.
Gene set enrichment analysis (GSEA) GSEA (http://www.broadinstitute.org/gsea) help investigators to interpret chip results more easily and reasonably. In the present study, GSEA was performed on the DEGs using two gene sets, KEGG gene set and the GO gene set. The false discovery rates (FDRs) <25% for the gene sets were considered as signi cantly enriched gene sets.

Results
Identi cation of the DEGs among the lung cancer tissues and normal lung tissues To further search genes related to the occurrence and development of lung cancer, we rst identi ed the DEGs among the different stages of lung cancer cases and the normal lung tissue samples. Compared with the normal lung tissues, 365 genes were found to be upregulated and 40 genes were downregulated in the stage I of lung cancer cases ( Figure 2A); 404 genes were found to be upregulated and 65 genes were downregulated in the stage II of lung cancer cases ( Figure 2B); 409 genes were found to be upregulated and 69 genes were downregulated in the stage III of lung cancer cases ( Figure 2C); and 293 genes were highly expressed and 37 genes were lowly expressed in the stage IV of lung cancer cases ( Figure 2D). The total DEGs were also demonstrated by using the clustering heat map between stage I-IV of lung cancer cases and the normal lung tissues ( Figure 2E). In addition, we also applied the Venn image to integrate the DEGs among the different stages of lung cancer tissues and normal lung tissues. The results showed that a total of 241 DEGs were found between stage I-IV of lung cancer cases and the normal lung tissues ( Figure 2F). In this process, we found 8 genes in the HOX family, including HOXA11, HOXA13, HOXB13, HOXB9, HOXC11, HOXC13, HOXD11 and HOXD13 were all signi cantly upregulated in all stages of lung cancer as compared to the normal lung tissues ( Table 1), suggesting that HOX family might play an important role in the progression of lung cancer.

GO analysis of the DEGs
To predict the potential roles of the DEGs in the progression of lung cancer, GO analysis was performed on the 241 DEGs. GO analysis showed that the 241 DEGs were mainly enriched in xenobiotic glucuronidation, regionalization, pattern speci cation process and cellular glucuronidation in the term of biological processes ( Figure 3A-B); GABA receptor complex, GABA-A receptor complex, chloride channel complex in term of cellular component ( Figure 3C-D); and extracellular ligand-gated ion channel activity, GABA receptor activity and glucuronosyltransferase activity in molecular function term ( Figure 3E-F). The HOX family was mainly enriched in the term of proximal promoter DNA-binding transcription activator activity, RNA polymerase II-speci c, regionalization, pattern speci cation process, epidermis development and proximal/distal pattern formation.

KEGG and GeNets analysis of the DEGs
The KEGG analysis carried out by the cytoscape plug-in BinGO showed that the DEGs were enriched in ascorbate and aldarate metabolism, pentose and glucuronate and nicotine addiction pathways ( Figure   4A). In addition, from the GeNets analysis, we found that the 241 DEGs were enriched in 11 communities, among which HOXC11 and HOXD11 played a role in community 6 while HOXA13 played its role in community 2 ( Figure 4B).
GEPIA of the expression patterns of HOX family in different stages of lung cancer Using GEPIA dataset, we compared the mRNA expression patterns of HOX factors between lung cancer tissues and normal lung tissues. The results demonstrated that the expression levels of HOXA11, HOXA13, HOXB9, HOXB13, HOXC11, HOXC13, HOXD11 and HOXD13 were all obviously higher in lung adenocarcinoma and squamous cell lung carcinoma tissues than in lung tissues ( Figure 5A-B). We also analyzed the expression pro les of HOXs in different stages of lung cancer. HOXC11 signi cantly varied, whereas HOXA11, HOXA13, HOXB9, HOXB13, HOXC13, HOXD11 and HOXD13 groups did not signi cantly differ in different stages of lung cancer ( Figure 5C).
The relationship between the expression patterns of HOXs and the prognosis of patients with lung cancer The Kaplan-Meier curve and log-rank test analyses revealed that the increased HOXA11/A13/B9/B13/C11/C13/D11/D13 mRNA levels were signi cantly associated with the shorter FP; HOXA11/A13/B9/B13/C11/D11/D13 mRNA levels were signi cantly associated with shorter the OS, and HOXD11 mRNA levels were signi cantly associated with PPS (P < 0.05) in patients with lung cancer ( Figure 6). These results indicated that HOXs have a high value in predicting the prognosis of lung cancer.

Prediction of the diagnostic value of HOXs in lung cancer
The ROC analysis showed that the 8 HOXs used to separate different stages of lung cancer patients from healthy controls had high speci city and sensitivity (Figure 7). In detail, the area under curve (AUC) of HOXA11 to distinguish lung cancer at stage I, stage II, stage III and stage IV form healthy controls were In addition, the GESA KEGG pathway analysis showed that 31 gene sets were signi cantly enriched at nominal P value < 5% in stage II of lung cancer as compared to normal lung tissues (19 up, 12 down) ( Figure 9A-C); 22 gene sets were signi cantly enriched at nominal P value < 5% in stage II of lung cancer as compared to stage I lung tissues (13 up, 9 down) ( Figure 9D-F) 6 gene sets were signi cantly enriched at nominal P value < 5% in stage III of lung cancer as compared to stage II lung tissues (3 up, 3 down) ( Figure 9G-I); and 5 gene sets were signi cantly enriched at nominal P value < 1% (4 up and 1 down) in stage IV lung cancer tissues compared to stage III lung cancer ( Figure 9J-L).

Discussion
Up to now, the application of high-throughput techniques in lung cancer generated a larger amount of data. Based on these data, many genes have also been identi ed. But it remains incompletely clear that the roles of these genes play in lung cancer progression. This study integrated the database of lung cancer from UCSC-Xena, which contains 1006 lung cancer samples and 110 normal lung samples, and a total of 241 DEGs were identi ed between lung cancer cases (stage I-stage IV) and the normal lung tissues. Among the 241 DEGs, 8 genes from the HOX family got us attention, including HOXA11, HOXA13, HOXB13, HOXB9, HOXC11, HOXC13, HOXD11 and HOXD13, which were all signi cantly upregulated in all stages of lung cancer as compared to the normal lung tissues.
There are 39 HOX genes in mammals, which are divided into four linear clusters, Hox A, B, C and D with 9 to 11 genes in each cluster. The four clusters are located on different chromosomes, 7p15.3 for HOXA, 17q21.3 for HOXB, 12q13.3 for HOXC and 2q31 for HOXD in humans. In this study, we revealed two HOXA genes (HOXA11 and HOXA13), two HOXB gene (HOXB9 and HOXB13), two HOXC genes (HOXC11 and HOXC13) and two HOXD genes (HOXD11 and HOXD13) were signi cantly overexpressed in all stages of lung cancer tissues compared with the normal lung tissues, as demonstrated by both the UCSC-Xena and GEPIA databases. Noticeably, the majority of identi ed HOX genes have been known to be functionally crucial in lung cancer progression [22]. HOXA11 and HOXA13 belong to the HOXA cluster of HOX gene family. HOXA11, also known as HOX1I, HOX1 or radioulnar synostosis with amegakaryocytic thrombocytopenia 1, plays a vital role in several kinds of cancers, such as gastric cancer [23], renal cell carcinoma [24] and glioblastoma [25]. Yang et al. [26] reported that HOXA11 was overexpressed in the lung adenocarcinoma samples. Zhang et al. [27] also found that HOXA11 was signi cantly overexpressed in lung adenocarcinoma samples as compared to the adjacent non-cancer tissues with the area under curves (AUC) of 0.955, suggesting that HOXA11 has a high clinical value in lung cancer diagnosis. The expression pattern of HOXA13 has been reported to be strongly implicated in the progression of gastric cancer [28], prostate cancer [29] and ovarian cancer [30]. Deng at al. [31] found that HOXA13 was overexpressed in lung adenocarcinoma based on data from The Cancer Genome Atlas, Oncomine and reverse transcription-quantitative polymerase chain reaction (RT-PCR). Ectopic expression of the long non-coding HOTTIP facilitated the progression of non-small cell lung cancer (NSCLC) via decreasing HOXA13 expression [32], indicating that HOXA13 might function as a tumor suppressor in NSCLC. HOXB9, also known as a direct transcriptional target of WNT/transcription factor 4 (TCF4), is closely implicated in the modulation of several cellular activities, such as cell proliferation, differentiation and apoptosis. HOXB9 served as an oncogene in breast cancer via inducing epithelial-mesenchymal transition (EMT) [33]. HOXB9 was overexpressed in lung cancer, and the high expression level was closely associated with the poor prognosis of lung adenocarcinoma [34]. Similarly, the high expression of HOXB13 predicted a poor prognosis in lung adenocarcinoma, and upregulation of HOXB13 increased the expression of several metastasis-and drug-resistance-related genes, including ABCG1, EZH2, and Slug [35], suggesting that HOXB13 plays a crucial role in the metastasis and drug-resistance of lung cancer.
HOXC11 was found to be strongly related to the clinical prognosis of patients with renal cancer, cervical cancer, and breast cancer [36-38], and HOXC11 upregulation signi cantly promoted renal cancer cell proliferation [38]. In NSCLC, HOXC11 negatively regulated by miR-1197 reversed miR-1197 role in promoting cell proliferation and migration [39]. HOXC13 was reported to function as an oncogenic role in lung adenocarcinoma via modulation of the expression of CCND1 and CCNE1 [40]. Moreover, Sharpe et al. [41] reported that POU2F1 promoted the progression of head and neck squamous cell carcinoma via downregulating HOXD10 and HOXD11. However, HOXD11 role in lung cancer still remains unknown. The methylation status of HOXD13 is a potential marker and may be utilized for the diagnosis and therapy of lung adenocarcinoma [42].
In conclusion, this study identi ed 8 genes (HOXA11/A13/B9/B13/C11/C13/D11/D13) of HOX family overexpressed in lung cancer and they might play crucial roles in lung cancer progression with high value in the diagnosis and prognostic prediction in lung cancer. Nevertheless, their roles in lung cancer progression still need to be veri ed using the fundamental experiments.   Figure 1 The process analysis diagram of the present study.