Systematic Profiling of Invasion-related Genes Signature Predicts Prognostic Features and Identificates Molecular Subtypes of Lung Adenocarcinoma


 Background: Due to the high heterogeneity of lung adenocarcinoma (LUAD), molecular subtype based on gene expression profiles is of great significance for diagnosis, and prognosis prediction in patients with LUAD.Methods: Invasion-related genes were obtained from the CancerSEA database, and LUAD expression profiles were downloaded from The Cancer Genome Atlas. The ConsensusClusterPlus was used to obtain molecular subtypes based on invasion-related genes. The limma software package was used to identify differentially expressed genes (DEGs). A multi-gene risk model was constructed by Lasso-Cox analysis. A nomogram was also constructed based on risk scores and meaningful clinical features.Results: 3 subtypes (C1, C2, C3) based on the expression of invasion-related genes were obtained. C3 had the worst prognosis. A total of 669 DEGs were identified among the subtypes. Pathway enrichment analysis results showed that the DEGs were mainly enriched in the cell cycle, DNA replication, the p53 signaling pathway, and other tumor-related pathways. A 5-gene signature (KRT6A, MELTF, IRX5, MS4A1, CRTAC1) was identified by using Lasso-Cox analysis. The training, validation, and external independent cohorts proved that the model was robust and had better prediction ability than other lung cancer models. The gene expression results showed that the expression levels of MS4A1 and KRT6A in tumor tissues were higher than in normal tissues, while CRTAC1 expression in tumor tissues was lower than in normal tissues. At the same time, the 5 genes were significantly expressed in pan-cancer immune subtypes. Gene set enrichment analysis showed that MS4A1, KRT6A, and CRAT1 genes were both enriched in the HALLMARK_IL2_STAT5_SIGNALING pathway, and IRX5 and MELTF gene were both enriched in the HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION pathway. Conclusion: The 5-gene signature prognostic stratification system based on invasion-related genes could be used to assess prognostic risk in patients with LUAD.


Introduction
Lung cancer is the most common cause of cancer-related death worldwide (Ferlay et al.,, 2015). About 85% of lung cancers are non-small cell lung carcinomas (Zheng, 2016), which can be divided into 3 histological subtypes, with lung adenocarcinoma (LUAD) being the most common (Travis, 2011). Major risk factors for LUAD include smoking, genetic factors, diet, alcohol consumption, and exposure to ionizing radiation and environmental pollutants (Malhotra et al.,, 2016, Pesch et al.,, 2012. As the early symptoms of LUAD are not obvious, most patients with LUAD are diagnosed with advanced stages, though metastasis occurs earlier, and the 5-year overall survival (OS) rate is less than 20% (Lin et al.,, 2016). Although progress has been made in terms of early diagnostic methods, chemotherapy, radiotherapy, and surgical diagnosis and treatment options in recent years, the prognoses of patients with LUAD remain poor (Qi et al.,, 2016).
At present, lung cancer treatment mainly depends on histological type and clinical stage, but due to the high heterogeneity of LUAD, even patients with the same histological type and clinical stage of LUAD have different prognoses, so the classi cation of LUAD based on high-throughput sequencing data is of great signi cance for individualized and accurate LUAD treatment.
In recent years, the use of high-throughput sequencing technology to detect a large number of gene expression changes, combined with the use of bioinformatics methods to systematically analyze tumorrelated genes and their regulatory mechanisms, has become an important research means in functional genomics, and it has been widely used to screen potential tumor biomarkers (Zheng et  In this study, we identi ed molecular subtypes of LUAD based on tumor invasion-related genes by using gene expression data from public databases, such as The Cancer Genome Atlas (TCGA) and GEO, for the rst time. We evaluated the relationships between the molecular subtypes and prognosis and clinical features. The prognostic risk model based on differentially expressed genes (DEGs) between the LUAD subtypes could be used to evaluate LUAD prognosis. In addition, the nomogram we constructed could be used to help clinical decision-making and prognosis judgment.

Methods And Materials
Data download and preprocessing RNA sequencing data and clinical follow-up information for LUAD were downloaded from the TCGA database. The GSE31210 chip dataset containing survival time information was downloaded from the GEO database.
The TCGA-LUAD RNA sequencing data were preprocessed as follows: 1) the samples with no clinical follow-up information were removed; 2) the samples with no survival time information were removed; 3) the samples with no status information were removed; 4) the Ensemble IDs was transformed into gene symbols; and 5) the median expression of multiple gene symbols was obtained.
The GEO data were preprocessed as follows: 1) the samples with no clinical follow-up information were removed; 2) the samples with no survival time or status information were removed; 3) the probes were converted into gene symbols; 4) the probes were mapped to multiple genes, and the probes were deleted; and 5) the median expression of multiple gene symbols was obtained.
After preprocessing, there were 500 TCGA-LUAD samples and 126 GSE31210 dataset samples. The clinical statistics of the samples can be found in Supplementary Table 2.

Consistent clustering
The expression levels of 97 invasion-related genes were extracted from the TCGA expression pro les, and genes related to LUAD prognosis were obtained by univariate Cox analysis using the coxph function in R (p < 0.01). ConsensusClusterPlus (V1.48.0) was used to cluster the samples consistently according to signi cant genes from the univariate Cox analysis (parameters: reps = 100, pItem = 0.8, pFeature = 1, distance = Minkowski). Pam and Minkowski distances were used as the clustering algorithm and distance measure, respectively.

Identi cation of differentially expressed genes
The DEGs of different molecular subtypes were calculated by using the limma package in R (Ritchie et al.,, 2015). The DEGs were ltered according to the threshold of FDR < 0.01 and | log2fc | > 1, and then volcano maps were plotted.

GO and KEGG enrichment analyses
The results of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses using the DAVID database showed that p < 0.05 was statistically signi cant. The results were visualized by using the ggplot2 package in R (V3.5.3).

Calculation of immune scores
Stromal, immune, and estimate scores were calculated by using the ESTIMATE package in R. Ten immune cells were evaluated by using MCPcounter, and 28 immune cells were evaluated by using singlesample gene set enrichment analysis (ssGSEA) with the GSVA package in R (Charoentong et al.,, 2017).

Construction of prognostic risk model
The 500 samples in the TCGA dataset were divided into training and validation cohorts. To avoid the in uence of random assignment bias on the stability of subsequent modeling, all samples were put back into the random grouping 100 times in advance, and the group sampling was carried out according to the training cohort-to-validation cohort ratio of 1:1. The nal training cohort contained 250 samples, and the nal validation cohort contained 250 samples.

Univariate and multivariate Cox regression analyses
The survival coxph function in R was used to analyze the DEGs among the molecular subtypes and the survival data through univariate Cox proportional hazard regression. We selected p < 0.001 as the lter threshold, and genes related to prognosis were obtained. The R package glmnet was used to perform Lasso regression on the DEGs and prognosis-related genes to compress the risk model to reduce the number of genes (Kukreja, 2006). The step method in the stats package in R starts from the most complex model and reduces the number of parameters by deleting 1 variable in turn. The smaller the step value, the more superior the model. This means that the tting degree of the model is better with fewer parameters. The number of genes in the risk model was further reduced by using the Akaike information criterion (AIC) algorithm.

Gene set enrichment analysis
To observe the relationships between risk scores and biological functions in different samples, the gene expression pro les of these samples were selected for ssGSEA using the GSVA R software package. The Clinical expression of genes in the Oncomine Oncomine (http://www.oncomine.org) is a gene chip-based database and integrated data-mining platform. In this study, we set the screening criteria as follows: 1) cancer type: LUAD; 2) analysis type: cancer vs normal analysis; and 3) threshold criteria: p < 0.05, fold change > 1.5, and gene rank = top 10%.

Immunohistochemistry and protein level validation
The Human Protein Atlas (HPA) provides information on the tissue and cell distributions of 26,000 human proteins. We explored protein levels relating to the 5 genes in normal lung and tumor tissues.

Pan cancer analysis of hub genes
The transcriptome data of 33 cancers from the UCSC Xena database were downloaded, and normal patients were removed, the ggplot2 package were used to plot the prognostic forest of KRT6A, MELTF, IRX5, MS4A1, CRTAC1 genes in 33 cancer types. For LUAD, we analyzed the expression differences of these 5 genes among different immune subtypes. According to the expression levels of 5 genes, LUAD patients were divided into high and low groups, and the clusterPro lter package was used for GSEA analysis.

Study ow chart
To make the research easier for readers to understand, we drew a methodology ow chart ( Supplementary Fig. 1).

Molecular typing of LUAD based on invasion-related genes
Through univariate Cox analysis of the 97 invasion-related genes in the TCGA expression pro le, 19 genes were found to be associated with LUAD prognosis (p < 0.01; Supplementary Table 3). Consistent cluster analysis showed that the samples could be clustered at k = 3 (Fig. 1A). The expression levels of the invasion-related genes in the 3 subtypes are shown in Fig. 1B. These levels were different among the C1, C2, and C3 subtypes. Most of the genes were highly expressed in the C3 subtype and lowly expressed in the C2 subtype. We further analyzed the relationships between the 3 subtypes and prognosis. The results showed that there were signi cant differences between the 3 subtypes. The prognoses of patients with the C2 subtype were the best, and those of patients with the C3 subtype were the worst (log-rank p < 0.05; Fig. 1C, D).

Identi cation and functional analysis of DEGs among subtypes
The DEGs between C1, C2 and C3 were identi ed by using the limma package in R. The volcano map of the DEGs between C1 and C3 is shown in Supplementary Fig Clinical correlations of molecular subtypes and comparison with existing subtypes The distributions of different clinical features among the C1, C2, and C3 subtypes were compared. The results showed that there were signi cantly more C2 patients than C1 and C3 patients in the T1, N0, and Stage I samples, while there were signi cantly fewer C2 patients than C1 and C3 patients in the T2, N1, and Stage II samples ( Fig. 1E-G). The number of survivors in the C2 group was signi cantly higher than in the C1 and C3 groups (Fig. 1H). These results con rmed that patients with the C2 subtype had the best prognoses.
Previous studies have analyzed 33 cancers in the TCGA database. These studies clustered non-blood tumors into 6 immune subtypes based on the distributions of various features, such as macrophages, immune-in ltrating lymphocytes, transforming growth factor-beta response, interferon-γ response, and wound healing; these subtypes include C1 (wound healing), C2 (INF-γ predominance), C3 (in ammation), C4 (lymphocyte depletion), C5 (immunological silencing), and C6 (transforming growth factor-beta predominance), among which C1 and C6 have been associated with poor prognosis (Thorsson et al.,, 2018). By comparing the molecular subtypes with these immune subtypes, it was found that most LUAD patients in the TCGA dataset belonged to the C1, C2, and C3 immune subtypes (about 89.5%), and there were no patients with the C5 immune subtype in the LUAD TCGA dataset (Fig. 1I). By comparing the distributions of the molecular and immune subtypes, it was found that patients with the C3 molecular subtype showed the highest proportion of the C2 immune subtype, reaching 54% (Fig. 1J). The proportion of the C2 immune subtype among the C2 molecular subtype was lower, and the proportion of the C3 immune subtype was higher than that of the C3 molecular subtype. The survival curve analysis results showed that there were signi cant differences in OS among the immune subtypes (p < 0.05; Fig. 1K). These results suggested that the prognosis of the C3 immune subtype was better than that of the C2 immune subtype.

Comparison of immune scores among subtypes
The relationships between the molecular subtypes of the TCGA dataset and immune scores were identi ed by using the ESTIMATE software package in R, MCPcounter, and the ssGSEA method in the GSVA package. The results showed that there were signi cant differences in immune scores among the different subtypes ( Fig. 2A-C). The heat map of immune scores among the 3 subtypes is shown in Fig. 2D.

Construction of risk model
The 500 samples in the TCGA dataset were grouped according to the training set-to-validation set ratio of 1:1, and the univariate Cox proportional hazards regression model method was used to evaluate the 669 DEGs between the molecular subtypes. A total of 29 genes were found to be associated with prognosis (Supplementary Table 4). Lasso regression was used to further compress the 29 genes. The trajectory of each independent variable is shown in Supplementary Fig. 3A. As lambda decreased, the number of independent variable coe cients tending to 0 increased. We used 10-fold cross-validation to build the model and analyzed the con dence interval (CI) under each lambda ( Supplementary Fig. 3B). When lambda equaled 0.003518527, the model was considered optimal, and 12 genes (KRT6A, MELTF, IL20RB, PLEK2, LOXL2, IRX5, SLC16A11, FAM189A2, ITGA6, PKP2, MS4A1, CRTAC1) were selected as target genes. The AIC algorithm was used to further compress these 12 target genes, and 5 target genes (KRT6A, MELTF, IRX5, MS4A1, CRTAC1) were nally obtained.
Risk scores were further converted into Z-scores. Samples with scores > 0 were divided into the high-risk group, and samples with scores < 0 were divided into the low-risk group. A total of 119 samples were divided into the high-risk group, and 131 samples were divided into the low-risk group. The survival curve results showed that there was a signi cant difference in prognosis between the 2 groups (p < 0.0001; Fig. 3A).
The risk score distributions of the samples were calculated according to expression levels and then plotted (Fig. 3B). The survival times of the samples with high risk scores were signi cantly shorter than those of the samples with low risk scores, suggesting that samples with high risk scores had worse prognoses. The timeROC software package in R was used to analyze the prognostic classi cation e ciency of risk scores. The model had a large area under the curve (AUC) at 1, 3, and 5 years; the 1-year AUC was 0.72, and the 5-year AUC was 0.74 (Fig. 3C).

Veri cation of risk model robustness in internal and external datasets
The robustness of the model was veri ed by the internal dataset (TCGA validation set and all datasets) and external dataset (GSE31210 dataset). In all datasets, the same model and coe cients as those in the training set were used. The survival curve showed signi cant differences between the high-and low-risk groups in the veri cation set and all datasets ( Fig. 3D and Fig. 3G).The risk score of each sample was calculated according to gene expression, risk score distributions were plotted in TCGA internal validation set and all datasets in Fig. 3E and Fig. 3H. The classi cation e ciencies of prognosis prediction at 1, 3, and 5 years in the TCGA testing cohort and entire TCGA cohort are shown in Fig. 3F and Fig. 3I, respectively. The 1-year AUC reached 0.73 in both datasets.
Z-score transformation of risk scores was performed in GSE31210 dataset. Samples with risk scores > 0 after Z-score transformation were divided into the high-risk group, and samples with risk scores < 0 after Z-score transformation were divided into the low-risk group. This resulted in 94 samples in the high-risk group and 132 samples in the low-risk group. The survival curve showed a signi cant difference between the high-and low-risk groups (p = 0.0028; Fig. 3J).
The risk score distribution of the samples in the GSE31210 cohort was consistent with that of the training set (Fig. 3K). Receiver operating characteristic (ROC) analysis showed that the 1-year AUC reached 0.79 (Fig. 3L).
Relationships among risk scores, clinical features, and molecular subtypes Survival analysis of different clinical subgroups was carried out based on risk scores. The results showed that the 5-gene signature could signi cantly distinguish age, sex, tumor/node/metastasis (TNM) stage, stage, recurrence, chemotherapy, and smoking status (current smoker, never smoked, reformed smoker) samples into high-and low-risk groups (p < 0.05; Fig. 4A-T). The 5-gene signature could not divide the M1 samples into 2 groups with signi cant prognostic difference, which may be due to the small number of M1 samples (p > 0.05; Fig. 4J). In general, our model could be used as a prognostic marker for different clinical subgroups if the sample size was appropriate.
Risk score distributions in terms of different clinical features were also assessed. The results showed that there were no signi cant differences in terms of age or stage (p > 0.05; Fig. 5B Fig. 5A, C, D, F, G). We also compared risk scores among the 3 subtypes (C1, C2, C3). The results showed that the risk scores of C3 subtype samples with poor prognosis were signi cantly higher than those of C2 subtype samples with good prognosis (Fig. 5H), which further suggested that high risk scores were associated with poor survival outcomes.

Relationships between risk scores and pathways
To observe the relationships between the risk scores and biological functions of different samples, GSEA was used to calculate the scores of different functions for each sample as well as correlations between these functions and risk scores. Correlation scores > 0.4 were considered to show positive correlations. Nine pathways were positively correlated with risk scores, and 10 pathways were negatively correlated with risk scores (Supplementary Fig. 4A Fig. 5A, B). A nomogram was constructed based on the signi cant variables of multiple factors (Fig. 6A), and the results showed that risk scores had the greatest effect on survival prediction, suggesting that the 5-gene signature was a good predictor of survival. Furthermore, by using calibration curves to evaluate the accuracy of the model (Fig. 6B), it was observed that the calibration curves at 1, 3, and 5 years were close to the standard curve, suggesting that the model had good prediction performance. Moreover, decision curve analysis was used to evaluate the model's reliability (Fig. 6C), and the results showed that the bene ts of risk scores and the nomogram were signi cantly higher than those of the extreme curve, and the effect of the nomogram was higher than the effects of T stage, N stage, and risk scores, which were close to the extreme curve, suggesting that risk scores and the nomogram had good clinical applicability.

Comparison of risk model with other models
To prove the superiority of our model, 3 risk models, including an 8-gene signature ( To make the models comparable, we calculated the risk score of each LUAD sample in the TCGA dataset by the same method, and we evaluated the ROC curve of each model. Z-score transformation of risk scores was performed. The samples with risk scores > 0 after Z-score transformation were divided into the high-risk group, and samples with risk scores < 0 after Z-score transformation were divided into the low-risk group. The survival curves were plotted. The results showed that all 3 models could signi cantly classify the high-and low-risk groups into prognostic categories (Fig. 6E, G, I). However, the AUCs of the ROC curves of the 3 models were lower than those of the 5-gene signature at 1, 3, and 5 years in the TCGA dataset (Fig. 6D, F, H). These results showed that our model had good clinical predictive power.

Expression of 5 genes in 33 pan-cancers
The box diagram showed that MS4A1 was signi cantly highly expressed in LUAD, HNSC, and kidney renal clear cell carcinoma, while in bladder carcinoma, colon adenocarcinoma, KICH, and READ tumors, MS4A1 was signi cantly lowly expressed (Fig. 7A). Compared with normal samples, KRT6A and MELTF showed signi cantly high expression in most cancer types, including LUAD (Fig. 7B, C), while CRTAC1 was expressed lowly in most cancer types, including LUAD (Fig. 7D, IRX5 was signi cantly highly expressed in breast cancer, CHOL, colon adenocarcinoma, kidney renal papillary cell carcinoma, liver hepatocellular carcinoma, and READ tumors, while in KICH, kidney renal clear cell carcinoma, lung squamous cell carcinoma, and PRAD, IRX5 was signi cantly lowly expressed (Fig. 7E).

Analysis of prognosis of genes in pan-cancer
We found that the MS4A1 gene is a protective gene in more than half of the tumors, which means, patients with high expression of MS4A1 have a better prognosis. KRT6A and MELTF genes are risk genes in most tumors including LUAD. CRTAC1 and IRX5 genes are signi cant protective genes in LUAD, while CRTAC1 gene is a high-risk gene in LUSC (Fig. 8A-E).

Clinical validation and gene set enrichment analysis of 5 genes
The results showed that in the Oncomine database, CRAT1 was lowly expressed in 12 LUAD studies, MS4A1 and MELTF were highly expressed in 1 LUAD study, KRT6A was highly expressed in 2 LUAD studies, and IRX5 showed no signi cant expression in any LUAD study (Fig. 9A-E). In The Human Protein Atlas database, the immunochemistry results of the 5 genes were analyzed, but only 4 genes (MS4A1, KRT6A, MELTF, CRTAC1) had protein expression data. The results showed that the expression levels of MS4A1, KRT6A, and MELTF in tumor tissues were higher than in normal tissues, while CRTAC1 expression in tumor tissues was lower than in normal tissues (Fig. 9F-I). At the same time, we found that the expression levels of 5 genes were signi cantly expressed in pan-cancer immune subtypes (Fig. 9J). Gene set enrichment analysis showed that MS4A1, KRT6A, and CRAT1 genes were both enriched in the HALLMARK_IL2_STAT5_SIGNALING pathway, and IRX5 and MELTF gene were both enriched in the HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION pathway. In addition, the KRT6A gene and CRAT1 gene were also enriched in the HALLMARK_KRAS_HALLMARK_UPRAS pathway ( Fig. 9K-O).

Discussion
In this study, we rst genotyped the 500 LUAD samples of the TCGA dataset based on 97 invasion-related genes, and we divided these samples into 3 subtypes, among which there were signi cant differences in prognosis. The C3 subtype had poor prognosis, and this was closely related to the pathways of tumorigenesis and development. A total of 669 DEGs were identi ed, and 5 target genes, including KRT6A, MELTF, IRX5, MS4A1, and CRTAC1, were obtained by using Lasso regression and the AIC algorithm. secreted by chondrocytes is the glycosylated extracellular matrix molecule of human articular cartilage (Steck et al.,, 2007). At present, there have been no studies of MELTF and CRTAC1 in LUAD, but such studies may provide new ndings for prognostic markers of LUAD. We plan to further verify the mechanism of MELTF and CRTAC1 in LUAD.
We Z-scored the risk scores and divided the samples whose risk scores were > 0 into the high-risk group and those whose risk scores were < 0 into the low-risk group. The results showed that the high-risk score samples had signi cantly shorter survival times than the low-risk score samples. By analyzing the relationships between risk scores and pathways, we found that the tumor-related pathways of KEGG_P53_SIGNALING_PATHWAY, KEGG_CELL_CYCLE, KEGG_MISMATCH_REPAIR, and KEGG_DNA_REPLICATION increased with increased risk scores. The main ways to repair DNA include base excision repair, mismatch repair, nucleotide excision repair, and homologous recombination repair. According to the signi cant clinical characteristics in the univariate and multivariate regression analyses, T stage, N stage, and risk scores were used to construct the nomogram. Calibration and decision curve analysis curves suggested that the model had good prediction performance. Both the internal and external datasets also con rmed that the 5-gene signature was robust, and it could perform well in the independent dataset (GSE31210). Our model performed better than other models of LUAD. One advantage of our model is that targeted sequencing based on particular genes reduces health care costs signi cantly compared to whole-genome sequencing. Second, we selected invasion-related genes as the target genes, which is very important for the early diagnosis and prognosis prediction of LUAD. More importantly, in the routine clinical diagnosis and treatment process, patients' treatment plans and prognoses are largely dependent on pathological stage, the determination of which currently depends on the anatomic location of LUAD, so the biological heterogeneity of patients with LUAD is not currently being fully re ected. The nomogram we constructed can make up for this de ciency and provide a basis for the individualized treatment of patients with LUAD.
Gene expression was explored by using the Oncomine, GEO, and HPA databases. The results showed that the expression levels of MS4A1 and KRT6A in tumor tissues were higher than in normal tissues, while CRTAC1 expression in tumor tissues was lower than in normal tissues.
Our study has some limitations. First, the population in the TCGA database is predominantly white and Black, and our results need to be validated in other racial groups. Second, the construction of the alignment map was done retrospectively, so our results need to be further validated in multicenter clinical trials and prospective studies. In the future, we will explore whether other regression modeling methods can further improve the prediction accuracy of the model.

Conclusion
In conclusion, we identi ed molecular subtypes of LUAD based on tumor invasion-related genes, and we developed a 5-gene signature prognostic hierarchical system. We recommend the use of this classi er as a molecular diagnostic test to assess the prognostic risk of LUAD.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
Publicly available datasets were analyzed in this study. This data can be found here: TCGA.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.

Funding
Not applicable.

Authors' contributions
All authors contributed to the literature investigation, data collection, writing the manuscript, providing useful discussion of its content, and undertaking reviews or revising the manuscript before submission.         Expression of 5 genes in 33 pan-cancers The box diagram showed that MS4A1 was signi cantly highly expressed in LUAD, HNSC, and kidney renal clear cell carcinoma, while in bladder carcinoma, colon adenocarcinoma, KICH, and READ tumors, MS4A1 was signi cantly lowly expressed ( Figure 7A).
Compared with normal samples, KRT6A and MELTF showed signi cantly high expression in most cancer types, including LUAD ( Figure 7B, C), while CRTAC1 was expressed lowly in most cancer types, including LUAD ( Figure 7D, IRX5 was signi cantly highly expressed in breast cancer, CHOL, colon adenocarcinoma, kidney renal papillary cell carcinoma, liver hepatocellular carcinoma, and READ tumors, while in KICH, kidney renal clear cell carcinoma, lung squamous cell carcinoma, and PRAD, IRX5 was signi cantly lowly expressed ( Figure 7E).