Clinical Prognosis of Keratin Family Genes in Patients with LUAD a Study Based on TCGA Database

Background: Malignant tumor is the main cause of death in the world, among which lung cancer is the main cause of death. The incidence rate and mortality of lung cancer are increasing year by year. This study aims to elucidate the potential prognostic value of keratin (KRT) gene family members in patients with lung adenocarcinoma (LUAD). Materials and methods: RNA sequencing data were obtained from the Cancer Genome Atlas (TCGA) database of LUAD tumors and paired normal tissues. Multivariate Cox proportional hazards regression analysis was used to evaluate the prognostic value of KRT family member genes. Analyze the screening variables to construct the risk score. The time-dependent ROC curve is used to evaluate the predicted results. Finally, nomograms were used to assess individualized prognostic risk. Result: From the differentially expressed genes, 14 KRT genes with signicant imbalance in LUAD tumors and adjacent non-cancerous tissues were screened. Receiver operating characteristic curve (ROC) analysis conrmed that these 14 KRT genes can be used as potential diagnostic markers for the diagnosis of lung adenocarcinoma. Multivariate Cox regression analysis showed that six KRT genes were related to the prognosis of lung cancer. The variables were screened by multivariate Cox regression model. The nal results showed that KRT8 and KRT6A were independent risk factors for the prognosis of lung adenocarcinoma. Conclusion: KRT8 and KRT6A can be used as prognostic markers of LUAD. The high expression of KRT8 and KRT6A suggests that the prognosis of LUAD patients is poor.


Introduction
Malignant tumors are the leading cause of death worldwide, among which, lung cancer is the most (1,2).
According to statistics, the global incidence of lung cancer is about 2206771 new cases in 2020 and ranks second among all cancer incidences; With 1796144 deaths from lung cancer, ranking rst among all cancer mortality, it is one of the leading causes of death, and its incidence and mortality are increasing year by year (3). Lung cancer is broadly divided into non-small cell lung cancer and small cell lung cancer, and the two main pathological types of non-small cell lung cancer are adenocarcinoma and squamous cell carcinoma. In recent years, the incidence of lung adenocarcinoma (LUAD) has increased year by year, which has exceeded the incidence of lung squamous cell carcinoma (LUSC). For LUAD, its occurrence and development are highly related to genetic factors, environment and other external factors. With the rise of next-generation gene sequencing technology (NGS), humans can acquire more valuable information through a deeper eld of molecular biology, which undoubtedly gives more clues to the diagnosis and prognosis judgment of lung cancer and the discovery of new targets for treatment.
Using whole genome sequencing data combined with bioinformatics analysis is an effective way to explore the future molecular mechanism. The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov )It is an open source project for mapping 38 human cancer genomes using large-scale genome sequencing, including complete RNA sequencing (RNA SEQ) data for LUAD (1,4,5). A large number of studies have reported that keratin (KRT) family member genes are abnormally regulated in a variety of cancers, which can be used as biomarkers for cancer diagnosis and prognosis(6-8). Among them, keratin may be a new marker for the occurrence and development of lung adenocarcinoma. Keratins belong to the intermediate lament (IF) protein family expressed in all epithelial cells. As cytoskeletal proteins, IF proteins play an important role because they maintain the structural integrity of cells and tissues. These IF proteins are chemically stable long and unbranched laments with a diameter of about 10 nm; Keratins are classi ed into type I acidic keratins and type II Basic keratins, which provide key structural support under mechanical and non-mechanical stress. The human keratin gene family consists of 54 genes with different functions(6, 9,10). Previous studies have shown that multiple KRT family genes can be used as biomarkers for diagnosis and prognosis of epithelial malignancies (such as colorectal cancer, breast cancer, lung cancer, intrahepatic cholangiocarcinoma, pancreatic cancer, gastric cancer, etc.) (7,8,11,12). In addition, previous studies have also reported that some KRT family genes can be used as prognostic indicators of LUAD (13)(14)(15). However, there is no report on the comprehensive and systematic analysis of KRT family genes in LUAD. At least as far as we know, its potential molecular mechanism still needs further study. The purpose of this study is to elucidate the potential molecular mechanism of KRT family member genes and determine their prognostic value in LUAD.

Results
1. Screening and co expression analysis of differential genes RNA sequencing data of LUAD patients were downloaded from TCGA database, including 535 tumor tissue samples and 59 adjacent or normal tissue samples. A total of 501 LUAD patients had complete clinical outcome parameters and RNA SEQ data, which were included in the further study.
The differentially expressed genes belonging to KRT gene family were screened out. A total of 14 KRT genes were found to be signi cantly dysregulated in LUAD tumor and adjacent non cancer tissues. The differential expression fold changes and thermograms are shown in Fig.1 and Fig.2A. One KRT gene was down regulated, while the other genes were up-regulated. The speci c expression is shown in Fig.3. Coexpression analysis showed that there was a certain co expression relationship between genes (Fig. 2B).

Prognostic screening of KRT gene
Univariate survival analysis of clinical parameters of overall survival (OS) time in patients with lung adenocarcinoma showed that tumor stage was signi cantly correlated with OS. Among them, 14 KRT genes were associated with the diagnosis of lung adenocarcinoma (Table 1). Multivariate Cox regression analysis showed that 6 KRT genes were correlated with the prognosis of lung cancer after correction of age, gender and stage (Table 2; KRT86 KRT81 KRT8 KRT18 KRT19 KRT6A). ROC curve analysis con rmed that these 14 KRT genes can be used as potential diagnostic markers for lung adenocarcinoma (Fig.4).
By establishing the lung cancer prediction model and constructing the multivariate Cox regression of the patient's risk score, it was found that the risk score was a signi cant independent risk factor (HR = 2.359, 95% CI 1.728-3.222, p < 0.001) (Fig.5).
K-M curve analysis showed that patients with high-risk score increased the risk of death (log rank p = 0.004, adjusted HR = 1.378, 95% CI 1.013-1.875, as shown in Fig.6A-D). The AUC change of timedependent ROC curve shows that the risk score has a certain predictive value for all-cause death in patients with lung adenocarcinoma, and its AUC is roughly stable at about 0.6. With the change of time, the change of AUC is not obvious (Fig.6E).

Strati cation and joint effect analysis
The relationship between clinical parameters and prognostic gene characteristics can be further studied through the comprehensive analysis of Norman map, strati cation and combined effect analysis. For example, male patients with stage III disease and older than 65 years old are very likely to die of the disease, and their 1-year survival rate is almost zero. Nomograms constructed with risk scores and clinical LUAD parameters showed that prognostic markers based on KRT gene expression were more accurate than other parameters (Fig.7).

Discussion
Keratin KRT consists of most intermediate laments IF. It is an important component of the cytoskeleton and participates in many cellular processes, including mitosis, differentiation, and apoptosis. Keratin is expressed in all epithelial cells and plays an important role in protecting epithelial cells from mechanical and non-mechanical damage. Keratin is divided into type I acidic keratin and type II basic keratin. KRT8 and KRT6a belong to type II keratin (7,9,16,17). More and more studies have shown that KRT family genes can be used as markers for tumor diagnosis and prognosis, including KRT8 and KRT6A(7).
However, this result is not surprising, because previous studies have con rmed the value of KRT gene in the diagnosis of lung adenocarcinoma (13, 14, 18). KRT8 is the main component of cytoskeleton intermediate laments and is mainly expressed in epithelial tissues (13). Studies have found that KRT8 is expressed in most cancers, such as breast cancer (12), renal cell carcinoma (19), gastric cancer (11), lung cancer (20) and so on. Compared with normal lung tissue, the expression of KRT8 was signi cantly increased in LUAD and LUSC, and the KRT8 high expression group in LUAD patients signi cantly reduced the overall survival rate (OS) and recurrence free survival rate (RFS) (13). Studies have shown that KRT8 is an important marker in the development process of primordial germ cells (PGCs), and KRT is necessary for PGCs migration. Compared with germ cells in gonads, PGCs in migrating mice express high levels of KRT8 during migration. Experiments have also proved that the lack of KRT8 leads to serious damage to human PGCs migration (21). Our study shows that KRT8 is highly correlated with the prognosis of lung cancer patients. Therefore, whether the high expression of KRT8 gene is related to the metastasis and differentiation of lung cancer cells in lung cancer tissues needs further research.
In addition, KRT6A is an important component in the formation of nail bed, lamentous papilla and oral mucosal epithelium (18), KRT6A is expressed in non-keratinized strati ed squamous epithelial cells. In the pseudostrati ed epithelium of respiratory mucosa, KRT6A is highly up-regulated. In tumors, KRT6A is strongly expressed in squamous metaplasia in different parts, and its low expression can be occasionally found in adenocarcinoma (6). Previous studies have found that KRT6A can be used as a marker for the origin of breast cancer (22). The mutation of KRT6A can cause glutamate to be replaced by lysine in KRT6A, resulting in congenital thick nail (23,24). KRT6A can also change the subtype of tumor associated macrophages in pancreatic ductal adenocarcinoma by participating in the modi cation of tumor associated macrophages (25). Some studies have also con rmed that the high expression of KRT6A in LUAD can promote the proliferation, migration and colony formation of lung adenocarcinoma cells, and can be used as a prognostic marker of LUAD (18, 26). However, this view is also controversial. Xiao Jian and others have reached different conclusions. The research suggests that KRT6A protein can inhibit the proliferation, migration and invasion of lung adenocarcinoma cells, and the high expression of KRT6A seems to be related to the good prognosis of lung adenocarcinoma patients (27). Studies have shown that high expression of KRT6A in patients with three negative breast cancer after adjuvant chemotherapy suggests better prognosis(28). Although this study is limited to cytological test and there is no further animal research and mechanism research, it may suggest that KRT6A has a potential bene cial effect. Therefore, we hope to have more mechanism research to nd the role of KRT6A in lung adenocarcinoma.
It is worth mentioning that we creatively proposed this gene signature. We found that the gene signature constructed by the combination of these two genes for the diagnosis of lung cancer is very valuable for the diagnosis of lung cancer. At the same time, we also predict the prognosis of patients according to the gene expression of patients. However, our research also has many de ciencies. In the collected data, the clinical information of many patients is not perfect, including clinical examination results, drug dosage, treatment methods, etc. Secondly, the function of KRT8 and KRT6A genes and the mechanism of occurrence and prognosis in lung adenocarcinoma have not been further clari ed. Therefore, we need to further study the clinical data of larger samples and improve the basic experiments to study its mechanism and authenticity.

Conclusions
Our study concluded that krt8 and krt6a could serve as prognostic markers for LUAD, and the high expression of krt8 and krt6a suggested a worse prognosis for LUAD patients.

Data sources
The RNA sequencing data set and clinical data of patients were collected from TCGA data base (https://portal.gdc.cancer.gov/). The RNA sequencing data is imported with default settings, and all values are normalized to FPKM values. This study does not include any experiments conducted by any author on human participants or animals. Since all data sets included in the current study were downloaded from TCGA database, and the acquisition and application of data were in accordance with the relevant provisions of TCGA database, no additional ethical review approval was required.
2. Screening and co expression of KRT differential genes R software (version 4.0.3) was used to screen the obtained gene expression data for differential expression, and the screening standard was log2(FC) >1 or log2 -1, and the average expression of FPKM > 1. Then, the KRT differential genes were selected from the differentially expressed genes. The heat map of differential expression was made by online tool heat mapper http://www.heatmapper.ca/expression/ ). The GGCORRPLOT package was used to further display the differentially expressed genes, and the selected KRT differentially expressed genes were analyzed for co-expression. The co-expression relationship was considered statistically signi cant by Pearson test, with p < 0.05.

Prognostic and diagnostic value of KRT differential gene in lung adenocarcinoma
The screening criteria of clinical data are as follows: (1) the follow-up time must not be 0; (2) Patients must have sequencing data of tumor tissue. If a patient has sequencing data of multiple tumor samples, the tumor samples used for prognosis analysis will randomly retain a set of initial sampling and nonmetastatic cancer sample data. The ROC curve module of Stata software was used to evaluate the diagnostic value of KRT differential genes in lung adenocarcinoma. The survival package of R language platform was used to evaluate the prognostic value of KRT family members by multivariate Cox proportional hazard regression analysis. Age, gender and tumor stage were included as covariates, and p < 0.05 was considered to be statistically signi cant. Finally, the genes with p value less than 0.05 in the previous multivariate Cox analysis were included in the multivariate Cox proportional hazards regression analysis. After adjusting for age, gender and tumor stage, the variables were screened by the backward stepwise regression model based on the maximum likelihood method.

Construct prognostic gene signature model
The genes with p < 0.05 in multivariate Cox regression analysis were used to construct gene signatures.
Use the following formula: risk score = expression of KRT a × β a+ expression of KRT b × β b + … expression of KRT n × β n. The linear combination of regression coe cient (β) was obtained from multivariate Cox proportional hazards regression analysis. Patients were classi ed as low-risk or high-risk according to the median risk score. The survival ROC software package draws time-dependent ROC curves to evaluate the prediction results https://cran.rproject.org/web/packages/survivalROC/index.html . Finally, nomograms were used to assess the individual prognostic risk score.    According to the prognostic risk score model of two KRT family genes in LUAD patients. They were divided into two groups from top to bottom A Risk score B Distribution of patient survival status C Heat map of expression of two prognostic KRT family genes between low and high-risk groups. D Kaplan-Meier curve for high and low risk groups. E ROC curve for predicting survival of LUAD patients according to risk score: AUC is the area under the curve