Development of a Coagulation-Related Genes Model for Prognostication, Immune Response, and Treatment Prediction in Lung Adenocarcinoma

doi:10.21203/rs.3.rs-3333569/v1

Backgroud:

Lung adenocarcinoma(LUAD) is the most prevalent form of lung cancer worldwide. But the diagnosis and prognosis of LUAD patients remain poor. Studies have shown that LUAD patients with tumors tend to have abnormal coagulation factors.Therefore, the objective of this study is to develop a biomolecular model focusing on coagulation-related factors in LUAD.

Methods：

In this study, we obtained LUAD patients gene expression information and clinical information from The Cancer Genome Atlas (TCGA) database and coagulation-related genes through The Molecular Signature Database (MsigDB), thereby obtaining differentially expressed coagulation-related genes. Predictive models were constructed through LASSO Cox regression. The risk score from the model was used to build high risk set versus low risk set.Additionally, We verify the accuracy of prognostic models through a range of methods. Finally,we applied tumor immune dysfunction and exclusion (TIDE) algorithms to assess immune escape and immunotherapy in relation to coagulation-related genes.

Result:

We developed a prognosis model using four genes to estimate the survival rate of patients with LUAD. High risk patients exhibited lower overall survival (OS) rates compared to low risk patients. Kaplan-Meier(K-M) curves,progression-free survival curves (PFS),ROC curves,principal component analysis (PCA) and nomograms can verify the accuracy of the model.Furthermore,the dual effects of high risk and low tumor mutation burden (TMB) led to poorer survival in patients with LUAD. TIDE analysis revealed a higher likelihood of immune evasion in individuals classified as high risk.

Conclusion：

The prognosis model can accurately predict the prognosis of LUAD patients and provide ideas for future immunotherapy.

Biological sciences/Cancer

Biological sciences/Computational biology and bioinformatics

Biological sciences/Immunology

Health sciences/Oncology

Coagulation

LUAD

TCGA

immunotherapy

risk score

tumor immune microenvironment

The prognosis of lung adenocarcinoma patients is very poor
Coagulation genes are associated with tumorigenesis and progression
Construct coagulation-related gene models to predict the occurrence and prognosis of lung adenocarcinoma patients
Gene mutation rate and drug sensitivity analysis were carried out for patients in high-risk and low-risk groups

Lung cancer is a prevalent malignancy and the foremost cause of cancer-related mortality worldwide (1). Among various histological subtypes, lung adenocarcinoma (LUAD) currently exhibits the highest incidence and prevalence. Despite significant advancements in surgical, chemotherapy, and targeted therapies for LUAD, the prognosis of patients with this condition remains suboptimal (2). Hence, there is a pressing need to enhance the management of LUAD through the implementation of a rational treatment strategy.Recently, tumor risk score prediction signatures represent a non-invasive approach for assessing patient survival and accurately predicting prognosis. These signatures are progressively finding applications in clinical practice (3). Consequently, the development of key gene models assumes pivotal significance in the diagnosis and treatment of lung adenocarcinoma. The coagulation system serves as an intrinsic defense mechanism that can be activated via either the extrinsic pathway (tissue factor pathway) or the intrinsic pathway. Notably, tumor cells have been observed to express procoagulant factors, including tissue factor, initiating coagulation cascades that culminate in thrombin production (4).Patients with tumors typically exhibit a hypercoagulable state (5). Extensive experimental evidence supports the notion that individuals with malignancies often experience chronic hypercoagulation and hyperfibrinolysis (6). Cancer patients frequently exhibit various coagulation abnormalities, which underlie the heightened risk of thrombosis and bleeding (7). A study revealed that cancer patients, particularly those with a favorable prognosis, experienced significantly prolonged survival with the administration of anticoagulant therapy (8). Several biomarkers associated with coagulation disorders have been unequivocally linked to prognosis in diverse cancer types (9-11).Furthermore, studies have highlighted the substantial involvement of the coagulation cascade in shaping the tumor immune microenvironment (TME) (12). Consequently, the impact of coagulation on tumor biology has emerged as a topic of profound research interest. Nonetheless, the precise role of coagulation in LUAD remains inadequately elucidated. This study aims to construct a genetic model incorporating coagulation-related genes specific to lung adenocarcinoma. The primary objective is to offer novel insights into the diagnosis and prognosis of patients afflicted with this condition.To achieve this, we identified coagulation-related genes exhibiting significant differential expression through differential gene analysis. Subsequently, a gene prediction model comprising four genes (MMP1, MMP10, CTSV, F2) was constructed employing univariate Cox regression analysis and lasso analysis. Patients were stratified into high risk and low risk groups based on their risk scores, revealing substantial distinctions in tumor prognosis, immune cell infiltration, and response to immunotherapy between the two groups. Collectively, the aforementioned findings substantiate the potential utilization of coagulation-related genes as a novel avenue for diagnosing and treating lung adenocarcinoma. The flowchart illustrating the methodology employed in this study is depicted in Fig 1.

Data collection

Transcriptomic data for lung adenocarcinoma and related clinical data were obtained from TCGA data portal (https://gdc-portal.nci.nih.gov/). The dataset included a total of 600 tumors and normal samples. By accessing the TCGA data portal, we were able to download the required data for our study.By excluding samples with no prognostic information or 0 or negative prognostic information, 507 samples were identified for follow-up studies.

Acquisitionof differentially Coagulation-RelatedGenes in Lung Adenocarcinoma

We acquired a total of 138 coagulation-related genes from The Molecular Signature Database (MsigDB)s. To identify differential genes(DEGs), we standardized the gene set using the limma package and defined the conditions as |log2FC|>1.5 and adjusted P<0.05. A comparison was made between normal lung samples and samples from LUAD patients in the TCGA database. The DEGs were visualized using the "ggplot2" package and represented as volcano plots.The DEGs of LUAD were intersected with CRGs to obtain the coagulation differential genes of LUAD.

Enrichment Analysis

After identifying the differentially expressed coagulation genes, we conducted an enrichment analysis using the "clusterProfiler" R package. The analysis included Gene Ontology (GO) terms, biological process (BP), cellular component (CC), and molecular function (MF), as well as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. This analysis aimed to explore the underlying molecular mechanisms associated with these coagulation-related genes.Furthermore, we performed the same enrichment analysis on the model genes using GO, KEGG, and the single-sample Gene Set Enrichment Analysis (ssGSEA) algorithm. The analysis was conducted separately for the high risk and low risk groups. We set the cutoff criteria for statistical significanceat P and FDR values < 0.05.

Establishment of a Coagulation-Related Gene Risk Model

We identified coagulation-associated genes (p < 0.05) with significant differences in OS using the survival R package. This analysis involved conducting univariate Cox regression analysis and calculating the hazard ratio (HR) value. To perform the analysis, we randomly divided the TCGA data into a train set (n = 254) and a test set(n = 254). Next, we used the glmnet package to further narrow down the range of genes through LASSO with Cox regression.The risk model was then constructed using the train set, and we validated it using both the total set and the test set. The risk score formula is presented below:

Coef (i) and Expr (i) in the formula represent the regression coefficient of the multiple Cox regression analysis for each mRNA and the normalized expression level of each mRNA, respectively. Based on the risk score, we categorized the total set, test set, and train set into high risk and low risk groups. To assess the difference in overall survival between the high risk and low risk groups in the train set, we utilized the "survival" package and the survminer package to generate K-M curves. Next,we employed the pheatmap package to visualize the gene expression patterns of the modeled genes in the high risk and low risk groups. Additionally, we examined the protein expression of four CRGs in both normal and LUAD tissues using immunohistochemistry (IHC) data from the Human Protein Atlas database (HPA, https://www.proteinatlas.org/). Finally, we applied the "ggplot2" package to conduct principal component analysis (PCA) and assess the accuracy and feasibility of the model by analyzing the discrete direction of the high risk and low risk groups.

The Construction of the Nomogram

Based on TNM, sex, and staging results,, we utilized the "rms" package to construct a Nomogram Survival Model. The risk score, derived from univariate Cox regression analysis and multivariate Cox regression analysis, was employed as an independent factor. To assess the accuracy of the nomogram, we employed calibration curves. By incorporating the risk score and clinical data of lung adenocarcinoma patients, we aimed to predict the 1-year, 3-year, and 5-year survival rates for diverse patient groups.

Immune-Related Analysis and Tumor Mutation Burden Analysis

In R language, we conducted a tumor microenvironment analysis based on the OS of LUAD patients using the limma package and estimate package. Additionally, we utilized the reshape2 package and ggpubr package to calculate the differences between high risk and low risk groups within the overall sample. To assess the variations in immune function among LUAD patients belonging to the high risk and low risk groups, we employed cibersort. The results were visualized using the pheatmap package, and the relationship between risk score and TMB was compared using the maftools package. Furthermore, we utilized the survival package to determine the difference between TMB and patient survival. Statistical significance was defined as a p-value less than 0.05.

Cell culture and Real-time quantitative PCR(RT-qPCR)

Human LUAD cells (A549, H1299) and human normal bronchial epithelial histiocytes (BEAS-2B) were purchased from Pricella Company in Wuhan, China.Cells were cultured in RMPI1640 medium supplemented with 10% Gibco Fetal Bovine Serum and 1% penicillin mixture. The cells were maintained in sterile conditions with a 5% CO2 atmosphere at 37°C. The medium was changed every 2-3 days.

RNA was extracted from both normal lung cells and LUAD cells using the Trizol method. The extracted RNA was then converted into cDNA using reverse transcription kits. The mRNA expression of Coagulation-Related Genes(CRGs) was compared by validating total RNA through the RT-qPCR method. The SYBR Green qPCR Master Mix (No ROX) was used for the RT-qPCR analysis following the instruction manual's guidelines. ACTIN was used as the primer. The prime pairs are listed in Table 1.

Identification of Coagulation-Related Genes in LUAD Patients

To begin, Figure 1 presents the flowchart outlining the entire study. We obtained gene expression data, clinical information data, and mutation data for lung adenocarcinoma patients and normal tissues from the TCGA database. By intersecting the coagulation-related genes (138) with the lung adenocarcinoma genes in the TCGA database. Furthermore, based on the gene expression levels, we identified 51 DEGs(FDR<0.05, |log2FC|>1.5), which consisted of 37 high-expression genes and 14 low-expression genes. The expression patterns of these differential genes in normal tissue and LUAD tissue are presented in the heatmap and volcano plot (Fig 2a and 2b).Finally, we found that DEGs in LUAD were all CRGs(Fig 2c)

Enrichment Analysis of Coagulation-Related Genes

To elucidate the underlying mechanisms associated with the identified 51 genes, we conducted Gene Ontology (GO) analysis. The results, depicted in a bar plot, demonstrated that coagulation-related genes are primarily enriched in processes such as blood coagulation, coagulation, hemostasis, wound healing, and regulation of body fluid levels. Furthermore, in the Cellular Component (CC) category, these genes showed enrichment in collagen-containing extracellular matrix, secretory granule lumen, cytoplasmic vesicle lumen, and vesicle lumen. In terms of Molecular Function (MF), they were mainly associated with serine-type endopeptidase activity, serine-type peptidase activity, serine hydrolase activity, and endopeptidase activity (Fig 2d). Subsequently, we performed KEGG analysis, which revealed that these coagulation-related genes were predominantly enriched in complement and coagulation cascades as well as in Coronavirus disease - COVID-19 pathways (Fig 2e).Collectively, these findings support the association between the coagulation process and the development of lung adenocarcinoma. Moreover, our risk model, based on coagulation genes, demonstrates a relationship with coagulation metabolism.

Construction of a Coagulation Metabolism-Related 4-Gene Risk Model

Through expression difference analysis, we identified 51 genes showing differential expression. Univariate Cox analysis (p<0.05) was performed on these 51 genes, revealing that 13 genes were significantly associated with overall survival (OS) prognosis. Among them, 12 genes (MMP1, MMP10, CTSV, F2, FGA, KLK8, GDA, HRG, HNF4A, F2RL2, PROZ, APOC3) were identified as high risk genes, while one gene (DPP4) was classified as a low risk gene (Fig 3a). Subsequently, using the glmnet package in R, we conducted LASSO regression analysis on these 14 genes in the training set to identify the most informative genes for prognostic purposes. As a result, a 4-gene model consisting of MMP1, MMP10, CTSV, and F2 was obtained (Fig 3b, c). The risk score formula for this model is as follows: risk score = (0.0837569167784308 * MMP1) + (0.0878985239886888 * MMP10) + (0.159310542027166 * CTSV) + (0.258137401792528 * F2).

We performed a random division of patients with lung adenocarcinoma from the TCGA database into three groups: the total set, train set, and test set. Utilizing the risk score calculation formula, we categorized them into high risk and low risk groups. In the total set, there were 253 patients in the high risk group and 254 patients in the low risk group, whereas the train set consisted of 127 patients in the high risk group and 127 patients in the low risk group. Similarly, the test set comprised 126 patients in the medium and high risk group and 127 patients in the low risk group. We compared different clinical information across the total set, train set, and test set (Table 2).

Table1: Primer sequences for qRT–PCR.

Genes	Forward (5′-3′)	Reverse (5′-3′)
ACTIN	CCTTCCTGGGCATGGAGTC	TGATCTTCATTGTGCTGGGTG
MMP10	GAGTTTGACCCCAATGCCAG	TCTTCCCCCTATCTCGCCTA
CTSV	TCAGGCAGATGATGGGTTGC	GCCCAACAAGAACCACACTG
F2	CACGGCTACGGATGTGTTCT	AGTTCGTACCCAGACCCTCAG

Table2: Clinical characteristics of 3 sets of data randomly generated by the TCGA database

Clinical Features	Type	Total set n= 507	Train set n= 254	Test set n= 253	P-value
Age	<= 65	239 (47.14)	120 (47.24)	119 (47.04)	1
Age	>65	258 (50.89)	130 (51.18)	128 (50.59)	1
Gender	FEMALE	272 (53.65)	135 (53.15)	137 (54.15)	0.8912
Gender	MALE	235 (46.35)	119 (46.85)	116 (45.85)	0.8912
Stage	Stage I	272 (53.65)	127 (50)	145 (57.31)	0.3882
	Stage II	120 (23.67)	65 (25.59)	55 (21.74)
	Stage III	81 (15.98)	45 (17.72)	36 (14.23)
	Stage IV	26 (5.13)	13 (5.12)	13 (5.14)
T	T1	169 (33.33)	80 (31.5)	89 (35.18)	0.6613
	T2	271 (53.45)	141 (55.51)	130 (51.38)
	T3	45 (8.88)	21 (8.27)	24 (9.49)
	T4	19 (3.75)	11 (4.33)	8 (3.16)
N	N0	327 (64.5)	155 (61.02)	172 (67.98)	0.2544
	N1	95 (18.74)	52 (20.47)	43 (17)
	N2	71 (14)	42 (16.54)	29 (11.46)
	N3	2 (0.39)	1 (0.39)	1 (0.4)
M	M0	338 (66.67)	170 (66.93)	168 (66.4)	0.9886
M	M1	25 (4.93)	12 (4.72)	13 (5.14)	0.9886

By examining the K-M curve, we found that in the total set, train set, and test set, the OS of high risk patients was significantly lower than that of the low risk group (Fig 4a, c, e). Additionally, when assessing the Progression-Free Survival (PFS), the high risk group exhibited significantly lower survival rates compared to the low risk group (Fig 4b, d, f). The risk heatmap displayed a gradual increase in the levels of MMP1, MMP10, CTSV, and F2 across the total set, train set, and test set, which are considered high risk genes (Fig 5a, d, g). Furthermore, the relationship between the risk score and survival time between the high risk and low risk groups was plotted, clearly demonstrating that as the risk score increases, the patient's survival time significantly decreases (Fig 5b, e, h). A higher risk score is associated with a significant decrease in the patient's survival time (Fig 5c,f,i).

Independent Analysis of Prognostic Factors by Predictive Models

The study employed univariate Cox regression analysis (Fig 6a) and multivariate Cox regression analysis (Fig 6b) to determine whether risk factors could serve as independent prognostic factors differentiating other clinical traits. The results of univariate Cox regression analysis in the population sample revealed several significant factors: Stage (P < 0.001, Hazard ratio = 1.595), T (P < 0.001, Hazard ratio = 1.618), M (P = 0.034, Hazard ratio = 1.858), N (P < 0.001, Hazard ratio = 1.732), and risk score (P < 0.001, Hazard ratio = 1.517), all of which were considered high risk factors. Moreover, based on the results of multivariate Cox regression analysis in the population sample, T (P = 0.010, Hazard ratio = 1.362) and risk score (P < 0.001, Hazard ratio = 1.415) were identified as independent risk factors and both were classified as high risk factors. To assess the accuracy of the risk model, the ROC curve was utilized. The area under the curve (AUC) was calculated as 0.687 for 1 year, 0.634 for 3 years, and 0.622 for 5 years (Fig 6c). Furthermore, the one-year survival prediction demonstrated that the AUC under the risk score curve (0.687) was significantly higher than that of Age (0.490), Gender (0.550), T (0.652), M (0.501), and N (0.666). Therefore, the risk model exhibited significantly higher accuracy (Fig 6d).

The Principal Component Analysis (PCA) and the Construction of Predictive Nomograms

PCA was utilized to examine the distribution between high risk and low risk groups in lung adenocarcinoma patients. Significant differences were observed among all genes (Fig 6e), coagulation-related genes (Fig 6f), and model-building genes (Fig 6g), with the most pronounced differences observed in the model-building genes,which genes exhibited the highest discriminatory power.Subsequently, a nomogram model was constructed incorporating age, sex, TNM stage, T-stage, N stage, M stage, and risk score of lung adenocarcinoma patients (Fig 6h). This model aimed to predict the 1-year, 3-year, and 5-year survival rates (Fig 6i). This demonstrates the accuracy of the signature model and its ability to compare patient survival via various clinical characteristics.

Relationship between Different Clinical Traits and Risk Scores

To gain deeper insights into the relationship between the different clinical traits and risk scores, we constructed heat maps to illustrate the association between the expression of model genes and various clinical traits and pathological features. (Fig 7a). Furthermore, we compared the risk scores across the different clinical traits and confirmed their higher accuracy in distinguishing clinical stages (stage I-II or stage III-IV) and lymph node involvement (N0 or N1-N3) (Fig 7b-m).

Effects of different clinical features on Kaplan–Meier survival curves between high risk and low risk groups.risk groups: age (b, c), sex (d, e), stage (f, g), T (h, i), N (j, k), M (l, m).

GO, KEGG, GSVA Enrichment Analysis

Based on the aforementioned results, it can be inferred that the gene model constructed using the four genes exhibits a reliable predictive value for distinguishing between high risk and low risk groups. GO enrichment analysis revealed that the coagulation-related genes were primarily enriched in sulfur compound binding, glycosaminoglycan binding, serine hydrolase activity, serine-type peptidase activity, and serine-type endopeptidase activity. In terms of CC, these genes were mainly enriched in the collagen-containing extracellular matrix, apical part of the cell, vesicle lumen, and cytoplasmic vesicle lumen. Furthermore, they were enriched in the secretory granule lumen. Additionally,MF analysis showed enrichment primarily in epidermis development, negative regulation of proteolysis, skin development, and antimicrobial humoral response(Fig 8a, b). Subsequently, KEGG analysis was conducted, indicating that these coagulation-related genes were predominantly enriched in Neuroactive ligand-receptor interaction (Fig 8c, d). To further validate the distinction between high risk and low risk groups, GSVA enrichment analysis was performed. The enrichment pathways were compared between the two groups, and five pathways were selected based on p-value ranking. The results demonstrated that the cell cycle, neuroactive ligand-receptor interaction, pentose and glucuronate interconversions, porphyrin and chlorophyll metabolism, and starch and sucrose metabolism were primarily enriched in the high risk group. Conversely, allograft rejection, asthma, autoimmune thyroid disease, ribosome, and systemic lupus erythematosus were primarily enriched in the low risk group (Fig 8e, f).

Immune-related Differences

In terms of TME scores, including StromalScore, ImmuneScore, and ESTIMATEScore, no significant differences were observed between the high risk and low risk groups (Fig 9a). However, distinct variations were observed in the composition of immune cell infiltrates between the two groups, including B cells naive, T cells CD4 memory resting, NK cells resting, Macrophages M0, Dendritic cells resting, Mast cells resting, and Mast cells activated (Fig 9b, c). Furthermore, an analysis of immune function between the high risk and low risk groups revealed notable disparities in B cells, CCR, HLA, iDCs, Macrophages, Mast cells, MHC class I, NK cells, Parainflammation, T cell co-stimulation, Type I IFN Response, and Type II IFN Response (Fig 9d).

TMB and TIDE

Initially, we obtained somatic mutation data from the TCGA database and examined the disparities in somatic mutations between the low risk group (n=245) and the high risk group (n=251). Notably, the high risk group exhibited a higher prevalence of somatic mutations compared to the low risk group, as evidenced by the following percentages: TP53 (low risk 40%, high risk 51%), TTN (low risk 40%, high risk 47%), MUC16 (low risk 40%, high risk 39%), CSMD3 (low risk 36%, high risk 40%), RYR2 (low risk 34%, high risk 37%), LRP1B (low risk 32%, high risk 32%), ZFHX4 (low risk 27%, high risk 35%), USH2A (low risk 28%, high risk 33%), KRAS (low risk 25%, high risk 31%), XIRP2 (low risk 20%, high risk 26%), FLG (low risk 21%, high risk 25%), SPTA1 (low risk 18%, high risk 26%), NAV3 (low risk 19%, high risk 21%), ZNF536 (low risk 19%, high risk 21%), COL11A1 (low risk 18%, high risk 20%) (Fig 10a, b).Subsequently, we performed tumor mutation burden (TMB) calculations, which demonstrated significant distinctions between the high risk and low risk groups (Fig 10c). Furthermore, we categorized patients into the high TMB group versus the low TMB group, with the former exhibiting a higher OS rate compared to the latter (Fig 10d). Interestingly, the high risk low-TMB group displayed the poorest prognosis (Fig 10e).To assess TIDE, we conducted an analysis using TIDE scores, where higher scores indicate a stronger immune evasion mechanism. In our study, the TIDE score was significantly higher in the high risk group compared to the low risk group (Fig 10f). However, further investigation is required to fully understand the differences in immunotherapy between the high risk and low risk groups. Additionally, by comparing drug susceptibility, we identified clear discrepancies between the high risk and low risk groups.

Drug Susceptibility Analysis

We conducted a comparison of drug sensitivity between the high risk and low risk groups and observed distinct variations in the IC50 values between the two groups. Specifically, the high risk group exhibited greater drug sensitivity and showed enhanced efficacy in BI-2536 (p=0.00011) (Fig 11a), Dasatinib (p=3.3e-06) (Fig 11b), PD0325901 (p=7.3e-08) (Fig 11c), SCH772984 (p=3.4e-16) (Fig 11d), and Trametinib (p=5.2e-07) (Fig 11e). Conversely, in the low risk group, BMS-754807 (p=9.9e-08) (Fig 11f), Doramapimod (p=4e-10) (Fig 11g), JAK1_8709 (p=0.00059) (Fig 11h), Ribociclib (p=2.5e-05) (Fig 11i), and SB216763 (p=1.4e-05) (Fig 11j) exhibited higher drug sensitivity and demonstrated greater effectiveness against these drugs.

Verification of the expression level of CRGs

By evaluating the expression of MMP1, MMP10, CTSV, and F2 in normal tissues and lung adenocarcinoma (LUAD), we confirmed that these genes are indeed high risk genes (Fig 12a-h).To further validate our findings, we compared the protein expression encoded by these four genes between LUAD and normal tissues using the Human Protein Atlas (HPA) database. Consistent with the mRNA expression levels, the protein expression levels of MMP10, CTSV, and F2 were significantly higher in lung adenocarcinoma than in normal tissues (Fig 12i-k). However, the protein expression of MMP1 was not observed. To further validate the accuracy of the CRGs diagnostic model, we conducted additional RT-qPCR experiments to verify the expression of mRNA in both normal lung tissue and lung adenocarcinoma tissue. The RT-qPCR results revealed significantly higher mRNA expression levels of MMP10,CTSV, and F2 in A549 and H1299 cells (lung adenocarcinoma cell line) compared to BEAS-2B (the human normal bronchial epithelial tissue cell line) (Fig 12l-n). Furthermore, these results are fully consistent with our bioinformatics analysis based on the TCGA database.

Currently, both in China and worldwide, the incidence and mortality rate of lung cancer remain alarmingly high. Among the various types of lung cancer, lung adenocarcinoma is particularly prevalent and associated with a dismal prognosis (13). Chest CT is currently the primary screening tool for diagnosing lung cancer and holds significant diagnostic value; however, it still carries a false positive rate. Despite advancements in the diagnosis, prognosis, and treatment options for lung cancer patients, such as immunotherapy and targeted therapy, the overall survival of patients has not improved significantly. Hence, the construction of clinical models for the diagnosis and prognosis of LUAD patients becomes immensely important. These models aim to enhance diagnostic accuracy and increase the life expectancy of patients (14).The coagulation system is a dynamic process that maintains a delicate balance between coagulation and bleeding under normal physiological conditions. However, it undergoes alterations in disease states (15). Since the 1960s, hyperfibrinogenemia and hypercoagulability have been linked to rapidly growing tumors (16). A study involving 1,961 cancer patients reported an increased incidence of hyperfibrinogenemia (17). The systemic activation of hemostasis and thrombosis has been extensively implicated in cancer pathogenesis, progression, and metastasis (18, 19). There is a reciprocal relationship between cancer and hemostasis, bearing implications for cancer biology and cancer-associated thrombosis, with a particular focus on tissue factor (20). The coagulation system plays a pivotal role in the occurrence and development of lung cancer, contributing to tumor establishment, cell migration, vascular invasion, extravasation, and distant metastasis in lung adenocarcinoma (PMC4821869).

In this study, we initially identified 136 coagulation-related genes from the "Hallmark_coagulation" pathway in lung adenocarcinoma patients through GSVA analysis. We identified 51 differentially expressed genes based on gene expression variations, and then employed univariate Cox regression analysis and lasso regression analysis to construct a highly accurate risk gene model, consisting of MMP1, MMP10, CTSV, and F2 genes, all of which are classified as high risk genes.MMP1 is responsible for degrading stromal collagens, thereby enhancing the ability of neoplastic cells to traverse the basal membrane of both endothelium and vascular endothelium (21). MMP10 has been demonstrated to play a significant role in the activation of pro-MMPs (22) and is highly expressed in epithelial tumors such as bladder transitional cell cancer, gastric cancer, esophageal cancer, non-small cell lung cancer (NSCLC), and skin cancer (23-26). CTSV (Cathepsin V/CTSL2), a cysteine proteinase, can degrade certain components of the extracellular matrix and has been associated with tumor cell malignancy and prognosis in breast cancer patients (27). F2, also known as coagulation factor II or thrombin factor 2, is involved in the generation of activated serine protease thrombin through proteolytic cleavage. Elevated levels of thrombin not only enhance blood coagulation but also promote tumor growth and metastasis. Consequently, thrombin and factors contributing to thrombin production serve as potential targets for cancer treatment and cancer-associated thrombosis (28). These newly identified coagulation-related genes provide valuable insights for improved diagnosis of lung adenocarcinoma and guidance for subsequent treatments.Once the model was established, we classified patients into high risk and low risk groups based on their respective risk scores. To validate the diagnostic and therapeutic potential of these genes for lung adenocarcinoma patients in different risk groups or with varying clinical traits, we conducted K-M curve analysis, receiver operating characteristic (ROC) analysis, PCA, C-index assessment, survival analysis, nomogram construction, and heatmap analysis.

Among malignancies, the TP53 tumor suppressor gene mutation is the most common in human malignancies, particularly in non-small cell lung cancer. Our study revealed that the genetic mutation differences between the high risk group and the low risk group mainly involved the TP53 gene in a cohort of 227 lung adenocarcinoma patients. TP53 mutations are often indicative of extensive tumor invasion and poor prognosis (29). Furthermore, TP53 mutation status has been associated with patient survival in various tumor types (30).Similarly, TTN plays a significant role in the development and progression of multiple tumors. In the context of lung adenocarcinoma, the lncRNA TTN-AS1 has been shown to drive invasion and migration of lung adenocarcinoma cells by modulating the miR-4677-3p/ZEB1 axis (31). Shen et al. demonstrated that TTN can target the miR-376a-3p/PUM2 axis, promoting endometrial cancer cell growth and suggesting TTN as a potential therapeutic target for endometrial cancer (32). Fu et al. discovered that TTN acts as an oncogene in osteosarcoma by targeting miR-134-5p and upregulating the malignant brain tumor domain-containing 1 gene, thereby promoting osteosarcoma cell growth (33) (34). These findings align with our research and reinforce the notion that downregulated TTN expression inhibits proliferative capacity in bladder cancer cells.Furthermore, TP53 and TTN mutations may provide valuable insights for future immunotherapy approaches.Today, immunotherapy is an emerging treatment option for lung adenocarcinoma. In our study, we observed clear differences in immune cell populations between the high risk and low risk groups. Specifically, there were variations in B cells naive, T cells CD4 memory resting, NK cells resting, Macrophages M0, Dendritic cells resting, Mast cells resting, and Mast cells activated. The TIDE score is a predictive tool for immunotherapy response and the tumor's immune evasion ability. Our findings showed that patients in the high risk group had significantly higher TIDE scores compared to those in the low risk group, indicating a greater propensity for immune evasion in the high risk group and potentially worse treatment outcomes. In recent years, tumor mutation burden has garnered significant attention. According to our study, patients with a high mutation burden exhibited significantly better overall survival (OS) than those with a low mutation burden. Notably, lung adenocarcinoma patients with a high risk profile and low mutation burden had notably worse OS.Several studies have highlighted the effectiveness of different drugs in various cancer types. BI 2536 induces gasdermin E-dependent pyroptosis in ovarian cancer, inhibits proliferation, arrests the cell cycle, induces apoptosis and pyroptosis, and leads to the accumulation of CD8 T cells in tumor sites (35). Dasatinib, in a dose-dependent manner, blocks lung cancer cell proliferation and suppresses LIMK1 activities by directly targeting LIMK1 (36). PD0325901, an ERK inhibitor, enhances the efficacy of PD-1 inhibitors in non-small cell lung carcinoma (37). SCH772984 effectively inhibits MAPK signaling and cell proliferation in models resistant to BRAF or MEK inhibitors, as well as in tumor cells resistant to concurrent treatment with both BRAF and MEK inhibitors (38). Trametinib (GSK1120212), an oral MEK inhibitor selective for MEK1 and MEK2, has been FDA-approved for the treatment of metastatic melanoma when combined with a BRAF inhibitor (39). BMS-754807 inhibits IGF-IR/IR and AKT phosphorylation, enhancing the cytotoxic effects of carboplatin or cisplatin synergistically, particularly in lung cancer cells expressing high levels of IGF-IR (40). As an inhibitor of the MAPK-P38 pathway, doramapimod disrupts the function of ZCCHC14 by inhibiting the activity of the P38 signaling pathway, hindering tumor proliferation and invasion(41).JAK1_8709 is also sensitive in high risk bladder cancer (42). In breast cancer, CDK4/6 inhibitors (ribociclib) can be enhanced by combining them with drugs that block downstream estrogen-dependent stimulation of cancer cells (43). Inhibition of estrogen pathways through endocrine therapy, resulting in downregulation of cyclin D1 and decreased complexation of CDK4 and CDK6 (44) SB216763 is used in the treatment of breast and colorectal cancer by inhibiting phospholipid synthesis in cancer cells as an inhibitor of glycogen synthetase kinase 3 (GSK3) (45).While these drugs have shown promise, further investigation is needed to elucidate their specific mechanisms in lung cancer.However, many drugs used in the treatment of lung cancer still require additional study to uncover their specific mechanisms of action. In our study, we employed bioinformatics to construct a model consisting of four coagulation-related genes to determine the prognostic survival of lung adenocarcinoma patients. It is important to acknowledge the limitations of our study. Firstly, the sample size was insufficient, and we did not perform further verification using GEO and ICGC databases. Additionally, our analysis relied on public databases, and there was a lack of available clinical samples for further investigation. Future studies will address these limitations and delve deeper into these research areas.

This article presents a 4-gene model associated with coagulation genes in lung adenocarcinoma to predict the clinical prognosis of patients. Utilizing bioinformatics data, the model demonstrates high accuracy, offering a new perspective for the diagnosis, prognosis, and treatment of lung adenocarcinoma in the future.

Lung adenocarcinoma(LUAD),Molecular Signature Database (MsigDB), least absolute shrinkage and selection operator (LASSO) Cox regression, Kaplan-Meier(K-M) curves, progression-free survival curves (PFS), principal component analysis (PCA), tumor mutation burden (TMB), tumor immune dysfunction and exclusion (TIDE),overall survival (OS),tumor mutation burden (TMB),tumor immune microenvironment (TME),The Cancer Genome Atlas (TCGA),The Molecular Signature Database (MsigDB),Gene Ontology (GO) terms, biological process (BP), cellular component (CC), molecular function (MF),Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway，single-sample Gene Set Enrichment Analysis (ssGSEA),hazard ratio (HR),immunohistochemistry (IHC), Human Protein Atlas database (HPA),Real-time quantitative PCR(RT-qPCR),Coagulation-Related Genes（CRGs）,differential genes(DEGs)

Acknowledgements

The authors thank TCGA database HPA database for providing data support

Author Contributions

All authors contributed to the study conception and design.LM Z and J L were responsible for the conception and designed this work. J L and XD G interpreted the data and clinical information. J L performed the bioinformatic analysis. J L XM S and L L performed experiments, J L, JH W, H L, HQ H were major contributors to drafting the manuscript. LM Z reviewed and revised the manuscript. All authors read and approved the final manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or

not-for-profit sectors.

Data availability

The datasets generated and/or analyzed during the current study are available in the TCGA database (TCGA-LUAD) .

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, Jemal A, Yu XQ, (2015); He J. Cancer statistics in China. CA Cancer J Clin;66(2):115-32.
Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ Jr, Wu YL, Paz-Ares L. (2017);Lung cancer: current therapies and new targeted treatments. Lancet;389(10066):299-311.
Senan S, Paul MA, Lagerwaard FJ.(2013);Treatment of early-stage lung cancer detected by screening: surgery or stereotactic ablative radiotherapy? Lancet Oncol.14(7):e270-4.
Hamza MS, Mousa SA.(2020);Cancer-Associated Thrombosis: Risk Factors, Molecular Mechanisms, Future Management. Clin Appl Thromb Hemost. 26:1076029620954282.
Khorana AA, Mackman N, Falanga A, Pabinger I, Noble S, Ageno W, Moik F, Lee AYY.(2022 );Cancer-associated venous thromboembolism. Nat Rev Dis Primers.8(1):11.
Khorana AA, Mackman N, Falanga A, Pabinger I, Noble S, Ageno W, Moik F, Lee AYY. (2022); Cancer-associated venous thromboembolism. Nat Rev Dis Primers.8(1):11.
Korte W. (2000); Changes of the coagulation and fibrinolysis system in malignancy: their possible impact on future diagnostic and therapeutic procedures. Clin Chem Lab Med.38(8):679-92.
Tieken C, Versteeg HH.(2016);Anticoagulants versus cancer. Thromb Res.140 Suppl 1:S148-53.
Kawai K, Watanabe T.(2014);Colorectal cancer and hypercoagulability. Surg Today.44(5):797-803.
Tikhomirova I, Petrochenko E, Malysheva Y, Ryabov M, Kislov N.(2017);Interrelation of blood coagulation and hemorheology in cancer. Clin Hemorheol Microcirc. 2016;64(4):635-644. Swier N, Versteeg HH. Reciprocal links between venous thromboembolism, coagulation factors and ovarian cancer progression. Thromb Res.150:8-18.
Ma M, Cao R, Wang W, Wang B, Yang Y, Huang Y, Zhao G, Ye L. (2021);The D-dimer level predicts the prognosis in patients with lung cancer: a systematic review and meta-analysis. J Cardiothorac Surg. 16(1):243.
Saidak Z, Soudet S, Lottin M, Salle V, Sevestre MA, Clatot F, Galmiche A.(2021); A pan-cancer analysis of the human tumor coagulome and its link to the tumor immune microenvironment. Cancer Immunol Immunother.70(4):923-933.
Neal JW, Gainor JF, Shaw AT.(2015); Developing biomarker-specific end points in lung cancer clinical trials. Nat Rev Clin Oncol. 12(3):135-46.
Duncan MW. (2009);Place for biochemical markers in early-stage lung cancer detection? J Clin Oncol. 27(17):2749-50.
Palta S, Saroa R, Palta A.(2014); Overview of the coagulation system. Indian J Anaesth.58(5):515-23.
Brugarolas A, Elias EG. (1973);Incidence of hyperfibrinogenemia in 1961 patients with cancer. J Surg Oncol.5(4):359-64.
Brugarolas A, Elias EG. (1973);Incidence of hyperfibrinogenemia in 1961 patients with cancer. J Surg Oncol. 5(4):359-64.
Langer F, Bokemeyer C. (2012);Crosstalk between cancer and haemostasis. Implications for cancer biology and cancer-associated thrombosis with focus on tissue factor. Hamostaseologie. 32(2):95-104.
Lima LG, Monteiro RQ.(2013);Activation of blood coagulation in cancer: implications for tumour progression. Biosci Rep.33(5):e00064.
Lima LG, Monteiro RQ. (2013);Activation of blood coagulation in cancer: implications for tumour progression. Biosci Rep.33(5):e00064.
Fakhoury H, Noureddine S, Chmaisse HN, Tamim H, Makki RF. (2012);MMP1-1607(1G>2G) polymorphism and the risk of lung cancer in Lebanon. Ann Thorac Med.7(3):130-2.
Rodríguez JA, Sobrino T, Orbe J, Purroy A, Martínez-Vila E, Castillo J, Páramo JA. (2013);proMetalloproteinase-10 is associated with brain damage and clinical outcome in acute ischemic stroke. J Thromb Haemost. 11(8):1464-73.
Gobin E, Bagwell K, Wagner J, Mysona D, Sandirasegarane S, Smith N, Bai S, Sharma A, Schleifer R, She JX.(2019);A pan-cancer perspective of matrix metalloproteases (MMP) gene expression profile and their diagnostic/prognostic potential. BMC Cancer.19(1):581.
Xu J, E C, Yao Y, Ren S, Wang G, Jin H.(2016); Matrix metalloproteinase expression and molecular interaction network analysis in gastric cancer. Oncol Lett.12(4):2403-2408.
Shi X, Chen Z, Hu X, Luo M, Sun Z, Li J, Shi S, Feng X, Zhou C, Li Z, Yang W, Li Y, Wang P, Zhou F, Gao Y, He J. (2016);AJUBA promotes the migration and invasion of esophageal squamous cell carcinoma cells through upregulation of MMP10 and MMP13 expression. Oncotarget.7(24):36407-36418.
Justilien V, Regala RP, Tseng IC, Walsh MP, Batra J, Radisky ES, Murray NR, Fields AP.(2012);Matrix metalloproteinase-10 is required for lung cancer stem cell maintenance, tumor initiation and metastatic potential. PLoS One.7(4):e35040.
Toss M, Miligy I, Gorringe K, Mittal K, Aneja R, Ellis I, Green A, Rakha E.(2020);Prognostic significance of cathepsin V (CTSV/CTSL2) in breast ductal carcinoma in situ. J Clin Pathol.73(2):76-82.
Reddel CJ, Tan CW, Chen VM. (2019);Thrombin Generation and Cancer: Contributors and Consequences. Cancers (Basel).11(1):100.
Mogi A, Kuwano H.(2011); TP53 mutations in nonsmall cell lung cancer. J Biomed Biotechnol. 2011:583929.
Donehower LA, Soussi T, Korkut A, Liu Y, Schultz A, Cardenas M, Li X, Babur O, Hsu TK, Lichtarge O, Weinstein JN, Akbani R, Wheeler DA.(2019);Integrated Analysis of TP53 Gene and Pathway Alterations in The Cancer Genome Atlas. Cell Rep.28(5):1370-1384.e5.
Zhong Y, Wang J, Lv W, Xu J, Mei S, Shan A.(2019);LncRNA TTN-AS1 drives invasion and migration of lung adenocarcinoma cells via modulation of miR-4677-3p/ZEB1 axis. J Cell Biochem.120(10):17131-17141.
Shen L, Wu Y, Li A, Li L, Shen L, Jiang Q, Li Q, Wu Z, Yu L, Zhang X.(2020);LncRNA TTN‑AS1 promotes endometrial cancer by sponging miR‑376a‑3p. Oncol Rep.44(4):1343-1354.
Fu D, Lu C, Qu X, Li P, Chen K, Shan L, Zhu X.(2019); LncRNA TTN-AS1 regulates osteosarcoma cell apoptosis and drug resistance via the miR-134-5p/MBTD1 axis. Aging (Albany NY).11(19):8374-8385.
Xiao H, Huang W, Li Y, Zhang R, Yang L. (2021);Targeting Long Non-Coding RNA TTN-AS1 Suppresses Bladder Cancer Progression. Front Genet.12:704712.
Huo J, Shen Y, Zhang Y, Shen L.(2022); BI 2536 induces gasdermin E-dependent pyroptosis in ovarian cancer. Front Oncol.12:963928.
Zhang M, Tian J, Wang R, Song M, Zhao R, Chen H, Liu K, Shim JH, Zhu F, Dong Z, Lee MH. (2020);Dasatinib Inhibits Lung Cancer Cell Growth and Patient Derived Tumor Growth in Mice by Targeting LIMK1. Front Cell Dev Biol.8:556532.
Luo M, Xia Y, Wang F, Zhang H, Su D, Su C, Yang C, Wu S, An S, Lin S, Fu L.(2021);PD0325901, an ERK inhibitor, enhances the efficacy of PD-1 inhibitor in non-small cell lung carcinoma. Acta Pharm Sin B.11(10):3120-3133.
Morris EJ, Jha S, Restaino CR, Dayananth P, Zhu H, Cooper A, Carr D, Deng Y, Jin W, Black S, Long B, Liu J, Dinunzio E, Windsor W, Zhang R, Zhao S, Angagaw MH, Pinheiro EM, Desai J, Xiao L, Shipps G, Hruza A, Wang J, Kelly J, Paliwal S, Gao X, Babu BS, Zhu L, Daublain P, Zhang L, Lutterbach BA, Pelletier MR, Philippar U, Siliphaivanh P, Witter D, Kirschmeier P, Bishop WR, Hicklin D, Gilliland DG, Jayaraman L, Zawel L, Fawell S, Samatar AA.(2013);Discovery of a novel ERK inhibitor with activity in models of acquired resistance to BRAF and MEK inhibitors. Cancer Discov. 3(7):742-50.
Zeiser R, Andrlová H, Meiss F.(2018);Trametinib (GSK1120212). Recent Results Cancer Res. 211:91-100.
Franks SE, Jones RA, Briah R, Murray P, Moorehead RA.(2016); BMS-754807 is cytotoxic to non-small cell lung cancer cells and enhances the effects of platinum chemotherapeutics in the human lung cancer cell line A549. BMC Res Notes.9:134.
Shi X, Han X, Cao Y, Li C, Cao Y. (2021);ZCCHC14 regulates proliferation and invasion of non-small cell lung cancer through the MAPK-P38 signalling pathway. J Cell Mol Med.25(3):1406-1414.
Ling Y, Li J, Zhou L. (2023);Smoking-related epigenetic modifications are associated with the prognosis and chemotherapeutics of patients with bladder cancer. Int J Immunopathol Pharmacol.37:3946320231166774.
Braal CL, Jongbloed EM, Wilting SM, Mathijssen RHJ, Koolen SLW, Jager A. (2021);Inhibiting CDK4/6 in Breast Cancer with Palbociclib, Ribociclib, and Abemaciclib: Similarities and Differences. Drugs. 81(3):317-331.
Scott SC, Lee SS, Abraham J.(2017);Mechanisms of therapeutic CDK4/6 inhibition in breast cancer. Semin Oncol.44(6):385-394.
Phyu SM, Tseng CC, Smith TAD.(2019); CDP-choline accumulation in breast and colorectal cancer cells treated with a GSK-3-targeting inhibitor. MAGMA.32(2):227-235.

No competing interests reported.

Development of a Coagulation-Related Genes Model for Prognostication, Immune Response, and Treatment Prediction in Lung Adenocarcinoma

Status:

Version 1

Abstract

Figures

Highlights

Background

Materials and Methods

Results

Discussion

Conclusion

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1