DOI: https://doi.org/10.21203/rs.3.rs-2162636/v1
Lung cancer is one of the leading causes of death in patients with tumor around the world. Lung adenocarcinoma (LUAD) is most frequent histological subtype in lung cancer. Immune therapy has now become an effective method of treating LUAD. Tumor mutation burden (TMB) shows predictive biomarker potential for identification of cancer patients responding to immune checkpoint inhibitors. The mutation rate of lung adenocarcinoma was the third in various cancers. However, it is not clear whether heterogeneous genetic mutations are associated with TMB and immunity in the patients with LUAD. In our study, First, somatic mutation data of LUAD were downloaded from International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) datasets, and found that 88 same common mutated genes were involved in two cohorts including TNN, TP53, MUC16, CSMD3, RYR2, ZFHX4, LRP1B, USH2A, KRAS, XIRP2, FLG, SPTA1, and so on. Among them, Collagen XXII (COL22A1) and Dynein Axoneme Heavy Chain 8 (DNAH8) mutations were correlated with higher TMB and showed a poorer clinical outcome. Then, In the basis of CIBERSORT algorithm as well as Gene set enrichment analysis (GSEA), we found that COL22A1 or DNAH8 mutation participated in the activation or raising process of immune-related signaling pathways and enhanced antitumor immune reaction. To sum up, COL22A1 or DNAH8 are most common mutated in LUAD, and their mutations are related with higher TMB and poorer prognosis as well as promotes antitumor immunity, which may regard as a biomarker to predict immune reaction.
Worldwide, lung cancer remains the leading cause of cancer incidence and mortality, with 2.1 million new lung cancer cases and 1.8 million deaths predicted in 2018, representing close to 1 in 5 (18.4%) cancer deaths(1). Moreover, lung cancer is the leading cause of death in China(1). In the past 25 years, for unknown reasons, lung adenocarcinoma (LUAD) has replaced lung squamous cell carcinoma (LUSC) as the most frequent histological subtype(2). It is more common in women than in men and is more likely to occur in younger people than other types of lung cancer and to present at more advanced stages of disease(3). Therefore, it is very urgent to find effective biomarkers and treatment methods for lung adenocarcinoma. Therapeutic options for LUAD include surgery, radiation, chemotherapy, targeted therapy, and immunotherapy or a combination of these treatments(4).
In the last decade, Targeted therapy based on related driving gene mutations improves the clinical efficacy of some specific LUAD patients (5), however, the survival rate of five years is still below 20%(6). In the last few years, the use of immune checkpoint inhibitors (ICIs) in the treatment of the LUAD has completely changed the treatment of patients with advanced LUAD. And the survival rate of some patients has improved considerably(7). The most widely studied immune checkpoint is programmed death protein 1 (PD-1)/programmed death receptor ligand 1 (PD-L1), whose inhibitors have been approved by United States Food and Drug Administration (FDA) for the treatment of advanced LUAD(8). PD-L1 expression is used as a biomarker for PD-1/PD-L1 inhibitor therapy that predicts therapeutic response(9), however, Only 25–30% of patients receiving checkpoint therapy show long-term responses, which may not be completely determined by PD-L1 expression(10). Therefore, it is absolutely necessary to develop new biomarkers to determine the response rate of patients with immune checkpoint blockade (ICB).
Tumor mutation burden (TMB) is a emerging predictive biomarker for ICB treatment, defined as the total number of mutations per coding area of a tumor genome (11), which was not related to the expression of PD-L1(12). A comprehensive analysis of 27 cancer types observed positive correlation between TMB and benefit of ICB treatment(13). Moreover, previous studies have shown that LUAD has a third high mutation rate and high antigenicity in all cancers(14, 15).
In the present study, firstly, we used The Cancer Genome Atlas (TCGA) dataset and International Cancer Genome Consortium (ICGC) dataset to identify somatic mutations in LUAD patients in the United States and China. After that, we found the common mutation genes in the two groups, and studied the relationship between these gene mutations and TMB and prognosis. Finally, we studied whether genetic factor variation is related to immune response. The results of this study may provide a new biomarker for TMB and possible immunotherapy for the patients with LUAD.
Firstly, we obtained the mutation data of 538 American LUAD samples from TCGA, and calculated the cumulative mutation frequency of each gene and sorted it in descending sequence. Figure 1A showed the top 100 of frequent mutation gene with a high mutation frequency and a somatic mutation pattern for the top 100 genes. The top 5 frequent mutation genes were TP53 (48%), TTN (43%), MUC16(42%), RYR2(37%), and CSMD3 (36%). In the same way, the top 100 mutated genes were also recognized in Chinese samples from ICGC database. We observed similar results. Genetic mutation was also occurred commonly in Chinese samples, and TTN (50%), TP53 (44%), MUC16 (42%), CSMD3 (40%), and RYR2(38%) were the five highest mutation rates among Chinese samples (Fig. 1B).
Then, in order to obtain the same genes that are universal mutated in both cohorts. We intersected the 100 genes with the highest frequency of mutations in the two groups. The 88 high mutation rates genes owned by 11 TCGA queues are also covered by ICGC queues, including TNN, TP53, MUC16, CSMD3, RYR2, ZFHX4, LRP1B, USH2A, KRAS, XIRP2, FLG, SPTA1, and so on (Fig. 2B). To further studied if these 88 common mutation genes were related to TMB, according to the status of 88 gene mutation, LUAD patients from TCGA group were sorted into mutation group and wild group. Moreover, the TMB scores of 88 commonly mutated genes were calculated in TCGC LUAD patients, and the average value of TMB is 5.55 per Mb ranging from 0.26 to 48.31 per Mb. By analysing the data combined with TMB expression matrix and genetic mutation matrix, we revealed that TMB value of all of the 88 genes in mutation group were significantly altered compared with wild group (P < 0.05) (Fig. 2A).
Previous studies reported that TMB is associated with the high-risk subtype of LUAD patients(16). Therefore, considering the association established between 88 common mutation genes and TMB expression, we conjectured that these genes may be related to clinical results. To this end, the LUAD patients in TCGA database were divided into wild-type and mutant groups based on the situation of gene mutation, and Kaplan Meier analysis was carried out combined with the survival data of patients. Our results illustrated that only COL22A1 or DNAH8 mutation were statistically associated with poorer prognosis (p < 0.05) (Fig. 3), which is consistent with the previous research results. In the basis of this results, univariate and multivariate Cox regression analyses were used to further investigate whether mutations in COL22A1 or DNAH8 were independent prognostic factors in patients with lung adenocarcinoma. As shown in Fig. 4, in univariate and multivariate overall survival analyses of the Cox proportional hazards model, COL22A1 or DNAH8 mutations were still significantly associated with overall survival in patients with LUAD, after correction for common clinical information and TMB scores.
Then, we identified the enrichment signaling pathway related to the mutation of COL22A1 or DNAH8 gene. The results of GSEA showed that the following signaling pathways were statistically significant enriched in the COL22A1 mutant group, including cell cycle, NOTCH signaling pathway, ERBB signaling pathway, chronic myeloid leukemia, ubiquitin mediated proteolysis, homologous recombination, adipocytokine signaling pathway, etc. (Fig. 5A). Significantly enriched signaling pathways in the COL22A1 wild group included drug metabolism cell pigment P450, histidine metabolism, complement pathway, ribosome, bile acid synthesis, glycosaminoglycan degradation, etc. (Fig. 5A). It is well known that TMB helps to screen beneficiants and predictive immunotherapy effects. In view of the correlation between COL22A1 gene mutation with TMB, it is speculated that the COL22A1 gene mutation may be related to the immune response. As shown in Fig. 5A, we observed some immune related signaling pathways, including cell cycle, NOTCH signaling pathway, ERBB signaling pathway, chronic myeloid leukemia, ubiquitin mediated proteolysis enriched in LUAD patients with COL22A1 mutation, however, no signaling pathway associated with immune response was found in patients with wild-type COL22A1 lung adenocarcinoma. Using the same method, the results of DNAH8 were similar to those of COL22A1. Cell cycle, transcription factors, base excision repair, homologous recombination, mismatch repair, protein ubiquitination, DNA replication, RNA degradation related signaling pathways were statistically significant enriched in the DNAH8 mutant group (Fig. 5B). Significantly enriched signaling pathways in the DNAH8 wild group included nitrogen metabolism, renal tubular bicarbonate recovery, linoleic acid metabolism, primary bile acid biosynthesis, polysaccharide biosynthesis, etc. (Fig. 5B). In contrast, the DNAH8 mutant group had more immune-related signaling pathways than the wild-type group.
First, we calculated the infiltration degree of 22 immunocytes in each of the 538 LUAD samples by CIBERSORT deconvolution algorithm. The results showed that the proportion of immune cells varied greatly in different LUAD patients, and macrophages and T cells had higher proportion in all samples (Fig. 6A). Next, in all lung adenocarcinoma samples, we investigated the association of COL22A1 or DNAH8 mutations with 22 types of immune cells, respectively. We found that the correlation between T cells regulatory (Tregs) and COL22A1 mutation was the most significant (Fig. 6B), and that the association between B cells and DNAH8 mutation was the most significant. (Fig. 6C). At last, correlation analysis showed that activated memory CD4 T cells were positively correlated with CD8 T cells, and negatively correlated with macrophages and T cells regulatory (Fig. 6D). In addition, from the perspective of follicular helper T cells, it had the strongest positive correlation with CD8 T cells and the strongest negative correlation with neutrophils (Fig. 6D).
In the present study, the characteristics of LUAD somatic mutations were studied in 538 samples from the United States and 507 samples from China. Next, we found that COL22A1 or DNAH8 mutation frequently occur in both ICGC and TCGA sample cohorts. Subsequently, COL22A1 or DNAH8 mutation was recognized to be related to higher TMB and poorer patient clinical results. Furthermore, in COL22A1 or DNAH8 mutation LUAD patients, signaling pathways about immune response were significantly abundant. Finally, COL22A1 or DNAH8 mutation LUAD samples emerge ed that the most infiltrating cells are Tregs, NK cells, B cells, and T cells. which is consistent with previous studied evidence that the anti-tumor immune reaction was related to these immune cells and immune pathways (17–20).
COL22A1 is a kind of micro collagen, which belongs to the family of fibronectin collagens with interrupted triple helices(21). Its biological function has been poorly understood. Firstly, Studies have shown that COL22A1, markers of early articular chondrocytes, is co-expressed with Lgr5 + cells before cavitation and serves as an important lineage marker indicating the progression to articular chondrocytes(22). In addition, in cancer, collagen is the main component of the extracellular matrix, which plays an important role in the occurrence and development of tumor. The upregulation of COL22A1 mRNA may play a key role in the progression of head and neck squamous cell carcinoma (HNSCC) and provide useful information for prognostic prediction of HNSCC patients(23). A network science date illustrated that COL22A1 was one of the four susceptibility genes prioritized by all described indicators as the most influential on colorectal cancer (CRC) among 512 top-ranked single-nucleotide polymorphisms(24). Finally, there were data suggested that mutations in COL22A1 could be one of the risk factors for intracranial aneurysms in humans(21). DNAH8, a protein coding gene, which is on chromosome 6q21.2, a cancer-associated amplicon(25), and is primarily expressed in prostate and testis(26). Its expression in primary tumor is higher than that in normal prostate, and it is further increased in metastatic prostate cancer (26). Although the role of DNAH8 has been poorly documented, phylogenetic sequence analysis suggested that DNAH8 is an axonemal dynein and likely an outer-arm axonemal dynein (data not shown)(26).In the present study, we revealed that COL22A1 or DNAH8 mutations indicated a poorer prognosis and were associated with increased TMB. In addition, TMB represents the mutation accumulation of somatic in tumors, high TMB helps to expose more new antigens, which is likely to cause T cell-dependent immune reaction(27). So that, we hypothesized that COL22A1 or DNAH8 mutations might promote anti-tumor immune response.
Recently, more and more studies have focused on the relationship between immune cell infiltration and cancer progression and prognosis. Human regulatory CD4 T cells (Tregs) are a subset of adaptive lymphocytes well characterized for their immunosuppressive functions and maintenance of immunological tolerance(28). Increased infiltration of Tregs is associated with poor prognosis in several kinds of cancer, including breast, lung, and kidney cancer (18, 29, 30). In our research, we also found that LUAD samples with COL22A1 mutation had higher Tregs loads, which means a poor prognosis that in line with our results. In addition, we observed that DNAH8 mutation was associated with increased B cells infiltrations. B cells are recognized as the main effector cells of humoral immunity which suppress tumor progression by secreting immunoglobulins, promoting T cell response, and killing cancer cells directly(31). Upon entering the local microenvironment, tumor antigens released from lung cancer cells aid in B cell aggregation and trigger B cell-mediated antigen presentation, which facilitate the maintenance of B cell activation and proliferation(32). In addition, a higher percentage of follicular helper T cells were uncovered in COL22A1 or DNAH8 mutant group. Daniel et al. revealed that immunocheckpoint therapy induces activation of t-follicular helper cells of B cells to promote anti-tumor response in mouse models of triple-negative breast cancer. Therefore, our results suggested that COL22A1 mutation induced the altered of tumor infiltration immune cells can help the patients with LUAD anti-tumor immunity and DNAH8 mutation induced the altered of tumor infiltration immune cells can suppress the patients with LUAD anti-tumor immunity.
There are some limitations in this study. In the absence of clinical data from ICGC database samples, we were unable to determine if COL22A1 or DNAH8 mutations are also related to tumor immunity and prognosis in the patients with LUAD in China. Moreover, tumor immunotherapy is a very complex topic, including immune cells, cytokines, immune microenvironment, tumor-related gene mutations and antigens, Etc; while this study is all informatics analyses and further experimental validations are needed.
To sum up, our research found that COL22A1 and DNAH8 were often mutated in LUAD patients, and their mutations were associated with higher TMB and predict a worse prognosis. Besides, COL22A1 and DNAH8 mutations were associated with immune-related signaling pathway and anti-tumor immune response, respectively. Each of them may play an important role in immunotherapy of LUAD patients.
Somatic gene mutations and clinical data for United States LUAD samples derived from TCGA (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga). Somatic mutation data for Chinese LUAD samples was acquired from ICGC (http://icgc.org). The data was extracted by Perl and organized to be analyzed by R. The samples included in this study all have complete clinical data. Samples with missing data such as gender, age, TNM stage or survival information were excluded.
TMB is considered to be the gross number of bases of mutations in per megabase, and only mutations causing the amino acid changes were counted. We Calculate the amount of TMB in each TCGA LUAD patients with TMB formula (33).
We adopted R software (v4.0.2) for all bioinformatics analysis. Perl language was used to extract 100 genes with the highest mutation frequency from TCGA and ICGC databases. Mutations in these genes are visualized by R package "GenVisR"(34). We selected the R software package "Venn" to cross the genes in the two databases to get the same genes with high mutation frequency. The R package "ggpubr" was used to evaluate and visualize the relationship between these cross-mutated genes and TMB. In GSEA software (v4.1.0), the mutation and expression matrix data of COL22A1 and DNAH8 were used to perform GSEA analysis(35). The genetic set “c2.cp.kegg.v7.4.symbols.gmt” was downloaded as the gene sets database, which was used for pathway enrichment analysis. We obtained the Normalized Enrichment Score (NES) by setting the permutation value at 1000, and obtained significant enrichment signaling pathways by excluding the normal p-value > 0.05 pathways. CIBERSORT is a calculation method of 22 kinds of immunocytocytes assessment based on transcription group data in tumor tissue(36). CIBERSORT deconvolution algorithm was used, and the filtering condition was set to P < 0.05 to obtain the proportion matrix data of immunocytocytes for each tumor sample. The visualization of matrix data is realized by R software package "corrplot". TCGA samples were divided into wild group and mutation group according to COL22A1 or DNAH8 status. The immune cell infiltration in the mutant group was analyzed by the R software package "limma", and visualized by R software package "vioplot".
Statistical analysis is completed by R (V4.0.2). Kaplan-Meier survival analysis was used to analyze the survival curves, and logarithmic rank test was used to evaluate. Prognostic risk factors were determined by univariate and multivariate Cox regression analysis. We analyze the correlation between TMB and mutant genes by the Mann-Whitney U test. Two tailed p-value < 0.05 was regarded as statistically significant in all comparisons.
LUAD Lung adenocarcinoma
TMB Tumor mutation burden
ICGC International Cancer Genome Consortium
TCGA The Cancer Genome Atlas
COL22A1 Collagen XXII
DNAH8 Dynein Axoneme Heavy Chain 8
GSEA Gene set enrichment analysis
LUSC lung squamous cell carcinoma
ICIs immune checkpoint inhibitors
PD-1 programmed death protein 1
PD-L1 programmed death receptor ligand 1
FDA Food and Drug Administration
ICB immune checkpoint blockade
Tregs regulatory T cells
CRC colorectal cancer
NES Normalized Enrichment Fraction
Ethics approval and consent to participate: Not applicable
Availability of data and materials: All data are obtained from the analysis of available data in the database
Acknowledgements: Not applicable
Authors' contributions: C.Wang designed the research, C.Zhang and X. Song completed data analysis and manuscript writing. All authors have read and approved the final manuscript.
CONFLICTS OF INTEREST: The authors declare no conflict of interest.
FUNDING: This work was supported by the National Natural Science Foundation of China (28051495).