Integrative omics analysis identies subtypes with therapeutic implications in lung adenocarcinoma harboring KEAP1/NFE2L2

Backgrounds: Lung adenocarcinoma is one of the most common malignant tumors, in which KEAP1-NFE2L2 pathway is altered frequently. The biological features and intrinsic heterogeneities of KEAP1/NFE2L2-mutant lung adenocarcinoma remain unclear. Methods: Multiplatform data from The Cancer Genome Atlas (TCGA) were adopted to identify two subtypes of lung adenocarcinoma harboring KEAP1/NFE2L2 mutations. Bioinformatics analyses, regarding immune microenvironment, methylation level and mutational signature, were performed to characterize intrinsic heterogeneities. Meanwhile, initial results were also validated by using common lung adenocarcinoma cell lines, which revealed consistent features of KEAP1/NFE2L2-mutant subtypes. Furthermore, cell line samples were adopted for drug sensitivity screening based on public datasets. Results: Two mutant subtypes (P1 and P2) of patients were identied in TCGA. P2 patients had signicantly heavier smoking levels and worse survival compared with P1 patients. The P2 subset was characterized by active immune microenvironment and more smoking-induced genomic alterations, including methylation and somatic mutations. Validations of the corresponding features in mutant cell lines were achieved to some degrees. Several compounds which were sensitive to mutant subtype of lung adenocarcinoma were identied, such as inhibitors of PI3K/Akt and IGF1R signaling pathways. Conclusions: KEAP1/NFE2L2 mutant lung adenocarcinoma showed potential heterogeneities. The intrinsic heterogeneities of KEAP1/NFE2L2 were associated with immune microenvironment and smoking-related genomic aberrations.


Introduction
Lung cancer is the leading cause of cancer-associated morbidity and mortality worldwide, among which lung adenocarcinoma accounts for the highest proportion with increasing incidence [1][2][3][4]. Previous studies promoted a paradigm shift regarding classifying lung tumors based on the signi cant genomic alterations for therapeutic targets, such as epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) [5][6][7]. The Kelch-like ECH-associated protein 1 (KEAP1) and the nuclear factor erythroid-2-related factor 2 (NFE2L2) mutations were found in more than 20% patients with non-small cell lung cancer, which represented one of the most important genomic subtypes [8,9]. Moreover, the genomic alterations of KEAP1 or NFE2L2 were reported to play crucial roles in lung adenocarcinoma [10][11][12].
Abnormal regulations of reactive oxygen species contribute to the occurrence and development of malignancies [13]. The KEAP1 and NFE2L2 are the two main components in the stress response pathways. KEAP1 mediates the degradation of NFE2L2 to act as an adaptor protein of the Cullin 3 (CUL3) E3 ubiquitin ligase so as to maintain the redox homeostasis. In the presence of oxidative stress, the inactivation of KEAP1 results in the release, accumulation and nucleus translocation of NFE2L2 to counteract the damage [14,15]. Therefore, the KEAP1/NFE2L2 mutations, representing the dysfunctional activations of the stress response pathway, have been found in many common malignant tumors, including lung adenocarcinoma [16][17][18]. Nevertheless, the biological features and clinical implications of KEAP1/NFE2L2 mutations remain elusive and contradictory. Rizvi et al revealed potential associations between the response of immune checkpoint inhibitors and KEAP1-mutant non-small cell lung cancer. On the contrary, Hellyer et al suggested that KEAP1/NFE2L2 mutations might represent a mechanism of intrinsic resistance to EGFR-tyrosine kinase inhibitor therapy [19]. Chemoresistance was also reported to be associated with KEAP1/NFE2L2 mutations [20,21].
In our study, multiplatform data from The Cancer Genome Atlas (TCGA) was adopted to identify two subtypes of lung adenocarcinoma harboring KEAP1/NFE2L2 mutations. Bioinformatics analyses regarding immune microenvironment and methylation level were performed to characterize potential mutant subgroups. The initial results were also validated by using common lung adenocarcinoma cell lines, which revealed consistent features of KEAP1/NFE2L2-mutant subtypes. Furthermore, cell line samples were adopted for drug sensitivity screening based on public datasets. Potential drugs which were sensitive to each mutant subtype of lung adenocarcinoma were explored and identi ed.

Patient cohort and cell lines data
Level 3 RNA sequencing data, DNA methylation data (Illumina In nium HumanMethylation 450K BeadChip), miRNA expression data and clinical information of patients with lung adenocarcinoma were downloaded from TCGA (https://protal.gdc.cancer.gov/). Somatic mutation data were selected based on previous studies by comprehensive analyses accounting for variance and batch effects [22]. Copy number variations (CNV) were estimated using the GISTIC2 method from the University of California Santa Cruz Xena website (https://xena.ucsc.edu). Patients with KEAP1/NFE2L2 mutations were selected as the main study cohort. Patients with missing data types were excluded.
RNA sequencing data, miRNA expression levels, copy number values and gene mutation status of common lung adenocarcinoma cell lines were downloaded from the Cancer Cell Line Encyclopedia (CCLE, https://portals.broadinstitute.org/ccle). Also, DNA methylation levels (Illumina In nium HumanMethylation 450K BeadChip) of selected cancer cell lines were acquired from the Gene Expression Omnibus (GEO, (https://www.ncbi.nlm.nih.gov/geo) (GSE68379). The drug sensitivity data of selected cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/). Histological information of each cell line was con rmed based on GDSC, CCLE and Cellosaurus database [23,24]. Cell lines with unknown data types were removed. Similarly, lung adenocarcinoma cell lines with KEAP1/NFE2L2 mutations were identi ed by following the above rules. Data processing and clustering For the DNA methylation data, probes in sex chromosomes or overlapping single nucleotide polymorphisms were removed. Cross-reactive probes were also excluded according to Chen et al [25]. The frequencies of six base substitutions (C > A, C > G, C > T, T > A, T > C, and T > G) were calculated. For some datasets, features or probes with more than 20% missing values were deleted. The k-nearest neighbor algorithm was adopted to impute the remaining missing data.
All ve data types (RNA sequencing, DNA methylation, miRNA level, copy number and base substitution) were integrated using the similar network fusion (SNF) method for lung adenocarcinoma patients and cell lines. The SNF method fused all ve datasets into one by creating a similarity matrix for each data type. A non-linear method based on the theory of message-passing was adopted to iteratively update and converge datasets. Afterwards, consensus clustering was performed to identify distinct KEAP1/NFE2L2 mutated subgroups of lung adenocarcinoma patients and cell lines [26].
Bioinformatics analyses to characterize KEAP1/NFE2L2-mutant subgroups Mutant subgroups were preliminarily characterized by subjecting clusters for both patients and cell lines to Gene Set Enrichment Analysis (GSEA) using Hallmark, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) (MSigDB v7.0) gene sets [27]. Normalized enrichment score > 1, nominal P-value < 0.05, and false discovery rate Q-value < 0.25 were used as screening thresholds for GSEA. Moreover, we also explored potential concurrent mutations with KEAP1/NFE2L2-mutant subsets in lung adenocarcinoma patients.
The features of tumor microenvironment in KEAP1/NFE2L2-mutant lung adenocarcinoma were evaluated according to several previous studies. Saltz et al proposed a leukocyte fraction by estimating tumorin ltrating leukocytes on hematoxylin and eosin stained slides using deep learning techniques [28]. We also used the "Estimation of STromal and Immune cells in MAlignant Tumours using Expression data (ESTIMATE)" method for the assessment of tumor microenvironment [29]. Li et al developed a public resource (Tumor IMmune Estimation Resource, TIMER) to study tumor-in ltrating immune cells by computational approaches based on RNA sequencing [30]. The levels of speci c immune cell in ltration, like CD8 + T cell and macrophage, between mutant subgroups were also compared. Furthermore, we compared the number of immunogenic mutations and non-synonymous mutations per sample strati ed by the KEAP1/NFE2L2 mutant status.
The global methylation levels (β value) between KEAP1/NFE2L2 mutant patient and cell line subsets were compared to investigate epigenomic alterations and potential clinical associations. Next, a list of smoking-related DNA methylation probes was obtained from a previous study conducted by Vaz [31].The union of all reported probes was considered and their levels strati ed by the mutant subsets were compared. Somatic mutation status of KEAP1/NFE2L2-mutant patients was analyzed to extract mutational signatures using the SignatureAnalyzer [32]. Similarities were compared based on previously reported thirty mutational signatures in the Catalogue Of Somatic Mutations In Cancer (COSMIC, https://cancer.sanger.ac.uk/cosmic) to identify the potential clinical associations and etiologies.
Cancer-associated drug sensitivity data of lung adenocarcinoma cell lines were also downloaded from two sub-datasets of GDSC. Drug samples that were tested in < 50% cell lines were excluded. The natural log value of the tted half-maximal inhibitory concentration [LN(IC50)] of each drug was adopted to select caner-associated drugs which were speci cally sensitive to mutant subtypes (C1 and C2). The criteria for KEAP1/NFE2L2-mutant speci c drugs were as follows: LN(IC50) C1 or C2 < LN(IC50) C2 or C1 , P < 0.05; LN(IC50) C1 or C2 < LN(IC50) WT

Statistical analysis
All statistical analyses in this study were conducted using R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) and IBM SPSS Statistics 22.0 (IBM, Inc., NY, USA). Comparisons of immunological features and drug sensitivities were performed using the Kruskal-Wallis H test and Mann-Whitney U test. Baseline characteristics and co-mutations were compared by the chi-square test. Survival curves were estimated and compared following the Kaplan-Meier method and the log-rank test. A twotailed P-value less than 0.05 was considered statistically signi cant.

Results
Identi cation of subtypes of KEAP1/NFE2L2-mutant lung adenocarcinoma As previously stated in the Methods section, we integrated ve data subtypes and clustered 89 KEAP1/NFE2L2-mutant lung adenocarcinoma patients into two subgroups (P1 and P2 groups, Fig. 1A). Similarly, two subtypes were identi ed in 20 lung adenocarcinoma cell lines harboring KEAP1/NFE2L2mutations (C1 and C2 groups, Fig. 1C). Clustering with two classes in both patients and cell line samples showed the highest silhouette values (silhouette = 0.93 and 0.83, Fig. 1B and 1D).
Clinicopathological differences of the KEAP1/NFE2L2-mutant subtypes A signi cant difference was found in the smoking status of patients in P1, P2 and wild-type groups (P = 0.033, Table 1). The P2 group consisted of the highest proportions of current smokers and reformed smokers for ≤ 15 years, while P1 groups consisted of more reformed smokers ≥ 15 years (Table 1). No signi cant difference of pathological stage was found among patients with P1, P2 and KEAP1/NFE2L2 wild-type lung adenocarcinoma (P = 0.233, Table 1). Mutant samples contained a signi cantly higher proportion of female patients (P = 0.003, Table 1). Survival analysis showed no signi cant difference in the overall survival between subgroups of KEAP/NFE2L2-mutant and wild-type lung adenocarcinoma (P = 0.212, Fig. 2A). However, the P2-mutant subgroup was associated with a signi cantly worse survival than the P1 subgroup (P = 0.020, Fig. 2B). * Samples with unknown information were removed when comparisons were conducted among groups.
Basic biological features of KEAP1/NFE2L2-mutant subtypes GSEA was performed in KEAP1/NFE2L2-mutant subtypes in both patients and cell line cohorts. As shown in Fig. 3A and 3B, P2 and C2 subtypes were both enriched in pathways, such as KRAS signaling, IL2/STAT5 signaling, apoptosis, and interferon alpha and gamma response. GSEA showed similarities between P2 and C2 subtypes, validating the integration and clustering to some degree.
Moreover, both P2 and C2 subtypes were associated with regulations of immune-related pathways, such as activations of T cells and macrophages (Supplement Fig. 1A an 1B). The results revealed that P2 and C2 subgroups displayed active immune pathways compared with P1 and C1 subgroups.
The P2 subgroup was found associated with higher proportions of TP53 (P < 0.001), PCLO (P = 0.011), NF1 (P = 0.029) and PTPRT (P = 0.040) mutations, while the P1 subgroup may have more patients with STK11 (P = 0.008) mutation (Supplement Table 1). However, we did not validate the mutational associations in lung adenocarcinoma cell lines owing to the small sample size.

Smoking-related genomic features of the KEAP1/NFE2L2-mutant subtypes of lung adenocarcinoma
First, the methylation levels were compared across mutant subgroups. 84,700 and 64,204 differentially hypermethylated probes were found in the P1 and P2 groups, respectively (Fig. 5A). Meanwhile, 8,981 hypermethylated probes were found in the C1 group, while 5,933 hypermethylated probes were found in the C2 group (Fig. 5B). Next, unique smoking-related probes were extracted according to Vaz et al [31].
Both P2 and C2 groups displayed a similar trend of hypermethylation compared with the P1 and C1 groups ( Fig. 5C-D). The results suggested that smoking-related epigenomic alterations might play essential roles in KEAP1/NFE2L2-mutant subgroups. The epigenomic similarities also con rmed a potential resemblance between patient and cell line mutant subsets.
Second, we assessed the somatic mutational patterns of all lung adenocarcinoma patients and obtained four distinctive signatures (Supplement Fig. 2A). Among them, signature 2 subgroup (W2) was highly similar to Signature 4 and 29 of thirty known somatic mutational signatures in the COSMIC database, which were closely associated with smoking and tobacco chewing (coe cient of cosine similarity = 0.805 and 0.740). Then, we compared the normalized activities of the identi ed W2 mutational signature between KEAP1/NFE2L2-mutant subgroups. We found that P2 subset had signi cantly higher activities of W2 signature than P1 subset (Supplement Fig. 2B, P = 0.004), which further indicated possible different roles of smoking in the mutant subgroups.
Screening for compounds with potential sensitivity to the KEAP1/NFE2L2-mutant subtypes After characterizing the clinical and biological features of the mutant subtypes, possible cancerassociated drugs which were sensitive to each subtype were explored. More than 400 drugs or compounds were tested on KEAP1/NFE2L2-mutant or wild-type lung adenocarcinoma cell lines in GDSC. This study aimed to target cancer-associated drugs or compounds with potential speci c sensitivity to C1 or C2. 38 drugs, which were potentially sensitive to the C2 KEAP1/NFE2L2-mutant subtype, were discovered (Supplement Table 2). Although the criteria were adjusted, only one C1-speci c compound was identi ed (Supplement Table 2).
C2-speci c drugs were found to be mainly composed of the following types. First, inhibitors of the PI3K/Akt signaling pathways, such as afuresertib, AZD8186 and AMG-319 might be sensitive to the C2 KEAP1/NFE2L2-mutant subgroup compared with the C1 and wild-type groups ( Fig. 6 and Supplement Table 2). Second, inhibitors of IGF1R signaling, such as BMS-536924, linsitinib and NVP-ADW742, showed better e cacy in the C2 subset ( Fig. 6 and Supplement Table 2). Moreover, drugs that target Wnt and MAPK/Erk signaling pathways were more toxic to the C2 group ( Fig. 6 and Supplement Table 2). In addition, chemotherapy drugs, such as docetaxel, epothilone B and vinorelbine were found to preferentially kill the cells in the C2 subgroup ( Fig. 6 and Supplement Table 2). Nevertheless, only one compound (EHT-1864) was found that might be sensitive to the C1 subset compared with C2 and wildtype cell lines ( Fig. 6 and Supplement Table 2). The selected compound, EHT-1864, is an inhibitor of Rac1, Rac2 and Rac3 and mediated the reorganization of actin cytoskeleton.

Discussion
The KEAP1/NFE2L2 mutations were observed in many common malignant tumors, including lung adenocarcinoma [8,9,16,18], which might de ne a molecular subset of rapidly progressing tumor [35]. In this study, the multiplatform data from TCGA were adopted to identify subsets of lung adenocarcinoma with KEAP1/NFE2L2 mutations. Clinicopathological and bioinformatics analyses, such as immune microenvironment and methylation level, were performed to further explore the intrinsic heterogeneities of KEAP1/NEFE2L2-mutant disease. Moreover, cell line samples were used for drug sensitivity screening based on public datasets. In addition, CUL3 mutation was not included as the genomic signature in this study. CUL3 belonged to the ubiquitin-proteasome system, which was involved in many oncogenic processes, and could not be considered as a speci c KEAP1/NFE2L2 pathway component [36].
Variations in the KEAP1-NFE2L2 pathway were detected in more than 20% patients with lung cancer, which represented one of the major molecular subtypes [8,9]. Goeman et al revealed that KEAP1/NFE2L2 mutations represented a negative factor of survival, which de ned a rapidly progressing molecular subtype [35,37]. The mutant type showed heterogeneities, and one subset was associated with signi cantly worse survival. Cai et al performed a similar study and divided KEAP1/NFE2L2-mutant lung adenocarcinoma into three subsets based on gene pro ling. The present study integrated multi-omics datasets, such as somatic mutation, methylation, and miRNA, to cluster into two subsets. P2/C2 subset displayed active immune pathways compared with the P1/C1 subgroups. Clinical features, somatic mutation signatures and methylation levels showed potential associations with patients' smoking history.
Previous studies demonstrated that smoking led to signi cant nuclear translocation of NFE2L2, which might be potentially fatal in smoking-related lung tumorigenesis [38,39]. Moreover, aberrant KEAP1/NFE2L2 functions could also be an advantageous strategy of the tumor to protect itself against oxidative stress [39]. These ndings might also be potential evidence of distinct KEAP1/NFE2L2 subtypes.
Furthermore, drug sensitivities of cell lines from public datasets were analyzed and several subgroupspeci c drugs were discovered in our study. Best et al observed that synergy between KEAP1/NFE2L2 and PI3K pathways promoted lung cancer progression with the altered immune milieu, which supported the compound screening results of inhibitors of PI3K/Akt pathways in this study [10]. Several studies revealed possible associations between the two pathways [40,41]. The pathway analyses of this study also revealed that PI3K/Akt pathway was enriched in the P2 subgroup. Vartanian et al identi ed alternative pathways critical for NFE2L2-dependent growth in KEAP1-mutant cell lines, including IGF1R [33]. The ndings in this study suggested that inhibitors of IGF1R signaling were effective in the C2 subtype. Only one alternative compound existed, which inhibited Rac signaling to mediate the actin cytoskeleton. Wu et al demonstrated that KEAP1 stabilized F-actin cytoskeleton structures and inhibited focal adhesion, thereby restraining migrations and invasions of lung cancers [34]. KEAP1/NFE2L2/CUL3 represented a mechanism of resistance to tyrosine kinase inhibitor in patients with EGFR-mutant nonsmall cell lung cancer [19]. Most identi ed compounds in our study were sensitive to the C2 subgroup which represented a subset with a worse prognosis. However, only one compound showed better e cacy to the C1 group with a revised statistical threshold, revealing di culties in selecting appropriate drugs. However, the intrinsic differences in immune in ltrations suggested distinct immunotherapy strategies, especially developing drugs for the C2/P2 group. Also, concurrent alterations, like STK11 and TP53, could also be potential targets in KEAP1/NFE2L2-mutant diseases.
There were also limitations that should be mentioned in this study. First, it had a small sample size of mutant cell lines and patients. The study explored intrinsic heterogeneities of KEAP1/NFE2L2-mutant lung adenocarcinoma. However, further studies are required to better characterize and precisely differentiate each mutant subtype. Although LN(IC50) was adopted from GDSC to measure compound sensitivities, more experiments should be conducted to test drug e cacy.

Conclusion
Two subtypes of KEAP1/NFE2L2-mutant lung adenocarcinoma were identi ed based on both patient and cell line samples, and genomic and clinicopathological features of KEAP1/NFE2L2 mutations were characterized. The intrinsic heterogeneities of KEAP1/NFE2L2 mutations was found to be associated with immune microenvironment and smoking-related genomic aberrations.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data could be downloaded from public databases (TCGA, GEO, CCLE, GDSC and XENA databases) and previous literatures in the reference. The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Competing interests
None.