Integrative Bioinformatic and Functional Analyses of Potential Genes in Female Patients with Non-Small-Cell Lung Cancer

[Abstract] Objective: To explore the pathogenesis and prognostic biomarkers of non-small-cell lung cancer (NSCLC) in female patients via a bioinformatic analysis and functional prediction of potential NSCLC-associated genes in females. Methods: Data for female patients with NSCLC were downloaded from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified using GEO2R. The DAVID online database was used to perform Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) analyses, and STRING online software was used to perform protein-protein interaction (PPI) analyses. Next, the plug-in (M-CODE) was used to screen the key DEGs; the Oncomine database was analysed for IL6, EGF and MMP9 expression in NSCLC tissues and normal lung tissues and the Kaplan-Meier plotter was used to perform prognostic analyses of key DEGs. Finally, RT-PCR was used to verify the expression of key DEG in NSCLC cells. Results: A total of 500 DEGs were screened, and the functional and pathway enrichment analysis showed that these genes were mainly involved in cell proliferation, cell migration, and vasculature development regulation. The KEGG analysis showed that the pathways were primarily related to ECM-receptor interactions, protein digestion and absorption, and leukocyte transendothelial migration signalling. Three key DEGs were obtained by the PPI network analysis: IL6, EGF and MMP9. The expression of IL6 was low in NSCLC tissues, while that of EGF and MMP9 was high. IL6 and EGF may be biomarkers for predicting the prognosis of female patients with NSCLC. Compared with human bronchial epithelial cell line (16HBE), EGF and MMP-9 were high expressed in NSCLC cells. Conclusion: In female patients, IL6, EGF and MMP9 may be research targets for characterizing the pathogenesis of NSCLC, and IL6 and EGF may be biomarkers for predicting the prognosis of this cancer.


Introduction
Lung cancer has become a major public health problem worldwide, and smoking is considered an important cause [1][2]. In the last 10 years, the proportion of lung cancer in women has gradually increased compared with that in men, and the vast majority of female lung cancer patients have no history of smoking. Studies have shown that 15% of male and 53% of female lung cancer cases are not related to smoking, and NSCLC has a higher incidence and worse prognosis in women [3][4]. At present, female NSCLC patients represent a special and large group of cancer patients. Environmental factors and genetic susceptibility may be the main factors leading to the risk of female NSCLC [5]. In this study, we downloaded a gene expression microarray of female patients with NSCLC from the Gene Expression Omnibus (GEO) and used GEO2R software to compare NSCLC tissue with normal lung tissue to screen out DEGs and evaluate their clinical value with the severity and prognosis of the disease. The purpose of this study was to further investigate the molecular mechanism of NSCLC in females and its biomarkers for early diagnosis, precise treatment and prognosis.

Chip data sources and DEG screening
GSE19804 and GSE31210 chip data sets were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Both chip platforms are GPL570. After analysing NSCLC tissues and normal lung tissues with GEO2R software in the GEO database, the DEGs of NSCLC tissues and normal lung tissues were obtained and screened. The array data of GSE19804 included 60 female NSCLC tissue samples and 60 healthy female lung tissues [6]. GSE31210 contained 246 samples, consisting of 121 female NSCLC samples and 9 normal female lung tissues [7].

Enrichment analysis of GO and KEGG
GO and KEGG analyses and functional annotation of DEGs were carried out using the David online database. The main biological processes and tumor-related signalling pathways associated with the DEGs were analysed to obtain a more comprehensive understanding of their mechanisms and functions.

DEG expression level and prognosis of female NSCLC patients
String (https://string-db.org/) is a powerful online analysis website for protein-protein interaction networks (PPIs). A PPI network of key DEGs was constructed using this website, and the screening condition was a combined score > 0.4. The M-CODE plug-in was used to screen out the most critical genes in the PPI network. The log rank test and Cox proportional hazards regression method were used to compare the overall survival (OS) of the two groups to verify the relationship between the expression of these key genes and the prognosis of female NSCLC patients. The statistical analyses were performed using R software (version 3.3.2), with the significance level set as a two-sided P value of < 0.05.

Oncomine datasets
The GSE19804 and GSE31210 datasets were downloaded from GEO. The relative expression level of key DEGs was extracted from each cohort. An unpaired t test was used to evaluate the differential expression of key DEGs between NSCLC tissues and adjacent normal tissues. The statistical analyses were performed using R software (version 3.3.2) with the significance level set as a two-sided P value of < 0.05.
The relative expression levels of key DEGs in different NSCLC datasets were analysed by searching the Oncomine bioinformatics gene database (https://www.oncomine.org/). The specific screening conditions were as follows. (1) gene: enter key DEG name; (2) cancer type: select cancer vs normal analysis; (3) cancer type: select NSCLC; (4) demographics: select female; and (5) analysis type: set P-value to 0.05, fold change to 2, and gene rank to 10%.

Cell lines
The 16HBE, lung adenocarcinoma cell line (PC9) and lung squamous cell carcinoma (SK-MES-1) were purchased from the Institute of Biochemistry and Cell Biology of the Chinese Academy of Sciences.  Table S1.

Five hundred common DEGs were found to be associated with NSCLC in females
The results showed that GSE19804 contained 1160 DEGs and GSE31210 contained 780 DEGs. After removing duplicate and incomplete data, a total of 500 common DEGs were found ( Figure 1).

GO and KEGG analyses of the biological functions of 500 DEGs.
A cluster profiler was used to analyse all 500 DEGs. In the biological process category, the DEGs were mainly enriched in endothelial cell proliferation, ameboidal-type cell migration, urogenital system development, vasculature development regulation, and epithelial cell proliferation regulation. In terms of cell components, the DEGs were mainly enriched in glycosaminoglycan binding, extracellular matrix structural construct, and amide binding. In cell components, the DEGs were mainly enriched in glycosaminoglycan binding, extracellular matrix structural construction, and amide binding (Figure 2A-C). The results of the KEGG enrichment analysis showed that the DEGs were mainly enriched in ECM receptor interaction, malaria, protein digestion and absorption, and leukocyte transverse migration ( Figure 2D).

PPI network was constructed, and 3 key DEGs were screened out.
Five hundred common DEGs were imported into STRING for analysis to obtain a PPI network. The PPI network was composed of 482 nodal proteins and 1678 interactions. Finally, three genes located in key positions were screened out: interleukin-6 (IL6), epidermal growth factor (EGF) and matrix metalloproteinase 9 (MMP9). The corresponding degrees of IL6, EGF and MMP9 were 75, 58 and 42, respectively ( Figure   3).

Expression of key DEGs and their relationship with prognosis
We used the GEPIA tool to verify the expression of three key genes in NSCLC tissues and normal lung tissues based on the GEO database ( Figure 4A-F) and further verified the correlation between the expression levels of the three key genes and the prognosis of female NSCLC patients using GEO31210 chip data. The survival analysis showed that IL6 and EGF were associated with the prognosis of NSCLC in female patients ( Figure 5A-C).

Oncomine database analysis of IL6, EGF and MMP9 expression in NSCLC tissues and normal lung tissues
An analysis of eight datasets found that the expression of IL6 in NSCLC tissues was lower than that in normal lung tissues, and the expression of EGF and MMP9 in NSCLC tissues was higher than that in normal lung tissues ( Figure 6A-C).

Expression of key DEG in NSCLC cells
The expression of three key genes in NSCLC cells were assessed using RT-qPCR, the expression of IL6 in NSCLC cells was lower than that in 16HBE cell line, while the expression of EGF and MMP9 in NSCLC cells was higher than that in 16HBE cell line ( Figure 7A-C).

Discussion
With the continuous development of medical care, the proportion of female patients with LAD among new lung cancer cases is increasing significantly [8]. At present, the specific molecular mechanism of NSCLC in females has not been fully elucidated, although it has been shown to involve the abnormal expression, inactivation, and mutation of many genes and the involvement of a variety of related signalling pathways [9].
With the development of second-generation sequencing technology, the use of bioinformatic methods to analyse the treatment targets of female patients with NSCLC is a current research hotspot [10]. GEO2R software was used to screen the GSE19804 and GSE31210 datasets of tissue microarray expression of female NSCLC patients in the GEO database. A total of 500 DEGs were screened.
Then, the functional and pathway enrichment analysis of these DEGs showed that they were mainly involved in cell proliferation, cell migration, and vasculature development regulation. The KEGG analysis showed that it was mainly related to ECM-receptor interaction, protein digestion and absorption, leukocyte transendothelial migration and other signalling pathways, and these findings facilitated a further exploration of the mechanisms. Three key genes (IL6, EGF and MM9) closely related to the occurrence and development of NSCLC in females were screened, which provided potential targets for the follow-up study of the treatment of NSCLC. IL-6 can regulate tumor activity in two ways. Its main cancer-promoting effect is to produce Th17 cells and inhibit dendritic cell and CD8 + T cell activity. IL-6 suppresses cancer by inhibiting regulatory cells [11]. The intricate interaction between IL6-mediated tumor immunosuppression and metabolic disorders was also observed in colorectal and pancreatic cancer [12]. In this study, we found that the expression of IL-6 in NSCLC tissues and serum was lower than that in normal tissues and serum. Therefore, according to our preliminary research results, we speculate that IL6 exhibits a two-way regulatory role in the process of tumour immunity. The low expression level of IL-6 in the serum of patients with NSCLC may be one of the important factors promoting the occurrence and development of NSCLC, although its specific mechanism remains to be clarified.
Epidermal growth factor (EGF) is a low molecular weight peptide chain composed of 53 amino acid residues and a strong mitogenic agent.