Prognostic value of immune-related genes and comparative analysis of immune cell inltration in lung adenocarcinoma of different genders

Lung adenocarcinoma (LUAD) is one of the most important subtypes of lung cancer. Compared with male LUAD patients, female patients have a higher incidence, but better long-term survival rate, with unknown reasons. In this study, we aimed to explore the effect of gender on immune cell inltration of LUAD tumor microenvironment (TME), and tried to clarify the reasons for the different clinical characteristics of male and female LUAD patients, by conducting a comparative analysis of the TME in female and male LUAD patients.


Background
As the most commonly diagnosed cancer, lung cancer is the leading cause of cancer death with a 5-year overall survival rate of 7.5% in males and 8.5%in females [1,2]. Among of all histologic subtypes of lung cancer, lung adenocarcinoma (LUAD) is the most prevalent, accounting for more than 40% of lung cancer incidence [3]. There are gender differences in the occurrence and progression of LUAD, but so far the reason is unclear. This study is the rst to compare the differences in tumor-in ltrating immune cells (TIICs) in TME between female LUAD patients and male LUAD patients, to explore the in uence of gender factors on LUAD.
Recent studies have shown that components in TME, including immune cells, endothelial cells, mesenchymal cells, in ammatory mediators and extracellular matrix, are closely related to the occurrence, progression and prognosis of cancer [4][5][6][7][8][9]. As the two main components, TIICs and stromal cells are regarded as valuable for the effect of tumor immunotherapy [10][11][12][13][14]. It is reported that immune cells, such as T and B cells, macrophages, dendritic cells, monocytes, etc., interact with tumor cells and are correlated to tumor cell proliferation, metastasis and prognosis [15][16][17]. Regulated by Tregs and M2 macrophages, tumor-in ltrating lymphocytes (TILs) participate in the clinical course of various cancers [18][19][20][21][22]. Donnem and his colleagues found that the density of CD8 (+) TIL is an independent prognostic factor in non-small cell lung cancer (NSCLC) [23]. In the study of colorectal cancer, it was found that the level of DNA microsatellite instability is related to the density of TIL. High-level of microsatellite instability is characterised by the presence of TILs and a favourable prognosis [24]. Further more, TME also has an impact on tumor-related gene expression and clinical prognosis [25][26][27][28]. These ndings indicate that TME is closely related to tumor progression, and this will provide a potential cure for cancer.

Estimation of STromal and Immune cells in MAlignant Tumours using Expression data (ESTIMATE) is
created by Yoshihara and his colleagues, which has been used to score the microenvironment of various types of cancer [29]. For example, the ESTIMATE algorithm has been applied to evaluate immune and stromal scores to predict the proportion of immune cells, stromal cells and other non-tumor cells in TME of lung cancer [30], breast cancer [31], uveal melanoma [32], gastric cancer [33], colorectal cancer [34].
As a clinical variable, gender plays a signi cant role in tumor progression, which should be considered as an important factor in pathogenesis of cancer. Women tend to have lower mortality lower rates of lung cancer than men. In our study, we used gene expression pro le data from The Cancer Genome Atlas (TCGA) database to comparatively analyze the difference of TIICs and tumor-related genes between female LUAD patients and male LUAD patients based on ESTIMATE algorithm, in order to explore the causes of higher mortality of male patients with LUAD than female patients.

Methods And Materials
Data Mining from the public Database Gene expression data of female (n=304) and male (n=247) patients with LUAD were downloaded from the data portal of TCGA (https://tcga-data.nci.nih.gov/tcga/), which included complete clinical information (age, TNM stage, survival and outcome). Estimate score, immune score and stromal score were calculated by ESTIMATE algorithm. 232 female and 188 male LUAD patients data from GSE72094 was downloaded using Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo) to serve as validation sets.
Identi cation of differentially expressed genes Differentially expressed genes (DEGs) in high-and low-score groups were screened by using the "limma" package [35]. A fold change of >1 and an adjusted p<0.05 was set as the cutoff criteria. Additionally, "heatmap" package was used to construct heat maps.

Survival analysis
The correlation of estimate, immune and stromal scores with overall survival of LUAD patients was plotted by "survival" and " survminer" packages.

Functional analysis and Cox analysis of DEGs
Functional analysis of DEGs in female and male LUAD patients was conducted using "clusterPro ler", "org.Hs.eg.db", "enrichplot" and "ggplot2" packages for GO terms and KEGG pathways. The p value < 0.05 and q value < 0.05 was considered to be statistically signi cant. At the same time, we performed univariate regression analysis on DEGs to screen prognostic factors that regulate TME and tumor immunity.

PPI Analysis
The protein-protein interaction (PPI) networks was analyzed by an online tool named Search Tool for the Retrieval of Interacting Genes (STRING) [36]. Con dence Score≥0.9 was set as the cutoff value. We analyzed the connectivity degree of each DEGs using STRING database and reconstructed the networks via Cytoscape software. TME analysis CIBERSORT is a deconvolution algorithm that has been widely used to calculate the numbers of each type of TIICs and analyze the correlation between gene expression and TIICs in TME [37][38][39][40]. The "corrplot" package was used to conduct correlation-based heatmaps.

Statistical analysis
Kaplan-Meier plots were used to analyze and visualize the associations of estimate, immune, and stromal scores and DEGs with prognosis. The correlations of estimate, immune, and stromal scores with age and TNM stage were estimated via Wilcoxon test or Mann-Whitney U tests. The estimate, immune, and stromal scores of female LUAD patients were compared with that of male LUAD patients using GraphPad Prism 8. Functional analysis, Cox analysis, survival, and TME analysis were conducted using R version 3.6.2. P < 0.05 was considered to be statistically signi cant. among male patients with LUAD. The estimate, immune and stromal scores of women were higher than those of male (p<0.05) ( Figure 1A). The samples were divided into high-and low-score groups by the medium value. Up-regulated genes and down-regulated genes were showed in Figure 1B. By interaction of DEGs in immune and stromal groups, 269 up-regulated genes and 35 down-regulated genes were screened in female patients, and 340 up-regulated genes and 28 down-regulated genes were screened in male patients ( Figure 1C).

Associations of the scores with age, TNM stage and prognosis in LUAD patients of different genders
To explore the relationship between estimate, immune or stromal scores and age, TNM stage or survival, the samples were divided into high-score and low-score groups. The analysis indicated that for female patients with LUAD, T1 stage had higher estimate scores than that of T3 stage (p=0.01), and N1 stage had higher estimate score than that of N0 stage (p=0.015) and N2 stage (p=0.018) ( Figure 2A). However, in male patients with LUAD, estimate score had no obvious relationship with the T or N stage ( Figure 2D). Whether it was a male or female patient with LUAD, high immune score was related to older age ( Figure  2B, 2E). Immune score of T1 stage was higher than that of T3 stage in female patients ( Figure 2B), but in male patients, immune score of T1 stage was higher than that of T2 and T4 stage ( Figure 2E). In addition, N1 stage had higher immune scores than that of N0 stage (p=0.038) and N2 stage (p=0.011) ( Figure 2B) for female patients, and clinical stage I had higher immune scores than that of stage III for male patients (p=0.037) ( Figure 2E). For female patients with LUAD, the stromal scores of N1 stage were higher than that of N0 (p=0.016) ( Figure 2C), and stromal scores of M0 stage or clinical stage I were higher than that of M1 stage (p=0.044) or clinical stage IV (p=0.024) ( Figure 2F). To study the correlations of overall survival with estimate, immune or stromal scores, we performed Kaplan-Meier survival analysis ( Figure 2G). The results indicated that high estimate score had better overall survival for female patients with LUAD (p=0.039), but they had no signi cant relationship with prognosis of male patients (p=0.039). The overall survival was not associated with immune or stromal score whether it was a male patient or a female patient (p>0.05).

GO and KEGG analysis of immune-related DEGs
To further explore the functions of immune-related DEGs, we performed GO and KEGG analysis on these genes. The top 5 enriched BPs in female patients with LUAD based on GO analysis were leukocyte proliferation, lymphocyte proliferation, mononuclear cell proliferation, T-cell activation, and lymphocyte differentiation ( Figure 3A), and the top 5 enriched BPs in male patients with LUAD based on GO analysis were T-cell activation, leukocyte migration, leukocyte proliferation, lymphocyte proliferation, and regulation of immune effector process ( Figure 3E). Circular plot demonstrated the functional interactions between the BP, CC or MF pathways and genes as extracted from GO ( Figure 3B, 3F). Moreover, the top 5 enriched BPs in female patients with LUAD based on KEGG were viral protein interaction with cytokine and cytokine receptor, cytokine-cytokine receptor interaction, chemokine signaling pathway, hematopoietic cell lineage, and staphylococcus aureus infection ( Figure 3C), and top 5 enriched BPs in male patients with LUAD based on KEGG were cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, hematopoietic cell lineage, chemokine signaling pathway, and graft-versus-host disease ( Figure 3G). Circular plot demonstrated the functional interactions between the BP, CC or MF pathways and genes as extracted from KEGG ( Figure 3D, 3H).

PPI network and COX regression analysis of immune-related DEGs
The STRING tool was used to plot PPT networks of immune-related DEGs, which were regenerated by Cytoscape ( Figure 4A, 4B). The top 30 genes with most number of adjacent nodes in female and male patients with LUAD were showed in Figure 4C and 4D. We performed univariate Cox regressiom analysis to evaluate the prognostic value of immune-related DEGs in female ( Figure 4E) and male ( Figure 4F) patients with LUAD (genes with p<0.05 displayed in forest plot). Two-way venn diagram comparing the key genes in female ( Figure 4G) and male ( Figure 4H) cohorts. CCR2, LCP2, and PTPRC were selected as key prognostic factors of female patients with LUAD. BTK and CCR2 were selected as key prognostic factors of male patients with LUAD.
The expression level of these key TIICs-related genes and their prognostic value in LUAD patients We further revealed the expression level of these key TIICs-related genes and their prognostic value in LUAD patients. According to the result, the expression level of CCR2 in tumor tissues of female LUAD patients was not signi cantly different from that in normal tissue (p>0.05) ( Figure 5A and 5B), but female LUAD patient with high level of CCR2 exhibited a better overall survival (p=0.001) ( Figure 5C). The expression level of LCP2 in tumor tissues of female LUAD patients was lower than that in normal tissues (p<0.05) ( Figure 5D), yet there was no difference in the expression of LCP2 in paired tumors and adjacent normal tissues (p>0.05) ( Figure 5E). Interestingly, female LUAD patient with a high level of LCP2 exhibited a better overall survival (p=0.033) ( Figure 5F). The expression level of PTPRC in tumor tissues of female LUAD patients was lower than that in normal tissues (p<0.05) ( Figure 5G and 5H), and female LUAD patient with high level of PTPRC exhibited a better overall survival (p=0.006) ( Figure 5I). The expression level of BTK in tumor tissues of male LUAD patients was lower than that in normal tissues (p<0.05) ( Figure 5J and 5K), and male LUAD patient with high level of BTK exhibited a better overall survival (p=0.035) ( Figure 5L). The expression level of CCR2 in tumor tissues of male LUAD patients was lower than that in normal tissues (p=0.035) ( Figure 5M), yet there was no difference in the expression of CCR2 in paired tumors and adjacent normal tissues (p>0.05) ( Figure 5N). Interestingly, male LUAD patient with a high level of CCR2 exhibited a better overall survival (p=0.011) ( Figure 5O). Associations of these key genes expression with age and TNM stage in LUAD patients of different genders were showed in Figure  S1.
CIBERSORT for estimating TIICs components in female and male LUAD microenvironment  Figure 6C). As shown in Figure 6D for male LUAD patients, there was a signi cant negative correlation between CD8 T cells and memory resting CD4 T cells(Cor=-0.41), and also a signi cant positive correlation between CD8 T cells and memory activated CD4 T cells (Cor=0.47).

Difference analysis of TIICs in LUAD tumor and adjacent normal tissues
To compare the differences in immune cell in ltration between female and male patients with LUAD, we analyzed the differences in the distribution of TIICs in the TME of female and male patients with LUAD ( Figure 7).

Compared with paracancerous tissues in female patients, 8 TIICs (memory B cells, plasma cells, memory activated CD4 T cells, follicular helper T cells, regulatory T cells, gamma delta T cells, M1
macrophages, and resting dendritic cells) had a higher proportion in cancerous tissues, and 6 TIICs (memory resting CD4 T cells, resting NK cells, monocytes, M2 macrophages resting mast cells, eosinophils, and neutrophils) had a lower proportion in cancerous tissues (p<0.05) ( Figure 7A). Similarly in male LUAD patients, 5 TIICs (plasma cells, memory activated CD4 T cells, follicular helper T cells, regulatory T cells, and M1 macrophages) accounted for a higher proportion in cancerous tissues than that in paracancerous tissues, and the proportion of 4 TIICs (resting NK cells, monocytes, resting mast cells, and neutrophils) in cancerous tissues was lower than that in paracancerous tissues (p<0.05) ( Figure  7B). Female patients with LUAD had a higher proportion of memory B cells, while the percentage of T cells CD4 naïve and resting NK cells was lower in female patients ( Figure 7C).
In order to further research the in uence of the proportions of TIICs on the prognosis of female and male LUAD patients, univariate Cox regression analysis was conducted on the 22 TIICs for female and male patients respectively. The proportion of activated dendritic cells was identi ed as risk factor for female LUAD patients (Table S1). The proportion of gamma delta T cells, activated NK cells and activated mast cells were identi ed as risk factors for male LUAD patients (Table S2).

Impact of the key identi ed genes on TIICs
We further analyzed the effects of the main differentially expressed genes in female ( Figure S2) and male ( Figure S3 Figure S3A). The relative proportions of CD8 T cells (p = 0.008), CD4 memory activated T cells (p < 0.001), monocytes (p = 0.006), and M1 macrophages (p < 0.001) were signi cantly upregulated in high CCR2 expression group compared with the low CCR2 expression group ( Figure S3B). Gamma delta T cells (p = 0.031) and M0 macrophages (p < 0.001) were present in lower proportions in the high CCR2 expression group than in the others ( Figure S3B).

Validation of TCGA results with GEO database
In order to verify the prognostic value of the identi ed genes from TCGA, we used GSE72094 as a validation cohort. Patients were divided into high expression groups and low expression groups respectively according to gene expression. We rst compared the survival curves of female LUAD patients and male LUAD patients, which showed that the overall survival rate of female patients is better than that of male patients ( Figure 8A). Kaplan-Meier survival curves further con rmed that male LUAD patients with high expression levels of CCR2 ( Figure 8B) and BTK ( Figure 8C) were present in a better overall survival than those with low expression levels of CCR2 and BTK. Similarly, Kaplan-Meier survival curves further veri ed that female patients with high expression levels of CCR2 ( Figure 8D) LCP2 ( Figure 8E), and PTPRC ( Figure 8F) were present in a better overall survival than those with low expression levels of CCR2 LCP2, and PTPRC.

Discussion
LUAD is the most common subtype of lung cancer. The incidence and mortality of LUAD in female and male patients are different. The prognosis of male LUAD patients is usually worse than female patients, but the cause is currently unknown. Utilizing TCGA database, this study is the rst to compare the differences in the in ltration of immune cells in TME between female and male patients with LUAD, which provided in-depth insights for clarifying the reasons for differences in the prognosis of patients of different genders.
The ESTIMATE algorithm has been widely used in recent years to evaluate the immune score and stromal score, which can help scholars understand the TME of LUAD in depth. Our results indicated that the estimate score ranged from − 2358. 46  In general, average estimate, immune and stromal scores of female patients with LUAD were higher than those of male patients. Interestingly, except that female patients with high estimate score had better overall survival than that with low estimate score, there was no signi cant difference in overall survival among other high-score and low-score female or male patients. Early research showed that immune-hot tumors are de ned as those in which many immune cells such as T cells, lymphocytes et al. have a high proportion, and these in ltrating immune cells improve e ciency of tumor to response to treatment of immune checkpoint inhibitors [41][42][43]. More immune-hot tumor means higher immune score in the tumor microenvironment. However, cancer patients with higher immune score do not always mean more immune-hot tumor. In this study, female patients presented higher immune score compared with male patients with LUAD. As showed in Fig. 7, female patients with LUAD had a higher proportion of memory B cells compared with male patients, while the percentage of naïve CD4 + T cells and resting NK cells were lower in female patients, and most immune cells, including naïve B cells, plasma cells, CD8 + T cells, resting/activated memory CD4 + T cells, M0/M1/M2 macrophages, resting/activated dendritic cells, follicular helper T cells, regulatory T cells, gamma delta T cells, activated NK cells, monocytes, resting/activated mast cells, eosinophils, and neutrophils were not signi cantly different between female and male patients with LUAD. Hence, although female and male LUAD patients have differences in immune cell in ltration of tumor microenvironment, more study has to be conducted to verify tumor sensitivity to immunotherapy.
In order to further study the difference of immune-related genes between female and male LUAD patients, we divided female (male) LUAD patients into high-and low-score groups. A totall of 304 DEGs were identi ed in female patients and 368 DEGs were identi ed in male patients (Fig. 1). GO and KEGG analysis for these DEGs indicated that although there were differences in the signaling pathways enriched in female and male patients with LUAD, they were all closely related to tumor immunity (Fig. 3). We further used PPI and Cox methods to screen out the most critical prognostic immune-related genes in female and male LUAD patients. The results showed that CCR2, LCP2, and PTPRC were screened as key prognostic factors of female patients, and BTK and CCR2 were screened as key prognostic factors of male patients with LUAD (Fig. 4).
Next, we compared the in ltration of immune cells in female and male patients with LUAD. Our results indicated 8 TIICs had a higher proportion in cancerous tissues compared with paracancerous tissues in female patients (Fig. 7A), and 5 TIICs accounted for a higher proportion in cancerous tissues than that in paracancerous tissues in male patients (Fig. 7B). Recent studies demonstrate that patients with B cells rich in TME have a better prognosis and immunotherapy response [44][45][46]. Memory B cells are the basis for humans to have long-lasting immunity. They respond to reencountered antigens by forming germinal centers (GC) and rapidly producing antibodies [47]. It is reported that high density of tumor-in ltrating memory B cells is closely related to superior survival [48][49][50]. In our study, the density of memory B cells in TME of female patients with LUAD was signi cantly higher than that in TME of male patients (Fig. 7C), which may explain that female patients with LUAD have a better prognosis than male patients.

Conclusion
We presented a detailed analysis of the tumor microenvironment immune cell in ltration in female and male patients with LUAD. We found differences in the in ltration of immune cells, the expression of prognostic immune-related genes and related signaling pathways in the tumor microenvironment of female and male patients with LUAD. More importantly, female patients with LUAD had a higher proportion of memory B cells compared with male patients, which provides a reliable theoretical basis for explaining the better prognosis of female LUAD patients than that of male patients. Although this study can provide us with insight into the tumor microenvironment of patients with lung adenocarcinoma of different genders, some potential limitations are still to be addressed. For example, the reason that the proportion of memory B cells in the tumor microenvironment of female LUAD patients was signi cantly higher than that of male LUAD patients was not known, and the study of prognostic immune-related genes lacked biochemical experiments to verify. Further in vivo and in vitro studies on immune cells and genes are urgently needed to clarify the difference between female and male lung adenocarcinoma and provide a theoretical basis for providing effective and personalized tumor immunotherapy.

Availability of data and materials
All the data were collected from TCGA and GEO database. The data sets used in this study are available on request from the corresponding author.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.