A Novel Gene Expression Signature-Based on B-Cell Proportion to Predict Prognosis of Patients with Lung Adenocarcinoma

Background: This study aimed to develop a reliable immune signature based on B-cell proportion to predict the prognosis and benet of immunotherapy in LUAD. Methods: The proportion of immune cells in the TCGA-LUAD dataset was estimated using MCP-counter. The Least Absolute Shrinkage and Selector Operation was used to identify a prognostic signature and validated in an independent cohort. We used quantitative reverse transcription-polymerase chain reaction (qRT-PCR) data and formalin-xed paran-embedded (FFPE) specimens immunohistochemistry to illustrate the correlation between prognostic signature and leukocyte migration. Results: We found that the relative abundance of B lineage positively correlated with overall survival. Then, we identied a 13-gene risk-score prognostic signature based on B lineage abundance in the testing cohort and validated it in a cohort from the GEO dataset. This model remained strongly predictive of prognoses across clinical subgroups. Further analysis revealed that patients with a low-risk score were characterized by B-cell activation and leukocyte migration, which was also conrmed in FFPE specimens by qRT-PCR and immunohistochemistry. Finally, this immune signature was an independent prognostic factor in the composite nomogram of clinical characteristics. Conclusions: In conclusion, the 13-gene immune signature based on B-cell proportion may serve as a powerful prognostic tool in LUAD.


Introduction
Lung cancer is one of the malignant tumors with the highest morbidity and mortality worldwide; the overall 5-year survival rate is about 20% (1). Nonsmall cell lung cancers (NSCLCs) represent 85% of lung tumors. They encompass multiple cancer types, such as lung adenocarcinomas (LUADs), lung squamous cell carcinomas (LUSCs), and large-cell cancers. Among them, LUADs and LUSCs are the largest NSCLC subgroups (2). Therefore, effective treatments for LUAD have always been the focus of research. Over the past 10 years, the understanding of the immune system and its role in the development and progression of cancer has continued to deepen, leading to remarkable progress in the eld of cancer immunotherapy (3). Immunotherapy has been widely used in the rst-line and second-line treatments of NSCLC(4-6), which has inspired people's enthusiasm for elucidating the prognostic and pathophysiological effects of the tumor microenvironment (TME). The TME, including cancer-associated broblasts (7,8), extracellular matrix (9), epithelial cells (10), myeloid cells (11), and tumor-in ltrating lymphocytes (12), affects the malignant progression and immune response of lung cancer. T cells and B cells are important components of tumor-in ltrating immune cells. The research on the functions and mechanisms of T cells is relatively comprehensive; however, the research on B cells is still insu cient.
Tumor-in ltrating B cells have emerged as key players in the TME. Chen and colleagues performed a single-cell RNA-seq analysis of cells isolated from patients with NSCLC and identi ed two major subtypes of B cells, namely the naïve-like and plasma-like B cells (13). They found that the naïve-like B cells suppressed growth, while the plasma-like B cells promoted cell growth in the advanced stage of NSCLC, but inhibited cancer cell growth in the early stage of NSCLC. Wang and colleagues conducted a comprehensive genomic landscape of 149 NSCLC cases and revealed that highly clustered EGFR mutations were associated with in ammatory tumor-in ltrating B lymphocytes, which was also con rmed in the TCGA dataset (14). Tumor-in ltrating B cells also served as local antigen-presenting cells by providing secondary stimulation to Immune in ltrating cells (TILs). Bruno and colleagues demonstrated that tumor-in ltrating B cells e ciently presented antigens to CD4 + TILs and identi ed three CD4 + TIL responses to tumor-in ltrating B cells, which were categorized as activated, antigenassociated, and nonresponsive (15). Hence, a new role was suggested for tumor-in ltrating B cells in their interplay with CD4 + TIL in the TME. Whether tumor-in ltrating B cells have protumor or antitumor effect is still controversial.
Considering the important roles of B cells in the TME, which constitutes a potential novel therapeutic in NSCLC immunotherapy, a novel gene expression signature needed to be urgently developed based on Bcell proportion in patients with LUAD. Therefore, a prognosis signature based on B-cell proportion was established, which was a robust prognostic biomarker and predictive factor that could be used in the clinic.

Materials And Methods
2.1 RNA-sequencing data used to assess the abundance of immune-in ltrating cells The gene expression data (work ow type: HTSeq-Counts) and the corresponding clinical information from the Cancer Genome Atlas (TCGA) website (https://gdc.cancer.gov/) were downloaded using the "TCGAbiolinks" R package (Version 2.14.1). Entrez IDs were converted into gene symbols using the Bioconductor package "org.Hs.eg.db" (Version 3.10.0). Genes with low expression were removed from the pro le. The abundance of immune-in ltrating cells in each sample was assessed with the MCPcounter(16), which provided the abundance score for eight immune populations (T cells, CD8 + T cells, cytotoxic lymphocytes, natural killer cells, B lineages, monocytic lineage, myeloid dendritic cells, and neutrophils) and two stromal populations (endothelial cells and broblasts). The assessment of these cell subpopulations was based on the analysis of gene expression of cell markers. The MCP-counter signature composition of B lineages was as follows: BANK1, CD19, CD22, CD79A, CR2, FCRL2, IGKC, MS4A1, and PAX5. The transcripts of other cell subpopulations were published by the algorithm's author. All cell subpopulation abundances were normalized using the Z score.
2.2 Differential expression analysis and construction of the B-lineage-associated risk signature Differential expression analysis was performed using the "DESeq2" R package (Version 1.26.0) with the standard comparison mode between the two experimental conditions. LASSO algorithm, using the R package "glmnet" (Version 3.0), was built to construct a B-lineage-associated risk signature. The "survival" R package (Version 3.5) was used to select the optimal cutoff value and plot Kaplan-Meier survival curves. The "timeROC" R package (Version 0.4) was used to conduct a time-dependent receiver operating characteristic (ROC) curve analysis.

Microarray data
The transcript expression matrixes from GSE31908, GSE29013, and GSE30219 based on the GPL570 platform, including 131 patients with LUAD, were downloaded from the Gene Expression Omnibus (GEO) database. In these matrixes, the gene expression data for three matrixes were subjected to log2 transformation. The scale method of the "limma" R package (Version 3.42.2) was used to normalize the data.

Patients with LUAD from Cancer Hospital A liated to Nanjing Medical University
A total of 12 patients who underwent surgery without neoadjuvant chemotherapy and were diagnosed with LUAD at Cancer Hospital A liated to Nanjing Medical University (Nanjing, China) were included. The Nanjing cohort consisted of formalin-xed para n-embedded (FFPE) specimens collected from patients who underwent radical surgery between 2018 and 2020. Each patient underwent a standard radical surgical procedure, and all specimens were evaluated by expert pathologists according to eighth edition of the Union for International Cancer Control Tumor-Node-Metastasis (TNM) grading system. All patients underwent regional lymphadenectomy, and the existence of Tumor-Lymph Node-Metastasis (TNM) was pathologically examined. Total RNA was extracted from 4-µm-thick FFPE specimens by manual microdissection using an RNeasy FFPE Kit (Qiagen, Hilden, Germany

Function enrichment and gene interaction analyses
Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses were performed using "clusterPro ler" R package (Version 3.11) based on differentially expressed genes (absolute value of logFC > 1.5; P value < 0.01). GeneMANIA (https://genemania.org/) was used to nd other genes related to a set of input genes using a very large set of functional association data. Association data included protein and genetic interactions, pathways, co-expression, co-localization, and protein domain similarity.

Development and validation of the nomogram
Univariate and multivariate Cox analyses were performed to assess the independent prognostic ability of B-lineage-associated risk signature using "survival" R package (Version 3.5). Then, a concise nomogram of predicting the OS of LUAD was established using R package "rms" (Version 2.10), including four factors. In addition, the predictive accuracies of the nomogram and separate prognostic factors were compared using ROC analyses.

Statistical analysis
Statistical analyses were performed using R (Version 3.6.3) and GraphPad Prism 8. The Wilcoxon ranksum test and Student t test were used to determine differences in comparison of two groups. All statistical tests were two-tailed with a statistical signi cance level set at 0.05 in this study.

A landscape of immune in ltration of patients with LUAD
Immune cell in ltration and TME are vital in tumor immunity. In recent studies, MCP-counter was widely used to quantify the relative abundance of immune cell subpopulations through the expression of multiple immune-in ltrating cell markers, yielding robust results. A total of 400 patients from TCGA-LUAD were screened as TCGA cohort because of having relatively complete demographic information, clinical information, and survival information. MCP-counter was used to visualize the relative abundance of multiple immune-in ltrating cell subpopulations with Z score (Fig. 1A). The signi cant difference in the relative abundance of immune cell subpopulations between different samples was clearly observed. Among these immune cell subpopulations, patients were divided into low expression of B lineage (100 patients) and high expression of B lineage (300 patients). The relative abundance of B lineage signi cantly positively correlated with OS (Fig. 1B). However, MCP-counter was used to quantify the relative abundance of B lineage based on 400 samples in the TCGA cohort. It is di cult for individuals to quantify the abundance of B lineage. Therefore, a B-lineage-associated risk signature was constructed to assess the OS of patients with LUAD and guide individualized diagnosis and treatment strategies. The study design is shown in Fig. 1C.

Construction of a B-lineage-associated risk signature
RNA expression pro les of 400 samples in the TCGA cohort were used to construct a B-lineageassociated risk signature predicting the OS of patients with LUAD; the gene expression of total RNA is shown in Figure S1A. In the TCGA cohort, differently expressed genes were detected between low expression of B lineage (70 patients) and high expression of B lineage (330 patients), revealing that 1057 genes were statistically signi cantly differentially expressed, including 991 downregulated genes and 66 upregulated genes (Fig. S1B). Moreover, cluster analysis of these 1057 signi cantly differentially expressed genes in the TCGA cohort was performed (Table S1). A signi cant clustering was observed between the 2 groups of samples, con rming that these 1057 genes were statistically signi cantly differently expressed (Fig.S1C).
Furthermore, we ltered the differentially expressed genes with absolute value of logFC > 3 and P value < 0.01, including 95 downregulated genes and 1 upregulated genes, to identify genes with large differences between groups as B-lineage-associated risk signature candidate genes (Table S2). Further, 13 genes, including FDCSP, FCER2, CNR2, MS4A1, FCRL1, BLK, TNFRSF13B, CD19, FCRLA, CR2, GH1, KRT20, and ALB, with nonzero regression coe cients with 10-fold cross-validation were found to have maximum prognostic value according to LASSO Cox regression analysis ( Fig. S1D and S1E). Finally, a 13-gene Blineage-associated risk signature was constructed, and the risk score of this risk signature was calculated using the following formula: The coe cient of each gene is shown in Table S4. Based on the B-lineage-associated risk signature, patients were divided into two subgroups. The optimal cutoff value was determined using the "surv_cutpoint" function of the "survminer" R package; the optimal cutoff value was − 16.2 (Fig.S1F). The cutoff value in the TCGA cohort served as the cutoff value to assign patients into high-risk and low-risk groups across all patients with LUAD in the following analysis.

Evaluation and validation of the prognostic ability of Blineage-associated risk signature
The prognostic capability of the B-lineage-associated risk signature in the TCGA cohort was evaluated. Due to the limited patient records available, the duration of treatment is unknown. However, the diversity of treatment methods in patient samples re ects the challenging variables that clinicians encounter when treating LUAD. The Kaplan-Meier analysis demonstrated that the patients in the high-risk group correlated with worse outcome (Fig. 2A).  Fig. 2C).
The risk signature was assessed in an independent validation cohort, meta-GEO cohort, including GSE29013, GSE30219, and GSE31908, to validate the robust ability of the B-lineage-associated risk signature to predict OS. The cohort contained 131 patients with LUAD having survival information and RNA expression pro les. The Kaplan-Meier analysis yielded consistent results that patients in the high-risk group were positive related to worse clinical outcome (Fig. 2D). The risk signature distribution, OS, and 13-gene expression pro le are shown in Fig. 2E. that the B-lineage-associated risk signature was independently involved in predicting prognosis, without being manipulated by any clinical subgroup. The Kaplan-Meier survival analysis between high-risk and low-risk groups in these subgroups illustrated that risk signature was still a robust and independent marker for predicting OS in patients with different sexes (Fig. S2A and S2B), different ages ( Fig. S2C and  S2D), and different TNM stages ( Fig. S2E and S2F).

Difference in immune status and immunotherapy bene ts between high-risk and low-risk patients
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of differently expressed genes were performed to further explore the pathways of statistically differently expressed genes between patients with low expression of B lineage and patients with high expression of B lineage and also understand the potential biological processes affecting the prognosis of patients (absolute value of logFC > 1.5, P value < 0.01, 1057 genes, Fig. S3A and S3B). The GO analysis comprised enriched biological processes, cellular components, and molecular function. These genes were enriched in several immune-related biological processes, including B-cell activation, leukocyte migration, humoral immune response, and regulation of B-cell receptor signaling pathway. In molecular function, these genes were enriched in immunoglobulin binding. The KEGG analysis re ected that these genes were enriched in cytokine-cytokine receptor interaction. These results indicated that differently expressed genes between patients with low expression of B lineage and patients with high expression of B lineage were related to the tumor immune process, suggesting that tumors evade immune surveillance and cause different clinical outcomes due to differences in immune cell activation in the TME and differences in antigen presentation and signal transduction,. Moreover, GeneMANIA online tool was used to observe the interaction of 13 risk signature-related genes (Fig. S3C). The result re ected that these genes were coexpressed and had close physical interactions. In addition, these genes had close relationship with several chemokine-related genes and B-cell surface marker-related genes. Moreover, we got consist result in the correlationship of 13 risk signature-related genes expression, which revealed a extensive coregulation relationship (Fig. S3D). This nding suggested that these genes might have originated from the same underlying structure and might affect the B-cell activation and antigen presentation process, thereby affecting the prognosis of patients with LUAD.
Recent studies have shown that the difference between the TME and immune cell in ltration can signi cantly change the recognition of tumor antigens, presentation of signals, and killing of tumor cells by the patient immune system. The expression of immune cell markers was detected to examine the relative abundance of tumor suppressor immune cell subpopulations in the TME between high-risk and low-risk groups. The tSNE plot revealed that two groups of patients in the TCGA cohort were divided into two different principal component clusters based on mRNA expression (Fig. 3A). The expression levels of MS4A1 (encoding CD20, marker of B cells, Fig. 3B), IGHG1 (marker of plasma cells, Fig. 3C), and CD19 (encoding CD19, marker of B cells, Fig. 3D) were signi cantly high in the low-risk groups, indicating higher abundance of the tumor suppressor B lineages. The expression levels of CD8A (encoding CD8, marker of CD8 + T cells, Fig. 3E), CD4 (encoding CD4, marker of CD4 + T cells, Fig. 3F), and CD3D (encoding CD3, marker of T cells, Fig. 3G) were also signi cantly high in the low-risk groups, indicating higher abundance of the tumor suppressor T-cell subpopulation.

Meanwhile, immune checkpoints inhibitors have been shown to exert antitumor effects by reversing
tumor-immunosuppressive effects. The present study further investigated the relationship between the Blineage-associated risk signature and the expression of immune checkpoints, including CTLA4, TIGIT, PDCD1, TIM3, and LAG3 ( Fig. 3H-3L). These results showed that these immune checkpoints were highly expressed in low-risk patients. In addition, patients were divided into high-risk and low-risk groups by qRT-PCR in 16 patients with LUAD from Jiangsu Cancer Hospital (The primers shown in Table S4).
Immunohistochemical analysis was performed on the surface proteins of the immune cells and important immune checkpoints on para n-embedded pathological sections of these patients, including CD20, CD19, CD8, CD4, CD3, and PD1 (Fig. 4A). Consistent results were obtained at the protein and RNA levels (Fig. 4B). Moreover, We continue to explore whether immunotherapy can bene t in the "immune activation" low-risk group. Unsurprisingly, B-lineage-associated risk signature and immunotherapy bene t showed a correlation trend (Fig. 4C) in GSE135222. These results indicated that high-risk and low-risk groups distinguished by the B-lineage-associated risk signature had different immune activation and e ciency of immune checkpoints axis, leading to different bene t from immunotherapy.

Construction and validation of a nomogram based on the B-lineage-associated risk signature
The predictive ability of the B-lineage-associated risk signature for prognosis was evaluated through univariate Cox regression analysis (P value < 0.05). Then, multivariate Cox regression was used to evaluate the B-lineage-associated risk signature score and several other clinical data, including TNM stage, lymph node invasion and age, as independent prognostic factors (P value < 0.05, Fig. 5A). A nomogram that integrated the B-lineage-associated risk signature and other independent clinical factors (lymphatic invasion, age, and TNM staging) was constructed to provide clinicians with a quantitative approach to predict the prognosis of patients with LUAD (Fig. 5B). Time-dependent ROC curve analysis was used to compare the predictive accuracy between the nomogram, B-lineage-associated risk signature, age, and TNM stage in 1, 2, 5, and 7 years (Fig. 5C-5F). The nomogram model suggested higher prognostic accuracy for 1-, 2-, 5-, and 7-year OS with a larger AUC.

Discussion
LUAD is a malignant tumor with a high incidence worldwide. Early assessment of patient prognosis and effective immunotherapy biomarkers are very important. Traditional classi cation methods, including the TNM staging system, cannot cover the heterogeneity in molecular biology. Meanwhile, the research on the heterogeneity of the TME has become a hot issue in the eld of tumor malignant progression, patient prognosis, and tumor immunotherapy (17)(18)(19)(20)(21)(22). Evaluating the prognosis of patients with LUAD from the perspective of molecular biology and TME is very meaningful for individualized diagnosis and treatment.
In this study, a novel B-lineage-associated risk signature was constructed, which was signi cantly related to OS and tumor immune microenvironment. A large number of clinical trials have shown that the combination of immune checkpoint inhibitors and chemotherapy can signi cantly improve the progression-free survival of patients with advanced NSCLC compared with conventional chemotherapy alone (25,26).However, only part of patients can achieve a long-term, effective immune response from immunotherapy, and therefore a new immunotherapy strategy and research perspective is necessary (27).  (29). B cells and their related pathways work together to promote the aggregation, maturation, and maintenance of tertiary lymphatic structures (TLS) (30). TLS is the lymphocyte aggregate formed in the chronic in ammatory response and similar in structure to the secondary lymphoid organs (31). TLS is de ned as a CD20 + B-cell follicle surrounded by a CD3 + T-cell aggregate of DC-LAMP + mature dendritic cells (30,(32)(33)(34). In many solid tumors including NSCLC, TLS is associated with improved prognosis and immune response (34,35). A total of 13 B-cellassociated transcripts were screened using bioinformatics methods, revealing that they had a strong interaction and co-regulation relationship. Hence, it was possible that they were from the same structure in the sample.
In conclusion, the B-lineage-associated risk signature is a promising biomarker that divides patients into two subgroups with completely different clinical prognosis and immune status. It provides a view of the transcriptome level and TME to clarify the mechanism underlying different prognosis and e cacy of LUAD after immunotherapy.

Declarations
Availability of data and material The data sets supporting the conclusion of this study are available from the "TCGAbiolinks" R package (Version 2.14.1; https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html) and GEO database (https://www.ncbi.nlm.nih.gov/gds).

Ethics approval and consent to participate
This study was approved by the Regional Ethics Committee at Nanjing Medical University. The experiments were undertaken with the understanding and written consent of each patient.
Ethic approval and consent to participate The study was approved by the Regional Ethics Committee at Nanjing Medical University. The experiments were undertaken with the understanding and written consent of each patients. The study methodologies conformed to the standards set by the Declaration of Helsinki.

Con icts of interest
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could role as a potential con ict of interest.

Consent for publication
Written informed consent for publication was obtained from all participants.      year in TCGA cohort.