The Airway Microbiota of Non-Small-Cell Lung Cancer Patients and Its Relationship to Tumor Stage and EGFR Mutation

DanHui Huang Southern Medical University Nanfang Hospital Jing He Southern Medical University Nanfang Hospital Xiaofang Su Southern Medical University Nanfang Hospital YaNa Wen Southern Medical University Nanfang Hospital ShuJia Zhang Southern Medical University Nanfang Hospital LaiYu Liu Southern Medical University Nanfang Hospital Haijin Zhao Southern Medical University Nanfang Hospital CuiPin Ye Southern Medical University Nanfang Hospital JianHua Wu Southern Medical University Nanfang Hospital Shaoxi Cai Southern Medical University Nanfang Hospital Hangming Dong (  dhm@smu.edu.cn ) Southern Medical University https://orcid.org/0000-0002-9573-0721

cancer remained in its infancy and deep knowledge of the interplay between lung cancer with different clinical parameters and lung microbiota needed to be further explored. TNM stage remains the most important prognostic factor in predicting recurrence rates and survival time. The 5-year-survival rate of lung cancer is signi cantly affected by tumor anatomic stages, from 87%-97% of stage I to 10%-23% of stage IV [12]. Through the analysis of 165 cases of normal tissue adjacent to lung cancer, an early study found that the α diversity and genus Thermus was more abundant in late-stage (stage IIIB and IV) than that in early-stage [13] (13) (13) [13], suggesting that lung microbiota participated in the development of different stages of lung cancer. However, the study only included 7 subjects in IIIB stage and 7 subjects in IV stage. Therefore, more studies regarding the association between tumor anatomic stage and lung microbiota should be conducted to nd out more potential bacterial markers linked with the stepwise change of lung cancer from early-stage to late-stage.
Lung cancer staging traditionally relies on the TNM staging system. Since the stage of lung cancer was associated with lung microbiota, detailed understanding regarding the association between N, M classi cations and lung microbiota should be explored. Previous studies suggested that speci c genera might be engaged with the metastasis of lung cancer patients [13,14]. In vivo mechanistic investigations found that certain species might contribute to the development of extrathoracic or intrathoracic metastasis via enhancement of adhesion of lung cancer cell or regulation of lung immune system [15][16][17]. Therefore, it is plausible to hypothesize that the lung microbiota may be identi ed as relevant to N and M classi cation.
Epidermal growth factor receptor (EGFR) is a paramount therapeutic target for the treatment of lung cancer. Tyrosine kinase inhibitors (TKIs) which target the kinase domain of EGFR are especially effective in NSCLC patients whose tumors harbor activating mutations in the tyrosine kinase domain of the EGFR gene. Bacterium that carried genotoxic markers could promote the accumulation of genetic lesions and initiated cancer development [18]. Current evidences suggested that some pathogens might play a role in driving EGFR gene mutation. A retrospective study found lung adenocarcinoma patients who had tuberculosis lesions had a higher probability of having EGFR gene mutations [19]. Another early study demonstrated an association between human papillomavirus and EGFR gene mutation in lung cancer patients [20]. Conversely, EGFR mutation might also regulate lung microbiome since it played a role in maintaining airway epithelial barrier via activation of Claudin 1, a member of tight junction protein [21].
However, the association between EGFR gene mutation and lung microbiota was unknown. Thus, it is plausible that lung microbiota may have a connection with EGFR gene mutation among NSCLC patients.
In this study, we used next-generation sequencing to identify airway microbiota in spontaneous sputum of NSCLC patients, aiming to characterize airway microbiota in NSCLC patients with different tumor stages (included tumor stage and TNM classi cation), and EGFR gene mutation.

Patients and samples
The study was approved by the Ethics Committee of Nanfang Hospital, Southern Medical University. First diagnosed NSCLC patients were prospectively admitted in this study at NanFang Hospital, Southern Medical University between April 2017 and September 2019. The inclusion criteria were as follows: pathologically diagnosed of NSCLC; aged 30-80; did not receive any anti-tumor therapy such as surgery, radiotherapy, chemotherapy, targeted therapy or immunotherapy; no evidence of community-acquired pneumonia, acute exacerbation of chronic obstructive pulmonary disease, bronchiectasis with infection, acute bronchitis or asthma; had no fever or purulent or gray sputum; without a history of other malignant diseases or multiple primary lung cancer. We conducted a questionnaire and reviewed the electronic medical records to obtain demographic and clinical data including age, sex, smoking status, antibiotics usage, TNM stage, systemic or pulmonary comorbidities and tumor EFGR mutation. Tumor anatomic stage and TNM classi cation was based on NCNN clinical practice Guidelines of NSCLC (Version 2020. V1). The EGFR mutation was detected based on the ARMS technology in the pathology department of Nanfang Hospital.
Participants were asked to rinse their mouths before sampling. The rst mouthful of phlegm in the morning was collected within 24 hours of hospitalization and transferred into -20℃ refrigerators within 2 hours and then transferred into -80℃ within 1 week.
2.2 DNA extraction, 16S rRNA ampli cation, 16S rRNA sequencing Sputum samples kept on dry ice were transferred to Sagene Biotechnology Company, GuangZhou. DNA was extracted from samples using Hipure Bacterial DNA kit (Mageon, China) using standard techniques.
The V3-V4 region of 16S rRNA gene was ampli ed using speci c primers(16S_341F:5'-CCTAYGGGRBGCASCAG-3';16S_806R:5-GGACTACNNGGGTATCTAAT). PrimeSTAR HS DNA Polymerase was used for PCR reaction. The concentration and length of the PCR products were detected by 1% agarose gel electrophoresis. Samples with a bright main strip were used for further experiments. Sequencing libraries were conducted using the NEBNext® UltraTM DNA Library Prep Kit for Illumina® sequencing (New England Biolabs, United States). The quality of the library was evaluated under a Qubit@ 2.0 Fluorometer (Thermo Scienti c) and Agilent Bioanalyzer 2100 system. Sequencing was conducted to generate 250-bp paired-end reads using an Illumina HiSeq 2500 sequencer according to the manufacturer's instructions.

Microbiota analysis
Raw data was obtained and then further ltered to eliminate reads with adapter pollution and low quality to obtain clean reads by using QIIME2 [22]. Clean sequences were clustered by 97% identity into operational taxonomic units (OTUs) using UPARSE [23]. Representative sequence of each OTU was annotated into taxonomy against Greengenes database [24].
We applied OTUs data in online microbiome data analyze platform (MicrobiomeAnalyst) (https://www.microbiomeanalyst.ca/) to compare microbiota community structure at both intercommunity and α-diversity level and β-diversity level. For α diversity, we chose Chao1 value, Simpson index and Shannon index to evaluate. For β diversity, we estimated using Bray-Curtis distance and visualized by principal coordinate analysis (PCoA). Differential taxonomy was identi ed by LEfSe (Linear discriminant analysis (LDA) effect size) analysis in an online platform (GALAXY) (http://huttenhower.sph.harvard.edu/galaxy). PICRUSt2 was used to predict the functional pro ling of microbial communities based on the 16S rRNA sequence [25]. Metabolic function predictions were based on MetaCyc [26] database. Differentially present pathways between groups were analyzed with welch t test using STAMP [27]. The network analysis on the genus level was carried out with SparCC [28]. P value ≤ 0.05 and SparCC correlation scores ≥ 0.5 or ≤ -0.5 were included for networks inference.

Statistical analysis
The software SPSS (V 23.0) was used for statistical analysis. The continuous variables were compared between two groups by Mann-Whitney U test or independent t test. The categorical variables were compared by chi-square test, continuity-adjusted chi-square test, Fisher's exact test. P value<0.05 was considered statistically signi cant.

Subjects clinical characteristics and sputum microbiota in NSCLC
Spontaneous sputum samples were collected from 116 NSCLC patients preliminarily. After carefully assessment, 85 patients who met eligible criteria were nally taken into further analysis. The procedure of patients' recruitment and exclusion was shown in Figure 1. The average number of trimmed sequences reads number of the 85 subjects was 33271 (7869, 44193). OTU rarefaction curve was constructed to evaluate sequence depth (Supplementary gure S1). The result indicated that sequence depth of sputum samples was su cient enough to reach a reliable estimate of microbiome structure. The median age of all patients was 59.21±8.75 years. The clinical characteristics of the 85 patients were listed in supplementary table S1. Among the 85 patients, 66 (78%) were adenocarcinoma, 18 (21%) were squamous cell carcinoma, 1 was unidenti ed type of NSCLC. 13(15%) patients were in tumor stage , 9 (11%) were in stage , 11 (13%) were in stage and 40 (47%) were in stage .

The Association between sputum microbiota and NSCLC clinical stage
Stage III and stage IV lung cancer patients are on a continuum with respect to tumor burden. It is well accepted that a great number of lung cancer patients with anatomical stage III also harbor micrometastases. A previous study found that the α diversity of microbiota of non-malignant tissues adjacent to stage IIIB lung tumor tissues was similar with that of stage IV [13], suggesting that the airway microbiota may be similar between stage III and stage IV lung cancer patients. To evaluate the similarity of microbiota between stage III and stage IV patients, we compare the sputum microbiota between these 2 groups via diversity analysis and differential analysis. Among the 11 NSCLC patients in stage III, 4 patients were in stage IIIA, 5 patients were in stage IIIB, 2 patients were in stage IIIC. Baseline information included age (independent samples T test, P=0.21), BMI index (Mann-Whitney U test, P=0.536), smoking status (continuity-adjusted chi-square test, P=0.249), antibiotics treatment before sampling (continuityadjusted chi-square test, P=0.121), pathological type (continuity-adjusted chi-square test, P=0.191) was comparable between the groups. Chao1, Simpson index, and Shannon index were selected to estimate the α diversity of the lung microbiome community. α diversity between stage III and stage IV patients was similar (Mann-Whitney U test, P=0.519 for Shannon; P=0.783 for chao1; P=0.261 for Simpson index) (Supplementary gure S3 A-C). β diversity based on Bray Curtis distance was used to estimate the β diversity of lung taxonomy community structure in different groups. The result showed that there was no signi cant difference in taxonomy structure between stage III and stage IV patients (PERMANOVA test, P=0.905) (Supplementary gure S3 D). LEfSe analysis was conducted to identify whether differential taxonomy existed between stage III and stage IV patients. Only genus Paludibacter was found to be signi cantly different between the 2 groups (Supplementary gure S3 E). The relative abundance of Paludibacter was only 0.01% in stage III and 0.05% in stage IV patients. Taken together, the results above suggested that sputum microbiome of stage III and stage IV patients was similar.
Since the sputum microbiome of stage III and stage IV was similar, we divided the lung cancer patients into 2 groups: stage I and stage II (Early stage, ES) and Stage III and stage IV (Advanced stage, AS) and evaluate the microbiota difference between these 2 groups. Baseline information included demographic and clinical characteristic were comparable between AS and ES groups (supplementary table S2). The relative abundance of phylum level and genus level of ES and AS group were shown in Figure  For the β diversity, Bray Curtis distance based on genus level was performed. The result showed that there was signi cantly different taxonomy structure between patients in ES group and AS group (genus level, PERMANOVA test, P=0.045) ( Figure 3D).
Differentia analysis using Lefse identi ed that phylum Firmicutes, genera Peptoniphilus, Granulicatella, Hylemonella, Actinobacillus, SMB53 and Gemella were signi cantly enriched in ES group, and phylum Actinobacteria, genus Actinomyces were signi cantly enriched in AS group ( Figure 4A). The relative abundance of phyla Firmicutes and Actinobacteria, genera Granulicatella, Actinomyces and Actinobacillus were ≥ 0.1% and were shown in Figure 4 B, C.
Functional analysis based on metaCyc database identi ed 29 differentially abundant pathways (Figure 4 D). The largest 3 pathways which had higher proportion in ES patients were anhydromuropeptides recycling, gondoate biosynthesis (anaerobic) and L-lysine biosynthesis II. For the differential abundant pathways had higher relative abundance in AS group, incomplete reductive tricarboxylic acid (TCA) cycle, NAD salvage pathway I and phosphopantothenate biosynthesis I were the top 3 differentially abundant pathways. Genus Actinomyces was positively correlated with the NAD salvage pathway (Spearman rank correlation, P value < 0.0001, r=0.547).
Co-abundance analysis based on SparCC was conducted. The sputum microbiota structure of ES lung cancer patients was more complex and better organized than the taxonomy structure inferred for patients in AS group (Figure 5 A, B). The taxonomy structure of ES group was composed of 33 genera while the structure inferred for AS group was composed of 19 genera. The number of inter-genus correlations in ES group was 78, while only 44 in AS group. The interactions between genus Streptococcus and other genera (Porphyromonas, Prevotella, Capnocytophaga, Veillonella, Atopobium, Actinomyces, Rothia, Granulicatella) were exclusively co-occurrence in the AS group. Co-occurrence between Actinomyces and genera Rothia and Atopobium was ubiquitous among 2 groups, while co-occurrence between Actinomyces and genera Granulicatella, Veillonella, Prevotella and Streptococcus were exclusive in AS group.

The role of sputum microbiota on NSCLC intrathoracic metastasis and lymph node metastasis
Since tumor stage is associated with the organism metastasis and lymph node metastasis, a further analysis was conducted to explore the linkage between sputum microbiota and these clinical parameters.
Previous mouse studies suggested that the homeostasis of commensal lung microbiota may affect intrathoracic metastasis [17] and extrathoracic metastasis [15] and these 2 phenomena may depend on different mechanisms. Thus, we hypothesized that airway microbiota associated with intrathoracic (ipsilateral or contralateral lung metastasis or pleural metastasis) and extrathoracic metastasis was different. Among the 85 NSCLC patients, 15 were with intrathoracic metastasis and without extrathoracic metastasis (Intra group), only 3 patients were with extrathoracic metastasis and without intrathoracic metastasis, 28 were without neither intrathoracic nor extrathoracic metastasis (Non_M group). We further explore the characterization of sputum microbiota among Intra and Non_M patients. Baseline information was comparable between Intra and Non_M group (Supplementary table S3 Figure S5 D). The largest 3 pathways which had higher proportion in Intra patients were incomplete reductive TCA cycle, tetrapyrrole biosynthesis II (from glycine), tetrapyrrole biosynthesis I (from glutamate). For the differential abundant pathways had higher relative abundance in Non_M group, purine ribonucleosides degradation, lactose and galactose degradation I, L-lysine biosynthesis II were the top 3 differentially abundant pathways. Genus Peptostreptococcus was positively correlated with the incomplete reductive TCA cycle (Spearman rank correlation, P value=0.017, r=0.5779).
Next, we explored the association between sputum microbiota and lymph node metastasis. Compared with LNM_N, genera Parvimonas and Pseudomona were signi cantly increased in LNM_Y, while phylum Proteobacteria and genera Neisseria, Actinobacillus, Eikenella were signi cantly declined in LNM_Y (Supplementary Figure S7 A). All the above-mentioned differential taxonomy except for genus Eikenella was ≥0.1%. The relative abundance of each differential genus and phylum were listed in Supplementary Figure S7 B, C Functional pro le prediction based on Metacyc database identi ed 23 differential metabolic pathways (Supplementary Figure S7 D). L-valine biosynthesis, L-isoleucine biosynthesis I (from threonine), Lisoleucine biosynthesis II were the top 3 differential pathways that were more abundant in LNM_Y group. Anhydromuropeptides recycling, 8-amino-7-oxononanoate biosynthesis I, biotin biosynthesis I and ppGpp biosynthesis were the top 3 differential pathways that were more enriched in LNM_N group. Genus Pseudomonas was associated with L-valine biosynthesis (Spearman rank correlation, P value= 0.012, r=0.468).

The Association between sputum microbiota and NSCLC EGFR gene mutation
Among the 65 lung adenocarcinoma patients, 44 patients with EGFR mutation testing were available in subgroup analysis. Finally, 21 were with EGFR mutation-positive (EGFR+), 23 were with EGFR mutationnegative (EGFR-). Patients with EGFR mutation were more likely to be never smoker (Fisher exact test, P=0.036) and female (Fisher exact test, P=0.031). Other baseline information included age, BMI, tumor stage, antibiotics usage was comparable between 2 groups (Supplementary table S5). α diversity between EGFR+ and EGFR-was similar (P=0.1054 for chao1; P=0.1532 for Simpson index; P=0.0820 for Shannon) (Supplementary Figure S8 A-C). β diversity based on Bray Curtis distance was conducted to estimate the bacterial community composition in different groups. The result showed that there was no association between EGFR mutation and airway taxonomy structure (PERMANOVA test, P=0.212) (Supplementary Figure S8 D).
LEfse analysis identi ed that EGFR mutation was associated with signi cantly enriched level of phyla Bacteroidetes and Tenericutes, genera Sharpea, Prevotella, Porphyromonas, Parvimonas, Desulfovibrio, Mycoplasma, Actinobacillus, Dialister, and Eikenella (Figure 6 A). Subgroup analysis limited to nonsmoker subjects was conducted. The result showed similarly that phylum Bacteroidetes and genera Parvimonas and Actinobacillus were associated with EGFR mutation (Figure 6 B). The relative abundance of both genera Parvimonas and Actinobacillus and phylum Bacteroidetes were ≥0.1% and were shown in Figure6 C, D PICRUSt2 based on Metacyc prediction identi ed that superpathway of L-aspartate and L-asparagine biosynthesis, preQ0 biosynthesis and queuosine biosynthesis were the most 3 signi cantly abundant pathways in EGFR mutation non-smoking group and L-isoleucine biosynthesis II, L-isoleucine biosynthesis II III and superpathway of branched amino acid biosynthesis were the top 3 pathways that were signi cantly enriched in EGFR negative non-smoking group (Figure6 E).

Discussion
Growing pieces of evidence suggested that the development of cancer is affected by human commensal microbiota through in ammation, immunity, metabolism pathways [29]. Recently, various studies identi ed the alteration of airway microbiota among NSCLC patients [5,7,9,11,[30][31][32][33][34]. The interplay between microbiota and lung cancer is complex. However, only few studies focused on the association between airway microbiota and tumor clinical parameters, includes tumor anatomic stage, metastasis, and gene mutation. In this study, we reported the characterization of sputum microbiota among NSCLC patients with early stage (stage I and stage II) and advanced stage (stage III and stage IV). More deeply, we explored the association between sputum microbiota and tumor N stage and intrathoracic metastasis. Besides, we investigated the linkage between EGFR mutation of lung adenocarcinoma and sputum microbiota.
Using 16S rRNA sequencing to pro le the sputum microbiota in NSCLC patients, we found that the most abundant phylum and genus in NSCLC sputum samples were Firmicutes (40%) and Streptococcus (21%), which was consistent with the previous 2 studies analyzed sputum microbiota in lung cancer patients [11,33]. TNM stage is the most predominant factor in predicting NSCLC survival time(35). The stepwise development of NSCLC from early-stage to late-stage was the results of various genetic and epigenetic alterations[36, 37], which may be associated with alteration of airway microbiota. Lung cancer staging system is categorical, however stage III and stage IV lie on a continuum with respect to tumor burden [38]. A great proportion of stage III patients had occult metastasis. The difference of stage III and stage IV lung cancer patients lie on the tumor burden of distant sites, instead of the tumor burden of local reginal sites[38]. Among the 11 III stage NSCLC patients enrolled in this study, 7 (63%) patients were in stage IIIB or IIIC. We found that the α diversity and β diversity between stage III and stage IV patients were not signi cantly different, suggesting that the sputum microbiota might not sensitively re ect the tumor burden of distant site. Similarly, Yu et al collected adjacent tumor tissues from lung cancer patients and found that the α diversity among NSCLC patients in IIIB and IV stage was similar [13]. However, considering that III stage NSCLC is a heterogeneous disease, the difference of sputum or lung tissue microbiota between stage III and stage IV lung cancer should be interpreted in a larger scale study in the future.
Compared with ES stage patients, we found a signi cant reduction of α diversity in AS patients. The signi cant decrease of α diversity in lung cancer patients compared with healthy or non-malignant control was evident in several studies, among which 2 studies used sputum samples[39, 40], 1 study used protected brush samples[41] and 1 study used surgical lung tissues [7]. Taken above, the results suggested that the reduction of α diversity might be a potential marked indicated the development and progression of lung cancer. β diversity between ES and AS lung cancer patients were signi cant different in our study, indicating that the taxonomy community structure differed during the progression of lung cancer. The results of genus network analysis also supported the difference of taxonomy community structure. The SparCC results indicated that the sputum microbiota structure of ES lung cancer patients was more complex and better organized than the taxonomy structure inferred for AS patients.
We reported differential abundant taxonomy among NSCLC patients in AS stage and ES stage. More precisely, phylum Firmicutes, genera Granulicatella, Actinobacillus were signi cantly enriched in ES group, and phylum Actinobacteria, genus Actinomyces were signi cantly enriched in AS group. Granulicatella has been previously identi ed as a member of the normal bacterial ora of the respiratory tract[42] and was implicated in clinical infection such as sinusitis [43]. A study enrolled female lung cancer patients in China and found signi cantly enriched genus Granulicatella in sputum samples of lung cancer patients compared with healthy control [11]. Another pilot study using metagenomic sequencing technology identi ed Granulicatella adicens, a species belongs to genus Granulicatella, in sputum of lung cancer patients compared with benign diseases [4]. Taken together, our result and the previous studied mentioned above suggested that genus Granulicatella might played a role in the early development of NSCLC. Actinobacillus was a common member of human oral commensal microbiota. Previous studies found that Actinobacillus might in uence the production of in ammatory cytokines [16] and was associated with COPD [44]. COPD is a widely recognized risk factor of lung cancer. Chronic in ammation is a key feature of COPD and could be a potential driver for lung cancer development [45].Thus, genus Actinobacillus might serve as a linkage between COPD and lung cancer. It is plausible that the inhabitation of Actinobacillus lead to a chronic in ammation of the lung and enhance the initiation and early development of lung cancer. An early study identi ed genus Actinomyces was a common anaerobe colonizing in the airway of lung cancer patients [46]. It is interesting to note that in our study the cooccurrence of Actinomyces and genus Veillonella exclusively existed in AS group. Thus, in AS lung cancer patients, the increase of genus Actinomyces could increase the abundance of genus Veillonella. A previous study found that the lower airway of lung cancer patients was enriched for genus Veillonella, which was further found to be associated with upregulation of ERK and PI3K signaling pathways [47]. It was recognized that PI3K and ERK pathways activation was involved in lung cancer metastasis [48].
Besides, we found genus Actinomyces was positively related with NAD salvage pathway, which was signi cantly enriched in AS patients. Cancer cells have enhanced glycolysis for sustaining rapid proliferation. Increased NAD levels enhance glycolysis and fuel cancer cells and is associated with cancer cell survival and enhanced invasion capacity [49,50]. In fact, rate-limiting enzyme, such as nicotinamide phosphoribosyltransferase, was frequently ampli ed in several cancer cells [51]. Thus, in addition to its possible indirect in uence on cancer related signaling pathway, genus Actinomyces might enhance lung cancer progression partly via enhanced NAD production.
Lung microbiota was reported to have in uence on proliferation or metastasis of intrathoracic cancer via regulation of immune system [17,52]. In this study, we reported intrathoracic metastasis was associated with enriched sputum genus peptostreptococcus and decreased Streptococcus. Peptostreptococcus was associated with colon cancer progression [53,54]. However, its relationship with lung cancer remained largely unknown. We noticed that genus Peptostreptococcus are obligate anaerobes. It has been suggested that tumor microenvironment condition such as hypoxia may enhance tumor invasion and metastasis [55]. Therefore, it was plausible to speculate that the anoxic lung tumor condition, which can facilitate intrathoracic metastasis, may favor the growth of some obligate anaerobe, such as genus Peptostreptococcus. It is of interest to notice that incomplete reductive TCA cycle of sputum microbiota was signi cantly enriched in Intra group and was positively related with genus Peptostreptococcus. Reductive TCA cycle existed in anaerobe, including some deeply rooted bacteria, is one alternative strategy for xing CO 2 [56]. During this reaction, oxaloacetate is nally produced [57] and may participate in TCA cycle in cancer cell. Current evidences demonstrated that certain cancer cells, including lung cancer with speci c genome subtype [58,59], rely heavily on the TCA cycle for energy production [60]. A recent study reported that enhanced TCA cycle might promote lung metastasis of certain cancer [61].
In the absence of distant metastasis, the existence of lung cancer spread to a regional lymph node affects clinical treatment options and prognosis. In this study, we found that the α diversity and β diversity were similar between LNM_Y and LNM_N, which indicated that the sputum taxonomy structure did not vary during the progression of lymph node metastasis. LEfse analysis revealed genera Parvimonas, Pseudomona was positively correlated with lymph node metastasis, while genera Neisseria and Actinobacillus was associated with depression of lymph node metastasis. Genus Pseudomonas showed a correlation with adenocarcinoma [62]. A clinical study identi ed that genus Pseudomonas was positively associated with matrix metalloproteinase in transplant lung patients [63], which was associated with metastasis and invasiveness of cancer cell [64]. Genus Neisseria was found to be negatively associated with lymph node metastasis. A previous study discovered that compared with healthy control, the relative abundance of salivary Neisseria was signi cantly decreased among lung cancer patients, which suggested that it might serve as a protective role in lung cancer progression [65]. Metabolic function prediction identi ed L-valine biosynthesis and L-isoleucine were increased in sputum microbiota of LNM_Y patients. Valine and isoleucine belong to branched chain amino acids, which play critical role in the regulation of energy homeostasis, nutrition metabolism, immunity and disease in humans [66].
They can act as signaling molecules regulating metabolism of glucose, lipid, and protein synthesis and serve as potential biomarkers in cancer [66](66)(66) [66]. Since genus Pseudomona was positively associated with L-valine biosynthesis, it was plausible that Pseudomona might apply valine for lung cancer cell and enhance its proliferation and invasiveness.
EGFR mutation was a strong prognostic factor among lung adenocarcinoma patients. The present data here showed that certain sputum bacterium had a close link with EGFR mutation among lung adenocarcinoma. Both in the overall analysis and subgroup analysis limited to non-smoker subjects, the results showed that the relative abundance of phylum Bacteroidetes and genera Parvimonas and Actinobacillus were positively associated with EGFR mutation. The increased EGFR signaling pathway was identi ed as relevant to airway mucin production, epithelial cell repairment [67], thus may have an in uence on the abundance of phylum Bacteroidetes, genus Parvimonas and genus Actinobacillus. On the other hand, other evidences suggested that some speci c bacterium such as genus Parvimonas may cause the EGFR mutation. Currently, several evidences suggested that Parvimonas micra, a member of genus Parvimonas, was enriched in patients with colon cancer [68,69]. Interestingly, in vitro study demonstrated that infection of Parvimonas micra could enhance the ability of human in ammatory cells to generate reactive oxygen species and caused DNA damage of human cells [70], which could cause oncogene mutation and carcinogenesis.
Our study provided novel insight into the association between sputum microbiota, its predicted metabolic function and lung cancer stage, intrathoracic metastasis, lymph node metastasis and EGFR mutation.
However, there are some limitations in our study. Firstly, the number of patients enrolled in this study is not large enough, so there may be heterogeneity. Secondly, the use of sputum can not surrogate lung cancer tissue. It should be caution to interpret intratumor microbiota using our results. Thirdly, the discovery of speci c bacterial genera to distinguish lung cancer with various important clinical parameters hypothesis lacked validation cohorts, which may result in the false positive value and unreliability. Fourthly, the study is a cross-sectional study and only illustrates the phenomenon from microbiology. The mechanism of the microbiota and the causal relationship needed further exploration.

Conclusions
Collectively, the present data showed association between important clinical parameters of lung cancer and airway microbiota. The taxonomy structure differed between patients in early stage and advanced stage. The tumor stage, intrathoracic metastasis, lymph node metastasis, and EGFR mutation were associated with alteration of speci c airway genera and predicted metabolic function of sputum microbiota. Our study shed light that airway microbiota might participate in various pathophysiological processes that were importantly related to lung cancer development. Further studies with large scale and multi-omics are needed to achieve a better understanding of the role of microbiota in the development and progression of lung cancer could pave a new way for exploring new therapeutic options and biomarkers.       Differentially abundant of genera Actinobacillus and Parvimonas between EGFR-non-smoker lung adenocarcinoma and EGFR+ non-smoker lung adenocarcinoma; (E) Differential predicted metabolic function based on MetaCyc database between EGFR-non-smoker lung adenocarcinoma and EGFR+ nonsmoker lung adenocarcinoma. *P 0.05, P was calculated using Mann-Whitney test.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.