Microbiota Core in Central Lung Cancer with Streptococcal Enrichment as a Possible Diagnostic Marker

Background: Dysbiosis has been scarcely explored in the respiratory tract with cancer. We aimed to define the bacterial and fungal microbiota of the bronchi in cancer versus a healthy state, also defining the microbiota core, and their correlation with that in saliva and feces; and to detect markers for the early diagnosis of central lung cancer. For this purpose, twenty-five patients with central lung cancer and sixteen healthy controls without antimicrobial intake during the previous month were recruited. Bacterial and fungi distribution was determined by massive sequencing in bronchial biopsies. Complex computational analysis was performed to define for the first time the lung microbiota core. Results: A greater abundance of Streptococcus, Rothia, Gemella and Lactobacillus distinguish the saliva of patients. Affected and contralateral bronchi of these patients have almost identical microbiota dominated by Streptococcus, whereas Pseudomonas was the major genera in controls. Oral and pulmonary ecosystems were significantly more similar in patients, probably due to microaspirations. Streptococcal bronchial abundance differentiates patients from controls by an ROC curve (90.9% sensitivity, 83.3% specificity, AUC=0.897). The mycobiome of controls (Candida) was significantly different from that of patients (Malassezia), with the cancer-affected bronchi similar to their saliva, but different from their contralateral bronchi. Conclusions: Central lung cancer is highly enriched with Streptococcus, and shows significantly differences in their composition from healthy subjects. Alterations are not restricted to the tumor tissue, and seem to be the consequence of microaspirations from the oral cavity. These findings could be useful in the screening and even diagnosis of this pathology.

recruited. Bacterial and fungi distribution was determined by massive sequencing in bronchial biopsies. Complex computational analysis was performed to define for the first time the lung microbiota core.
Results: A greater abundance of Streptococcus, Rothia, Gemella and Lactobacillus distinguish the saliva of patients. Affected and contralateral bronchi of these patients have almost identical microbiota dominated by Streptococcus, whereas Pseudomonas was the major genera in controls.
Oral and pulmonary ecosystems were significantly more similar in patients, probably due to microaspirations. Streptococcal bronchial abundance differentiates patients from controls by an ROC curve (90.9% sensitivity, 83.3% specificity, AUC=0.897). The mycobiome of controls (Candida) was significantly different from that of patients (Malassezia), with the cancer-affected bronchi similar to their saliva, but different from their contralateral bronchi.
Conclusions: Central lung cancer is highly enriched with Streptococcus, and shows significantly differences in their composition from healthy subjects. Alterations are not restricted to the tumor tissue, and seem to be the consequence of microaspirations from the oral cavity. These findings could be useful in the screening and even diagnosis of this pathology.

Background
Culture-independent techniques have revealed the composition of a stable microbiota within the distal airways [1]; and although normality criteria have not yet been established, atypical compositions linked to certain respiratory diseases such as cystic fibrosis, COPD or asthma have been detected [2].
Dysbiosis is typically detected surrounding tumors tissues [3], and this trait has been poorly explored in the respiratory airway tissues, and using only surgical samples [4][5][6][7][8]. An understanding of the microbiota is necessary to decipher its possible causal role in cancer, and also to elucidate the prognosis and response to immunomodulatory therapies, because microorganisms and/or their metabolites shape the local microenvironment, influence the immune response, and impact the final battle against cancer [9]. Finally, the presence and/or abundance of specific bacteria could be used as a marker, as occurs with Streptococcus gallolyticus sbsp. gallolyticus in the colorectal cancer.
The aims of the present study were to: i) define the bacterial and fungal microbiota of central lung cancer, in relation to the corresponding contralateral bronchus and compared with healthy controls, also defining the core of those microbiotas; ii) correlate the pulmonary microbiota in lung cancer with the salivary and fecal compartments; iii) detect possible markers for the early diagnosis of central lung cancer.

Methods Patients and samples
Twenty-five patients (24 men, mean age of 68 years) diagnosed with central lung cancer, directly visible and biopsied by bronchoscopy, were recruited ( Table 1). The exclusion criteria included the intake of antibiotics, pre or probiotics, and systemic corticoids during the previous 4 weeks; the presence of acute infection; radio or chemotherapy in the last year, and immunodeficiency. Tumors were histologically classified into non-small cell lung cancer (n = 18, including 10 squamous, 4 adenocarcinoma, and 4 undifferentiated); and small cell lung cancer (n = 7). Each patient contributed with 4 samples: i)-saliva collected from a rinse with sterile distilled water, prior to the bronchoscopic procedure, ii)-biopsies of affected, and iii)-contralateral bronchi, and finally iv)-a fecal sample (provided by 18 out of the 25 patients). Bronchoscopy procedures were performed, as usually, by the nasal route (or oral if not possible) with a local instillation of lidocaine. Prior to the cancer sampling, biopsies for microbiota determination were obtained from contralateral nontumor-affected tissues. Table 1 Demographic and clinical data of patients with lung cancer and healthy controls, differentiating between the entire control population (n = 16) and those included in the analysis of bacteria and fungi (n = 12) Statistically significant values that can differentiate control cases are highlighted in bold. Simultaneously, 16 healthy controls were included, and each contributed the following: i)-oral microbiota and ii)-a single biopsy of their healthy bronchi. Controls without respiratory symptoms (except 2 with chronic cough) underwent bronchoscopy for non-cancer related indications (benign tracheal stenosis 9, fake haemoptysis 3, chronic cough 2, control of a previous endobronquial hamartoma resection 1, and dyspnea 1), and all of them had normal spirometries. All samples were immediately frozen at -80ºC after collection.

Microbiota composition
Samples were slowly defrosted at -20ºC for 24 h and 4ºC for another 24 h, to prevent bacterial death and DNA fragmentation. Total DNA was obtained by the QiaAmp kit (Qiagen) from the biopsies, from the pellet of saliva after centrifugation, and from 200 µl aliquots of a solution of 0.5 gr of feces in 5 ml of water. Bacterial composition was determined by PCR amplification of the 16S rDNA V3-V4 region using published primers [10] in all samples, whereas the mycobiome was only analyzed in bronchial and saliva of the 16 controls and in a subset of 6 patients by amplification of the ITS-1 region [11].
PCR products were pooled equally and submitted to massive sequencing (

Bioinformatics for bacterial community characterization
Raw sequence data (FASTQ files) from 16S rDNA sequencing were demultiplexed and quality assessed using the q2-demux plugin. Then, denoising, filtering, and chimera removal were performed with DADA2 pipeline [12] (via q2-dada2 plugin), thus identifying all amplicon sequence variants (ASVs) [13] and their relative abundance in each sample. To minimize the number of spurious ASVs, those unique sequences with a total abundance lower than 7 reads across all samples were filtered out [14].
ASVs were firstly aligned and then used to construct a phylogenetic tree by following the align-totree-mafft-fasttree pipeline [15][16] from the q2-phylogeny plugin. ASVs were taxonomically classified by using the classify-sklearn naïve Bayes taxonomy classifier (via q2-feature-classifier plugin) [17] against the Silva 132 database [18]. Sequences not assigned to any taxa or classified as Eukaryote or Archaea, were filtered out. Diversity analysis was done using q2-diversity plugin, after samples were normalized via rarefaction (subsampled without replacement). Diversity analysis comprised alpha diversity metrics (Chao1, Shannon index, and Faith-pd [19], which measure microbiota degree of diversity) and beta diversity metrics (unweighted UniFrac [20] and weighted UniFrac [21], which measure microbiota composition differences between samples, whileup-weighting differences in ASVs phylogenetic distance). Unweighted UniFrac reports differences in the presence or absences of ASVs, while weighted UniFrac also reports differences in the abundance of ASVs.

Bioinformatics for fungal community characterization
Raw sequence data from ITS1 sequencing were demultiplexed and quality assessed using the q2demux plugin. Then, q2-itsxpress plugin [22] was used to quality filter and trim the ITS region from sequences. After that, denoising, merging, and ASVs calling were done by DADA2 pipeline (via q2-dada2 plugin), thus identifying all ASVs and their relative abundance in each sample. Very low abundance ASVs (total n < 7) were filtered out, as was done for bacteria. ASVs were taxonomically classified by using the classify-sklearn naïve Bayes taxonomy classifier, against the UNITE 7.2 database [23]. Diversity analysis was done using q2-diversity plugin, after samples were rarefied (subsampled without replacement). Diversity analysis comprised alpha diversity metrics (Chao1 and Shannon index), and beta diversity metrics Bray-Curtis distance [24], which measure nonphylogenetic microbiome composition differences between samples). The 50 samples using for mycobiome determination by ITS1 amplification and massive sequencing yielded 6,034,038 reads and the negative control performed 5,036 reads. After quality filtering, trimming, merging, and discard of very low frequency sequences, 3,535,951 reads of 515 ASVs were finally obtained. One saliva sample from a control, which read count was very low, was not included in downstream analyses due to sampling depth requirements (> 1000 reads). Rarefaction was set to 13,571 sequences per sample, since that sampling depth conserved all samples and reached a plateau according to alpha rarefaction plot inspection. The negative control was fully inspected to identify which taxa were present.

Statistical analysis
Statistical differences in mean alpha diversity metrics between patients and controls were calculated by Kruskal-Wallis test [25]. Differences in microbiota composition between samples were assessed and plotted by performing Principal Coordinates Analysis (PCoA) based on the beta diversity metrics.
Permutational multivariate analysis of variance (PERMANOVA) [26] was performed to determine which factors explained statistically significant variance in microbiota composition. All statistical tests were conducted via q2-diversity plugin from QIIME2. To determine which specific taxa explained beta diversity differences, differential abundance analyses were performed only in variables that yielded statistically significant differences in beta diversity analysis. For that purpose, linear discriminate analysis effect size (LEfSe) was used for testing taxonomic comparisons [27].

Sample filtering
According to 16 s rDNA sequencing throughput analysis, three samples from affected bronchi (patients 23, 24, and 25) and four samples from control bronchi (controls 2, 3, 7, and 16) were finally excluded from downstream analysis since they did not reach minimal sequencing depth requirements (set at > 1000 reads per sample).

Alpha diversity
Stools and saliva had similar alpha-diversity values, while bronchi were significantly more diverse according to Faith's PD index, which considers not only the richness and relative abundance of ASVs but also its genetic distance (Fig. 1). Chao1 and Shannon diversity indices were significantly higher in patients' bronchi than in controls' (p < .001), while the Faith's PD indexes were similar. The diversity of saliva was comparable between patients and controls.

Microbiota composition
The bacterial phyla distribution is shown in Fig. 2. Patients' saliva presented a higher density of and Leptotrichia (1-2%). The remaining genera represented less than 1% of the total microbiota abundance.
The microbiota core composition of saliva from patients and controls were highly similar, although significant differences on bacterial proportions were detected by LEfSe and weighted Unifrac distance (PERMANOVA p < .005), but not by the unweighted UniFrac distance analysis (Fig. 3), emphasizing that those differences are linked to relative abundance of common taxa and no to presence or absence of specific taxa. PCoA within the major genera (limited to 90% of the total composition)

Bronchi
More than 450 bacterial genera were identified among the bronchial microbiota, with statistically significant differences in their composition or distribution among patients and controls, while the microbiota of the cancer-affected bronchus was almost identical to its contralateral counterpart (Fig. 4). Particularities depending on the histological type of cancer were not detected.
Lung cancer-related genera were Streptococcus (19% affected bronchus and 24% contralateral), Prevotella ( computational tools revealed a more complex ecosystem than feces (Fig. 4), also defining for the first time the core lung microbiota.

Correlation between oral, fecal and bronchial microbiota
The distance between oral, fecal and bronchial microbiotas of each individual was assigned by pairwise analysis based on weighted UniFrac distance. The result of this analysis is shown in Fig. 5 and revealed that saliva and lung ecosystems were significantly more close (p < .001) in patients with lung cancer.
Streptococcal abundance characterizes the microbiota of patients Streptococcus belongs to the Firmicutes phylum and their abundance was consistently the factor most able to distinguish the microbiota of patients from controls. This genus has a complex taxonomy and comprises numerous groups and species; we compared by phylogeny the 95 ASV obtained in at least two samples, also considering the exchange between ecosystems (Fig. 6). Feces were the most remote niche, with saliva and bronchi also distant. It is important to note that > 95% of the ASV detected in the patients' bronchi were also in their saliva, while on the contrary only 36% of those present in saliva were also present in the lung.

Mycobiome composition
Fungal characterization through ITS1 sequencing was determined in the 16 controls (saliva and bronchus) and a subset of 6 patients (saliva, affected and contralateral bronchi). Most of the fungal reads (n = 3,246) corresponded to Collybia and Amphinema genera from the Agaricomycetes phylum.
Further inspection of the data demonstrated that those genera were absent in almost all samples, ruling out a cross-contamination event. Significant differences in the Chao1 index between the affected, contralateral, and control bronchi, and between patients' and control's saliva were not observed (Fig. 7). The Shannon index was significantly higher in affected bronchi from patients compared to that in bronchi from controls (p < .03), in accordance with the results obtained for bacteria.
A beta diversity analysis of fungal communities was performed by computing Bray Curtis distances, which was then used to build a PCoA plot (Fig. 7). In terms of fungal composition, affected bronchi from patients were similar to their saliva (p > .05), but different from both their contralateral bronchi (p < .006), and the bronchi (p < .001) and saliva (p < .007) from controls. This trend supports the aforementioned bacterial findings, which also suggest an interconnection between both anatomical ecosystems in lung cancer patients. A differential abundance analysis between affected bronchi from patients and from controls showed an enrichment of Malassezia in patients, whereas controls had a higher abundance of Candida (Fig. 7).

Discussion
The role of microbiota in carcinogenic processes has not yet been elucidated and will probably be different for each tumor and their localization. However, recent data clearly indicate that the microbiota contributes to the prognosis of cancer and determines the response to treatments, particularly responses to the new immunomodulatory therapies [9,28]. Criteria for the normal composition of lung microbiota have not yet been established, but the available data indicate that their composition in cancer patients differs considerably from that of healthy individuals [4][5][6][7][8].
Sampling in the respiratory system requires invasive methods, and here we have characterized for the first time, as far as we know, the respiratory microbiota surrounding central lung cancer by direct sampling of the tumor tissue by bronchoscopy. Our results reinforce the previously known particularities of the lung microbiota, which considerably differ from oral and stool microbial communities [29]. One important concern is the theoretical risk of contamination with the upper microbiota during the bronchoscopy, but our results allowed us to rule out significant contamination, as other authors had previously suggested [30][31][32]. Data analysis by various methodologies highlighted the particularities of the microbiota associated with cancer, but also defined the respiratory microbiota core in healthy conditions. Moreover, we considered strict inclusion criteria to avoid the possible bias of antibiotic or corticosteroid therapy, and the bacterial exchange between the anatomically separated niches such as saliva and feces. Finally, the mycobiome composition of central lung cancer was studied.
Significant differences in the lung ecosystem have been described based on health status or a lung cancer diagnosis in sputum [33], bronchoalveolar lavage [5], protected specimen brushing [6], cytological brushing [34], and surgical tissue [7,8,35]. The major contribution of our work is the study of the microbiome surrounding central cancer via direct sampling, but our results are not necessarily applicable to the distal airway. Interestingly, this central ecosystem was more diverse than the fecal or oral compartments. This finding contrasts, at least in part, with decreases in the microbiota biomass from upper to lower tract described in healthy people [32].
Low biodiversity is usually observed in various pathologies, including cancer, but we found significantly higher alpha-diversity values in cancer than in the control group. Other authors have published analogous [8] and opposing results [5,6,35]. Moreover advanced cancer stages [36] and reduced recurrence-free survival and disease-free survival [35] have been associated with higher values of alpha diversity. Eighty-eight percent of our patients were at tumor stage III or IV, and their survival was only of 198 days in the follow up. Furthermore, other factors such as environmental exposure, residence in high-population density areas [4,36], and pack-years of tobacco smoking, can increase the biodiversity of the lung microbiota, whereas chronic bronchitis reduces it [36]. All our patients had been smokers, while most of the controls (56%) had not (Table 1).
In terms of beta diversity, the tissues involved in cancer and the contralateral bronchus had almost identical compositions, as its have been previously published [6,34,35], especially Pseudomonas, has been also previously corroborated [5,6,37]. Higher concentrations of Streptococcus, Blautia, Akkermansia, and Rothia were observed in patients, but Streptococcus was consistently the major marker linked to lung cancer. This fact has been previously reported in saliva [5], sputum [33,38], bronchoalveolar lavage [5], lung tissue [7], and protected specimen brushing [6,34]. The exhaustive analysis of the obtained ASVs allows us to suggest that the streptococcal variants present in lung tissue are similar to those found in saliva or feces, but the low length of the 16 s rDNA amplicon sequences (460 bp) preclude us from making robust assumptions in this regard. New studies including Streptococcus cultures and molecular characterization of the species are needed to decipher whether oral lineages are different from those found in the lungs or feces and thus establish whether there are any markers truly associated with lung cancer, as occurs with Streptococcus gallolyticus subsp. gallolyticus and colorectal carcinoma [39].
There is increasing evidence of a link between Streptococcus and lung cancer. Recently, Tsay et al. [34] detected Streptococcus and Veillonella enrichment in the lower airways with ERK and PI3K pathways upregulation -an early event that contributes to cell proliferation, survival and tissue invasion-combining microbiome and transcriptomic signatures. The major question that a remains to be answered is whether the abundance of streptococci is a cause or consequence of the tumor process, as has been questioned in tumors from other localizations [40]. Streptococcus is a natural inhabitant of the oral cavity, which is connected to the lower respiratory tract by the larynx and trachea. The ASVs analysis allowed us to separate the oral streptococcal population from those found in the lung or gut, although the oral/lung bacterial exchange could occur via microaspirations [6,34,37].
Microaspiration events are common, but their frequency is significantly increased in chronic inflammatory airway diseases [37], inducing inflammation by elevation of Th17 lymphocytes, as well as expression of inflammatory cytokines (as IL-1α, IL-1β and IL-17). The alteration of the IL-23/IL-17 axis is well known in the pathogenesis of both autoimmune diseases and tumors. Recently, it has been described that Streptococcus mitis induced IL-1β, IL-6 and IL-23 transcription and Th17 responses able of releasing the potentially proinflammatory and protumoral IL-17 [41]. S. mitis also leads to neutrophil recruitment and inflammation, macrophage chemotaxis, a higher secretion of immunological inhibitory cytokine IL-10, and an increased immune checkpoint PD-L1 expression, facilitating the cancer development and expansion [41]. Lung resident γδ T cells, and their either protective roles or pro-tumorigenic functions in cancer have been recently discovered [42]. A study has provided evidence that local lung microbiota (one of the most common genera was Streptococcus) can provoke inflammation and tumor cell proliferation, via lung resident γδ T cells activation, that release IL-17, after IL-1β and IL-23 induction in myeloid cells by these local microbiota [43]. Our results demonstrated a global streptococcal enrichment in patients with cancer that affected more than just the respiratory tract, supporting the idea that microorganisms can orchestrate the balance between tumor-promoted inflammation and anti-tumor immunity depending on the specific microenvironment [43].
Streptococcal relative abundance in bronchial biopsies was a good predictor of lung cancer, but unfortunately was not reproducible in saliva. ROC curves suggested the contralateral bronchi as the best sample (90.9% sensitivity and 83.3% specificity; AUC = 0.897). Other authors found similar results in protected specimen brushing samples (87.5% of sensitivity and 55.6% specificity, AUC = 0.693) [6]. The proportional abundance of Streptococcus should be validated in the early stages of lung cancer with subsequent follow-up to corroborate this link. Along these lines, the intestinal enrichment of Streptococcus should be exhaustively explored to identify lung cancer markers in feces.
The intestinal and the respiratory ecosystems harbor a diverse and abundant microbiota, but some particularities distinguish both ecosystems. Food intake favors a higher rate of microbial reproduction increasing the total mass that is significantly reduced after defecation. On the contrary, nutritional sources for bacteria in the airway are limited to mucus and cellular debris, while clearance is carried out by the ciliary system and the immune system, particularly by macrophages. Food is an important mode of entry of foreign microorganisms into the gut ecosystem, as is inspired air. Alien microorganisms can lead to overestimation of the real diversity of the ecosystem, but we have implemented new analytic strategies to define the lung microbiota core, which had not previously been defined. Interesting findings of our work include the elevated alpha-diversity of the bronchial microbiota in comparison with saliva or feces, and the dominance of Pseudomonas in healthy individuals. The pulmonary presence of this genus is linked to cystic fibrosis and is the major pathogen that decreases the respiratory functionality within a pathogenic colonization. However, the lack of respiratory symptoms reduces a pathogenic role of Pseudomonas in healthy individuals, although more studies are needed in that line. Predator bacteria have been classically described in environmental ecosystems as scarcely detected in human samples [44][45]. Their low representation in the total ecosystem could prevent their detection by -omic strategies. This approach is an important field of research, which could represent an ecological key to modulating the microbiota composition.
The main limitation of our work is that we cannot estimate the effect of lung cancer factors, mainly tobacco (all patients had been smokers but only the half of the controls were) and COPD (12/25 patients) on the bronchial microbiota. However, it has not been established yet if tobacco has a significant influence on lung microbiota composition, and some important studies have shown contradictory results [36,[46][47].
Whereas severe COPD has been linked with significant alterations in the lung microbiota composition [48][49], mild and moderate COPD (92% of our COPD patients) has been associated with Streptococcus enrichment [50]. This finding might explain the association of mild/moderate COPD and lung cancer [51], although further studies are needed to confirm this association.
Our study has several additional limitations, including a small number of patients, all of them from the same hospital, a lack of other -omic analyses based on genetic expression, and we cannot rule out the possibility that the differences were be influenced by tobacco exposure. On the other hand, our strengths include performing the first study of microbiota combined with mycobiome of bronchial tissue obtained directly from tumor and contralateral bronchi (not adjacent to a resected tumor), as well as performing analysis of the connected ecosystems including saliva and feces.
The mycobiome results were consistent with those obtained for bacteria. The fungal community was slightly richer and more diverse in patients than in controls, although the contralateral bronchus was more similar to controls than to the affected counterpart. The mycobiome of saliva and the affected bronchus from patients matched perfectly, but differed in controls, again suggesting that in patients with cancer the bronchial microbiota is the result of a continuous exchange with that of the oral niche.
In terms of taxonomy, affected bronchi from patients had an enrichment of the Basidiomycota phylum with higher populations of Malassezia genus, whereas the enriched taxon in healthy individuals was the Ascomycota phylum and the genera Candida and Saccharomyces, as previously described [41,52]. Although the public databases are increasing exponentially, it is important to note that a major limitation to describing the mycobiome is the lack of available taxonomic records. As far as we know, this is the first description of the lung mycobiome.

Conclusion
In summary, patients with central lung cancer have a significantly different bronchial microbiota from healthy controls, not restricted to tumor-involved tissue, and probably conditioned by continuous microaspiration events from the oral cavity, more than by the carcinogenic process. Lung cancer was associated with a considerable enrichment of Streptococcus and we propose that this feature could be used for the screening, diagnosis and prognosis of this pathology. Lung mycobiota differ considerably in healthy individuals, and there are dissimilarities in patients between the affected and the contralateral bronchi. An innovative bioinformatics strategy used in this study has allowed us to define healthy individuals' the bronchial core microbiota, which is dominated by a non-pathogenic colonization of Pseudomonas.

Ethics approval and consent to participate
The Ethics Committee "CEIC Aragón" approved this project in 2016 with the reference 15/2016, and all participants signed the informed consent after receiving all the information from the clinical researchers.

Consent for publication
Not applicable.

Availability of data and materials
All generated data is publically available.

Competing interests
RdC is the recipient of a Vertex grant IIS-2017-106179 for cystic fibrosis research. The remaining authors declare that they have no competing interest.

Funding
This work was partially supported by the project PI17/00115 which recipient is RdC. MPA was supported by the      Saliva and lung microbiota relatedness. Weighted UniFrac distance calculated between saliva and lung microbiota in control group (left) and in patients' group (right). To notice that distance is significantly higher in control group than in patient group (p<.001), pointing to a major connection of both ecosystems in patients than in controls.