Microbiome in Acute Exacerbation and Stable Phase of COPD: A Descriptive and Comparative Study of 16s rRNA Sequencing and Metagenomic Sequencing

Background: The role of bacteria in different courses of chronic obstructive pulmonary disease (COPD) has not been very claried. In this study, sputum samples from patients at different courses of COPD were collected to analyze the differences of the structure and function of respiratory microbiome in different courses of COPD. Results: Our study involved 38 patients with acute exacerbation of COPD (AECOPD). Among them, 42 sputum samples were collected from the acute exacerbation of COPD (Of these 38 patients, 4 patients also collected sputum from the second acute exacerbation phase), and 12 sputum samples were collected from the stable phase of COPD (6 of the above 38 patients). Firmicutes, Proteobacteria, Bacteroidetes and Actinobacteria dominated the phylum in all the COPD samples, while Streptococcus, Neisseria, Haemophilus and Prevotella is the top 4 species in the cohort. Salmonella, Salmonella, Pseudomonas and Proteobacteria achieved higher abundance in AECOPD group. Samples from AECOPD patients showed higher α diversity. Moreover, genes annotated to K07481(the function is related to transposase) and K16087(the function is related to hemoglobin/transferrin/lactoferrin receptor protein) in stable subgroup showed higher abundance than that in acute exacerbation subgroup. Compared to samples from stable COPD, Proteobacteria contributed the most antibiotic resistance genes. Conclusions: There are differences in the function and metabolism of respiratory microbiome in patients with different course of COPD.


Background
Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the world, causing about 32 million deaths worldwide in 2017 with the expectation to reach about 44 million by 2040 [1,2].
The incidence of COPD is about 10.1% worldwide, and the number of premature deaths caused by COPD increased by 13.2% from 2007 to 2017 [2]. The latest epidemiological study in China shows that the prevalence rate of COPD in China increases signi cantly with age, with 21.2% between 60 and 69 years old, 35.5% in people over 70 years old [3]. COPD ranks third among the 15 diseases with the highest disease burden in the elderly [4]. In 2017, the case fatality rate of COPD in China was 68/100,000, resulting in 945,000 deaths, which was also the third cause of death [4].
The main feature of acute exacerbations of chronic obstructive pulmonary disease (AECOPD) is that the respiratory symptoms of patients deteriorate and need to clinical treatment [5]. Since bacteria can be cultured in respiratory samples from AECOPD patients, it has always been believed that bacteria maybe an important cause of AECOPD, such as bacterial colonization [6][7][8][9]. Studies based on culture techniques usually focus on several known species with respiratory pathogenicity, such as Haemophilus in uenzae, Moraxella catarrhalis, Streptococcus pneumoniae, or Pseudomonas aeruginosa (symptoms tend to be more severe). Colonization of these or other types of bacteria can also be found in patients with stable Page 4/25 COPD. These phenomena suggest that the role of bacteria in the deterioration of COPD may be more than infection [10].
The development of Next Generation Sequencing technology provides an opportunity for in-depth study of lung microbiome in healthy people and patients. Some studies based on 16 s rRNA sequencing have found that COPD patients developed unique bacterial community compared with healthy subjects, suggesting that changes in lung microbiome may be related to increased airway in ammation and disease progression [11][12][13]. Shotgun metagenomic sequencing can be used to sequence microorganisms in samples at the whole genome level, which is more comprehensive and achieve better classi cation and analysis than 16S rRNA sequencing. Assembling shorter sequencing fragments into larger fragments can increase the ability to distinguish between different microorganisms, improve the resolution of classi cation, and reach the level of species or strains [14,15]. In addition, by allocating microbial genes in the metabolic pathway, we have the ability to assess the potential metabolic function of microorganisms and interactions between microorganisms and between microorganisms and humans [16,17]. This study used both 16 s rRNA sequencing and metagenomics to analyze the sputum samples of AECOPD patients and analyze the role of respiratory microbes in AECOPD.

Lung microbiome description, gene prediction and species annotation
Of the 54 sputum samples, 38 were collected from the rst acute exacerbation (period A). There were 12 stable period sputum samples (period B) from 6 patients (7 sputum samples were collected in the second month of the stable phase, 4 sputum samples were collected in the 6th month of the stable phase, and 1 sputum was collected in the 12th month of the stable phase). There were 4 sputum samples (period C) from 4 patients during the second acute exacerbation (2 samples were collected on the rst day of admission and 2 samples were collected on the third day of admission) ( Table 1).
We obtained 310,383 ORFs (Open Reading Frames), with a total length of 173.46 Mbp, GC content of 44.78%, from which 115,179 were assigned to genes. In the ORFs that can be annotated to the NR database (Version: 2018.01) from NCBI, 94.28% were assigned at the kingdom level, 91.89% at the phylum level, 90.35% at the class level, 87.81% at the order level, 85.51% at the family level, 82.71% at the genus level, and 47.85% at the species level Additional le: Figure S1 . Firmicutes, Proteobacteria, Bacteroidetes and Actinobacteria were the top 4 phylums. Streptococcus, Neisseria, Haemophilus and Prevotella were top 4 genus in the cohort, in which many members in these four genus were natural inhabitants in the human up respiratory tract( Figure 1)[18].
Difference of microbiome between stable and acute exacerbation stage of COPD Our results show that the sputum microbiome composition of patients with acute exacerbation of COPD is different from that of patients at stable stage. At the phylum level, the relative abundances of Firmicutes and Proteobacteria in stable patients were 15.80% and 23.82%, respectively, while those in patients with acute exacerbation were 13.50% and 13.05% respectively, with no statistical difference between the two stages. At the genus level, the relative abundance of Haemophilus in stable patients was 10.32%, while that in patients with AECOPD was 2.71% ( Figure 2, Additional le: Table S1, Additional le:   Table S2), with no statistical difference between the two. The results of LEfSe analysis showed that in period A group, Salmonella, Salmonella enterica, Pseudomonas and Proteobacteria achieved high abundance. While in period B, Haemophilus in uenzae, Pasteurellales were more abundant ( Figure S2). In addition, period A cohort showed higher α diversity than period B group ( Figure 3).

Analysis of gene function and resistance genes
In order to further evaluate the function of genes, genes obtained by sequencing were annotated with the following functional database (KEGG, eggnog and CAZy database) by DIAMOND software. There are 201,130 (64.80%) genes that can match the KEGG database, of which 108,824 (35.06%) can match 4,562 KEGG ortholog group (KO) 193014 (62.19%) genes can be assigned to the eggNOG database,8984 (2.89%) genes can be matched the CAZy database. ( Figure S1). We found that the relative abundance of genes annotated to K07481 and K16087 in period B subgroup was higher than that in period A or C subgroup. The relative abundance of the former in period B subgroup was 0.119%, and that in period A and C subgroup was 0.07% and 0.017%, respectively. The relative abundance of the latter in period B subgroup was 0.135%, while that in period A and C subgroups was 0.046% and 0.006%, respectively ( Figure S3, Additional le: Table S3). The results of resistance gene annotation showed that 275 genes could be found in the CARD database, including 198 kinds of ARO (the Antibiotic Resistance Ontology) Compared with patients with stable COPD, Proteobacteria harbored more ARO genes (44%) in patients with acute exacerbation, suggesting that drug resistance of Proteobacteria may play an important role in the process of acute exacerbation of COPD. After assigning ARO to species, in subgroup period A, 44% of ARO belonged to Proteobacteria, 13% to Firmicutes, 4% to Actinobacteria, and 4% to Bacteroidetes. In period B subgroup, 44% of ARO belonged to Proteobacteria, 13% to Firmicutes, 3% to Actinobacteria, and 3% to Bacteroidetes (Fig. 4).

Metabolic pathway analysis
In this study, there was no difference in the number of genes in different metabolic pathways between period A and period B groups. However, it was found that the relative abundance of genes annotated to the metabolic pathway of purine metabolism (ko00230), ATP binding-cassette (ABC) transporters (ko02010), two-component signal transduction system (ko02020), and ribosome (ko03010) function, the homologous recombination function (ko03440) are higher in period B group than that in period A group ( gure 5).

Discussion
The microbiome of respiratory plays an important role in the development of COPD and AECOPD. We found that the relative abundance of Proteobacteria, in sputum microbiome of patients with AECOPD decreased signi cantly compared with patients in stable stage, while the relative abundance of Firmicutes did not change signi cantly [19]. In addition, we noted that in stable patients, the relative abundance of Haemophilus in uenzae was 4.38%, while in patients with AECOPD, the number was only 0.67%.
The function of K07481 is mainly related to transposition. A transposon is a mobile element in the genome that can transfer itself from one location of the genome to another, which is the main source of mutation in the genome. The function of K16087 is associated with hemoglobin, transferrin and lactoferrin receptor proteins. It is suggested that the function of K16087 is related to the iron uptake of bacteria in the host. Iron is involved in many important biological activities, including respiration, tricarboxylic acid cycle, gene regulation, DNA biosynthesis and so on. Microbes in the host tend to obtain iron from transferrin, lactoferrin or hemoglobin [20]. The results of this study showed that the relative abundance of genes annotated to K07481 and K16087 increased signi cantly in the period B subgroup, suggesting that there was a difference in the interaction between airway microorganisms and hosts between patients with stable COPD and patients with acute exacerbation of COPD.
ATP binding cassette transporter is one of the most active intracellular transport systems, which exists widely in bacteria, archaea and eukaryotes. Its main role is to combine ATP hydrolysis with the active transport of a variety of substrates (such as ions, sugars, lipids, proteins and drugs, etc.), and is related to many important biological functions of prokaryotes and eukaryotes [21]. The two-component signal transduction system enables bacteria to sense, respond to, and adapt to changes in the environment or intracellular state [22]. Homologous recombination is a process of gene exchange between homologous DNA sequences and leads to gene conversion or crossover. It is important for the repair of DNA damage, such as single-strand gap, double-strand break (DSB) and so on, and helps to maintain the integrity of the genome [23]. The relative abundance of genes annotated to the above functions decreased in patients with acute exacerbation of COPD, which may re ect the result of complex interaction between airway microbial community and airway environment.
Our research also has some shortcomings. First of all, the sample size of this study is small, including only 38 samples from AECOPD patients; second, this study collects sputum collected by patients through spontaneous expectoration, not directly from the lower respiratory tract; in fact, there has been some debate about the similarities and differences between sputum and other invasive lower respiratory tract samples, such as BAL or bronchial protective brushes. Some studies have found that compared with the samples obtained by bronchial biopsy, there is an enrichment of oral ora in sputum samples. however, further analysis found that these samples are very similar in terms of the abundance of oral ora [24]. Therefore, we believe that although there are differences in microbial composition between samples from the upper respiratory tract and the lower respiratory tract, sputum does re ect the microbial composition of the lower respiratory tract to some extent, especially in terms of abundance [25].

Conclusions
In conclusion, we showed the taxonomy and functional changes of respiratory microbiome in patients with acute exacerbation and stable stage of COPD, which partly revealed the effects of microbial interaction on host respiratory homeostasis and pathogenicity, and further deepened the understanding of the relationship between respiratory tract microbiome and COPD.

Subjects and samples
A total of 54 sputum samples were collected from 38 patients with AECOPD. Hospitalized patients for AECOPD based on the 2016 GOLD guideline were involved in this study. All patients provide informed consent. The sputum was collected by the method of spontaneous expectoration during the hospitalization. Details for sputum collection and processing, shotgun metagenome sequencing, sequence assembly and gene annotation, see supplementary.

Bioinformatics analysis
In the aspect of 16s rRNA sequencing, barcodes and sequencing primers were trimmed before merged.
Paired-end reads were assembled using FLASH [26], and quality-trimmed using QIIME [27] with default settings. These trimmed sequences were then chimera ltered, singletons discarded, and clustered into operational taxonomic units (OTUs) using a 97% similarity threshold with UPARSE[28]. We used mothur to annotate taxonomic information for the representative sequence of each OTU by comparison to the SILVA_128[29] reference database.
In terms of metagenomic, all samples were paired-end sequenced on the Illumina platform (insert size 350 bp, read length 150 bp). After quality control, the reads aligned to the human genome with SOAP2[30] were removed. The remaining high-quality reads were used for further analysis. The assembly of reads were executed using SOAPdenovo [31].Unused reads from each sample were assembled using the same parameters. Genes (minimum length of 100 nucleotides) were predicted on scaftigs (i.e., continuous sequences within scaffolds) longer than 500 bp using MetaGeneMark (prokaryotic GeneMark.hmm version2.10). Then, a non-redundant gene catalogue was constructed with CD-HIT[32] using a sequence identity cut-off of 0.95, with a minimum coverage cut-off of 0.9 for the shorter sequences. To determine the abundance of genes, reads were realigned to the gene catalogue with SOAP2[31] using parameters:m 200 -x 400 -s 119. Only genes with ≥2 mapped reads were deemed to be present in a sample. The abundance of genes was calculated by counting the number of reads and normalizing by gene length. The constructed nonredundant gene catalogue was aligned to functional database include KEGG database (http://www.kegg.jp/kegg/),eggNOG database (http://eggnogdb.embl.de/#/app/home) CAZy database (http://www.cazy.org/) by DIAMOND [33]. Each protein was assigned to the database by the highest scoring annotated hit(s) containing at least one HSP scoring over 60 bits. Statistics of the relative abundance of different functional hierarchy calculated by summing relative abundance annotated to the same feature.  Tables   Table 1. Patients and samples information table  Description: NA means data missing. GOLD classi cation: GOLD1:FEV 1 (% predicted)≥80%, GOLD2: 50%≤FEV 1 (% predicted)≤79%, GOLD3: Histogram of relative abundance of the top ten species of phylum(a) and genus(b).

Figure 1
Histogram of relative abundance of the top ten species of phylum(a) and genus(b).

Figure 1
Histogram of relative abundance of the top ten species of phylum(a) and genus(b).    Box chart for analysis of differences between groups of α diversity index. Description: The ACE estimator( gure 3a) and the Chao1 estimator( gure 3b) are used to evaluate community richness of microbiome, the goods coverage estimator is used to evaluate sequencing depth( gure 3c).

Figure 3
Box chart for analysis of differences between groups of α diversity index. Description: The ACE estimator( gure 3a) and the Chao1 estimator( gure 3b) are used to evaluate community richness of microbiome, the goods coverage estimator is used to evaluate sequencing depth( gure 3c).

Figure 3
Box chart for analysis of differences between groups of α diversity index. Description: The ACE estimator( gure 3a) and the Chao1 estimator( gure 3b) are used to evaluate community richness of microbiome, the goods coverage estimator is used to evaluate sequencing depth( gure 3c).    Functional Box Diagram for Metastat Analysis of functional differences between groups(KEGG). Description: The horizontal axis in the picture is the sample grouping, and the longitudinal axis is the relative abundance of the corresponding function. The horizontal line represents two groups with signi cant differences, and the absence of them means that there is no difference in this function between the two groups. "*" means that there is a signi cant difference between the two groups (q value < 0.05), and "**" means that there is a very signi cant difference between the two groups ((q value < 0.01). The Metastats method is used to take hypothetical test of the functional abundance data between groups to get the p value, and through the correction of the p value, the q value is obtained.

Figure 5
Functional Box Diagram for Metastat Analysis of functional differences between groups(KEGG).
Description: The horizontal axis in the picture is the sample grouping, and the longitudinal axis is the relative abundance of the corresponding function. The horizontal line represents two groups with signi cant differences, and the absence of them means that there is no difference in this function Page 24/25 between the two groups. "*" means that there is a signi cant difference between the two groups (q value < 0.05), and "**" means that there is a very signi cant difference between the two groups ((q value < 0.01).
The Metastats method is used to take hypothetical test of the functional abundance data between groups to get the p value, and through the correction of the p value, the q value is obtained.

Figure 5
Functional Box Diagram for Metastat Analysis of functional differences between groups(KEGG).
Description: The horizontal axis in the picture is the sample grouping, and the longitudinal axis is the relative abundance of the corresponding function. The horizontal line represents two groups with signi cant differences, and the absence of them means that there is no difference in this function between the two groups. "*" means that there is a signi cant difference between the two groups (q value < 0.05), and "**" means that there is a very signi cant difference between the two groups ((q value < 0.01).
The Metastats method is used to take hypothetical test of the functional abundance data between groups to get the p value, and through the correction of the p value, the q value is obtained.