Structure of the citrus phyllosphere microbiome
To explore variation in the phyllosphere microbiome when facing the invasion of pathogens, we generated a microbial community profile for each sample using 16S and ITS metabarcoding data. In total, we obtained 6,770,888 sequences from phyllosphere samples. After denoising, removing low-quality and chimeric sequences with DADA2 , a range of amplicon sequence variants (ASVs) were obtained for the four sample types epiphytic bacteria, epiphytic fungi, endophytic bacteria, and endophytic fungi, with 2590, 664, 472 and 245 ASVs respectively based on 20 replicates of each (Additional file 2: Table S1).
In this study, the Richness index (Observed ASVs) and Evenness index (Pielou's Evenness) were used to assess the alpha diversity for each of the four microbiomes. Our results showed that the phyllosphere microbiomes from different ecological niches (epiphytes and endophytes) have different patterns of microbial diversity. Specifically, epiphytes on diseased leaves had greater microbial richness for both bacteria and fungi (Kruskal-Wallis; p < 0.05) than those on healthy leaves. However, no significant difference was found between diseased and healthy leaves in their endophytes (Fig. 2 and Additional file 2: Table S3). Interestingly, almost half of the total bacterial ASVs (48.6%) were found only in the epiphytic microbiome of the diseased leaves while only 14.4% were unique to the endophytes (Fig. 3c and Additional file 1: Fig. S2c). Most of the shared bacterial ASVs (75.0%) between the healthy and diseased leaves in their epiphytes belonged to the following top five classes: Alphaproteobacteria (26.8%), Gammaproteobacteria (20.8%), Bacteroidia (13.3%), Bacilli (8.0%) and Actinobacteria (6.3%). In contrast, 45.7% of the unique bacterial ASVs in the epiphytic microbiome of the diseased leaves belonged to these top five classes of Alphaproteobacteria (12.4%), Gammaproteobacteria (11.6%), Thermoleophilia (9.8%), Subgroup 6 (5.9%) and Actinobacteria (5.9%). In the endophytic microbiomes, no significant difference of microbial richness was found between the healthy and diseased leaves. However, the community evenness of both bacterial and fungal communities in the diseased leaves were significantly lower (Kruskal-Wallis; p < 0.01) than those in the healthy leaves (Fig. 2 and Additional file 2: Table S3). Interestingly, most of the differential ASVs in the endophytes of diseased leaves were those that showed depleted distributions, contrary to that in the epiphytic microbiome (Fig. 3a, d and Additional file 1: Fig. S2a, d). Furthermore, there was no significant difference in the Evenness index of epiphytes between the healthy and diseased leaves for either bacteria or fungi (Fig. 2 and Additional file 2: Table S3).
Unconstrained principal coordinate analysis (PCoA) of the Bray-Curtis matrix was used to investigate patterns of separation between microbial communities, and differential microbial analysis was used to assess the enrichment at ASVs and genus level. Our analyses found that the epiphytic and endophytic microbiomes from leaves have different microbial community assemblies. Specifically, PCoA indicated that the bacterial communities between the diseased and healthy leaves differed significantly in both the epiphytic (PERMANOVA; P < 0.05) and endophytic (PERMANOVA; P < 0.01) habitats. In contrast, the fungal community assemblies did not show significant differences between the diseased and healthy leaves in the epiphytic habitat, while their endophytic communities exhibited significant differences (PERMANOVA; P < 0.001) (Fig. 2 and Additional file 2: Table S3). Moreover, a higher percentage of bacterial ASVs than that of the fungal ASVs showed significant differences in epiphytes between the diseased and healthy groups (Fig. 3a, d). Among them, 118 bacterial ASVs (22% of total ASVs) were significantly different (Wilcoxon test; P < 0.05), among which 109 were enriched and 9 were depleted in the diseased leaves (Fig. 3a). By contrast, 27 fungal ASVs (13% of total ASVs) were significantly different (Wilcoxon test; P < 0.05), among which 19 were enriched and 8 were depleted in the disease leaves (Fig. 3d).
Further taxonomic analysis of 16S rRNA profiling at the genus level showed that the enriched bacteria in the epiphytic habitat of the diseased leaves belonged to Alphaproteobacteria, Gammaproteobacteria and the other 8 classes (Additional file 2: Table S5). In comparison, all enriched bacteria in the endophytic habitat of diseased leaves belonged to Alphaproteobacteria (Additional file 2: Table S5). Moreover, Sphingomonas exhibited significant enrichment in both the epiphytic and endophytic habitats (Fig. 3b and Additional file 1: Fig. S2b). Interestingly, while enriched in the endophytic habitat, Methylobacterium was depleted in the epiphytic habitat of the diseased leaves (Fig. 3b and Additional file 1: Fig. S2b). Overall, the differentially distributed bacteria were mainly attributed to the following two functional categories: "plant growth promoting" (e.g. Bacillus, Methylobacterium and Sphingomonas) [7–9] and "plant metabolism-related" (e.g. Methylocella and Zymobacter) (Additional file 2: Table S5) [10, 11].
In contrast, taxonomic analysis of ITS profiling at the genus level showed that the enriched fungi in both the epiphytic and endophytic habitats of the diseased leaves belonged to four classes: Sordariomycetes, Eurotiomycetes, Dothideomycetes and Agaricomycetes (Additional file 2: Table S5). Specifically, the genus Colletotrichum exhibited significant enrichment in both the epiphytic and endophytic habitats of the diseased leaves (Fig. 3e and Additional file 1: Fig. S2e). In contrast, Alternaria and Nigrospora exhibited significant depleting in both the epiphytic and endophytic habitats (Fig. 3e and Additional file 1: Fig. S2e). Functionally, most of the differentially distributed fungi belonged to "potential plant pathogens" (e.g. Colletotrichum, Alternaria and Nigrospora) (Additional file 2: Table S5) [12–14]. Of these, Colletotrichum  and Alternaria  contain known pathogens of citrus. Interestingly, 5/6 depleted fungi in the epiphytic habitat of diseased leaves also belonged to "potential plant pathogens" (Additional file 2: Table S5).
A large number of "Bacteria-Bacteria" interactions in epiphytes were intensified in diseased leaves
To explore whether changes in microbial community assemblies were accompanied by changes in microbial interactions, we performed co-occurrence network analysis to uncover potential associations and complexity of connections within the phyllosphere microbiome in response of disease. This was followed by estimating the topological properties of the co-occurrence networks to identify differences between the diseased and healthy groups. The co-occurrence networks were analyzed by Spearman correlations based on quantitative data of ASVs.
Our co-occurrence network analysis revealed increased complexity in the diseased condition compared with the healthy condition (Fig. 4 and Additional file 1: Fig. S3). A previous study showed that highly connected networks can form and strengthened when microbiomes face environmental disturbance, such as pathogen invasion . Furthermore, the number of negative edges (negative interactions) of the networks in the disease group were higher than that in the healthy group (both in epiphytes and endophytes), suggesting that pathogen invasion intensified the microbial competition within both niches (Fig. 4 and Additional file 1: Fig. S3). Interestingly, in addition to increasing the interaction between the bacteria and fungi, a large number of "Bacteria-Bacteria" positive interactions in epiphytes were also intensified in the diseased leaves, indicating that the epiphytic bacteria might strengthen interactions in response to the disease (Fig. 4 and Additional file 2: Table S6).
Our topological analysis of networks provided further insights into the variation of interactions. In epiphytes, the network of the diseased group presented a higher number of connections per node (average degree = 6.20) than that in the healthy group (4.39), indicating a highly connected microbial community in the diseased leaves. Furthermore, when nodes from the fungi were removed, the number of connections per node among bacteria increased in the epiphytic habitat of the diseased leaves (average degree = 7.01) while that in the healthy group decreased (3.13). The results of this analyses further indicated intensified "Bacteria - Bacteria" associations in the epiphytic habitat of diseased leaves (Additional file 2: Table S6).
Numbers of metabolism-related and glycoside hydrolase genes were enriched in the epiphytic microbiome of diseased leaves
To further explore the role of phyllosphere microbiome when facing pathogen invasion, especially on its impact on the interactions in the epiphytic habitat (Fig. 4), we performed shotgun metagenomic sequencing of the epiphytic microbiome. The combined datasets of the diseased and healthy groups contained a total of 133.8 Gbp of raw reads. Quality filtering and removal of plant-derived reads resulted in 79.4 Gbp microbial reads. In total, 7,621,585 putative protein-coding genes were predicted from the assembly. After clustering sequences at 95% identity, 5,592,809 non-redundant genes were generated (Additional file 2: Table S2).
Of the genes retrieved from the shotgun metagenomic data, 28 to 58% were assigned to a known function (Additional file 2: Table S2). Functional annotations were obtained for approximately 53% of non-redundant genes by aligning to the KEGG database and generated 15,423 KOs (GhostKOALA Score > 10). Based on KEGG level 1 pathway annotations, KOs could be classified into metabolism (33.7%), genetic information processing (14.6%), cellular processes (11.2%), environmental information processing (10.2%) and organismal systems (9.7%) (Additional file 2: Table S7). Differential enrichment analysis (DESeq2; Fold Change > 2 or < -2, P < 0.05, FDR < 0.1) of the epiphytic microbiomes between the diseased and healthy leaves revealed that 1283 KOs were enriched in the diseased samples, whereas 498 KOs were depleted (Fig. 5a). Furthermore, we found that the enriched KOs belonged mainly to metabolism-related pathways. Specifically, a large number of enriched KOs were involved in pathways of unclassified-metabolism (131 KOs), carbohydrate metabolism (124 KOs), amino acid metabolism (77 KOs), metabolism of cofactors and vitamins (56 KOs), energy metabolism (50 KOs) and xenobiotics biodegradation and metabolism (32 KOs), which represent 6 of top 10 pathways with enriched KOs (Fig. 5b and Additional file 2: Table S8). Consistent with network inferences (Fig. 4), this provides further evidence that the metabolism-based interactions of epiphytic microbes were dramatically enhanced when infected with melanose disease. In addition to metabolism-related pathways, enriched KOs in the diseased group also include those involved in "Plant-Microbe and "Microbe-Microbe" signal interactions, including the pathways of ABC transporters (61 KOs), two-component system (56 KOs), quorum sensing (19 KOs) and bacterial secretion system (16 KOs) (Additional file 2: Table S9). Together, these results indicated that increased epiphytic microbiome-mediated molecular communications among microbes and/or between microbes and plants in response to disease.
For more detailed resolution of specific functions associated with metabolism-related pathways, we searched for carbohydrate-active enzymes (CAZy) within the shotgun metagenomic data (Fig. 6 and Additional file 2: Table S10). A total of 428 CAZy gene families were identified and annotated with auxiliary activities (AA), glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), and carbohydrate esterases (CE), as well as carbohydrate-binding modules (CBM). Differential enrichment analysis (DESeq2; Fold Change > 2 or < -2, P < 0.05, FDR < 0.1) of the epiphytic microbiomes between the diseased and healthy groups revealed that 35 CAZy gene families were enriched in the diseased group, whereas 5 CAZy gene families were depleted. Among them, most of the enriched CAZy gene families belonged to glycoside hydrolases (25 CAZy gene families). Many of these glycoside hydrolase families are associated with degradation of fungal cell wall components, including glucans (GH5_13, GH5_27, GH13_30, GH13_39 and GH74), mannans (GH5_13, GH5_27, GH99, GH130 and GH164) and chitosanase (GH5_13, GH5_27 and GH8), suggesting the epiphytic microbiome might be involved with disease suppression .
Taxonomic diversity of the phyllosphere microbiome
In this study, we used shotgun metagenomic data from the diseased group to compare microbial diversity between epiphytes and endophytes. Taxonomic annotation was assigned at reads level using NCBI nr database. Our analyses revealed that the epiphytic and endophytic niches had different microbial assemblies, suggesting their potential functional differences (Fig. 7).
Specifically, bioinformatic analyses of the shotgun metagenome sequences of the epiphytic microbes revealed that 66.7, 33.1, 0.2 and 0.1% of the reads corresponded to Bacteria, Fungi, Archaea and Viruses, respectively. By contrast, Bacteria (95.2%) were the predominant group among the endophytic microbes, with small fractions of Fungi (4.4%), Archaea (0.3%), and Viruses (0.2%). The dominance of bacteria in endophytes was largely attributed to Acinetobacter (47.2%). The shared bacteria between epiphytes and endophytes in the top 10 genera by relative abundance including Acinetobacter, Bacillus, Mesorhizobium, Paenibacillus, Pseudomonas and Streptomyces. In contrast, the shared fungi in the top 10 genera by relative abundance including Aspergillus, Rachicladosporium, Fusarium and Colletotrichum (Fig. 7). Moreover, 0.29% of the fungal sequence reads in the epiphytic habitat corresponded to the pathogen-related sequences (annotated as Diaporthe sp.), compared to 0.18% in the endophytic habitat.
De novo assembly of epiphytic bacterial genomes
Next, we attempted to identify epiphytic bacterial strains that might contribute to microbiome-related disease responses by binning genomes using the metagenome-assembled genome (MAG) extraction approach (Fig. 8a). After refinement and re-assembly, 24 bins (draft genomes) had completeness > 70% and contamination < 10% (Additional file 1: Fig. S4-6). Twelve of the 24 bins were enriched in the diseased leaves (Fold Change > 2), with bin 1 (Methylobacteriaceae) being the most abundant in epiphytes (Additional file 2: Table S11). Interestingly, 4/5 Sphingomonadaceae bins (bin 3, bin 11, bin 13 and bin 19) were enriched in the diseased leaves (Fig. 8b), consistent with our metabarcoding data (Fig. 3b), and suggesting a potential role for Sphingomonadaceae bacteria in host response to pathogen invasion on the citrus leaves. All four bins of Sphingomonadaceae were enriched by > 25-fold in diseased leaves over the healthy group (Additional file 2: Table S11). Among these four bins, bin 13 had the highest quality (completeness = 90.52%, contamination = 7.46%, N50 = 16,164). The assembled genome was comprised of 260 contigs with a total length of 2.62 Mb. We named this genome CPG1, short for citrus phyllosphere genome 1. We determine the taxonomy of CPG1 using Taxator-tk  and PhyloPhlAn , both annotated the genome to Sphingomonas sp. (Additional file 2: Table S11).
To better understand the putative functions of Sphingomonadaceae in the phyllosphere, we selected CPG1 as a representative genome for subsequent analyses based on the phylogenetic tree (Fig. 8b). As reported in previous studies, Sphingomonas may be involved in inducing plant defence signaling pathways and in inhibiting the growth and virulence of pathogens, which could help reduce host plant damages in the phyllosphere . Here, analyses of the binned genomes revealed that microbes enriched in diseased leaves contained more gene families associated with fungal cell wall-degrading enzymes (CWDEs), similar to results in shotgun reads (Fig. 6). Specifically, we found that CPG1 contains 12 CAZy protein family related to fungal CWDEs, suggesting that the enrichment of the genus Sphingomonas in the diseased leaves is likely associated with antifungal functions. Furthermore, glycoside hydrolases (GH99, GH74 and GH164) were not found in any of these 24 bins, which related to the degradation of glucans or mannans. The results suggest that the enrichment of CWDEs genes in diseased leaves was likely driven by multiple microbes, emphasizing the overall role of the microbial community in response to plant diseases (Additional file 2: Table S11).