Structure and function of rhizosphere and root endophyte microbial communities associated with healthy and root rot diseased Panax notoginseng

Background: Panax notoginseng (Burkill) F. H. Chen is a Chinese medicinal plant of the Araliaceae family commonly used in the treatment of cardiovascular and cerebrovascular diseases in Asia and elsewhere. To meet an increase in Chinese herbal medicine market demand, most P. notoginseng is planted articially, and is vulnerable to various plant diseases. Root rot disease, in particular, causes substantial P. notoginseng yield reduction and economic losses. High-depth next-generation sequencing technology was used to analyze the rhizosphere and root endophyte microbial communities of P. notoginseng to compare the characteristics of these two communities between healthy and root rot diseased P. notoginseng plants, and to clarify the relationship between these microbial communities and root rot disease. Results: The P. notoginseng rhizosphere microbial community was more diverse than the root endophyte community, and the difference in functional pathways between healthy and diseased P. notoginseng plants was greater in the root endophyte than in the rhizosphere communities. Multi-database annotation results showed that the highest number of endophytic bacteria occurred in the roots of diseased plants. The number of carbohydrate-active enzymes database families was also higher in diseased roots. The RND antibiotic eux function was higher in the healthy samples. A high abundance of Variovorax paradoxus and Pseudomonas uorescens occurred in the healthy and diseased root endophyte communities, respectively. Ilyonectria mors-panacis and Pseudopyrenochaeta lycopersici were most abundant in the diseased samples. In addition, the complete genome of two unknown Flavobacteriaceae species and one unknown Bacteroides species were obtained based on binning analysis. Conclusions: The rhizosphere and root endophyte microbial communities of healthy and root rot diseased P. notoginseng showed marked differences in diversity and functional pathways. The higher mapping values obtained for the diseased samples reected the occurrence of root rot disease at the molecular level. Variovorax paradoxus and Pseudomonas uorescens may be antagonistic bacteria of root rot in P. notoginseng, whereas Ilyonectria mors-panacis and Pseudopyrenochaeta lycopersici appear to be P. notoginseng root rot pathogens. Our study provides a theoretical basis for understanding the occurrence of root rot in P. notoginseng and for further research on potential biological control agents.

rot infection of P. notoginseng, the content of phenolic acid in the soil is lower than that associated with healthy P. notoginseng. It has been found that phenolic acid can inhibit the growth of pathogenic bacteria while also stimulating the production of ferric acid, thus, adjusting the content of phenolic acid can effectively prevent root rot disease occurrence in P. notoginseng [20].
Metagenomics refers to the DNA of the entire microbial community under study at one point in time, and evidence of all microorganisms in the environment can be obtained, especially of those that are di cult to cultivate [21]. Recently, an increasing number of researchers have applied metagenomics technology to help solve disease-related problems in plants, such as potato [22], peas [23,24], citrus [25], and tomato [26]. Researchers use metagenomics technology to detect and discover pathogenic bacteria, and such technology provides an effective approach to the prevention and control of plant diseases. Researchers have used metagenomics to investigate the changes in soil microorganisms under continuous cropping and to effectively alleviate the problems caused by continuous cropping obstacles [27].
Metagenomics has played an important role in research on medicinal plant diseases. Study of the continuous cropping obstacles affecting P. ginseng has been a focus of research but has been challenging. With the rapid development of high-throughput sequencing technology, researchers have carried out research on root rot disease in P. notoginseng using amplicon technologies such as 16S rRNA and ITS, but mostly focused on fungi. Fungi, endophytes, and rhizosphere microorganisms of other medicinal plants of the Araliaceae, such as P. ginseng and P. quinquefolius, have also been recorded [28][29][30][31][32]; however, there are few reports on the endophytes and rhizosphere microorganisms of P. notoginseng. In this study, high-throughput sequencing technology was used to study the rhizosphere soil and root endophytes of healthy and root rot diseased P. notoginseng plants, and to analyze and compare the microbial community structure and function of healthy and diseased plants. We expected a study of the correlation between the disease and the microbial community characteristics to provide a theoretical basis for the prevention and treatment of root rot disease in P. notoginseng. Consequently, the ndings of this study should have important theoretical and production application implications for ensuring the sustainable development of the P. notoginseng industry.

Results
Metagenomic sequencing data quality control and saturation veri cation A total of 333.90 Gb of data were produced from the 12 root endophyte samples, with an average of 27.83 Gb of data per sample. Analysis of the 12 rhizosphere soil samples by Illumina NovaSeq 6000 sequencing produced a total of 173.80 Gb of data with an average of 14.48 Gb of data per sample. After assessing the quality of the original data, the Q30 values of all samples were above 93%. Quality controls were performed on the raw data to remove host contamination and obtain clean data. After removal of host-related sequences of P. notoginseng, each rhizosphere soil sample and each root endophyte sample had an average of 13.90 Gb and 12,66 Gb of clean data, respectively (Table 1 and Supplementary Table  1). Being dependent on the alpha diversity chao1 and observed dilution curves ( Supplementary Fig. 1) of rhizosphere soil and root endophyte samples, Supplementary Fig. 1A and B show that the data are saturated at 500,000 sequences (0.075 Gb). Similarly, Supplementary Fig. 1C and D show saturation at 1,000,000 sequences (0.15 Gb). The results showed that the sequencing depth of each sample reached saturation, enabling us to proceed to the next step of the study.

Microbial community characteristics between healthy and diseased Panax notoginseng
Kraken2 is a high-precision metagenomic sequence classi cation software based on the k-mer algorithm, which can quickly classify sequencing data by species [33]. In this study, the species diversity of each group was analyzed, and the alpha diversity boxes plotted (Fig. 1B). The other diversity index boxes are shown in Supplementary Fig. 2 and 3. Box plots of the top ve species in each group at the genus level are shown in Fig. 1C. The overall richness of the rhizosphere soil was higher than that of the root endophytes. The number of operational taxonomic units (OTUs) in the rhizosphere soil of diseased plants was higher than that associated with healthy plants, whereas in the root endophyte samples, the number of OTUs of healthy plants was higher than that of diseased plants (Fig. 1B). The abundance of Bradyrhizobium in healthy rhizosphere soil was higher than in diseased rhizosphere soil (Fig. 1C). In the healthy rhizosphere soil, the top three species were Bradyrhizobium, Micromonospora, and Streptomyces. The genus-level differences in the root endophytes between healthy and diseased plants were more obvious. The three most abundant species in the healthy root endophytes were Lysobacter, Pseudomonas, and Paraburkholderia. The top three species in diseased root endophytes were Pseudomonas, Lelliottia, and Variovorax. Lysobacter and Paraburkholderia were more abundant in healthy samples than in root rot diseased samples, whereas Pseudomonas and Lelliottia were more abundant in diseased root endophyte samples (Fig. 1C).
We used the MetaPhlAn2 [34] in the HUMAnN2 [35] pipeline to analyze the composition of the microbial communities (including bacteria, archaea, eukaryotes, and viruses), and obtained a species composition table for the rhizosphere soil and root endophyte samples (Supplementary Table 2). Based on the species composition table, we used the Graphlan [36] software to construct species composition circle diagrams for the rhizosphere soil and root endophyte samples (Fig. 2). We screened the top 100 most abundant species in the species composition table and classi ed them by phylum. As indicated by the colors in the species circle diagrams, there were four main phyla: Actinobacteria, Bacteroides, Firmicutes, and Proteobacteria. Species that did not classify into one of these four phyla were grouped as Others in the circle diagrams. The circle diagrams show that Proteobacteria occurred in the highest proportion in both the rhizosphere soil and root endophyte samples. The relative occurrence of the phyla in the rhizosphere soil and root endophyte samples, respectively, was: Actinobacteria: 14% and 7%, Bacteroides: 10% and 7%, Firmicutes: 20% and 6%, Proteobacteria: 50% and 77%, and Others: 6% and 3% (Fig. 2). The species composition heat maps together with the histograms show the differences at the genus level between the samples with different health statuses (Fig. 3). A total of 87 genera were detected in the rhizosphere soil samples and the top seven most abundant genera were Rhodopseudomonas, Actinoplanes, Burkholderia, Caulobacter, Ktedonobacter, Mesorhizobium, and Granulicella (Fig. 3A, C). The rhizosphere soil bacterial ora in the diseased and healthy rhizosphere soil samples from the sampling sites in Lijiang (LJ) and Qiubei (QB) showed opposite results. The abundance of Rhodopseudomonas was higher in the healthy rhizosphere soil at the LJ site (LJTC.2), whereas it was more abundant in the root rot diseased rhizosphere soil at the QB site (QBTB.1). Ktedonobacter was found only in the rhizosphere soil from QB, and was more abundant in the root rot diseased samples. In total, 108 genera were detected in the root endophyte samples (Fig. 3B, D). The eight most abundant genera were Pseudomonas, Burkholderia, Variovorax, A pia, Agrobacterium, Sphingobium, Rhodanobacter, and Janthinobacterium. Pseudomonas showed a high abundance in the root rot diseased root endophyte samples from LJ, and a high abundance in the healthy root endophyte samples from QB, except for one pair of samples (QBGB.3 and QBGC.3). The abundance of Burkholderia in the healthy root samples was higher than that in the root rot diseased samples. Variovorax and A pia showed high abundance in the healthy root endophyte samples from LJ. Overall, the abundance of Agrobacterium in the diseased root endophyte samples was higher than that in the healthy samples, and Sphingobium showed the opposite trend. The LEfSe software was used to analyze the abundance differences among species in each group, and LDA was used to estimate the effect of each difference feature [37]. There were eight species with signi cant differences in the root endophyte and rhizosphere soil samples (LDA > 4 for signi cant differences; Fig. 4).

Functional composition and functional pathways
We used HUMAnN2 [35] to determine the functional composition and functional pathways, as well as to stratify the functional pathways and produce a functional composition table. We used STAMP [38] to screen the signi cantly different pathways in the functional composition table (at the 95% con dence interval and P < 0.05; Fig. 5). There were 16 signi cantly different pathways in the rhizosphere soil. The pathways: CDP-diacylglycerol biosynthesis I/II, superpathway of L-threonine biosynthesis, and superpathway of L-isoleucine biosynthesis I were signi cantly up-regulated in the diseased rhizosphere soil samples, whereas the L-lysine biosynthesis III and Pentose phosphate pathway were signi cantly upregulated in the healthy rhizosphere soil (Fig. 5A). There were 12 signi cantly different functional pathways in the root endophyte samples. The pathways: superpathway of L-phenylalanine biosynthesis, 4-amino-2-methyl-5-phosphomethylpyrimidine biosynthesis, and Pyridoxal 5'-phosphate biosynthesis I were signi cantly up-regulated in the root endophyte samples, whereas in the healthy root endophyte samples, the pathways: L-valine biosynthesis, L-isoleucine biosynthesis I, and Pyruvate fermentation to isobutanol were signi cantly up-regulated (Fig. 5B).
Function annotation results of healthy and root rot diseased Panax notoginseng plants We de novo assembled the clean data, evaluated the quality of the contig, and then performed deredundancy, quanti cation, and obtained pure contig for subsequent functional annotation. We obtained 11.22 Gb contigs with an average contig N50 of 1, 287 bp (Supplementary Table 4). This we compared to the eggNOG [39] database, integrated the annotation results, and counted the annotated clusters of orthologous genes (COG) orthologous proteins in each group. The COG classi ed 24 functional descriptions (Supplementary Table 5 Table 5).
The Kegg orthologs (KO) function results for each group were counted (Supplementary Table 6). Among them, the TC group had a total of 7,898 KOs, the TB group had 8,078 KOs, the GC group had 9,886 KOs, and the GB group had 10,736 KOs. It can be observed that the number of KOs in the root endophytes was higher than in the rhizosphere soil, and the number of KOs in the diseased group was higher than in the healthy group. There were 6,469 core KOs in the four groups, accounting for 54.4% ( Supplementary Fig.   5). In addition, each group had unique KOs, which were 106 (0.9%), 124 (1.0%), 435 (3.7%), 921 (7.7%), sorted by TC, TB, GC, and GB, respectively. The GO term annotation results are mainly divided into three categories: molecular function (MF), cellular component (CC), and biological process (BP). The top 50 GO terms of each group were classi ed and enriched, and the identi ed GO terms were found to belong to the BP category ( Supplementary Fig. 6). The enrichment degree of the GO terms in the root endophytes (Supplementary Fig. 6C and D) was higher than that in the rhizosphere soil (Supplementary Fig. 6A and B). The pathways of the top three enrichments in the rhizosphere soil were the biological process, metabolic process, and organic substance metabolic process (Supplementary Fig. 6A and B). The carboxylic acid metabolic process and the oxoacid metabolic process were enrichments in the root rot diseased soil group that did not appear in the top 20 processes associated with the healthy rhizosphere soil (Supplementary Fig. 6A and B). The macromolecule metabolic process, cellular aromatic compound metabolic process, and organic cyclic compound metabolic process were signi cantly enriched in healthy root endophytes, and did not occur in the top 20 processes of the diseased root endophytes (Supplementary Fig. 6C and D).
We used Metascape [40] to perform a functional enrichment analysis of these genes and displayed them as a network diagram to facilitate understanding of the relationship between pathways and biological processes ( Supplementary Fig. 7). Most of the terms in each group showed a strong correlation ( Supplementary Fig. 6). Although some terms were highly enriched, they were not strongly associated with other terms, such as anatomical structure homeostasis ( Supplementary Fig. 6A). Independent terms included: diseases of metabolism, glycoside compound metabolic process, and cellular amino acid biosynthetic process ( Supplementary Fig. 6C).
The CAZy database consists of ve categories and one related module. Each category is subdivided into families. Among them, the glycoside hydrolases (GHs), which mainly hydrolyze and rearrange glycosidic bonds, number 167 families in total [41]. We found 123 GH families in the TC group, 127 in the TB group, 126 in the GC group, and 131 in the GB group (Supplementary Table 7). The number of GH families in the diseased samples was slightly higher than that in the healthy group. Similarly, there are 110 families of glycosyltransferases (GTs), mainly involved in the formation of glycosidic bonds. There were 70 and 75 GT families in TC and GC, respectively, and 71 and 83 GT families in the TB and GB groups, respectively. In addition, polysaccharide lyases (PLs) have 40 families, carbohydrate esterases (CEs) have 17 families, auxiliary activities (AAs) have 16 families, and carbohydrate-binding modules (CBMs) have 86 families. These families all occurred in slightly higher numbers in the diseased group than in the healthy group (Supplementary Table 7). Sequences compared to the CAZy database in each group also showed an upward trend. The TC group had the lowest number of reads matched to the CAZy database, the GB group had the most reads matched to the CAZy database, and the GT family matched ratio was the highest, followed by that of the GH family ( Supplementary Fig. 8).
The Resfam database analysis results showed that there were 15 classi cations of mechanisms with 94 functional descriptions ( Supplementary Fig. 9). Among them, the TB group had the most resistance genes (106,554), with an average of 17,759 resistance genes per sample. The resistance genes of the other groups were TC (89,250), GB (84,135), and GC (57,122) (Supplementary Table 8). Overall, there were more resistance genes in the rhizosphere soil than in the root endophytes, and the number of resistance genes in the diseased group was higher than in the healthy group. The top three functions of the resistance genes in the TC, TB GC, and GB groups, respectively, were consistently: ABC transporter (33.77%, 34.27%, 31.50%, and 36.23%), Gene modulating resistance (20.24%, 20.56%, 21.60%, and 20.12%), and RND antibiotic e ux (17.96%, 16.79%, 16.73%, and 14.92%). The ABC transporter, RND antibiotic e ux, and Gene modulating resistance functions in the root rot disease group were higher than those in the healthy group (TC and GC) (Supplementary Table 8).

Metagenomic data binning
After the clean data were assembled, de-redundant contigs were binned to mine the single bacterial genome. Ultimately, nine bins obtained in the TC group, 24 bins in the TB group, and two bins in the TB group had > 99% completeness. Thaumarchaeota had 99.51% completeness and the genome length was 1.40 Mb, and Clostridium had 99.01% completeness, and the genome length was 6.31Mb in the TB group (Supplementary Fig. 10 and Supplementary Table 9). A total of 41 bins in the GC group, two of which were 100% complete, namely Flavobacteriaceae (genome length = 5.00 Mb) and Bacteroidetes (genome length = 6.34 Mb). There were 91 bins in the GB group, one of which was 100% complete (Flavobacteriaceae, genome length = 4.3 Mb). We found that these two 100% complete Flavobacteriaceae genomes were two different species under the Flavobacteriaceae genus through similarity comparison. The GC content results showed that the abundance of orders in the rhizosphere soil was greater than that in the root endophytes (Fig. 6). These results are consistent with the ndings of a higher microbial diversity in the rhizosphere soil than that in the root endophytes (Fig. 1B).
The abundance of the puri ed single bacterial genome in each sample was determined by bin quanti cation, and the quantitative results were displayed in a heat map ( Supplementary Fig. 10). Bacteria (bin 2) were rich in the QBTC.3 sample and Sphingomonadaceae (bin 13) was rich in the QBGC.1 sample. Alphaproteobacteria (bin 3) were rich in the LJ diseased root endophyte samples (Supplementary Fig. 10 and Supplementary Table 9). We also used the Krona software for visualization ( Supplementary   Fig. 11).

Discussion
Microbial communities in different ecological niches of root rot diseased Panax notoginseng We found that the shape of the dilution curves for the rhizosphere soil and root endophyte samples were signi cantly different ( Supplementary Fig. 1). Among the rhizosphere soil samples (Supplementary Fig.  1A and B), only the healthy rhizosphere soil dilution curve shape changed noticeably, and the dilution curves of the diseased rhizosphere soil samples were uniform, whereas the shape of the root endophyte dilution curves was more distinct (Supplementary Fig. 1C and D). Research on poplar shows that the obvious change in the OTU dilution curve of root endophytes may be caused by the uneven distribution of the microbial community associated with the roots and other plant parts [42]. In addition, when comparing the alpha diversity of rhizosphere soil and root endophyte samples (Fig. 1B), we found that the microbial diversity of the rhizosphere soil was higher than that of the root endophytes. Studies have shown that the microbial community of rhizosphere soil is more diverse than that of root endophytes [43,44], and that there is a clear compartmental separation between the rhizosphere soil and root endophytes of plants [45][46][47][48][49][50][51]. Our results re ect this uniqueness of the microbial communities ( Fig. 3A and B).
Previous studies have shown that the bacteria of root endophytes are recruited from the rhizosphere soil [52,53], which may account for why the microbial diversity in the rhizosphere soil is higher than that in the root endophyte samples. Our ndings support this view.
Bacterial community differences between healthy and diseased Panax notoginseng plants We compared the differences in bacterial communities between healthy and diseased samples and found that Rhodopseudomonas was most abundant in the rhizosphere soil ( Fig. 3A and C), especially in the healthy samples from the LJ site. It is reported that a compound produced by Rhodopseudomonas can promote plant growth, and studies have shown that Rhodopseudomonas can induce resistance in plants [54]. In tobacco, Rhodopseudomonas induces tobacco mosaic resistance to protect against tobacco mosaic immunity [55]. Pseudomonas, the most abundant dominant bacterial species in the root endophyte samples in our study, was more abundant in the diseased samples from LJ ( Fig. 3B and D). In previous studies on P. notoginseng, Pseudomonas was predicted to be the pathogen causing root rot disease [69]. In previous studies, Pseudomonas was only identi ed at the genus level. We identi ed Pseudomonas to the species level, and found that the species with highest abundance was P. uorescens. Pseudomonas uorescens is known to protect many crop plants against soil-borne diseases caused by phytopathogens [70,71]. Studies have shown that P. uorescens, as a biological control agent, can effectively reduce root knot nematode infestation of cucumbers, and replace chemical measures for nematode control, resulting in an increase in cucumber yield that has the added bene t of reduced cultivation costs [72]. Screened under in vitro conditions, P. uorescens, as a biological control agent, has been shown to have antagonistic activity against pea root rot, with high biological control activity [73]. A recent report found that Pseudomonas is recruited from roots to resist plant pathogens, and Pseudomonas was used to verify the disease incidence of plants, where it was found to reduce the disease incidence of Arabidopsis [74]. We suggest that, in the root rot diseased samples, Pseudomonas may have been recruited by rhizosphere exudates to resist plant pathogens, causing a higher abundance of Pseudomonas in the diseased samples. In our study, the abundance of Burkholderia in the healthy root endophyte samples was higher than that in the diseased samples. This nding is consistent with previous studies [65-68], suggesting that Burkholderia is a bene cial bacterium for P. notoginseng. Variovorax has been shown to be harmful to the growth of P. notoginseng and to reduce the yield of P. notoginseng [75]. Our results showed that Variovorax had a higher abundance in most of the healthy root endophytic samples. We identi ed Variovorax to the species level, and found that the species with the highest abundance was V. paradoxus.
Studies have shown that Variovorax can participate in the interaction between plants and microorganisms by manipulating plant hormone levels to balance the growth of plant roots. Variovorax paradoxus can signi cantly inhibit potato tuber soft rot, and is considered to be a new type of potato tuber soft rot biological control agent [76]. In our study, the abundance of V. paradoxus in healthy root endophyte samples was high. This suggests that V. paradoxus may promote the growth of P. notoginseng and may be an effective biological control agent for root rot disease.
Eukaryotic microbial communities associated with healthy and diseased Panax notoginseng plants Our results showed that Ilyonectria mors-panacis had the highest ratio in the diseased root endophyte samples (Fig. 3E) corroborating the ndings of a previous study that I. mors-panacis is the main pathogen of P. notoginseng root rot [9]. Ilyonectria mors-panacis is also the main pathogen of P. ginseng root rot [77,78]. Pseudopyrenochaeta lycopersici is also a plant fungal pathogen that causes severe root rot and root knot disease [79]. In our study, P. lycopersici had a higher number of reads in the diseased samples, except for QBTC.3. Pseudopyrenochaeta lycopersici has not previously been reported in association with root rot diseased P. notoginseng, and we infer that P. lycopersici may be a pathogen of P. notoginseng root rot. Arthrobotrys oligospora ATCC 24927, a fungus that feeds on nematodes, is often used as a biological control agent for plant and animal parasitic nematodes [80]. The number of reads of this fungus was relatively high in the diseased samples in our study. Acrobeloides nanus is a type of nematode that feeds on bacteria and was mainly associated with diseased samples in our study. This could account for the high occurrence of A. oligospora ATCC 24927, which feeds on nematodes, in the diseased samples.
The interaction of functional genes and pathways helps improve plant resistance Amino acids play an important role in plant growth. Studies have shown that amino acids also play a vital role in pathogen infection as they are indispensable nitrogen sources for many nutritional pathogens. After infection with pathogens, changes in the expression of genes are involved in amino acid metabolism and transport [81, 82], so regulating the content of amino acids is critical to the growth and defense of plants. Studies have demonstrated that the increased expression of UMAMIT (usually multiple acids move in and out transporters) amino acid transporters can induce plant resistance to pathogens [83]. In addition, the existence of the amino acid-sensing mechanism in plants indicates that changing the level of amino acids by changing the metabolism or transport of amino acids may trigger a defense response. For example, overexpression of the cationic amino acid transporter CAT1 helps the system to develop resistance to Pseudomonas. Furthermore, resistance-related genes are upregulated [84]. Our study found higher amino acid transport and metabolism in healthy rhizosphere soil than in diseased rhizosphere soil. Based on previous research results, it was concluded that the up-regulation of amino acid transport and metabolism in soil and the enhancement of transport activity would help to resist the invasion of pathogens and improve the resistance of P. notoginseng (Supplementary Table 5). In a study on the resistance signal mechanism of the related tomato root-knot nematode disease, it was found that the protein with the greatest difference in the differentially expressed protein group belongs to energy production and conversion, because the defense response is an energy-consuming process, which may play a key role in the plant's disease resistance process [85]. In our study, we found that the energy production and conversion process in diseased roots is three times that of healthy roots and may be closely related to the disease resistance process of P. notoginseng (Supplementary Table 5).
The antibiotic resistance-related gene can help Panax notoginseng resist root rot disease The misuse of antibiotics has contributed to the widespread development of antimicrobial resistance among clinically signi cant bacterial species, so understanding the mechanism of antibiotics in P. metabolism, and is also the key transport family to the defensive process [88][89][90]. The proportion of ABC transporters in the two groups of diseased samples was higher than that in the corresponding two groups of healthy samples, which may be related to the disease resistance of P. notoginseng. Many studies have reported that gene modulating resistance helps plants to inhibit pathogenic bacteria and improve plant resistance [91][92][93][94][95]. Studies have shown that the RND antibiotic e ux mechanism has an e ux effect on a variety of antibiotics, and inhibiting these pumps is one of the ways to resolve the problem of antibiotic resistance [96][97][98]. Our study showed that the RND antibiotic e ux mechanism in diseased samples was greater than that in healthy samples. We speculate that the RND antibiotic e ux mechanism in the diseased samples was active and had an e ux effect on antibiotics. As a result, the reduction of antibiotics in the diseased samples weakens the resistance to pathogens. This phenomenon may induce plant roots to produce antibiotic e ux, making it easier for root rot to invade.

Panax notoginseng root rot CAZy genes
According to the CAZy database comparison results, the proportion of GHs in each group was the largest, and the proportion in diseased samples was higher than that of healthy samples. In a study of peanut stem rot caused by fungi, it was found that the number of GHs is largest in the pathogenic secreted protein [99]. In addition, in research on soybean seed rot, 149 plant cell wall-degrading enzymes were detected, most of which are GHs [100]. Our results are in agreement with these previous reports. The number of GHs in the diseased samples was higher than that in the healthy samples, indicating that the root rot-affected parts of P. notoginseng secrete more enzymes related to the GH family. Regarding the glycosyltransferases (GTs), in a study of wheat infection with Fusarium wilt, it was found that the GTs in wheat infected with F. graminearum showed an upward trend [101], which is also consistent with our results, indicating that infection with P. notoginseng root rot can cause GTs to increase. In short, the increase in the number of GH and GT families can predict the tendency of P. notoginseng to suffer from root rot ( Supplementary Fig. 8 and Supplementary Table 7).

Conclusions
In this study, we analyzed the rhizosphere soil and root endophyte microbial communities of P. notoginseng from different geographical locations using second-generation high-throughput sequencing. We showed that the structural variability of the root endophyte microbial community was higher than that of the soil, but the diversity of the microbial community in the rhizosphere soil was higher than that in the root endophyte samples. Among them, Rhodopseudomonas, Actinoplanes, Burkholderia, V. paradoxus, and P. uorescens can help P. notoginseng resist the invasion of root rot disease. Ilyonectria morspanacis and P. lycopersici are pathogenic bacteria of P. notoginseng root rot. The up-regulation of amino acid transport and metabolism in soil would help to resist pathogens and improve the resistance of P. notoginseng. The ABC transporter and Gene modulating resistance genes can improve the disease resistance of P. notoginseng. In addition, the increase in the number of GT and GH families may be a molecular manifestation of P. notoginseng root rot. This study demonstrated the microbial and functional diversity in the rhizosphere microbial community of P. notoginseng and provided useful information for understanding the microbial community in P. notoginseng root rot. The results of this study provide a basis for a more comprehensive and complete study of the molecular mechanism of P. notoginseng root rot and other plant rhizosphere microbial communities, and lays a foundation for the development of biological control agents and related molecular monitoring methods.
Twenty-four healthy and diseased rhizosphere soil and root endophyte samples were collected, totally. The samples were divided into four groups: healthy rhizosphere soil (TC), diseased rhizosphere soil (TB), healthy root endophytes (GC), and diseased root endophytes (GB), with three replicates in each group.
The samples of diseased root endophytes were collected from P. notoginseng plants suffering from root rot disease (the aboveground parts of diseased plants were wilted and yellowed, and the roots were necrotic), whereas the endophyte samples from healthy plants were collected from plants showing no signs of disease (Fig. 1A). Both diseased and healthy plants were gently lifted from the soil and shaken to remove the dirt and other impurities attached to the surface of the roots. Thereafter, the roots were carefully rinsed with sterile water until the root surfaces were free of impurities. A prepared sterile scalpel was used to cut off the cleaned roots into 50 ml sterile centrifuge tubes, which were immediately placed into liquid nitrogen for preservation. When collecting the rhizosphere soil from diseased and healthy plants, each plant was gently removed from the soil and a standard soil ring knife was used to collect the samples. Rhizosphere soil 20 cm under the ground was collected from each plant and placed into 50 ml sterile centrifuge tubes which were stored in liquid nitrogen for later use. While selecting the diseased samples, another routine sample was selected within a radius of 1 m around diseased sample as a control. Were six samples taken from diseased plants (three rhizosphere and three endophytes) and six from healthy plants (three rhizosphere and three endophytes) at each sampling site.

Metagenomic sequencing
A Qubit 2.0 uorometer (ThermoFisher Scienti c Inc., Waltham, MA, USA) was used to accurately quantify the DNA concentration. After the DNA samples were quali ed, they were randomly interrupted using a Covaris Focused-ultrasonicator (model; Covaris, Inc., Woburn, MA, USA), and then the entire library was prepared following the steps of end repair, A-tailing, sequencing adapters, puri cation, and PCR ampli cation. After the library was constructed, it was diluted, and the inserted fragments of the library were detected using an Agilent 2100 bioanalyzer (Agilent, Santa Clara, CA, USA). After the size of the inserted fragments met expectations, the effective concentration of the library was accurately quanti ed using the Q-PCR method to ensure the quality of the library. After the library was quali ed, different libraries were pooled to ow cells according to the effective concentration and target data volume requirements. After the cBOT was clustered, a NovaSeq 6000 high-throughput sequencing platform (Illumina, Inc., San Diego, CA, USA) was used for sequencing. Each root endophyte sample had approximately 20-30 Gb of sequencing data, and each rhizosphere soil sample had approximately 10-30 Gb of sequencing data.

Data quality control
The FastQC software was used to estimate the quality of the original sequence of 24 samples to generate a quality evaluation report [102]. The Kneaddata tool (http://huttenhower.sph.harvard.edu/kneaddata) was used to perform quality control and de-hosting of the P. notoginseng genome sequence. The Kneaddata process relies on Trimmomatic to remove primers [103], linkers, and low-quality sequences (parameters: sequence quality ≥ 20, minimum sequence length ≥ 50 bp). Bowtie2 [104] was used to compare the genome of P. notoginseng [105] to screen out the sequence of the non-host genome, and nally obtain a clean sequence (Table 1). Cleaned reads were used for downstream analyses.
Analysis of the microbial diversity of Panax notoginseng root rot The HUMAnN2 software was used to calculate the species and functional composition [35]. HUMAnN2 does not consider double-ended information in the analysis, so the double-ended sequence after quality control is combined as one input le, and the other parameters are the default parameters. MetaPhlAn2 was used to calculate the species composition [34], and Bowtie2 [104] was compared to the nucleic acid sequence database that comes with HUMAnN2 and generates a species abundance composition table for each group. We used GraPhlAn software to analyze the top 100 species in the species composition , and to compare the differences in species composition between groups. Finally, LEfSe was used for the species difference analysis [37].
The LEfSe used the Kruskal-Wallis and Wilcoxon signed rank tests, signed linear judgment analysis (LDA) logarithmic score, and related P-values to identify species with signi cant differences in each group (LDA > 4 was considered as signi cantly different species) [107].
Kraken2 is a high-precision metagenomic sequence classi cation software based on the k-mer algorithm [33]. Kraken2 is used for species annotation at the read level, which can quickly classify sequencing reads for species classi cation. We compared the data to Kraken2's standard library (human, bacteria, archaea, viruses, and vectors), and the fungi, protozoa, and plasmid databases. software was used to calculate the functional pathways (P < 0.05) with signi cant functional differences between the groups and a histogram was drawn (Fig. 5).
Assembly results of metagenomic data MEGAHIT [112] was used for each sample assembly. The contigs of each sample were mixed by group, and the quality of contigs was assessed by QUAST [113]. MetaProdigal [114] was used for gene prediction, and cd-hit-est [115] to cluster and de-redundancy the predicted genes to construct a nonredundant gene set (similarity ≥ 95%, coverage ≥ 90%). The non-redundant gene set obtained was used for subsequent analysis. The nucleic acid sequences were quanti ed using Salmon [116]. The parameter was set to the --meta metagenomic model so that all reads could be binned, and the abundance of genes in each sample was obtained.
Multi-database annotation of P. notoginseng root rot The nucleic acid sequence was translated into a protein sequence and compared to the eggNOG [39] database using the Diamond [111] software. The eggNOG database integrates annotation information, such as GO, COG, and KO. We performed multiple sequence alignment between the query and the eggNOG database, to determine the conserved sites and analyze their evolutionary relationships [39]. We sorted out the possible gene names of each sequence predicted by the eggNOG database and eliminated the duplication results. The eggNOG annotation results were used for subsequent analysis. Metascape [40] was used to perform a functional enrichment analysis of these genes to facilitate understanding of the relationship between pathways and biological processes ( Supplementary Fig. 6). Similarly, Diamond [111] was used to align the protein sequence with the carbohydrate-active enzymes database (CAZy) [117]. The CAZy database, providing an understanding of the nature and extent of the complex carbohydrate metabolism focuses on the differences in carbon source metabolism between species, and comparing it to the ResFam database using Diamond [111,118], we predicted the unknown resistance gene and obtained the mechanism and function of the resistance genes in each sample.

Binning analysis of metagenomic data
We used MetaWRAP [119] to excavate the draft genome of a single strain. This process integrates three popular binning software. We chose the MaxBin2 [120] and MetaBAT2 [121] software for binning. The deredundant contigs of each group were binned, and the bins were puri ed. We evaluated and comprehensively analyzed the results to obtain better results. The Checkm database [122] requires a completeness of more than 70% and a pollution rate of less than 5%, which was used for puri cation.
After puri cation, the Blobology module was used to visualize the GC content and abundance of contigs by comparing the NCBI nt and tax databases. Samlon [116] was used to quantify the bins and calculate the abundance of a single bacterial genome in each sample. Subsequently, each contig in the bin was annotated, and the species of each bin was estimated. Finally, Krona was used to visualize the results of bin annotation [123].

Declarations
Ethics approval and consent to participate Not applicable

Consent for publication
Not applicable Availability of data and materials The sequencing data for all samples have been deposited to China National GeneBank DataBase (CNGBdb) with the accession number CNP0001803. The authors declare that all data necessary for con rming the conclusions presented in the article are represented fully within the article and additional les.

Competing interests
The authors declare that they have no con ict of interest.  Note: Raw reads are the data volume of paired-end sequencing, and Clean reads are sequences that are still paired after quality control and de-hosting.