High-throughput sequencing-based analysis of the composition and diversity of endophytic bacterial community in seeds of upland rice

Upland rice is an ecotype crop resulting from the long-term domestication and evolution of rice in dry land without a water layer. Generally, the stems and leaves are thick and luxuriant, while the leaves also typically broad and light. The root system is developed with abundant root hair, and the osmotic pressure of the root and cell juice concentration in the leaves is high, while this plant is drought-resistant, heat-resistant, and water absorbent. This study aims to reveal the “core flora” of the endophytes in upland rice seeds by examining their diversity and community structures. It further intends to reveal the impact of the soil environment on the formation of endophyte community structures in upland rice seeds by comparing the environmental soil microorganisms in upland rice habitats. In this study, high-throughput sequencing technology based on the Illumina Hiseq 2500 platform was used to investigate the structure and diversity of endophytic bacterial communities using upland rice varieties collected from different locations and soil samples from unified planting sites as materials. Here, 42 endophytic OTUs were found to coexist in the 14 samples. At the phylum level, the first dominant phyla in all the samples were Proteobacteria (93.81–99.99%). At the genus level, Pantoea (8.77–87.77%), Pseudomonas (1.15–61.58%), Methylobacterium (0.40–4.64%), Sphingomonas (0.26–3.85%), Microbacterium (0.01–4.67%) and Aurantimonas (0.04–4.34%), which represent the core microflora in upland rice seeds, served as the dominant genera that coexisted in all the upland rice seeds tested. This study significant for the isolation, screening, functional evaluation, and re-action of various functional microorganisms in upland rice to improve its agronomic traits. It also provides a specific reference for the interaction between microorganisms and plants.


Introduction
Endophytes are microorganisms that live in the tissues and organs of healthy plants at certain or all stages without causing substantial damage to the host plants, establishing a harmonious joint relationship. Endophytes usually enter the plant through the roots, but they can also enter through part of the plant visible aboveground. Once they enter the root system, endophytes can infect the adjacent plant tissues. Therefore, endophytes are ubiquitous in both Communicated by Erko Stackebrandt.
Zhishan Wang and Yongqiang Zhu have contributed equally to this work.

Electronic supplementary material
The online version of this article (https ://doi.org/10.1007/s0020 3-020-02058 -9) contains supplementary material, which is available to authorized users. the aboveground and underground parts of the plant host, even the seeds, thus having a positive impact on the development of the plant (Zinniel et al., 2002;Chebotar et al. 2015). Endophytes use multiple functions to provide a variety of benefits to host plants, improving their growth and health, and effectively protecting them from pathogens (Agler et al. 2016;Liu et al. 2017a, b;Xu et al. 2019;Afzal et al. 2019;Sánchez-Cruz et al. 2019). In diverse environmental conditions, the communication and interaction between endophytes and plants are stronger than those between the rhizosphere bacteria (Coutinho et al. 2015). Several studies have shown that plant variety, genotypes, and geographical locations have a significant impact on the establishment of microbial diversity and community structures in plants (Liu et al. 2017aEdwards et al. 2015). Although endophytes were discovered more than 100 years ago, its existence has been ignored until the 1930s. As a new microbial resource, endophytes have attracted increasing attention. Especially in recent years, the interaction between endophytes and plants has gradually become a research focus in the field of plant science, agronomy, thermatology, and ecology (Vandenkoornhuyse et al. 2015).
As a unique reproduction method of gymnosperms and angiosperms, seeds can survive for decades in suspension. When the external environment is suitable, they will germinate and grow rapidly into new plants (Nelson 2004;Steinbrecher and Leubner-Metzger 2017). It has been proven that seeds contain a considerable amount of endophytic and epiphytic microflora besides the basic nutrients needed for development and that the composition and vertical transmission of this microflora may directly or indirectly affect seed germination, plant growth, and health (Links et al. 2014;Haimin et al. 2018;Nelson 2017;Shade et al. 2017;Truyens et al. 2015). Truyens et al. (2015) found that the main endophytic bacteria in seeds belong to the γ-Proteobacteria phylum, followed by the Actinobacteria, Firmicutes, and Bacteroidetes phyla. Microorganisms such as Bacillus, Paenibacillus, Pseudomonas, Micrococcus, Staphylococcus, Acinetobacter and Pantoea are easily detected and isolated in various plant seeds . Compared with research involving the rhizosphere and foliar microbial communities, the understanding of endophytes in seeds remains limited.
The development and promotion of biotechnology have facilitated a change in the research methods used to examine endophytes in plant seeds. Previous techniques for investigating endophytes include the conventional culturedependent method (Liu et al. 2009;Jiang et al. 2013), the 16S rDNA clone library technique Zou et al. 2012;Chen et al. 2014;Haimin et al. 2018), and deformable gradient gel electrophoresis (DGGE) (Hardoim et al. 2012). Advances in high-throughput sequencing technology have made it widely applicable for analyzing the endophytic community structures and diversity of plant seeds. Liu et al. (2017a) used this technology to evaluate the endophytic community structure and diversity in a variety of corn seeds, inferring the impact of endophytic plants on the environmental adaptability and vertical propagation of different corn varieties, providing a basis for the cultivation of corn varieties in the future Rybakova et al. (2017) analyzed the microbial composition of Brassica napus seeds using highthroughput sequencing technology and found that the microbial structure mainly depended on the cultivated varieties, affecting the interaction between symbionts and pathogens. Wang et al. (2016) used the same method to preliminarily reveal the core Actinobacteria in rice roots, stems, and grains.
Upland rice is an ecological crop that results from the long-term domestication and evolution of rice in dry land without a water layer. Compared with traditional rice crops, upland rice is more drought-resistant, drought-tolerant, and can adapt to an arid climate, significantly saving labor and production costs (Kumar and Ladha 2011;Xia et al. 2019). However, up until now, research involving upland rice has primarily focused on functional genes, genetic diversity, and growth-promoting microorganisms (Lyu et al. 2014;Tuhina-Khatun et al. 2015;Ferrari et al. 2018;Braga et al. 2018;de Sousa et al. 2018). Minimal studies are available that involve the upland rice endophytes, both domestically and abroad, especially those that utilize high-throughput sequencing technology.
Therefore, to gain a deeper understanding of the community structures and diversity of endophytic bacteria in upland rice seeds, further exploration of the "core flora" of endophytic bacteria in upland rice is required. The results can be used to determine whether the soil environment is related to the formation of endophytic bacteria in upland rice seeds. Furthermore, plant growth and development, as well as highthroughput sequencing and related bioinformatics analysis, were performed based on the Illumina Hiseq 2500 platform to study 14 different varieties of upland rice seed and soil samples collected from the Sanya Division Farm in Hainan Province, where they are cultivated intensively. It is worth mentioning that 11 of the 14 different varieties of upland rice seed resources were collected from various regions in the country, while the remaining three were imported varieties. The samples are not only precious and rare but also remarkably representative.

Upland rice seeds sampling
The 14 upland rice seeds collected in different places used in this research were provided by Hunan Hybrid Rice Research 1 3 Center. Detailed information about the rice seeds is shown in Table 1. All samples were uniformly planted on the Sanya Division Farm in Hainan Province, and seed samples were collected for processing after maturity.

Sample surface sterilization and treatment
One replicate of each sample was collected in this study. Firstly, the husks of each upland rice seed sample were removed by a small sheller Then, under aseptic conditions the following operations were performed in the order listed: husked seeds were washed three times with prepared sterile water; 2.5 g of seeds were placed in a clean and sterile 50 mL tube containing 25 mL of phosphate buffer (per liter: 7.15 g of NaH 2 PO 4 ·2H 2 O, 22.04 g of Na 2 HPO 4 ·12H 2 O, 200 µL of Silwet L-77) Liu et al. 2019), and the seeds were sonicated twice by an Ultrasonic Processor Scientz-IID sonicator (NingBo Scientz Biotechnology Co., Ltd., China) at low power (237.5 W; 950 W × 25%) in an ice bath for 5 min (alternating thirty 2-s bursts and thirty 2-s rests) Liu et al. 2019). To validate that the surface was sterilized, sterile tweezers were used to press surface-sterilized seeds into LB medium (LUQIAO), and the samples were incubated at 30 °C for 72 h.

DNA extraction
Five gram of surface-sterilized upland rice seeds from each sample was frozen with liquid nitrogen and was quickly ground into a fine powder with a pre-cooled sterile mortar, and then the DNA was extracted using the FastDNA ® SPIN Kit for Soil (MP Biomedicals, Solon, OH, USA) following the manufacturer's instructions of the Kit.

Sequence data processing
The assembly of paired FASTQ files was performed by Mothur (version 1.39.0) (Schloss et al. 2011). Briefly, paired sequence reads were assembled after removing raw reads with ambiguous bases or low quality, such as read length < 50 bp, average Qscore < 25, or reads not matching the primer (pdiffs = 0) and barcode (bdiffs = 0). The highquality DNA sequences were aligned to SILVA reference database (V119) (Quast et al. 2013), and using chimera. uchime module to remove chimera sequences. Then the reads were classified and grouped into OTUs (Operational taxonomic units) under the threshold of 97% identity.

Data statistics
Community richness, evenness and diversity analysis (Shannon, Simpson, ACE, Chao and Good's coverage) were performed using Mothur. Both PCoA and NMDS were analyzed based on the tayc matrix by mothur. The t test (with 95% confidence intervals) was used to determine whether the means of evaluation indices were statistical difference, and p value < 0.05 were considered as significant standard. Taxonomy was assigned using the online software RDP classifier (Wang et al. 2007) at default parameter (80% threshold) based on the Ribosomal Database Project (Cole et al. 2009). Genera and family abundance differences between samples were analyzed by Metastats (White et al. 2009). Spearman correlation coefficient between two variables was calculated using the R command "cor.test".

Quality control of sequencing data
The raw high-throughput sequencing data was submitted to the NCBI database with BioProject number PRJNA661136. Based on the barcode and front-end primer information, the quality control sequences were separated into 14 sets of sequence files, and a total of 629,504 high-quality sequences were obtained, with an average of 44,964 sequences per sample (Supplementary Table S1), and a minimum of 28,388 reads was applied as the criteria for data normalization.  Table 2), and the endophytic bacterial diversity and richness of sample 19H005 were greatest, while the richness is lowest in sample 19H010. And rarefaction curve and Rank abundance curve also showed that the diversity and richness of sample 19H010 was significantly lower than others (Supplementary Fig. S1 and Fig. S2).

The diversity of endophytic bacteria in upland rice seeds
In order to show the shared endophytic OTUs between duplicate samples, the statistical results found that 42 endophytic OTUs coexisted in the 14 sample seeds. And each sample contained its unique OTUs. The proportions of unique OTUs were 51. 28%, 33.88%, 40.49%, 53.69%, 46.98%, 46.83%, 50.38%, 55.38%, 45.57%, 40.96%, 48.15%, 57.22%, 43.89%, 51.90% in sample 19H001, 19H002, 19H004, 19H005, 19H009, 19H010, 19H016, 19H020, 19H023, 19H025, 19H026, 19H027, 19H028 and 19H031, respectively, which indicated that differences in upland rice genotypes to some extent have an impact on endophyte composition in seeds. Furthermore, PCA results can reflect similarity in the endophytic community structures among the different upland rice seed genotypes. As shown in Fig. 1, the different samples were able to separate from each other on both the PC1 (10.09%) and PC2 (9.20%) axes in PCA, as well as in NMDS (Fig. 2), but the separation distance is not very large, which illustrated that the endophyte community structures in the fourteen upland rice seed samples were different, but the differences were not significant.

Endophytic bacterial community structures in upland rice seeds
As shown in Fig. 3, at the phylum level, Proteobacteria was the dominant endophytes that coexisted, in different proportions, in each of the different samples of upland rice seeds. The results indicated that the abundances distribution of bacterial endophytes had some discrepancies among the different upland rice seed samples, which was in line with the results obtained in the heat map in Fig. 4. In the different upland rice seed samples, the abundance of the dominant phyla Proteobacteria ranged from 93.81 to 99.99%. And the abundance of the dominant phyla Actinobacteria ranged from 0.06 to 5.47% in the different upland rice seed samples except sample 19H002. At the genus level, every upland rice seed sample also had dominant endophytes, including mainly Pantoea (8.77-87.77%), Pseudomonas (1.15-61.58%), Methylobacterium (0.40-4.64%), Sphingomonas (0.26-3.85%), Microbacterium (0.01-4.67%) and Aurantimonas (0.04-4.34%) (Fig. 5). The dominant genera of every upland rice seed sample is shown in Table 3, and the results indicated that the abundances distribution of bacterial endophytes had distinct discrepancies among the different upland rice seed samples. According to the above results, the commonly dominant genera in upland rice seeds were Pantoea, Pseudomonas, Methylobacterium, Sphingomonas, Microbacterium, Aurantimonas, Agrobacterium, Curtobacterium, Erwinia, Buttiauxella and Magnetospirillum.

Discussion
Rice, as the most important cereal crop in the world, consists of four ecosystems: irrigation, rainfed lowland, deep water, and rainfed highland. Although a considerable part of rice cultivation is based on the deepwater and irrigation ecosystems, drought has always been one of the most disastrous pressure sources, causing severe losses in the annual output of rice (Luo 2010;Guimarães et al. 2016). Compared with traditional rice planting, the upland rice ecosystem exhibits stronger water-saving ability and climate adaptability and can significantly reduce greenhouse gas emissions. However, due to water supply challenges, weed infection, diseases, and pests, lack of suitable cultivated varieties, its productivity is limited by nearly 50% of the production potential. Therefore, the proportion of the upland rice ecosystem in the global rice production area is exceedingly small, accounting for about 10%, and is mainly concentrated in some mountainous regions of Asia, Africa, and Latin America, growing in aerobic, rainfed conditions (Galinato et al. 1999;Bernier et al. 2008;Kumar and Ladha, 2011;Bridhikitti and Overcamp, 2012). Related research into upland rice is of great significance for rice breeding and drought resistance. In addition to genetic levels and the external environment, upland rice endophytes are an essential consideration, and recent reports show that upland rice endophytes significantly impact plant health and growth. Therefore, this study uses high-throughput sequencing technology to explore the endophytic bacteria in upland rice seeds form various regions to reveal their community structure and diversity, which not only provides a unique and innovative perspective but is also significant for follow-up research.  Endophytic bacteria are common in many plants, but until now, most studies involving endophytic bacteria in rice seeds have focused on rice, and few reports are available that explore this subject in upland rice seeds. In this research, 629,504 high-quality sequences and 11,485 OTUs of endophytic bacteria were obtained via high-throughput sequencing, which reflected the composition of the upland rice seed endophytes. The results indicated that the major microbial groups were similar among the 14 upland rice seed samples and shared 42 endophytic OTUs. Previous reports have shown that Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria common in plant seeds (Truyens et al. 2015;Hardoim et al. 2012), while Proteobacteria also represented the main dominant phyla of endogenous bacteria in upland rice seeds in this study.
The major microbial groups and community structures revealed in this study were similar among the 14 upland rice samples. The overlapping areas of the Venn diagram are often used to represent shared microbiota between multiple samples from a particular ecological scale ( Vandenkoornhuyse et al. 2015), but due to the large number of samples, the Venn diagram could not be drawn in this study. The Venn diagram data (Supplementary Table S2) of the 14 genotypes showed that 0.37% of the endophyte OTUs were shared by all the samples, of which the abundance accounted for > 2.04% of the total OTUs in each genotype. The primary shared genera were Pantoea, Pseudomonas,Methylobacterium,Sphingomonas,Microbacterium,Aurantimonas,Agrobacterium,Curtobacterium,Erwinia,Buttiauxella, was the first dominant genus shared by all 14 samples, while other shared genera were represented by Pseudomonas (1.15-61.58%), Methylobacterium (0.40-4.64%), Sphingomonas (0.26-3.85%), Microbacterium (0.01-4.67%), and Aurantimonas (0.04-4.34%). According to related research reports, Pantoea is the dominant genus in many plants, including rice, Salvia miltiorrhiza, maize, turfgrass, mulberry, hemp, and more Chen et al. 2018Chen et al. , 2020Liu et al. 2017a;Xu et al. 2019;Scott et al. 2018), while Pantoea species are often reported to promote plant growth. For example, Pantoea agglomerans C1 exhibits high biotechnological potential as plant growth-promoting bacteria (PGPB) in soils polluted with heavy metals (Luziatelli et al. 2020). Not only does Pantoea agglomerans Pa promote considerable growth of wheat seedlings, exhibit high chlorophyll content, lower the accumulation of proline, and favor K + accumulation in the inoculated plants, but it also provides significant salt stress relief while displaying plant growth-promoting (PGP) activity. Pantoea sp. Strain 1.19 is isolated from the rice rhizosphere and is effective in promoting the growth of legumes and non-legumes (Megías et al. 2017). In addition, Nascimento et al. (2020) found that Pantoea phytobeneficialis MSR2 possesses an astonishing number of plant growth promotion genes as a rare group of Pantoea strains, including those involved in nitrogen fixation, phosphate solubilization, 1-aminocyclopropane-1-carboxylic acid deaminase activity, indoleacetic acid, and cytokinin biosynthesis, and jasmonic acid metabolism.
Other dominant genera, such as Pseudomonas, Methylobacterium, Sphingomonas, Microbacterium, Aurantimonas, Agrobacterium, Curtobacterium, Erwinia, Buttiauxella and Magnetospirillum are also common endophytic bacterial groups in plant seeds. Furthermore, Pseudomonas, Sphingomonas, and Microbacterium, as endophytic dominant genera, were also isolated or detected from the seeds of rice, peanuts, browntop millet, Marama beans and more Zhang et al. 2018;Jiang et al. 2013;Sobolev et al. 2013;Verma and White 2018;Chimwamurombe et al. 2016). It has been reported that many species of Pseudomonas, Sphingomonas and Microbacteriumcan promote plant growth by improving the nitrogen-fixing ability of plants or secreting secondary metabolites with antibacterial and plant growth benefits. For example, nitrogen-fixing Pseudomonas stutzeri A15, isolated from the rhizosphere and inner layer of rice, significantly encourages the growth of rice seedlings in greenhouse conditions, while the nitrogen-fixing effect of this strain is superior to that of ordinary chemical nitrogen fertilizers (Pham et al. 2017). Pseudomonas chlororaphis and Pseudomonas aurantiaca, isolated from cacti, cotton, and gramineous plants, can produce secondary metabolites and promote plant growth (Shahid et al. 2017). A study by Chu et al. (2019) found that the strain Pseudomonas PS01, isolated from the corn rhizosphere, can increase the germination rate of Arabidopsis seeds under high salt stress. It was revealed that the Sphingomonas trueperi NNA-14, Sphingomonas trueperi NNA-19, Sphingomonas trueperi NNA-17, and Sphingomonas trueperi NNA-20 strains isolated from the rhizosphere soil, stems, and roots of giant reeds and switchgrass promoted plant growth. Of these, NNA-14 significantly increased the root length, specific surface area, and fine root number of corn, increased the N, Ca, S, B, Cu, and Zn content of corn, while NNA-19 significantly increased the dry root weight and root tip number of wheat, as well as the calcium content of the wheat . Sphingomonas sp. cra20, which can improve the growth rate of Arabidopsis plants under drought stress, promotes the developmental plasticity of the Arabidopsis root system and changes the microbial community structure in the rhizosphere ). In addition, Liu et al. (2015) isolated two yellow bacterial strains, NBD5 T and NBD8, from the nori branch, and identified them as a new species of Sphingomonas, which provided precious biological resources for further study regarding the relationship between Sphingomonas and plant growth. Research indicated that the volatiles from the root-associated bacteria of the genus Microbacterium could promote the growth of different plant species without direct and long-term contact between the bacteria and the plants. This process may be attributed to the regulation of sulfur and nitrogen metabolism (Cordovez et al. 2018). Moreover, research shows that the secondary metabolites produced by some species of Microbacterium have a significant inhibitory effect on some pathogenic bacteria in the host plant (Lopes et al. 2015;Mannaa et al. 2017;Savi et al., 2019). Researchers often isolate Methylobacterium from plants such as rice, barley, and legumes Chen et al. 2019;Tani et al. 2015;Andrews and Andrews 2017). Several species of Methylobacterium have been reported to act as PGPB and Hg-resistant bacteria (Durand et al. 2017;Antunes et al. 2017;Grossi et al. 2020). In addition, Tani et al. (2015) found that Methylobacterium can not only use the methanol emitted by plants as carbon and energy sources, promote plant maturity, and increase particle size. Selection pressure is critical to achieving growth promotion. Some species of Buttiauxella can improve plant growth and cadmium accumulation while exhibiting strong radiation tolerance Beblo-Vranesevic et al. 2018). It is worth mentioning that many species of Curtobacterium can improve the drought resistance and metal tolerance of plants (Silambarasan et al. 2019;Bourles et al. 2019). Some species of the Sphingomonas, Curtobacterium, and Buttiauxella genera are significant in examining the drought resistance of upland rice. Furthermore, the Aurantimonas and Erwinia genera have also been identified in plants (Liu et al. 2016;Li et al. 2018;Borruso et al. 2017). Some Aurantimonas species display antibacterial activity, and researchers have previously isolated an Aurantimonas aggregata sp. nov. in extreme environments (Pereira et al. 2017;Li et al. 2017). Erwinia amylovora often lead to fire blight disease, causing considerable economic losses to apples and pears (Borruso et al. 2017).
Although minimal studies are available involving the endophytes of upland rice seeds, the level of research on this subject is relatively mature. Liu et al. (2019) used high-throughput sequencing technology to investigate the diversity and community structures of endophytic bacteria in super hybrid rice, "Shenliangyou 5814," and its parents, revealing that the dominant shared genera are Pantoea, Methylobacterium, Sphingomonas, Rhizobium, Microbacterium, and Pseudomonas. Zhang et al. (2018) used the same method to explore five different genotypes of rice seeds, and the results showed that the common dominant genera were Pantoea, Acinetobacter, Xanthomonas, Bacillus, Flavobacterium, Stenotrophomonas, Neorhizobium, and Pseudomonas. Walitang et al. (2019) also introduced various bacterial species of Herbaspirillum, Microbacterium, Curtobacterium, Stenotrophomonas, Xanthomas, and Enterobacter during their research, indicating that these may be part of the core endophytic microflora that is prevalent and dominant in rice seeds. A comparison of the results mentioned above with the results of this study revealed the presence of both common dominant genera and unique dominant genera in the endophytes of upland rice seeds. The first dominant genus was Pantoea, further indicating the evolutionary homology of upland rice and rice at the plant endophyte level. Therefore, there is an adaptive process during the evolution of the interaction between microorganisms and plants. Based on this, it can be inferred that upland rice and rice may choose to retain the basic microorganisms needed for survival and development during the evolutionary process, preserving some microorganisms to assist with environmental adaptation.

Conclusion
Pantoea, Pseudomonas, Methylobacterium, Sphingomonas, Microbacterium and Aurantimonas served as major core endophytic bacteria in fourteen upland rice seed samples in this study. Endophytic diversity and community structure of 14 upland rice seed samples were not significantly different, but there were differences in abundance. It is worth mentioning that this is the first time that high-throughput sequencing technology has been used to explore the diversity and community structure of endophytic bacteria in upland rice seeds, and the study also found that there is a large similarity between the dominant bacteria of upland rice seeds and the dominant rice bacteria. In addition, the two genus Curtobacterium and Buttiauxella that have been reported to be resistant to radiation, drought and metal are first identified in upland rice seeds.