Composition and Diversity of Endophytic Bacterial Community in Seeds of Upland Rice Resources from Different Origin Habitats in Yunnan Province of China

Upland rice has the characteristics of strong drought tolerance and wide adaptability. Cultivating upland rice with high yield and high quality can solve the contradiction between food shortage, water shortage, and population increase in countries all over the world, and is of great signicance to the sustainable development of agriculture. This study aims to reveal the "core microbiota" of the endophytic bacteria in upland rice seeds in the Yunnan Province of China by examining their diversity and community structures. Through the correlation analysis with upland rice habitat environmental factors, the effects of climate and altitude on the structure and diversity of endophytic bacterial community in upland rice seeds were further revealed. In this study, high-throughput sequencing technology based on the Illumina Miseq platform was used to investigate the structure and diversity of endophytic bacterial communities using 12 upland rice variety seeds from different areas in Yunnan Province of China as materials. Here, 39 endophytic OTUs (0.68%) were found to coexist in all samples. At the phylum level, the rst dominant phyla in the 12 seed samples were Proteobacteria (66.92–99.98%). At the genus level, Pantoea (9.75– 99.24%), Pseudomonas (0.11–37.24%), Curtobacterium (0.01–19.90%), Microbacterium (0.01–14.95%), Methylobacterium (0.40–5.86%), Agrobacterium (0.01–4.53%), Sphingomonas (0.04–1.56%), Aurantimonas (0.01–1.45%) and Rhodococcus (0.11–1.09%), which represent the core microbiota in upland rice seeds, served as the dominant genera that coexisted in all the upland rice seeds tested. Environmental factors such as temperature, precipitation and altitude have great inuences on the structure of endophytic bacterial community in upland rice seeds. This study is of great signicance to explore the relationship between upland rice and its endophytic bacteria and to tap the resources of drought-tolerant bacteria to improve the yield of local upland


Introduction
Plant endophyte is a kind of important microbial resource that lives in various tissues and organs of healthy plants at a certain stage or all stages without causing infection symptoms. In the process of longterm interaction, plants and endophytes have formed a symbiotic unit and become an important part of plant evolution. Through a large number of existing studies, it can be found that there are a large number of endophytic bacteria in the plants studied, which settle in the internal tissue of the host plant and form a series of mutually bene cial and symbiotic relationships (Hassani et al. 2018). Many studies have also found that the establishment of plant endophyte diversity and community structure is closely related to plant variety, genotype, growth environment and geographical location (Edwards et  Rice is the main food crop in China, and China is the country that produces the most rice in the world.
However, the shortage of water resources in China has a great impact on rice cultivation, which forces researchers to step up their research on upland rice to deal with the problem. Upland rice is a kind of ecotype crop that is more tolerant and resistant to drought than rice. It is mainly planted in the Yellow-Huaihe River Basin of China and some places with insu cient water resources or uneven space and time.
In-depth study of endophytic bacteria in upland rice seed is not only conducive to the improvement of its subsequent yield, but also of great help to explore the mechanism of drought tolerance. Based on the Illumina Miseq platform, the diversity and community structure of endophytic bacteria in 12 upland rice seeds from Yunnan Province of China were analyzed to explore the similarities and differences of endophytic bacteria in upland rice seeds growing in Yunnan Province of China. On this basis, combined with the temperature, humidity and altitude of each variety origin area, the effects of environmental factors of upland rice origin areas on the diversity and community structure of endophytic bacteria in upland rice seeds were studied. Besides, combined with our previous studies on the diversity and community structure of endophytic bacteria in upland rice seeds in different areas, we can further clarify the "core microbiota" of upland rice seeds.

Materials And Methods
The source of upland rice seeds The 12 upland rice seed samples were provided by Hunan Hybrid Rice Research Center. The origin areas and related information of all samples are shown in Fig. 1 and Table 1. The samples were transferred to sterile bags, sealed and stored at 4 ℃ until used.

Sample surface sterilization and treatment
Three replicates of each sample were selected in this study. Firstly, the husks of each upland rice seed sample were removed by a small sheller. Then, under aseptic conditions the following operations were performed in the order listed: husked seeds were washed three times with prepared sterile water; 5 g of seeds were placed in a clean and sterile 50-mL tube containing 25

DNA extraction
Five gram of surface-sterilized upland rice seeds from each sample was frozen with liquid nitrogen and was quickly ground into a ne powder with a pre-cooled sterile mortar, and then the DNA was extracted using the FastDNA ® SPIN Kit for Soil (MP Biomedicals, Solon, OH, USA) following the manufacturer's instructions of the Kit.

Amplicon library preparation and sequencing
All PCR ampli cations were performed using TransStartFastPfu DNA Polymerase (TransGen, Beijing, China). For rice seeds, 799F (5'-AACAGGATTAGATACCCTG-3') and 1492R (5'-GGTTACCTTGTTACGACTT-3') was used for the rst-round ampli cation (4-7bp barcode was added to the 5' primer of 968F and 1378R). Then the 750 bp fragment ampli ed from endophytic bacteria was cut and used as the template for the second-round ampli cation for the V6-V8 region (968F: 5'-AACGCGAAGAACCTTAC-3' and 1378R: Brie y, paired sequence reads were assembled after removing raw reads with ambiguous bases or low quality, such as read length < 50 bp, average Qscore< 25, or reads not matching the primer (pdiffs = 0) and barcode (bdiffs = 0). The high-quality DNA sequences were aligned to SILVA reference database (V119) (Quast et al. 2013), and using chimera. uchime module to remove chimera sequences. Then the reads were classi ed and grouped into OTUs (operational taxonomic units) under the threshold of 97% identity.

Data statistics
Community richness, evenness and diversity analysis (Shannon, Simpson, ACE, Chao and Good's coverage) were performed using Mothur. Both PCoA and NMDS were analyzed based on the tayc matrix by mothur. The t-test (with 95% con dence intervals) was used to determine whether the means of evaluation indices were statistical difference, and p-value < 0.05 was considered as a signi cant standard. Taxonomy was assigned using the online software RDP classi er (Wang et al. 2007) at default parameter (80% threshold) based on the Ribosomal Database Project (Cole et al. 2009). Genera and family abundance differences between samples were analyzed by Metastats (White et al. 2009). Spearman correlation coe cient between two variables was calculated using the R command"cor.test". RDA analysis based on genus level was performed by "vegan" package in R. Average annual precipitation, average annual temperature and altitude were selected to be variable parameters.

Sequence accession numbers
The raw high-throughput sequencing data were submitted to the NCBI database with Accession number SRR13319808-SRR13319843 and BioProject number PRJNA688367.

Results
Diversity analysis of endophytic bacteria in upland rice seeds According to the information of barcode and front-end primers, the quality control sequences were divided into 36 groups of sequence les, and a total of 2,089,709 high-quality sequences were obtained, with an average of 58,047 sequences per sample (Supplementary Table S1). Because of the large sample size, we use the added value of the repeated samples as the nal calculation. The original diversity data are shown in Supplementary Table S2. According to the difference of distance between the sequences, 16S rRNA genes obtained were clustered into OTUs for species classi cation under the similarity level of 97%. A total of 5,704 OTUs were generated from all samples, with the number of OTUs in each sample ranging from 322 to 1,527 (  19H012, 19H013, 19H014, 19H015, 19H017, 19H018, 19H019, 19H022, 19H024, 19H029 and 19H032, respectively, which indicated that the endophytic bacteria in the seeds of different upland rice varieties were different. The diversity (α diversity) index of samples includes ACE, Chao, Shannon and Simpson values, in which ACE and Chao values are used for sample abundance assessment, and Shannon and Simpson values are used for sample diversity assessment. In general, there are some differences in diversity and abundance of each sample, among which sample 19H011, 19H015, 19H017, 19H018 and 19H024 have signi cant differences in ACE, Chao, Shannon and Simpson values from other samples ( Table 2). The values of ACE, Chao and Shannon of sample 19H017 were signi cantly higher than those of other samples, and the values of sample 19H013 were the lowest, which indicated that the diversity and richness of endophytic bacteria in sample 19H017 were the highest among all samples, and that of sample 19H013 was the lowest. The above results are consistent with the results of Rarefaction curve and Rank abundance curve of all samples in the case of OUT = 0.03 (Supplementary Fig. S1 and Fig. S2).

Bacterial endophytes community compositions and structures of upland rice seeds
The endophytic bacterial community composition of 12 upland rice seed samples from different areas in Yunnan Province of China is shown in Fig. 2 at the phylum level. The endophytic bacterial community structure of 12 samples had low diversity at the phylum level, mainly including Proteobacteria and Actinobacteria. The relative abundance of Proteobacteria was relatively high, and the relative abundance of different samples was between 66.92% and 99.98%, which was the main group of bacteria. And followed by Actinomycetes with abundance ranging from 0.01% to 32.21%. At the genus level, 148 genera were covered by endophytic bacteria in all upland rice seed samples, and the main bacteria with high relative abundance were Pantoea, Pseudomonas, Curtobacterium, Microbacterium, Methylobacterium, Agrobacterium, Sphingomonas, Aurantimonas and Rhodococcus, and the proportion ranges from 9.75-99.24%, 0.11-37.24%, 0.01-19.90%, 0.01-14.95%, 0.40-5.86%, 0.01-4.53%, 0.04-1.56%, 0.01-1.45% and 0.11-1.09%, respectively (Fig. 3). Table 3 listed in detail the dominant genera and proportion of each upland rice seed sample, which indicated that there were signi cant differences in the abundance distribution of endophytic bacteria in different upland rice seed samples. The classi cation of samples at the level of 97% sequence similarity (genus, top10) in Fig. 4 also showed that the abundance distribution of endophytic bacteria in each seed sample was different at the genus level.
To explore the differences in the community structure of endophytic bacteria in the 12 upland rice seed samples, PCoA (Principal Co-ordinates Analysis) and NMDS (Non-metric Multidimensional Scaling) methods were used to draw the two-dimensional distribution diagram of seed samples ( Fig. 5 and Fig. 6). The distance of each sample in the two-dimensional diagram can re ect the degree of community structure similarity, and the closer the distance between sample points is, the more similar the community structure is. The results of PCoA and NMDS showed that the distance between sample 19H019 and 19H029 was close, and the distance between sample 19H012, 19H015, 19H022 and 19H024 was close, which indicated that the endophytic bacterial community structure of sample 19H019 and 19H029 was similar, and the endophytic bacterial community structure of sample 19H012, 19H015, 19H022 and 19H024 was similar.
Analysis of environmental factors affecting the community structure and diversity of endophytic bacteria in upland rice seeds In order to further explore the impact of environmental factors of origin areas on the endophytic bacterial community structure and diversity of the sample seeds, we inquired about the temperature, precipitation and altitude of the corresponding areas in Yunnan Province of China. The speci c data are shown in Table 4and Fig. 1B. Then we showed the relationship between environmental factors and sample distribution and the main dominant bacteria by RDA (Redundancy Analysis), taking the average annual precipitation, average annual temperature and altitude as variables (Fig. 7). The effect of environmental factors on endophytic bacteria in seeds in the RDA diagram is mainly characterized by the length of environmental factors, while the in uence degree of environmental factors on each strain is re ected by the cosine value of the angle. Temperature, precipitation and altitude have great effects on endophytic bacteria in upland rice seeds, among which precipitation and altitude are the main in uencing factors (Fig. 7). There was a signi cant positive correlation between the main dominant bacteria Pantoea and precipitation, temperature and altitude, and the correlation between Pantoea and altitude was the strongest. However, there was a negative correlation between other dominant bacteria and environmental factors. The proportion of other bacteria is low and no further analysis is made.

Discussion
The food issue is a major issue related to the national economy and people's livelihood, and food security is an important part of national security. As we all know, the three major food crops in the world are wheat, rice and corn, among which wheat is the food crop with the largest sown area, the largest yield, and the most widely distributed in the world, while rice ranks second. According to statistics, rice is cultivated in 122 countries in the world, with a perennial cultivation area of 140-150 million hectares, which is widely distributed. However, droughts caused by persistent climate instability and unpredictable rainfall patterns have had a signi cant impact on rice cultivation, especially in sub-Saharan Africa and Southeast Asian countries, which makes upland rice cultivation a breakthrough to solve this problem In addition to Pantoea, the main genus of shared endophytic bacteria, some dominant genera, including Pseudomonas, Curtobacterium, Microbacterium, Methylobacterium, Agrobacterium, Sphingomonas, Aurantimonas and Rhodococcus, are also shared by all 12 upland rice seeds, which may be due to the conservation of plant seeds. Interestingly, this result is basically consistent with our previous studies on endophytic bacteria in upland rice seeds. Interestingly, this result is basically consistent with our previous studies on endophytic bacteria in upland rice seeds, which once again strongly indicates that these bacteria are indeed the core endophytic bacterial communities in upland rice seeds. It is also reported that many species of these bacteria have important contributions in promoting plant growth and improving plant nitrogen xation, resistance to harmful bacteria, cold resistance, drought resistance and tolerance to heavy metals. For example, Kumawat et al. (2019) found that the synergism of Pseudomonas aeruginosa and Bradyrhizobium sp. can improve plant growth, nutrient acquisition and soil health in soybean. Another study found that endophytic bacteria Pseudomonas stutzeri A15 isolated from rice had strong nitrogen xation ability, which was signi cantly better than that of chemical nitrogen fertilizer after inoculation of rice (Pham et al. 2017). The researchers also found that . Therefore, it is meaningful to explore the drought tolerance and mechanism of drought tolerance from the perspective of endophytic bacteria in upland rice seeds. It is worth mentioning that different from other genera of endophytic dominant bacteria, Rhodococcus is rarely reported as a dominant genus in plant endophytes. The ability of Rhodococcus to degrade a variety of toxic chemicals and produce bioactive substances is the most frequently reported, and more and more attention has been paid to it. Reports on the Rhodococcus also show that some of its species not only have the function of producing cytokinin but also can colonize the root system and increase the plant biomass in polychlorinated biphenyl (PCBs) Through this study, we also found that although there were signi cant differences in endophytic bacterial diversity, abundance and community structure in some of the 12 upland rice samples from origin areas of Yunnan Province of China, the difference of some of the samples was very small. The results of PCoA and NMDS showed that all samples could be separated in PC1-PC2 or NMDS1-NMDS2 coordinate system ( Fig. 5 and Fig. 6), and there were differences among different upland rice seeds by comparing the diversity index of different upland rice varieties (Table 2). This further shows that the differences in varieties and genotypes of upland rice do have a certain effect on the diversity and community structure of endophytic bacteria in upland rice seeds. A large number of studies on the community structure and diversity of endophytic bacteria in plant seeds have also found that although there are coexisting endophytic dominant bacterial groups among different varieties of plant seeds, at the same time, they have signi cant differences in endophytic bacterial community structure and diversity ( However, there are no signi cant differences in endophytic bacterial diversity and community structure in some upland rice seeds, and they are particularly close in PCoA and NMDS results (Fig. 5 and Fig. 6). Therefore, the diversity and community structure of endophytic bacteria in upland rice seeds should be affected not only by variety In order to further explore the in uencing factors of endophytic bacterial community structure and diversity in upland rice seed samples, we investigated and compared the environmental factors such as temperature, precipitation and altitude of origin areas in Yunnan Province of China. There were differences in these environmental factors among the origin areas, and through RDA analysis, it was further found that temperature, precipitation and altitude had great effects on endophytic bacteria.
Among them, the altitude at the origin areas had the greatest in uence on the main dominant bacteria Pantoeain upland rice seeds. Thus it can be seen that the community structure and diversity of endophytic bacteria in upland rice seeds are affected not only by upland rice varieties and genotypes, but also by environmental factors such as temperature, precipitation and altitude. We can also conclude that the community structure and composition of endophytic bacteria in upland rice seeds are caused by upland rice varieties, genotypes and environment, rather than by a single factor.
The actual living state of plants in nature is the state of microorganisms and plants, and plant breeding is the cultivation of symbiotes between plants and microorganisms (Wang et al. 2015). Upland rice has the characteristics of drought resistance and drought tolerance, and the symbiotic microorganisms should also have corresponding drought tolerance characteristics to adapt to the local environment. Using highthroughput sequencing technology to explore the community structure and diversity of endophytic bacteria in upland rice seeds from origin areas of Yunnan Province of China is of great signi cance for the subsequent excavation of drought-tolerant bacteria resources and the improvement of local upland rice yield. At the same time, the mechanism of drought tolerance at the microbial level of upland rice can be re ected by comparing the microbial differences between upland rice and rice, and this study lays a foundation for this.

Conclusion
Exploring the endophytic microbial community structure and diversity of upland rice seeds is the basis for understanding the synergistic effect of endophytic bacteria in upland rice and the new functions and new substances produced by the synergy. Pantoea, Pseudomonas, Curtobacterium, Microbacterium, Methylobacterium, Agrobacterium, Sphingomonas, Aurantimonasand Rhodococcusserved as major core endophytic bacteria in twelve upland rice seed samples in this study. Overall, there were some differences in endophytic bacterial community structure and diversity among all seed samples, but the differences in some samples were not signi cant. The differences of endophytic bacterial community structure and diversity in upland rice seeds are not only related to the varieties of the samples themselves, but also affected by local temperature, precipitation, altitude and other environmental factors.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information les.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Compliance with ethical standards
This article does not contain any studies with human participants or animals performed by any of the authors.

Con ict of interest
The authors declare that they have no competing interests.  Tables   Table 1 Statistical table of Table 4 Environmental data statistics of origin areas of upland rice  presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 2
Relative abundance of shared/unshared phyla in each upland rice seed sample. The abscissa represents the sample name, the ordinate represents the relative abundance of species, each color represents one species, and the corresponding rectangular height represents the relative abundance of species. When judging the relative abundance of a species in a sample, you only need to look at the length of the color rectangle and do not need to accumulate the heights of other colors below the rectangle.

Figure 3
Relative abundance of shared/unshared genus in each upland rice seed sample. The abscissa represents the sample name, the ordinate represents the relative abundance of species, each color represents one species, and the corresponding rectangular height represents the relative abundance of species. When judging the relative abundance of a species in a sample, you only need to look at the length of the color rectangle and do not need to accumulate the heights of other colors below the rectangle.

Figure 4
Classi cation of samples at the level of 97% sequence similarity (genus, top10). The horizontal represents the name of the sample, and the vertical represents the name of the endophytic bacteria in the sample. In the gure, the color gradients corresponding to relative abundance from 0 to 1 are white, light blue and dark blue. The darker the blue, the higher the relative abundance of species.  Non-metric multidimensional scaling (NMDS). Each point represents a sample, and the closer the distance between the two points indicates the smaller the difference of community composition between the two samples.