Development of SSR markers and genetic diversity evaluation of Mycocentrospora acerina causing round spot of Panax notoginseng in Yunan province, China

Background: Sanqi round spot, which is caused by Mycocentrospora acerina, is a destructive disease limits the production of Panax notoginseng in Yunnan province of China. However, the disease has not been studied comprehensively. Results: In the current study, we identify M. acerina polymorphic microsatellite markers using CERVUS 3.0 and compare the genetic diversity of its isolates from P. notoginseng round spot using Simple Sequence Repeat (SSR) markers and polyacrylamide gel electrophoresis. Thirty-two SSR markers with good polymorphism were developed using MISA and CERVUS 3.0. The genetic diversity of 187 M. acerina isolates were evaluated using 14 representative SSR primers, and the polymorphic information content values of 14 sites ranged from 0.813 to 0.946, with a total of 264 alleles detected at 14 microsatellite loci. The average expected heterozygosity was 0.8967. Conclusion: 14 SSR primers of M.acerina can be used in diversity analysis and identication of M. acerina and its closely related species. Genetic diversity of M. acerina in Yunnan province does not reect geographic specicity.

in actual research, the sum of the sampling of pathogen populations in a limited time and a certain space is usually analyzed as a population [20]. The genetic structure of plant pathogen population re ects the evolutionary potential and evolutionary history of pathogen [21]. Therefore, the ultimate goal of studying pathogen population genetics is to determine the factors that play a major role in the population evolution of pathogenic fungi, and to grasp the rules of interaction of these evolutionary factors.
In the history of agricultural production practice progress, plant pathogens have important economic and social impacts on humans, and the knowledge of the genetic diversity of plant pathogens will help people understand and control the agricultural ecosystem. The pathogenic fungus population is changing and will adapt to changes in the control methods and its living environment. Eventually, the genetic structure of the pathogen population changes, causing plants to lose resistance [22]. The speed of strain evolution is mainly re ected by the number of genetic variation in the pathogen population. This result will help the judgment of the effective maintenance time of disease prevention measures in agriculture. And pathogen populations with complex genetic structure can often adapt faster to the host's disease resistance or the sensitivity of fungicides. Therefore, the understanding of the genetic structure variation and distribution of phytopathogenic fungi populations will have important guiding signi cance for disease resistance breeding, the rational distribution of disease resistance genes, and the rational use of fungicides in production. The rapid development of molecular biology based on DNA carrying genetic information has made it a reality to accurately detect the genetic variation of plant pathogen populations [23].
Simple sequence repeat (SSR) is the simplest among numerous molecular marker methods used to evaluate levels of diversity in species. With the publication of genomic databases, SSR is convenient in the study of genetic diversity [24,25,26].
Pathogenic fungal populations exhibit variations across different locations as adaptations to changes triggered by control methods and diversity in their living environments [27]. The genetic structure of a pathogenic fungus population can also change and lead to loss of resistance in host plants [28]. Development of molecular markers for M. aceria could offer a more comprehensive genetic basis for M. acerina studies, which would enhance efforts to control Sanqi round spot. Therefore, understanding the variation in genetic structure and distribution of plant pathogenic fungi populations could enhance our understanding of the distribution of genetic informatin and facilitate the formulation of appropriate disease control strategies [29,30].
In this study, we analyzed the SSR characteristics in the M. acerina genome and developed SSR primers from M. acerina. Moreover, the effectiveness of the primers in analyzing population genetic diversity structure in M. acerina was analyzed.

Results
Isolation and identi cation of strains All the isolates were identi ed based on their colonies and conidium ( Figure 1A, B), and symptoms on inoculated plants were similar to those of plants growing in the eld ( Figure 2C, D). Phylogenetic trees, constructed based on internal transcribed spacer (ITS1/4) sequences, showed that the isolates were grouped with M. acerina ( Figure 3).

Genomic SSR analysis
A total of 8250 microsatellite sequences with 1 to 6 base repeats were obtained from the M. acerina genome. The average length of SSRs was 26 bp. SSR lengths of different repeat types were varied.
Among the SSRs, there were 3379 mono-nucleotide repeats, which accounted for 40.96% of the total repeats. Among the mononucleotide repeats, 113 repeats were repeated more than 30 times. In addition, there were 2137 tri-nucleotide SSRs (25.90%) and 179 penta-nucleotide repeats, which accounted for the lowest proportion (2.17%), with hexa-nucleotide repeats accounting for 3.71% of the total repeats. The maximum repetition times of each of the SSRs were 81, 40, 155, 217, 72, and 144 ( Figure 4).
Such SSRs with abundant repeats are bene cial to the development of molecular markers.
Based on the proportions of mono-nucleotide SSRs of M. acerina, T or A bases existed in single nucleotide SSR, and the number of poly T or A bases was 2449, accounting for 72.47% of the bases. There were 930 repeats with C or G bases, which was far less than those with T or A bases. Among the di-nucleotide repeats, there were four types of SSR sequences: AC, AG, AT, and CG. AG/CT repeats accounted for 49.84% of all the di-nucleotide repeats, followed by AC/GT and AT/AT. CG/CG type repeats were the least (9%). Among the tri-nucleotide repeats, AAC/GTT, AAG/CTT, and AAT/ATT were the main types. Overall, AAG repeats were the most abundant, accounting for 22.28%, and CCG content was the least (3.89%) ( Figure 5).
The numbers and proportions of tetra-, penta-, and hexa-nucleotide repeats in the M. acerina genome are listed in supplementary table 1, 2, and 3. There were 28 types of tetra-nucleotide repeats, among which ATCC had the highest content (12.48%), and CCCG had the lowest content (0.16%). In addition, there were 61 penta-nucleotide repeats, AACAC and AATCC accounting for the largest proportions. Hexa-nucleotide repeats were the most abundant (103), AACCCT was the most common, with 44 motifs.

Polymorphism of SSR Primers
Thirty-two pairs of primers had highly polymorphic loci ([PIC] > 0.5) and could be used as SSR markers in the construction of an M. acerina genetic map and for genetic diversity analyses. PIC is commonly used to assess the degree of gene variation. A locus can be considered a highly polymorphic marker when its PIC value exceeds 0.5. According to the SSR primer data (Table   2), the average PIC was 0.6492, the average allele number per locus was 5.147, the average proportion of locus types was 1.00, and the average expected heterozygosis (He) was 0.7212.

Diversity between populations
Nei's genetic diversity (0.0896) and Shannon's information index (0.1712) were the highest in the Honghe population (HH), followed by in the Puer population (LC); and the lowest in Lijiang, at 0.0842 and 0.143, respectively ( Table 3). The genetic diversity in different populations was relatively low.
The average observed allele number (Na) was 2.00; the average effective allele number was 1.11; the Nei' gene diversity (h) was 0.0908; and the average Shannon diversity index was 0.1761. In addition, the total genetic diversity (Ht) was 0.0909; the intrapopulation genetic diversity (Hs) was 0.0884, and the genetic differentiation index (Gst) was 0.0277, which indicates that there was very low genetic variation of 2.77% variation among populations. The estimated level of gene ow (Nm) was 17.5757, which indicates that there were numerous changes to genes attributed to gene ow in different regions, and that gene ow was not the primary factor in uencing genetic diversity in the population.
Among the six populations, the populations from Kunming and Honghe had the greatest genetic similarity (0.9988) and the least genetic distance (0.0012). In addition, Lijiang and Lancang had the least genetic similarity (0.9931) and the largest genetic distance (0.007) ( Table 4). Generally, the genetic distances between the populations above were small, and their genetic similarity coe cients were close to 1, which indicated that the genetic relationships between strains in each population were close, and there was low genetic differentiation among different populations.

Cluster analysis
Cluster analysis revealed that the maximum similarity coe cient was 0.97 and the minimum was 0.83 among 187 M. acerina strains collected from 12 counties of 6 prefectures (cities) in Yunnan Province ( Figure 6).

Discussion
The biology of M. acerina and round spot of P. notoginseng in Yunnan In the 20th century, there were many studies on M. acerina, mainly focusing on the host diversity and transmission methods [13,17,31]. In the 21st century, few studies on M. acerina. Sébastien Louarn studied the in uence of M. acerina on the polyacetylenes and 6-methoxymellein in organic and conventionally cultivated carrots (Daucus carota) during storage [32].
Since it was discovered in 1997 that M. acerina can infect the important economic crop of Chinese medicinal material P. notoginseng in Yunnan Province, our laboratory (Key Laboratory of Agricultural Biodiversity and Pest Control of the Ministry of Education) has examined the biological characteristics of M. acerina and a lot of research has been done on the trait of spread in the eld, and it is found that M. acerina is a kind of low temperature-loving fungus. When the temperature exceeds 32℃, its conidia will lyse, and the optimum growth temperature is 14-22℃. The latest measured length of M. acerina is (137.36 ~ 486.24µm) × (4.35 ~ 16.46µm) (n = 100), and a single conidia can cause infection (Supplementary Figs. 1 and 2). M. acerina cause initial infection through chlamydospores stored in the soil, and spread in the eld through conidia on the leaf surface of infected leaves, causing re-infection. Conidia are mainly spread by rain splash. P. notoginseng has a serious problem of continuous cropping, it will take at least 10 years to replant. Therefore, it was only planted in Wenshan in the 1990s, and it has now spread to Kunming, Honghe, Lijiang and Jianshui. No matter where the P. notoginseng plants, P. notoginseng round spot disease will follow. It is not known whether it is because the M. acerina originally existed in the local area or the pathogen spreads with various media. The results of this study indicate that the genetic distances between the M. acerina populations in different regions are relatively close, and the similarity is high, which may indicate that there are frequent exchange activities between M. acerina in different regions, such as seedlings. Cross-regional transportation and other media dissemination. In addition, the prevention and treatment of P. notoginseng round spot is mainly concentrated in the rainy season (June to September). Using the technology of facility cultivation to build a rain-proof lm in the rainy season can prevent P. notoginseng round spot and reduce the use of chemical pesticides( Supplementary Fig. 3). The reduction in the amount of chemical pesticides can reduce the survival pressure of M. acerina, which can also affect the genetic relationship between populations in different regions.

Features of SSR loci in M. acerina
A total of 8250 repeats were obtained from the screened SSR sequences, which indicated that the number of SSRs was high in the M. acerina genome compared to in some eukaryotes [33,34]. The analysis of microsatellite sequences in M. acerina could enhance our understanding of its genome structure, especially the composition of non-coding regions, and the mechanisms of pathogenicity and its heredity in M. acerina at the genome level. Among all SSR types, A and T are abundant, which is consistent with the SSR loci results in most eukaryotic genomes, probably due to the transformation of methylated C residues into T residues [35]. According to Velascor [36], the presence of a large number of short repeat sequences indicates that a species has a high mutation frequency, while species with high proportions of long repeat motifs generally have relatively short evolutionary times or low mutation frequencies [36]. A large number of short repeats of single, dibasic, and tribasic bases were observed in the genome of M. acerina, suggesting that M. acerina had a relatively high mutation frequency or a relatively short evolutionary time [37].
With advancements in genome sequencing technologies, molecular marker studies have become more cost-effective [38].
Based on genomic data, we obtained 8250 SSRs, which accounted for 0.55% of the whole genome sequences. In the Fusarium graminearum genome, SSR sequences obtained accounted for 0.27% of the whole-genome sequences [39] and 0.21% in the Sphacelotheca reilianm genome [40]. In this study, more than 100 pairs of primers were designed, out of which merely 32 pairs were polymorphic, probably because most of the selected primers existed in the coding regions of the genome, with only a few located in the non-coding regions. Studies have demonstrated that SSRs in coding regions often exhibit low polymorphism, and SSR markers should be designed as much as possible within non-coding regions, because coding regions have much greater selection pressure than non-coding regions and are relatively conserved in the course of species evolution, while non-coding regions are more likely to evolve or mutate [41,42].

Genetic diversity of M. acerina in Yunnan
In the current study, the PIC of polymorphic loci ranged from 0.53 to 0.8, which was high when compared to the PIC in other eukaryotes. For example, PIC ranged from 0.3 to 0.4 in Dactylis glomerata L. [43], 0 to 0.756 in Magnaporthe oryzae [44], and 0.305 to 0.726 in Panonychus citri [45]. According to the results based on primer polymorphism, 14 SSR loci were used to analyze the genetic diversity of M. acerina populations. the PIC of screened primers was higher; however, after population analysis, the genrtic diversity of M. acerina does not re ect geographic speci city. The potential reason is that the SSR primer loci are within the coding regions of the genome, which have high degrees of conservation [46]. Judicious selection of primers could improve the accuracy of results. The genetic diversities of Pyricularia oryzae Cav. and Puccina striiformis f. sp. tritici in Yunnan Province have been reported to be high [47,48]. Therefore, the genetic diversity obtained for M. acerina in the study could be due to the single genetic background of M. acerina as a quarantine pest in China [49] or its stable survival in areas with highly homogenous ecological environments for prolonged periods.
The automatic nucleic acid protein analyzer used in study is a novel instrument that can be applied in population genetics analyses. Compared with polyacrylamide gel electrophoresis, its operation is relatively simple and time saving, and it can simulate the electrophoretogram and directly read bands, which facilitates analysis procedures. In addition, it can be used directly for DNA and protein sample analysis [50,51].
Some of the primary factors in uencing the evolution of the population genetic structures of pathogenic fungi include population size, reproductive mode, and genetic drift [52,53]. In this study, 189 M. acerina strains were selected, and the population number was moderate. The mode of reproduction of M. acerina in the eld is asexual reproduction [54], which, to a certain extent, is not conducive to its genetic variation and the evolution of its populations. Pathogenic fungi are small individuals and easily experience genetic drift by natural or arti cial means, and gene drift is generally considered to hinder the evolution of organisms [55].
In the current study, there were no signi cant correlations in genetic diversity among strains from different geographical sources. Continuous selection and mutation of pathogenic genes will lead to homozygous individual genes, thus reducing the genetic diversity level of M. acerina population. These factors can partly explain that the isolates from the same region cannot be totally clustered into a group, and some isolates from different regions have very high genetic similarity coe cient. Another possible explanation is that the host P. notoginseng is merely grown in Yunnan and Guangxi, China, and Wenshan in Yunnan is the place of origin. The rest of the sampling points in the paper have been gradually planting P. notoginseng in the past 5-6 years. Current research show that due to the low genetic diversity of maple populations, we can effectively prevent the occurrence of diseases in this area through timely cleaning of diseased leaves, rain-proof cultivation, and alternate use of chemicals [56].

Conclusions
In this study, we developed 14 SSR primers of M.acerina can be used in diversity analysis and identi cation of M. acerina and its closely related species. Genetic diversity of M. acerina in Yunnan province does not re ect geographic speci city.

Methods
Strain isolation and observation M. acerina strains were collected from six major P. notoginseng production regions, including Honghe, Wenshan, Qujing, Kunming, Lijiang, and Puer in Yunnan Province ( Table 1). The geographical distribution of the samples shown in Figure 8. M. acerina were obtained using tissue isolation [57]. The junction between healthy and diseased tissue was washed using sterilized water and then immersed in alcohol (75%) for 2-3 min and washed again using sterilized water and dried on sterilized lter paper. Afterward, the samples were transferred into PDA medium and cultivated at 20°C for 4d. Then, we veri ed the isolates based on Koch postulates and pured by hyphal-tipped. M. acerina was identi ed based morphological characteristic and ITS sequence. was performed using a T1 thermocycler (Biometra, Germany), with initial denaturation at 94°C for 5 min, followed by 35 cycles of 94°C for 45 sec, 60°C for 45 sec, and 72°C for 90 sec, and a nal extension at 72°C for 10 min. Ampli cation products were separated by electrophoresis on 1% agarose gels in a 0.5× TAE buffer, using a 2000-bp DNA ladder as a DNA molecular weight marker. The PCR products were sequenced at Kunming Shuoqing Biological Engineering Technology Co. Ltd. Molecular Evolutionary Genetics Analysis (MEGA 5.1) was used to construct a phylogenetic tree based on the neighbor-joining method.
Simple sequence repeat screening SSRs were screened using MISA (http://pgrc.ipk-gatersleben.de/misa/) based on the whole genome data of M. acerina [59,60]. MISA is a script written in Perl language, which can identify SSRs from genome FASTA les [61], MISA is simple to run without networking, and does not require high hardware. It has become the tool of choice for most SSR researchers. In the SSR parameter settings, we de ned that six, ve, four, three, two, and one base were repeated ve, ve, ve, ve, six, and ten times.
The genome (Accession: PRJNA809504) used in this study was sequenced using Hiseq 2500 from Illumina. After sequencing, two sets of 101bp long and short paired-end short sequence data were generated. M.acerina produced a total of 20.502 Gb original sequence, after ltering quality control, remove the linker sequence, low-quality base sequence and low sequence complexity sequence. The K-mer parameter is 75, and the nal assembly result is obtained after automatic assembly and gap lling using SOAP denovo software (BGI, Shenzhen, China). The assembled size of M. acerina is 39Mb in total. The N50 index is an evaluation index for the continuity of genome assembly, this value is calculated by sorting the contig sequence length from largest to smallest. The larger the N50 value, the better the continuity of the contig generated by the assembly. In this study, M. acerina genome assembly contig N50 is 151kb, scaffold N50 is 567kb, the assembly quality is credible.
Primer design SSR primers for the whole genome of M. acerina were designed in the PRIMER 3.0 (http://Frodo.wi.mit.edu/primer3) website [62] based on the screened SSR results. The primers were synthesized by Shuoqing Biological Engineering Technology Co. Ltd.
For the initial screening, 24 isolates from different sources, 118 SSR primers were designed and ampli ed with 20-μL PCR mixtures. All the ampli cation products were separated on 2.5% agarose gels in 0.5× TAE buffer, using a 100-bp DNA ladder as a DNA molecular marker (Raju, Sheshumadhav and Murthy, 2008). The primers with clear, speci c, and target bands (100~500 bp) could be tested by automatic electrophoresis apparatus (Qsep 100TM ). Finally, the polymorphism of primers was assessed using CERVUS 3.    Figure 1 The geographical locations of six Mycocentrospora acerina populations in Yunnan(green region).