Comparative transcriptomes and genetic diversity assessment of two Sichuan peppers (Zanthoxylum bungeanum and Zanthoxylum armatum)

Background: Species of Zanthoxylum (Sichuan pepper) are diverse and valuable economic trees that have been cultivated across Asia for of years for their medicinal properties. The current lack of transcriptome and EST-SSR data for these species represents a constraint with regards to identifying increasingly more diverse cultivars in China. This study was conducted to compare the transcriptome profiles of two Zanthoxylum species (Z. bungeanum and Z. armatum) and develop the EST-SSR markers to assess the genetic diversity in Sichuan pepper. Results: A total of 36.76 G high-quality clean data were screened for subsequent analysis. For Z. bungeanum and Z. armatum, 64,944 and 75,669 unigenes were obtained, respectively. After comparing different databases, we found that the highest number of unigenes (43,3198 and 49,638 for Z. bungeanum and Z. armatum, respectively) were annotated using the NR database. In addition, six pairs of EST-SSR markers were selected to determine the genetic diversity of 125 samples of three Zanthoxylum species. A population clustering dendrogram indicated that there was an average of four samples per population, with the number of allelic genes (Na) ranging from 1 to 1.4510 (average of 1.1462). The number of effective allelic genes (Ne) ranged from 1 to 1.2744 (average of 1.0971), whereas genetic diversity (Nei) ranged from 0 to 0.1486 (average of 0.05512) and Shannon genetic diversity (I) ranged from 0 to 0.2177 (average 0.08123). In individual clustering dendrograms, the 125 samples were divided into five cohorts. Cohort I contained Z. bungeanum and its varieties, cohort II included Z. armatum and its varieties, cohort III contained mostly Z. piperitum and two other Z. bungeanum samples, and cohorts IV and V each contained only two samples, with genetic distances of 0 to 1.14 between them. Conclusion: In this study, we analyzed the comparative transcriptome data of two Zanthoxylum species, for which we assessed genetic diversity using EST-SSR billion reads of high quality were maintained, and 36.76 G high-quality clean data were used for subsequent analyses. We used six pairs of EST-SSR primers to determine the genetic diversity of 125 samples. Our results indicated that these EST-SSR primers were able to identify Z. bungeanum , Z. armatum, and Z. piperitum . According to the population dendrogram constructed in this study, there were an average of four samples per population. The number of allelic genes (Na ) ranged from 1 to 1.4510 (average 1.1462), the number

27.03% of biological process and 17.48% among all the annotated isogenes. The category "cell" accounted for the highest number of isogenes (7,949) annotated as cellular component, accounting for 22.4% of cellular components and 8.91% of all annotated isogenes. Within the category molecular function, "binding" accounted for most (12,873) of the annotated isogenes and accounted for 46.03% of isogenes assigned to molecular function and 14.43% of all annotated isogenes.
For Z. armatum, a total of 137,567 isogenes were annotated with respect to GO terms (Additional file 4: Fig. S2). In detail, 66,368, 31,517, and 17,540 isogenes were annotated as biological process, cellular component, and molecular function, respectively. With regards to biological process, the highest number of isogenes were assigned "metabolic process," accounting for 26.43% and 12.75% of isogenes annotated as biological process and all annotated isogenes, respectively. For cellular component, the highest number of annotated isogenes (8,840) were assigned to "cell," accounting for 22.28% and 6.43% of isogenes annotated at cellular component and total annotated isogenes, respectively. In terms of molecular process, 14,252 isogenes were assigned to "binding," accounting for 45.22% of isogenes annotated as molecular process, and 10.36% of all annotated isogenes.

KEGG pathways
KEGG pathways consist of cellular processes (cohort A), environmental information processing (cohort B), genetic information processing (cohort C), metabolism (cohort D), and organismal systems (cohort E). As shown in Fig. 2a. 25,520 isogenes were annotated to one of these five pathways. Of those isogenes annotated, the highest number (12,085) were assigned to cohort A and accounted for 47.36% all isogenes. A further 4,879 isogenes were mapped to cohort B (19.12%), whereas 3,908 (15.31%), 2,242 (8.79%), and 2,406 (9.43%) isogenes were assigned to cohorts E, C, and D, respectively. Among those isogenes annotated to cohort A, the largest number (4,374,36.19%) were annotated as "global and overview maps," whereas for cohort B, the largest number of isogenes (2,345,48.06%) were annotated as "translation." In cohort E, "endocrine system" accounted for the highest number of annotated isogenes (826, 21.14%), whereas for cohort C, the highest number of isogenes (1,518,67.71%) were annotated as "signal transduction." In cohort D, the highest number of isogenes (942, 39.15%) were annotated as "transport and catabolism." Figure 2b shows the results of KEGG pathway analysis for Z. armatum. In total, 27,917 isogenes were annotated to the aforementioned five pathways. Of these, the highest number (13,964,50.02%) were annotated to cohort A, whereas 4,925 isogenes (17.64%) were annotated to cohort B. Cohort E ranked 3 rd , with 3,953 (14.16%) of the annotated isogenes, whereas, 611 (9.35%) and 2,464 (8.33%) isogenes were annotated to cohorts D and C, respectively. In detail, 5,137 isogenes were classified in "global and overview maps" which accounted for 36.79% of the isogenes annotated to cohort A. For cohort B, the highest number of isogenes (2,381,48.35%) were assigned to "translation," whereas in cohort E, "endocrine system" accounted for the highest number (855, 21.63%) of annotated isogenes.
For cohort C, "signal transduction" accounted for the highest number (1,635,66.36%) of annotated isogenes, whereas the highest number of isogenes in cohort D (1,078,41.29%) were annotated as "cell growth and death" These findings indicate that Z. bungeanum and Z. armatum differ only in respect to the highest number of isogenes annotated to pathways in cohort D.

Unigene annotation in different databases
A comparison of 64,944 unigenes using different databases (NR, COG, KEGG, and GO) indicated that, for Z. bungeanum (Fig. 2c), the highest number of unigenes (43,318, 66.70% of the total) were annotated using the NR database, whereas the COG databases annotated the fewest unigenes (4,153, 45.83%). Among the total 64,944 unigenes, 2,143 were assigned by all four databases, and 15,700 unigenes were annotated by three databases, amongst which the NR, KEGG, and GO databases accounted for the highest number of unigene annotations (14,201). Furthermore, 14,901 unigenes were annotated by two databases. In detail, 11,401 unigenes were annotated by databases NR and GO, which accounted for the highest percentage. Some unigenes were only annotated by a single database, with the NR and COG databases annotating the highest (43,318) and lowest (4,153) numbers of these unigenes, respectively. For Z. armatum (Fig. 2d), 75,669 unigenes were assembled.
According to the blast results obtained using the aforementioned four databases, the highest number of unigenes (49,368, 65.60%) were annotated using the NR database, whereas the fewest unigenes (6,130,8.10%) were annotated using the COG database. Furthermore, 2,658 and 17,193 unigenes were annotated by all four and three databases, respectively. Among these, the largest number of 6 unigenes (15,035) were annotated collectively by NR, KEGG, and GO. We also found that 17,905 unigenes were annotated simultaneously by two databases, NR and GO accounting for the highest number (13,249) of annotated unigenes. Among the unigenes that were only annotated by a single database, the NR and COG databases annotated the highest (12,734) and lowest (6,130)  For Z. armatum (Table 1), we assembled 75,669 unigenes, with a total length of 58,975,053. We detected 15,096 SSR loci within 12,612 unigenes. The percentage incidence and occurrence of these SSRs was 16.67% and 19.95%, respectively. The mean distance between two SSR loci was 3.91 kb, and we found that 2,101 unigenes harbored more than two SSR loci. Among all SSR repeats, the highest number of loci (7,830) were mononucleotides, followed in numerical abundance by trinucleotides (3,711), dinucleotides (3,169), tetranucleotides (223), hexanucleotides (112), and pentanucleotides (51). In summary, the number of mononucleotide SSR loci in Z. armatum (7,830)exceeded that of all other repeat types combined (7,266) and also the number of mononucleotide loci in Z. bungeanum. In addition, the numbers for each type of SSR repeat and SSR 7 loci in Z. armatum exceeded those in Z. bungeanum.
Statistical analyses of numbers and distribution indicated that the percentage incidence of SSR loci in Z. bungeanum was 19.63% and the average distribution distance was 4.24 kb. Among these loci, mononucleotides had the highest incidence (9.69%), followed by trinucleotides (4.76%) and dinucleotides (4.61%). In terms of the average distance between loci, the distribution distance between mononucleotides was the shortest (8.59 kb), followed by trinucleotides (17.49 kb) and dinucleotides (18.07 kb). For Z. armatum, the percentage incidence of SSR was 19.95% and the mean distance between loci was 3.91 kb. Mononucleotides were found to have the highest percentage incidence (10.35%), followed by trinucleotides (4.90%) and dinucleotides (4.19%). In terms of the average distance between loci, we found that the shortest mean distance occurred between mononucleotides (7.53 kb), whereas those between trinucleotide and dinucleotides were 15.90 kb and 18.61 kb, respectively. These findings accordingly indicate that the density of SSRs in Z. armatum is greater than that in Z. bungeanum, although both these species show similar SSR distributions and frequencies. Furthermore, we also compared the SSR distribution distances in the two Zanthoxylum species with those in 15 species of woody and herbaceous plants (Additional file 5: Fig. S2). We accordingly found that the lowest average distance between SSRs (1.61 kb) occurred in Camellia sinensis, followed by the 1.68 kb in Rosa roxburghii and the 2.59 kb in Hevea brasiliensis. For Z. armatum and Z. bungeanum, the average distribution distances of SSRs were 3.91 kb and 4.24 kb, respectively, which places these two species at 4 th and 5 th among the 17 analyzed species.
Consequently, the frequencies of SSR in Z. armatum and Z. bungeanum were not low; moreover, the SSRs in these two species tend to be abundant, indicating the potential for further discoveries.

Primer polymorphisms
In our previous study, we randomly selected 120 primer pairs for screening, among which, six pairs were used in the present study to analyze 125 samples of Zanthoxylum species based on capillary electrophoresis technology. Four of these primer pairs were derived from Z. bungeanum, whereas the other two werefrom Z. armatum. Using these primers pairs, we succeeded in amplifying 51 allelic loci with the mean number of 8.5 polymorphic loci. The average number of amplified allelic genes (Na) 8 was 2, the average number of effective alleles (Ne) was 1.3093, the averagefor Nei's genetic diversity was 0.1916, and the Shannon information index (I) was 0.3037. On the basis of these values, particularly those for Na and Ne, there initially appeared to be a low genetic diversity among the 125 samples.  On the basis of the population dendrogram shown in Fig. 4, the following information and results can be obtained. Firstly, we successfully clustered the Zanthoxylum samples and found that the genetic distances ranged from 0.0050 to 0.4122, with an average distance of 0.1878. Secondly, EST-SSR primers were able to distinguish and identify each of the 125 Zanthoxylum samples. Thirdly, the populations were divided into four cohorts. Cohort I almost exclusively comprised Z. bungeanum cultivars, whereas cohort II contained Z. armatum varieties, cohort III included 'JLHJ-ZB' and 'PTSJ-ZP' (Z. piperitum), and cohort IV contained only a single sample 'MXDHP-ZB' (Z. bungeanum). Not only were Z. bungeanum varieties readily detected, but it was also possible to determine that the samples originated from Longnan City, Gansu Province. Therefore, we suspect that genetic distances relate to 9 this area. However, the members of cohort II were all Z. armatum varieties, which, although originating from a range of different localities,show very close genetic distances, and accordingly, do not display regionalism. The results obtained for cohort III and cohort IV were notable in that cohort III contained 'PTSJ-ZP,' which is a variety of Z. piperitum and would have been expected to have clustered with 'CCSJ-ZP' and 'LJSJ-ZP'. We did, nevertheless, succeed in classifying a number of other varieties, including 'MXDHP-ZB,' 'HYDHP-ZB,' 'HCDHP-ZB,' 'Yuexi Gongjiao,''TJ-ZA','JYQHJ-ZA,' and 'JYQHJ-ZA'." In the present study, however, the EST-SSR markers could not unambiguously authenticate Zanthoxylum varieties, thereby indicating that this technique is probably more applicable for the identification of species rather than varieties. Thirdly, the results indicated that clusters do not closely reflect regional diversity. Finally, the data obtained tend to indicate that Z. bungeanum and Z. armatum have low intraspecific variation and genetic differentiation. Accordingly, it would appear that the Zanthoxylum species examined in the present study do not show high genetic diversity. piperitum andtheoreticallyshould be cluster together, although we observed that only 'CCSJ-ZP' and 'LJSJ-ZP' clustered. The genetic distance between 'CCSJ-ZP' and 'PTSJ-ZP' was 0.2134, whereas that between 'CCSJ-ZP' and 'LJSJ-ZP' was 0.2439, and that between 'PTSJ-ZP' and 'LJSJ-ZP' was 0.2429. All of these values exceeded the average genetic distance of 0.1884, thereby indicating that the genetic distances among these cultivars were relatively larger. Moreover, based on molecular marker analysis, 'Putao Shanjiao' probably differs from 'Chaochang Shanjiao' and 'Liujin Shanjiao,' which should be investigated further.
Sichuan pepper is generally characterized by producing only female flowers and thus reproduction occurs almost entirely through apomixis. In contrast, however, sample 'X-ZB' produces only male flowers, whereas three 'HCDHP-ZB' samples have been exposed to irradiation in space. According to dendrogram constructed in the present study, these samples were assigned into cohort I but did not show any consistent patterns. Although a wild population of Z. armatum (ZA-WP) and the cultivar 'TJ-ZA' clustered together in cohort II, there was a large genetic distance between the wild and cultivated populations, indicating their dissimilarity. Finally, 'YXGJ-ZB' which clustered in cohort I, appeared to be relatively distinct from other Z. bungeanum varieties. To date, there have been no reports on the differentiation between 'YXGJ-ZB' and other Z. bungeanum varieties, and therefore it is conceivable that there are indeed differences between them which could be confirmed on the basis of morphology evidence.

Individual genetic diversity assessment of whole samples
Using the UPGMA method to cluster the 125 Zanthoxylum samples (Fig. 5), we found that the genetic distances among these 125 samples ranged from 0 to 2.3811, with an average of 0.5915. The samples could be clustered into five cohorts. Cohort I contained Z. bungeanum and its varieties. the EST-SSR primers used in this study were able to identify different Zanthoxylum species, they were unable to clearly distinguish varieties clearly. Secondly, Z. armatum 'Tengjiao' in cohort II had small genetic distances and, given that these samples were collected from different areas, this could indicatethat the genes of Z. armatum 'Tengjiao' are stable. Thirdly, samples 123-125 ('Hancheng Dahongpao', which has been exposed to irradiation in space) had small genetic distances and showed no obvious differences compared with unirradiated 'Hancheng Dahongpao', indicating that at the gene level, the samples that had been exposed to irradiation could not be distinguished based on molecular marker analysis using EST-SSRs. Moreover, cohort IV contained only two samples, namely samples 10 ('Yuexi Gongjiao') and 115 ('Putao Shanjiao'). Theoretically, 'Yuexi Gongjiao' should be classified in cohort I and 'Putao Shanjiao' should be classified in cohort III because they differ in term of morphological characteristics. Although we found that 'Jinyang Qinghuajiao' exhibited obvious differences from Z. armatum, they are considered very similar to each other in practical cultivation.
Finally, we found that wild Z. armatum samples from Ya'an City and Mian Yang City in Sichuan Province have small genetic distances, which indicates that they are very similar, whereas genetic distances between these samples and Z. armatum 'Tengjiao' were relatively larger, indicating that cultivated varieties Z. armatum show certain genetic differencescompare with wild Z. armatum. Although species of Zanthoxylum are economically important trees in China [ 15], there have been relatively few studies on the transcriptomes and genetic diversity of these species [16]. Moreover, to date, there have been limited whole-genome sequencing studies, and as such, reference genes are lacking. Hence, the development of analytic methods based on functional gene, SSR, and SNP techniques would be highly desirable, both from the perspective of advancing scientific knowledge and from the potential practical implications. Given the current paucity of relevant data, in the present study, we used the Illumina-Hiseq sequencing platform to sequence the genomes of Z.
bungeanum and Z. armatum, and accordingly obtained a large quantity of precise data that provided an excellent basis for the screening using SSR markers and detection of genetic diversity. Following quality control of the raw data, we obtained in excess of 0.245 billion high-quality sequences. In total, we generated 36.76 G of high-quality clean data, with accuracy Q20 and Q30 accuracies exceeding 97% and 93%, respectively. Additionally, regardless of the source of raw data or filterable data, data sizes per sequencing sample were greater than 30 M. Overall, we found that transcriptome results for each sample were reliable and very good, with all samples providing valuable data for subsequent screening using SSR markers and genetic diversity analyses. On the basis of analysis using GO, KEGG, and other databases, we annotated the unigenes of Z. bungeanum and Z. armatum according to different functions and metabolic pathways. For Z. bungeanum, 89,198 unigenes were mapped using the GO database, with percentages relative to the total number of unigenes ranging from 8.91% to 17.48%. Specifically, 57,683 unigenes were assigned to biological process with the highest percentage of 17.48%. In Z. armatum, 13,7567 unigenes were annotated using the GO database, with percentages relative to the total number of unigenes of between 6.43% and 12.75%. Among these, 66,368 were mapped to biological process with the highest percentage of 12.75%. On the basis of KEGG database analyses, a total of 25,520 unigenes were assigned to different pathways, with percentages ranging from 9.43% to 47.36%. Among these, 12,520 unigenes were annotated to cohort 13 A (cellular processes) with percentages reaching 47.36%. For Z. armatum, 27,917 unigenes were assigned to different KEGG pathways with percentages ranging from 8.83% to 50.02%. Among these, the highest number (13,964) of unigenes were mapped to cohort A. Annotation using the GO database is used to indicate the functions of gene [17], whereas KEGG-based analysis can be used to assess gene product functions and associated metabolic pathways [18]. In the present study, we found the numbers of annotated unigenes and annotation percentage were similar for Z. bungeanum and Z. armatum, in which these data proved to have consanguinity and similarity.

EST-SSR validation
One advantage of using the RNA-seq technique to obtain EST-SSR primers is that these markers possess better polymorphism and specificity than do common primers. For classification and identity, it is preferable to utilize EST-SSR due to their cDNA origin and conserved nature [19], which are Zanthoxylum. In the present study, we performed transcriptome analyses for two of the main commercially cultivated Zanthoxylum species in China, namely, Z. bungeanum and Z. armatum. On the basis of pre-screening for EST-SSR markers, we selected four and two EST-SSR pairs for detection of genetic diversity in Z. bungeanum and Z. armatum, respectively. We accordingly found that all of six pairs of SSRs exhibited excellent polymorphism in at least three species of Zanthoxylum, which is crucial with respect further characterization of Zanthoxylum.

Genetic diversity assessment
At the species level, EST-SSRs have been shown to be accurate and produce clear results with respect to the identity and genetic diversity of crop plants such as Triticum [22], Hordeum vulgare [23], and Citrus [24]. However, at the cultivar level, this technique often fails to yield clear results, as exemplified by the study on Zanthoxylum by Zhao et al. [25]. Nevertheless, the findings of the present study indicate that at the level of both species and variety, there appears to be relatively low genetic diversity among the three species Z. bungeanum, Z. armatum, and Z. piperitum. Moreover, we were unable to detect any substantial regional variation, which can probably be attributed to the parthenocarpic mode of reproduction, which is a notable characteristic of these species and has been widely investigated in other species [26]. Apomixis is considered a primary factor contributing to this category of result, and because reproduction in most Zanthoxylum species occurs via apomixis, this is probably a key factor associated gene stability and low genetic diversity. According to characteristics of genetic diversity and market requirements of Zanthoxylum in China, the range of requirements has increased, as has the scales of cultivation. As a consequence of differences in climatic conditions in China, it has been found thatdifferences in Z. bungeanum cultivated in north and Z. armatum cultivated in south is becoming increasing more obvious [4]. Hence, as Z. bungeanum, Z. armatum, and their cultivars are the main commercially grown Sichuan pepper in China, considerable efforts are being made to maintain the abundant diversity of Zanthoxylum with respect to characteristic such as aroma, stingless and pungent.
In the population dendrogram constructed in this study, we found that 'HYDHP-ZB' showed the lowest average genetic distance with other populations (0.1122), whereas in contrast, 'MXDHP-ZB' has the highest average genetic distances with other populations (0.2664). Zanthoxylum is cultivated in numerous areas of China, and the main and traditional production areas where there has been a long history of Zanthoxylum cultivation are assumed to be sites of cultivation origin. As reported in the literature, the utilization of Zanthoxylum in China can be traced back to the 11 th century B.C., at which time these trees were planted in Sichuan Province (27). Thereafter, species of Zanthoxylum were introduced into other areas. Therefore, the genome of these species has retained similarity with those in the north, as the species has spread to other regions and different cultivars have been bred.
Consequently, this may provide a suitable explanation for the minimum mean genetic distances of 'HYDHP-ZB' with other Zanthoxylum samples. 'PTSJ-ZP,' which is a cultivar of Z. piperitum, not only clustered with 'CCSJ-ZP' and 'LJSJ-ZP' in the population dendrogram but also in the individual dendrogram. Although many Zanthoxylum species reproduce by vegetative propagation, cross-pollination has been observed in Z. piperitum. Therefore, although 'Putao Sanjiao,' like 'CCSJ-ZP' and 'LJSJ-ZP,' is a cultivar of Z. piperitum, it may show a certain degree of genetic variation from these other cultivars. Furthermore, given that only a single sample of 'PTSJ-ZP' was collected in the present study, whereas we collected three samples each of 'CCSJ-ZP' and 'LJSJ-ZP', further studies will be necessary to confirm the findings in this regard, Although both 'HCDHP-ZB' and 'X-ZB' (which produces only male flowers) were exposed to irradiation in space, they did not show any abnormal clustering compared with the other Z. bungeanum samples. Consequently, it would seem that neither extraterrestrial radiation nor the production of male flowers has a significant impact on the genetic structure of these species, or simply that we were unable to detect differences based on SSR analysis.
With respect differences between wild and cultivated populations of Zanthoxylum, we found that wild

Conclusions
In this study, we obtained transcriptome data for Z. bungeanum and Z. armatum. More than 300 million reads per sample, a total of 0.258 billion reads, in excess of 4 G clean bases and base pair reads of up to 38.76 G were obtained regardless of whether Z. bungeanum or Z. armatum was used as a source of raw data. After quality control, 0.245 billion reads of high quality were maintained, and 36.76 G high-quality clean data were used for subsequent analyses. We used six pairs of EST-SSR primers to determine the genetic diversity of 125 samples. Our results indicated that these EST-SSR primers were able to identify Z. bungeanum, Z. armatum, and Z. piperitum. According to the population dendrogram constructed in this study, there were an average of four samples per population. The number of allelic genes (Na) ranged from 1 to 1.4510 (average 1.1462), the number of effective allelic genes (Ne) ranged from 1 to 1.2744 (average 1.0971), genetic diversity (Nei) ranged from 0 to 0.1486 (average 0.05512), and Shannon's genetic diversity (I) ranged from 0 to 0.2177 (average 0.08123). Generally, the 125 samples were clustered successfully, and overall we found that the Zanthoxylum samples examined in this study show a relatively low level of genetic diversity. In line with expectations, different populations and individuals were divided according to species, although they did not appear to show any regional variation. Notably, however, the EST-SSR primers used in this study were not effective in distinguishing between the various cultivars of   Table S1). In addition, 6 pairs of EST-SSR primer which have been screened by our research [11] (Additional file 2: Table S2).

RNA isolation and transcriptome sequencing
RNAs were isolated from of three fresh leaves of Z. bungeanum and Z. armatum according to the instructions of an RNAprep Pure Plant Kit. Samples of leaves (50-100 mg) were ground to a powder in liquid nitrogen. Following the addition of β-mercaptoethanol, 500 µL of lysate was placed in tubes and rotated to mix. The lysates were then centrifuged at 12,000 rpm for 2 min and the resulting supernatants transferred to a fresh RNase-free tube, to which was added a 0.4 volumes of absolute ethanol. Following precipitation, the solution was transferred to an adsorption column followed by centrifugation at 12,000 rpm for 15 s. The resulting supernatant was discarded and 350 µL of deproteinized solution was added to the pellet followed by centrifugation for 15 s at 12,000 rpm.
Thereafter, 80 µL of DNase I was added to the column and left to stand for 15 min. A 350-µL volume of deproteinized solution was then added to the column, followed by centrifugation for 15 s at 12,000 and the resulting supernatant was discarded. The column was then washed with 500 µL of buffer (in ethanol), followed by centrifugation for 15 s at 12,000 rpm. After a further centrifugation for 2 min at 12,000 rpm, the column was placed in a fresh RNase-free tube, followed by the addition of 50 µL of RNase-free H 2 O and centrifuged for 1 min at 12,000 rpm to obtain the RNA solution. The RNA product was checked by 1% agarose gel electrophoresis at a voltage of 120 V for 15 min. The concentration and purity of the isolated RNA were determined using a BioTek-Epoch microplate reader (BioTek Instruments, Inc., Beijing), and samples were sent to the Biozeron Company (Beijing) for sequencing.
Trinity software was used to assemble reads based on the following steps: (1) discard adaptor sequences in reads and non-relevant reads; (2) discard low-quality bases at the at 3ʹ terminus; (3) discard reads with an N percentage greater than 10%; and (4) discard reads with lengths of less than 70 bp after the removal of adapters.

DNA isolation
Fresh leaf tissues (100 mg) of 125 Zanthoxylum samples were placed in mortar with liquid nitrogen and ground to a powder. The powder was placed in centrifuge tubes, to which buffer LP1 and 6 µL RNase A were added, followed by vortex mixing for 1 min. Thereafter, 130 µL of buffer LP2 was added followed by centrifugation for 5 min at 12,000 rpm. The resulting supernatant was mixed with 1.5 volumes of buffer LP3 followed by rotation mixing for 15 s. The solution and sediment were subsequently transferred to an adsorption column, which was centrifuged at 12,000 rpm for 30 s, after which the supernatant was discarded. Thereafter, 600 µL of buffer PW was added to the column, followed by centrifugation for 30 s at 12,000 rpm. The resulting supernatant was discarded, and the process repeated. Following a further centrifugation at 12,000 rpm for 2 min, the supernatant was discarded, and the column was dried at room temperature for 2 min. Having dried, the column was placed into a fresh 1.5-mL tube, followed by the addition of 75 µL TE eluant. After standing at room temperature(20-30℃) for 2 min, the column was centrifuged for 2 min at 12,000 rpm to obtain a DNA solution.

PCR amplification using SSR primers
The 20 µL reaction mixtures used for PCR amplification(Biorad C1000 Touch) comprised the following The Msatcommander, software program MISA (http://pgrc. ikp-gatersleben. de/misa) was used to detect SSR loci and primers were designed using Primer 5.0. Initially, primers were designed from the 0-1000 bp sequences upstream and downstream of the detected SSR. Secondly, SSRs sequences repeated at least 11 for mononucleotides and six times for dinucleotides to assign, respectively, whereas SSRs sequences at least five repeats to assign for trinucleotides, tetranucleotides, pentanucleotides and hexanucleotides. Finally, the length of amplification products should be 100-280 bp and annealing temperatures for the designed primers ranged from 55°C to 65°C.

Assessment of capillary electrophoresis fluorescence labeling data
In this study, we used a 3730XL sequenator for capillary electrophoresis fluorescence labeling.
Formamide and markers were initially mixed at ratio of 100:1 by volume, and 9 µL of PCR product was transferred into capillary electrophoresis fluorescence detection system . The PCR product was then diluted 10 times and 1 µL added. Fragment (plant) analysis software was used to assess the raw data through comparing the molecular weights of PCR products molecular markers sizes.

Treatment of genetic diversity data
Raw data were analyzed using Gene Marker 2.2. Peak sizes were compared with markers to determine the sizes of fragments. SSRs are dominant markers that enable the sizes of products to be established. If different samples contain the same products, they are assumed to be homologous. For all data, the results were converted into a data matrix comprising "0" and "1" indicating no production and production, respectively. NTSYSpc 2.11 software was used to calculate the genetic distance of populations and UPGMA was used to analyze clusters and construct evolutionary trees. Moreover, indicators of genetic diversity, including Percentage of Polymorphic Loci (PPL); Nei's Gene Diversity ( H) [12], and Shannon's information index ( I) [13], were determined using POPGENE32. Nei's standard genetic distance was determined using the following formula:  Table S1. Information on 125 Zanthoxylum samples Additional file 2: Table S2. Information on six pairs of EST-SSR primers and genetic diversity results Additional file 3: Table S3. Transcriptome raw and clean data Additional file 4: Figure S1. GO (Gene Ontology) annotation for Zanthoxylum bungeanum and

Zanthoxylum armatum
Additional file 5: Figure S2. Comparison    Map showing the location of sites at which 125 Zanthoxylum samples were collected. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Capillary electrophoresis results of the primer pair ZB23 and ZA51.

Figure 4
Population clustering dendrogram based on genetic distance (Coefficient number: 0 represents the minimum genetic distance) Figure 5 Individual clustering dendrogram based on genetic distance (Coefficient number: 0 represents the minimum genetic distance)

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.
Additional file 3 Table S3.doc Additional file 1 Table S1.doc Additional file 2 Table S2.doc