GO (Gene Ontology) annotation includes three aspects: cellular component, molecular
function, and biological process. For Z. bungeanum, we annotated 89,198 isogenes (Additional file 4: Fig. S1). Among these, 57,683, 35,489, and 27,967 isogenes were respectively annotated as
biological process, cellular component, and molecular function, respectively. Specifically,
15,590 isogenes were assigned to “metabolic process” which was the most frequently
assigned category, and accounted for 27.03% of biological process and 17.48% among
all the annotated isogenes. The category “cell” accounted for the highest number of
isogenes (7,949) annotated as cellular component, accounting for 22.4% of cellular components and 8.91% of all annotated isogenes. Within
the category molecular function, “binding” accounted for most (12,873) of the annotated isogenes and accounted for
46.03% of isogenes assigned to molecular function and 14.43% of all annotated isogenes.
For Z. armatum, a total of 137,567 isogenes were annotated with respect to GO terms (Additional
file 4: Fig. S2). In detail, 66,368, 31,517, and 17,540 isogenes were annotated as
biological process, cellular component, and molecular function, respectively. With
regards to biological process, the highest number of isogenes were assigned “metabolic
process,” accounting for 26.43% and 12.75% of isogenes annotated as biological process
and all annotated isogenes, respectively. For cellular component, the highest number
of annotated isogenes (8,840) were assigned to “cell,” accounting for 22.28% and 6.43%
of isogenes annotated at cellular component and total annotated isogenes, respectively.
In terms of molecular process, 14,252 isogenes were assigned to “binding,” accounting
for 45.22% of isogenes annotated as molecular process, and 10.36% of all annotated
isogenes.
KEGG pathways consist of cellular processes (cohort A), environmental information
processing (cohort B), genetic information processing (cohort C), metabolism (cohort
D), and organismal systems (cohort E). As shown in Fig. 2a. 25,520 isogenes were annotated
to one of these five pathways. Of those isogenes annotated, the highest number (12,085)
were assigned to cohort A and accounted for 47.36% all isogenes. A further 4,879 isogenes
were mapped to cohort B (19.12%), whereas 3,908 (15.31%), 2,242 (8.79%), and 2,406
(9.43%) isogenes were assigned to cohorts E, C, and D, respectively. Among those isogenes
annotated to cohort A, the largest number (4,374, 36.19%) were annotated as “global
and overview maps,” whereas for cohort B, the largest number of isogenes (2,345, 48.06%)
were annotated as “translation.” In cohort E, “endocrine system” accounted for the
highest number of annotated isogenes (826, 21.14%), whereas for cohort C, the highest
number of isogenes (1,518, 67.71%) were annotated as “signal transduction.” In cohort
D, the highest number of isogenes (942, 39.15%) were annotated as “transport and catabolism.”
Figure 2b shows the results of KEGG pathway analysis for Z. armatum. In total, 27,917 isogenes were annotated to the aforementioned five pathways. Of
these, the highest number (13,964, 50.02%) were annotated to cohort A, whereas 4,925
isogenes (17.64%) were annotated to cohort B. Cohort E ranked 3rd, with 3,953 (14.16%) of the annotated isogenes, whereas, 611 (9.35%) and 2,464 (8.33%)
isogenes were annotated to cohorts D and C, respectively. In detail, 5,137 isogenes
were classified in “global and overview maps” which accounted for 36.79% of the isogenes
annotated to cohort A. For cohort B, the highest number of isogenes (2,381, 48.35%)
were assigned to “translation,” whereas in cohort E, “endocrine system” accounted
for the highest number (855, 21.63%) of annotated isogenes. For cohort C, “signal
transduction” accounted for the highest number (1,635, 66.36%) of annotated isogenes,
whereas the highest number of isogenes in cohort D (1,078, 41.29%) were annotated
as “cell growth and death” These findings indicate that Z. bungeanum and Z. armatum differ only in respect to the highest number of isogenes annotated to pathways in
cohort D.
2.2.3 Unigene annotation in different databases
A comparison of 64,944 unigenes using different databases (NR, COG, KEGG, and GO)
indicated that, for Z. bungeanum (Fig. 2c), the highest number of unigenes (43,318, 66.70% of the total) were annotated
using the NR database, whereas the COG databases annotated the fewest unigenes (4,153,
45.83%). Among the total 64,944 unigenes, 2,143 were assigned by all four databases,
and 15,700 unigenes were annotated by three databases, amongst which the NR, KEGG,
and GO databases accounted for the highest number of unigene annotations (14,201).
Furthermore, 14,901 unigenes were annotated by two databases. In detail, 11,401 unigenes
were annotated by databases NR and GO, which accounted for the highest percentage.
Some unigenes were only annotated by a single database, with the NR and COG databases
annotating the highest (43,318) and lowest (4,153) numbers of these unigenes, respectively.
For Z. armatum (Fig. 2d), 75,669 unigenes were assembled. According to the blast results obtained
using the aforementioned four databases, the highest number of unigenes (49,368, 65.60%)
were annotated using the NR database, whereas the fewest unigenes (6,130, 8.10%) were
annotated using the COG database. Furthermore, 2,658 and 17,193 unigenes were annotated
by all four and three databases, respectively. Among these, the largest number of
unigenes (15,035) were annotated collectively by NR, KEGG, and GO. We also found that
17,905 unigenes were annotated simultaneously by two databases, NR and GO accounting
for the highest number (13,249) of annotated unigenes. Among the unigenes that were
only annotated by a single database, the NR and COG databases annotated the highest
(12,734) and lowest (6,130) numbers, respectively.
Figure 2. a: KEGG pathway of isogenes for Zanthoxylum bungeanum; b: KEGG pathway of isogenes for Zanthoxylum armatum; c: Unigenes of Z. bungeanum annotated by four databases; d: Unigenes of Z. armatum annotated by four databases
2.3 SSR characteristics of Z. bungeanum and Z. armatum
For Z. bungeanum, we assembled 64,944 unigenes with a total length of 54,073,890 bp. For Z. bungeanum, a total of 12,746 SSR loci were identified in 10,595 unigenes(Table 1). The percentage incidence of SSRs (number of unigenes with SSR loci/total
number of unigenes) was 16.31%, whereas the percentage occurrence (number of SSRs/total
number of unigenes) was 19.63%. The average distance between SSR loci was 4.24 kb.
In 1,805 unigenes, more than two SSR loci were detected. In terms of SSR repeat number,
mononucleotides accounted for the highest number (6,296), whereas 3,092 and 2,992
SSRs comprised trinucleotides and dinucleotides, respectively. However, we detected
tetranucleotide, hexanucleotide, and pentanucleotide SSRs in only 214, 105, and 47
unigenes, respectively.
For Z. armatum (Table 1), we assembled 75,669 unigenes, with a total length of 58,975,053. We detected
15,096 SSR loci within 12,612 unigenes. The percentage incidence and occurrence of
these SSRs was 16.67% and 19.95%, respectively. The mean distance between two SSR
loci was 3.91 kb, and we found that 2,101 unigenes harbored more than two SSR loci.
Among all SSR repeats, the highest number of loci (7,830) were mononucleotides, followed
in numerical abundance by trinucleotides (3,711), dinucleotides (3,169), tetranucleotides
(223), hexanucleotides (112), and pentanucleotides (51). In summary, the number of
mononucleotide SSR loci in Z. armatum (7,830)exceeded that of all other repeat types combined (7,266) and also the number of mononucleotide
loci in Z. bungeanum. In addition, the numbers for each type of SSR repeat and SSR loci in Z. armatum exceeded those in Z. bungeanum.
Statistical analyses of numbers and distribution indicated that the percentage incidence
of SSR loci in Z. bungeanum was 19.63% and the average distribution distance was 4.24 kb. Among these loci, mononucleotides
had the highest incidence (9.69%), followed by trinucleotides (4.76%) and dinucleotides
(4.61%). In terms of the average distance between loci, the distribution distance
between mononucleotides was the shortest (8.59 kb), followed by trinucleotides (17.49
kb) and dinucleotides (18.07 kb). For Z. armatum, the percentage incidence of SSR was 19.95% and the mean distance between loci was
3.91 kb. Mononucleotides were found to have the highest percentage incidence (10.35%),
followed by trinucleotides (4.90%) and dinucleotides (4.19%). In terms of the average
distance between loci, we found that the shortest mean distance occurred between mononucleotides
(7.53 kb), whereas those between trinucleotide and dinucleotides were 15.90 kb and
18.61 kb, respectively. These findings accordingly indicate that the density of SSRs
in Z. armatum is greater than that in Z. bungeanum, although both these species show similar SSR distributions and frequencies. Furthermore,
we also compared the SSR distribution distances in the two Zanthoxylum species with those in 15 species of woody and herbaceous plants (Additional file 5:
Fig. S2). We accordingly found that the lowest average distance between SSRs (1.61
kb) occurred in Camellia sinensis, followed by the 1.68 kb in Rosa roxburghii and the 2.59 kb in Hevea brasiliensis. For Z. armatum and Z. bungeanum, the average distribution distances of SSRs were 3.91 kb and 4.24 kb, respectively,
which places these two species at 4th and 5th among the 17 analyzed species. Consequently, the frequencies of SSR in Z. armatum and Z. bungeanum were not low; moreover, the SSRs in these two species tend to be abundant, indicating
the potential for further discoveries.
In our previous study, we randomly selected 120 primer pairs for screening, among
which, six pairs were used in the present study to analyze 125 samples of Zanthoxylum species based on capillary electrophoresis technology. Four of these primer pairs
were derived from Z. bungeanum, whereas the other two werefrom Z. armatum. Using these primers pairs, we succeeded in amplifying 51 allelic loci with the mean
number of 8.5 polymorphic loci. The average number of amplified allelic genes (Na) was 2, the average number of effective alleles (Ne) was 1.3093, the averagefor Nei’s genetic diversity was 0.1916, and the Shannon information index (I) was 0.3037. On the basis of these values, particularly those for Na and Ne, there initially appeared to be a low genetic diversity among the 125 samples.
Fig. 3. Capillary electrophoresis results of the primer pair ZB23 and ZA51.
2.5 Genetic diversity assessment
2.5.1 Population genetic diversity
Using the UPGMA method for cluster analysis of Sichuan pepper populations, we found
that the observed number of allelic genes (Na) ranged between 1 and 1.45 (average 1.15), and that the effective number of alleles
(Ne) ranged from 1 to 1.27 (average 1.10). Genetic diversity (Nei) ranged from 0 to 0.15 (average 0.055), whereas Shannon’s information index (I) ranged from 0 to 0.22 (average 0.081). The number of polymorphic loci ranged from
0 to 23 (average 7.46), the percentage polymorphism ranged from 0% to 45.10% (14.60%)
(Table 2). In detail, for Z. bungeanum, Z. armatum, and Z. piperitum, average number of Na was 1.16, 1.17, and 1.04, respectively; the average number of Ne was 1.11, 1.11, and 1.10, respectively; the average number of Nei in each case was 0.06; and the values of Shannon’s I were 8.14, 8.44, and 2.00, respectively. These observations tend to indicate that
different Zanthoxylum species have similar genetic diversity, thereby indicating a relatively low level
of genetic diversity among these Zanthoxylum species.
On the basis of the population dendrogram shown in Fig. 4, the following information
and results can be obtained. Firstly, we successfully clustered the Zanthoxylum samples and found that the genetic distances ranged from 0.0050 to 0.4122, with an
average distance of 0.1878. Secondly, EST-SSR primers were able to distinguish and
identify each of the 125 Zanthoxylum samples. Thirdly, the populations were divided into four cohorts. Cohort I almost
exclusively comprised Z. bungeanum cultivars, whereas cohort II contained Z. armatum varieties, cohort III included ‘JLHJ-ZB’ and ‘PTSJ-ZP’ (Z. piperitum), and cohort IV contained only a single sample ‘MXDHP-ZB’ (Z. bungeanum). Not only were Z. bungeanum varieties readily detected, but it was also possible to determine that the samples
originated from Longnan City, Gansu Province. Therefore, we suspect that genetic distances
relate to this area. However, the members of cohort II were all Z. armatum varieties, which, although originating from a range of different localities,show very close genetic distances, and accordingly, do not display regionalism. The
results obtained for cohort III and cohort IV were notable in that cohort III contained
‘PTSJ-ZP,’ which is a variety of Z. piperitum and would have been expected to have clustered with ‘CCSJ-ZP’ and ‘LJSJ-ZP’. We did,
nevertheless, succeed in classifying a number of other varieties, including ‘MXDHP-ZB,’
‘HYDHP-ZB,’ ‘HCDHP-ZB,’ ‘Yuexi Gongjiao,’‘TJ-ZA’,‘JYQHJ-ZA,’ and ‘JYQHJ-ZA’.” In the
present study, however, the EST-SSR markers could not unambiguously authenticate Zanthoxylum varieties, thereby indicating that this technique is probably more applicable for
the identification of species rather than varieties. Thirdly, the results indicated
that clusters do not closely reflect regional diversity. Finally, the data obtained
tend to indicate that Z. bungeanum and Z. armatum have low intraspecific variation and genetic differentiation. Accordingly, it would
appear that the Zanthoxylum species examined in the present study do not show high genetic diversity.
Fig. 4. Population clustering dendrogram based on genetic distance
(Coefficient number: 0 represents the minimum genetic distance)
Further, among the data presented in Fig. 4 and Table 3, there are some interesting
results that are worthy of further investigation. Firstly, the genetic distances within
populations ranged from 0.0050 to 0.4122, whereas that between populations ranged
from 0.1122 to 0.2664, with an average of 0.1886. Further, the coefficient of genetic
similarity ranged from 0.7667 to 0.8930 (mean 0.8299). Among the samples collected,
‘HYDHP-ZB’ has the minimum genetic distance (0.1122) with other populations, whereas
‘MXDHP-ZB’ has the maximum genetic distance (0.1886). ‘MXDHP-ZB’ is a variety of Z. bungeanum but was classified in a cluster separate from those of cohort I, which contained varieties
of Z. bungeanum. This seeming discrepancy could be attributable to contingency and errors as a consequence
of using only a single sample of ‘MXDHP-ZB.’ (2) In cultivation ‘MXDHP-ZB’ has been
found to differ morphologically from Z. bungeanum species, and the present result confirm these differences at the molecular level.
‘CCSJ-ZP,’ ‘LJSJ-ZP,’ and ‘PTSJ-ZP’ are cultivars of Z. piperitum andtheoreticallyshould be cluster together, although we observed that only ‘CCSJ-ZP’ and ‘LJSJ-ZP’ clustered. The genetic distance between ‘CCSJ-ZP’ and ‘PTSJ-ZP’ was 0.2134, whereas
that between ‘CCSJ-ZP’ and ‘LJSJ-ZP’ was 0.2439, and that between ‘PTSJ-ZP’ and ‘LJSJ-ZP’ was 0.2429. All of these values
exceeded the average genetic distance of 0.1884, thereby indicating that the genetic
distances among these cultivars were relatively larger. Moreover, based on molecular
marker analysis, ‘Putao Shanjiao’ probably differs from ‘Chaochang Shanjiao’ and ‘Liujin
Shanjiao,’ which should be investigated further.
Sichuan pepper is generally characterized by producing only female flowers and thus
reproduction occurs almost entirely through apomixis. In contrast, however, sample
‘X-ZB’ produces only male flowers, whereas three ‘HCDHP-ZB’ samples have been exposed
to irradiation in space. According to dendrogram constructed in the present study,
these samples were assigned into cohort I but did not show any consistent patterns.
Although a wild population of Z. armatum (ZA-WP) and the cultivar ‘TJ-ZA’ clustered together in cohort II, there was a large
genetic distance between the wild and cultivated populations, indicating their dissimilarity.
Finally, ‘YXGJ-ZB’ which clustered in cohort I, appeared to be relatively distinct
from other Z. bungeanum varieties. To date, there have been no reports on the differentiation between ‘YXGJ-ZB’
and other Z. bungeanum varieties, and therefore it is conceivable that there are indeed differences between
them which could be confirmed on the basis of morphology evidence.
2.5.2 Individual genetic diversity assessment of whole samples
Using the UPGMA method to cluster the 125 Zanthoxylum samples (Fig. 5), we found that the genetic distances among these 125 samples ranged
from 0 to 2.3811, with an average of 0.5915. The samples could be clustered into five
cohorts. Cohort I contained Z. bungeanum and its varieties. Cohort II included Z. armatum and its varieties. Cohort III mainly contained Z. piperitum in addition to two Z. bungeanum samples. Cohort IV and cohort V each contained only two samples respectively, with
cohort IV containing sample 59 [‘Laiwu Huajiao’ (Z. bungeanum)] and sample 68 ‘[Hancheng Dahongpao’ (Z. bungeanum)], and Cohort IV containing sample 10 [‘Yuexi Gongjiao’ (Z. bungeanum)] and sample 115 [‘Putao Shanjiao’ (Z. piperitum)]. The construction of individual dendrograms produced the following results and
associated problems. Firstly, on an individual basis, this clustering was successful,
with respect to the separation of Z. bungeanum, Z. armatum, and Z. piperitum and their varieties. The results generally indicate that the genetic diversity of
Zanthoxylum species is not high, which is possibly related to the apomictic mode of reproduction,
which generally confers stable phenotypes. Additionally, we found EST-SSR primers
were more effective in differentiating species than varieties, because intraspecies
of Z. bungeanum and Z. armatum showed very small genetic distances, indicating little difference among varieties.
For instance, samples 73 (‘Maowen Dahongpao’), 87 (‘Wuxuan No.1’), 89 (‘Wuxuan No.2’),
and 92 (‘Wuxuan No.3’) in cohort I; and samples 12 (‘Jinyang Qinghuajiao’), 19 (‘Jiuye
Qinghuajia’), 34 [Z. armatum ‘Tengjiao’ (Hongya)], 35 [Z. armatum ‘Tengjiao’ (E’mei)], 43 [Z. armatum ‘Tengjiao’ (E’mei)], 44 [Z. armatum ‘Tengjiao’ (Danleng)], 51 (‘Jinyang Qinghuajiao’}, and 53 (‘Jinyang Qinghuajiao’)
in cohort II. However, although the EST-SSR primers used in this study were able to
identify different Zanthoxylum species, they were unable to clearly distinguish varieties clearly. Secondly, Z. armatum ‘Tengjiao’ in cohort II had small genetic distances and, given that these samples
were collected from different areas, this could indicatethat the genes of Z. armatum ‘Tengjiao’ are stable. Thirdly, samples 123-125 (‘Hancheng Dahongpao’, which has
been exposed to irradiation in space) had small genetic distances and showed no obvious
differences compared with unirradiated ‘Hancheng Dahongpao’, indicating that at the
gene level, the samples that had been exposed to irradiation could not be distinguished
based on molecular marker analysis using EST-SSRs. Moreover, cohort IV contained only
two samples, namely samples 10 (‘Yuexi Gongjiao’) and 115 (‘Putao Shanjiao’). Theoretically,
‘Yuexi Gongjiao’ should be classified in cohort I and ‘Putao Shanjiao’ should be classified
in cohort III because they differ in term of morphological characteristics. Although
we found that ‘Jinyang Qinghuajiao’ exhibited obvious differences from Z. armatum, they are considered very similar to each other in practical cultivation. Finally,
we found that wild Z. armatum samples from Ya’an City and Mian Yang City in Sichuan Province have small genetic
distances, which indicates that they are very similar, whereas genetic distances between
these samples and Z. armatum ‘Tengjiao’ were relatively larger, indicating that cultivated varieties Z. armatum show certain genetic differencescompare with wild Z. armatum.
Fig. 5 Individual clustering dendrogram based on genetic distance (Coefficient number:
0 represents the minimum genetic distance)