Transcriptome-derived microsatellite markers for population diversity analysis in Archidendron clypearia (Jack) I.C. Nielsen

The medicinal woody leguminous genus Archidendron F. Mueller serves as important herbal resources for curing upper respiratory tract infection, acute pharyngitis, tonsillitis, and gastroenteritis. However, genomic resources including transcriptomic sequences and molecular markers remain scarce in the genus. Transcriptome sequencing, genic microsatellite marker development, and population diversity analysis were conducted in Archidendron clypearia (Jack) I.C. Nielsen. Flower and flower bud transcriptomes were de novo assembled into 173,172 transcripts, with an average transcript length of 1597.3 bp and an N50 length of 2427 bp. A total of 34,701 microsatellite loci were identified from 26,716 (15.4 %) transcripts. Primer pairs were designed for 718 microsatellite loci, of which 456 (63.5 %) were polymorphic. Of the 456 polymorphic markers, 391 (85.7 %) and 402 (88.1 %) were transferable to A. lucidum (Benth.) I.C. Nielsen and A. multifoliolatum (H.Q. Wen) T.L. Wu, respectively. Using a subset of 15 microsatellite markers, relatively high genetic diversity was detected over two A. clypearia populations, with overall mean expected heterozygosity (He) being 0.707 and demonstrating the necessity of conservation. Relatively low differentiation between the two populations was revealed despite the distant separation (about 700 km), with overall inbreeding coefficient of sub-population to the total population (Fst) being 8.7 %. This study represents the first attempt to conduct transcriptome sequencing, SSR marker development, and population genetics analysis in the medicinally important genus Archidendron. Our results will offer valuable resources and information for further genetic studies and practical applications in Archidendron and the related taxa.


Introduction
Archidendron F. Mueller is a recently renewed woody genus (tribe Ingeae, subfamily Caesalpinioideae DC., family Leguminosae Juss. or Fabaceae Lindl.) of approximately 100 species, which takes over the two earlier genera Cylindrokelupha Kosterm. and Paralbizzia Kosterm. and most species of Pithecellobium Martius [1,2]. Archidendron species occur widely in tropical Asia [1], ranging from India and southern China to New Guinea (http:// www. asian plant. net/). Their leaves, twigs, and branches contain diverse components such as polyphenols, flavonoids, lignans, and terpenoids that have anti-virus, anti-bacterium, anti-allergy, and/or anti-oxidation functions, and have been long used as herbal resources for curing upper respiratory tract infection, acute pharyngitis, tonsillitis, and gastroenteritis [3]. Of the genus, Archidendron clypearia (Jack) I.C. Nielsen (syn. Dandan Li and Mei Li contributed equally to this work. * Siming Gan siminggan@caf.ac.cn 1 Pithecellobium clypearia Benth) represents the most important species for medicinal and industrial applications [4]. Molecular marker technology provides powerful tools for a range of applications, such as population diversity investigation and variety fingerprinting. However, so far to our knowledge, no molecular marker has been developed for the genus Archidendron. Though eight microsatellite (or simple sequence repeats, SSR) markers were reported in the closely related genus Pithecellobium [5,6], their transferability to Archidendron has not been investigated yet. Moreover, A. clypearia has become a species with extremely small populations due to human activities [7], and its genetic diversity remains to be investigated, especially using molecular markers.
Here we present the first report on transcriptome-derived SSR markers in the genus Archidendron. Transcriptome sequencing (or RNA sequencing, RNA-seq) has emerged as an innovative tool for generating large expressed sequence tag (EST) data, comprehensive transcriptome profiling, and molecular marker development in many organisms [8]. SSR markers are sound for many applications due to such characteristics as co-dominance, multi-allelism, high reproducibility, and abundance within the genome [9,10]. Further, transcriptome-derived SSR markers represent functionally transcribed genomic loci and are most likely transferable across related species [10]. Thus, the objectives of this study were to develop polymorphic EST-SSR markers in A. clypearia, test their cross-species transferability, and investigate the genetic diversity of and differentiation between two A. clypearia populations.

Plant material, RNA, and DNA isolation
A mature tree of A. clypearia growing in Huolushan Forest Park (23° 10′ 51′′ N, 113° 23′ 26′′ E), Guangdong, China was used for sampling flower and flower bud for RNA isolation and sequencing. The tree and other two A. clypearia mature trees from Erlongshan Ecological Park (EEP population, 23° 21′ 08′′ N, 113° 44′ 08′′ E), Guangdong, China were leaf sampled for screening primer pairs of effective polymerase chain reaction (PCR). Also, 16 additional mature trees were leaf collected from EEP (totaling at 18 presumably unrelated trees with at least 50 m distance apart) and included for estimation of marker polymorphism and population diversity. Besides EEP population, 16 unrelated trees (at least 50 m apart) were leaf sampled from Jianfengling National Forest Park (JNFP population, 18° 44′ 39′′ N, 108° 49′ 55′′ E), Hainan, China for genetic diversity and differentiation analyses. As A. clypearia is characteristic of extremely small populations [7], these two populations are the largest we have ever found in China.
Total RNA was isolated from flower and flower bud samples using the EASYspin Plus Plant RNA Kit (Aidlab Biotechnologies, Beijing, China). Genomic DNA was extracted from leaf samples using a modified cetyltrimethyl ammonium bromide method [11].

RNA sequencing, de novo assembly of transcriptomes, and SSR identification
A cDNA library was constructed per sample using a VAHTS® Stranded mRNA-seq Library Prep Kit for Illumina (Vazyme Biotech Co., Nanjing, Jiangsu Province, China) and sequenced on a NovaSeq 6000 system (Illumina Inc., San Diego, CA, USA) using paired-end 150 bp read chemistry. Raw reads were filtered out for adaptors and low quality reads ( ≥ 5.0 % uncertain nucleotides, ≥ 20.0 % lowquality bases with Q ≤ 20, and final read length ≤ 85 bp) using Trimmomatic 0.39 [12]. The clean reads of the two transcriptomes were de novo assembled into contigs using Trinity 1.6 [13] under default parameters. To reduce the assembly redundancy, identical or nearly identical contigs were clustered to define the final transcript using Cd-hit-est [14] with an identity threshold of 95.0 %.

Marker amplification and genotyping
All primer pairs were screened for effective PCR against the equimolar DNA mixture of the three mature trees. Routine PCR (10 µL) was performed as described earlier [11] with specific melting temperature (T m ; 58 or 56 ℃). The effective primer pairs each with an amplicon less than 500 bp [the maximal size of internal size standard GeneScan TM 500 LIZ ® (Applied Biosystems, Foster City, CA, USA) in SSR detection] were included in polymorphism estimation and cross-species transferability investigation. Fifteen EST-SSR markers (Table 1) were selected to genotype the 16 individuals of JNFP population. PCR and SSR detection were performed following a fluorescent-dUTP-based SSR genotyping protocol [11]. Briefly, PCR reaction and program were basically the same as described above except for the addition of 10 pmol fluorescein-12-dUTP (Fermentas International Inc., Burlington, Ontario, Canada), the reduction of each dNTP to 25 µM, and the replacement of 35 cycles with 20 touchdown cycles of 94 ℃ for 30 s, 68-58 ℃ or 66-56 ℃ for 30 s with a decrease of 0.5 ℃ per cycle, and 72 ℃ for 1 min followed with 26 normal cycles. SSR genotyping was performed on ABI 3130xl using GeneMapper 4.1 (Applied Biosystems). SSR markers were named with the prefix of ARCeSSR (Archidendon EST-SSR) and the suffix of sequential number (three numerals).

Data analysis
For each EST-SSR marker, the number of alleles (N a ), number of effective alleles (N e ), allele size range (ASR), and fixation index (F) were calculated over the 18 trees of EEP population using GenAlEx 6.5 [17]. The polymorphism parameters including observed heterozygosity (H o ), expected heterozygosity (H e ), and polymorphic information content (PIC) were also estimated across EEP population using GenAlEx 6.5 [17].
For the two populations EEP and JNFP, Hardy-Weinberg equilibrium (HWE) was tested using GenAlEx 6.5 [17], with manual Bonferroni correction. Linkage disequilibrium (LD) between a pair of markers was evaluated with 100,000 permutations using TASSEL 3.0 [18]. For each marker, null allele frequency (NAF) was calculated using CERVUS 3.0.7 [19]. Number of private alleles (N pa ) being present in only one population, number of diagnostic alleles (N da ) being present in only one population at a frequency greater than 10.0 %, and inbreeding coefficients of individuals relative to the sub-population (F is ) and to the total population/ species (F it ) and of sub-population to the total population/ species (F st ) were also calculated using GenAlEx 6.5 [17]. Evidence for a recent bottleneck was investigated by sign Table 1 The diversity parameters of 15 expressed sequence tag (EST) derived simple sequence repeats (EST-SSR) markers over the two populations of A. clypearia (Jack) I.C. Nielsen N a number of alleles, N e number of effective alleles, H o observed heterozygosity, H e expected heterozygosity, N pa number of private alleles being present only in one population, N da number of diagnostic alleles being present in only one population at a frequency greater than 10.0 %, F is inbreeding coefficient of individuals relative to the subpopulation, F it inbreeding coefficient of individuals relative to the total population/species, F st inbreeding coefficient of sub-population relative to the total population/species, HWE Hardy-Weinberg equilibrium, NAF null allele frequency (being close to zero in absence of a null allele with negative and positive symbols implying excess of heterozygotes and homozygotes, respectively) *** departure at 0.001 significance level with Bonferroni correction. Primer sequence and annealing temperature (T m , 58 ℃) for all the 15 EST-SSR markers were supplied in Supplementary Table S3 No.
EST  [20]. In addition, allele frequencies were assessed for deviation from a normal L-shaped distribution, being indicative of possible bottleneck through loss of rare alleles.

Results and discussion
The statistics data of A. clypearia transcriptome sequencing are listed in Supplementary Table S1. After assembly redundancy reduction, a final number of 173,172 transcripts were retained with an average transcript length of 1597.3 bp and an N50 length of 2427 bp. These length values were remarkably higher than those of other medicinal leguminous plants, e.g. an average transcript length of 626 bp and an N50 length of 987 bp in Mucuna pruriens (L.) DC. [21].
Out of the 718 primer pairs designed, 644 (89.7 %) were of effective amplification. After excluding 70 effective primer pairs each with an amplicon larger than 500 bp, 456 polymorphic (Supplementary Table S3 Table S4). The polymorphism estimates are generally higher than those reported in other legumes, e.g. PIC and H e being 0.24 and 0.41 in M. pruriens [21] and 0.1956 and 0.1081 in Cyamopsis tetragonoloba (L.) Taub. [23].
The 15 polymorphic EST-SSR markers resulted in a total of 146 alleles over the two populations, with 6-16 alleles per marker (mean 9.7; Table 1). Only two markers showed significant deviation from HWE expectation after Bonferroni correction (Table 1). Only 11 pairs of markers showed significant LD (P < 0.01; Supplementary Fig.  S1). Locus NAF, H o , and H e ranged from − 0.077 to 0.098 (mean − 0.034), 0.500 to 1.000 (mean 0.825), and 0.494 to 0.902 (mean 0.772), respectively (Table 1). Population H o and H e were 0.815 and 0.696 for EEP and 0.838 and 0.719 for JNFP (mean 0.826 and 0.707), respectively ( Table 2). The genetic diversity is relatively high as compared to other legumes, e.g., H e being 0.41 in M. pruriens [21] and 0.1081 in C. tetragonoloba [23]. This result demonstrates that the two populations are clearly needed to conserve the genetic diversity. Additionally, the higher H o than H e estimates and the general negative NAF indicate certain magnitude of heterozygote excess.
The indices F and F is suggested certain degree of inbreeding, with general mean of F and F is being −0.168 (Table 2) and − 0.171 (Table 1)

Conclusions
This study represents the first attempt to conduct transcriptome sequencing, SSR marker development, and population genetics analysis in the medicinally important genus Archidendron. High quality of flower and flower bud transcriptomes (52,895,540 raw reads, 15.87 Gb in total) were generated and de novo assembled in A. clypearia. A relatively large number (456) of EST-SSR markers were developed in A. clypearia, and high cross-species transferability was revealed in A. lucidum (85.7 %) and A. multifoliolatum (88.1 %). The two A. clypearia populations analyzed need to be conserved considering their relatively high genetic diversity though population differentiation was relatively low. These results will offer valuable resources and information for further population genetics analysis and breeding applications in Archidendron and the related taxa.

Acknowledgements
The authors would like to thank Yaqin Wang and Jiabin Lv for valuable assistance in microsatellite marker experiments and data analyses. We are also grateful to Jingjing Yan for kind help with collection of JNFP samples.