Development of nuclear and chloroplast polymorphic microsatellites for Crossostephium chinense (Asteraceae)

Crossostephium chinense is a traditional Chinese medicinal herb and it is often cultivated as an ornamental plant. Previous studies on this species mainly focused on its chemical composition and it was rarely represented in genetic studies, and thus genomic resources remain scarce. Both chloroplast and nuclear polymorphic microsatellites of C. chinense were screened from genome skimming data of two individuals. 64 and 63 cpSSR markers were identified from two chloroplast genomes of C. chinense. A total of 133 polymorphic nSSRs were developed. Ten nSSRs were randomly selected to test their transferability across 35 individuals from three populations of C. chinense, and 20 individuals each of Artemisia stolonifera and A. argyi. Cross-amplifications were successfully done for C. chinense and were partially amplified for both Artemisia species. The number of alleles varied from two to nine. The observed heterozygosity and expected heterozygosity per locus ranged from 0.000 to 0.286 and from 0.029 to 0.755, respectively. In this study, we developed polymorphic cpSSRs and nSSRs markers for C. chinense based on genome skimming sequencing. These genomic resources will be valuable for population genetics and conservation studies in C. chinense and Artemisia.


Introduction
Crossostephium chinense (L.) Makino (Asteraceae) forms a monotypic genus [1], or it is alternatively classified as Artemisia chinensis L. placed in the Artemisia subg. Pacifica C.R. Hobbs & B.G. Baldwin [2]. Whole plants of C. chinense are usually used for traditional Chinese medicinal treatments [3] and are also cultivated for ornamental purposes [4,5]. In China and Vietnam, an infusion of the leaves is used in traditional medicine to treat congestion, cough, and irregularity of the menstrual cycle. In the Philippines and Thailand, an infusion of leaves and branch tips is considered carminative and useful as an emmenagogue [6]. In the wild, the populations of C. chinense are restricted to the southern region of China (Zhejiang, Fujian, Guangdong, and Taiwan provinces), the Ryukyus, and the Minami Iwo Island of Japan, and the Batanes of the Philippines [1,[6][7][8][9]. This pattern is congruent with the results of ecological niche modeling [2] and 215 occurrences available in the GBIF database (GBIF, https:// doi. org/ 10. 15468/ 39omei). The narrow distribution range may be a consequence of its coastal zone habitat limited to rocks or raised coral outcrops [6,7], as well as its physical adaptation to regional microclimates [2]. Our observations found that each site forms an isolated population in islands or coastal regions. WFO's red-listed C. chinense as a threatened species [10], which raised public awareness on the importance and conservation of its rare populations. Previous studies of C. chinense mainly focused on its phytochemical composition [3,11,12]. The genetic studies by using markers of chloroplast (cp) and nuclear DNA of C. chinense have been partly carried out, nevertheless, this species is always treated as an outgroup species for other clade clarification in studies devoted to Asteraceae systematics [13]. The scarcity of genetic information may hinder its effective utilization and protection, therefore there is a need for further studies.
Chloroplast genomes that possess an intermediate level of nucleotide substitution rate, are more conserved than nuclear and mitochondrial genomes [14]. Apart, its' non-recombinant nature and generally uniparental inheritance has led to the increasing utilization of the cp genome as a useful tool to understand the evolutionary history [15], as well as the genetic resources to develop abundant molecular markers such as chloroplast hotspot regions and chloroplast SSRs (cpSSRs) [16]. Unlike cpSSRs, nuclear SSRs (nSSRs) are highly polymorphic, codominant, and biparentally inherited, making them widely applied for the evaluation of genetic variation [17], construction of genetic linkage maps [18], and conservation of genetic resources [19]. The increased availability of genome databases at a reasonable cost, simultaneously coupled with the development of bioinformatics tools [20], enabled large-scale investigations of molecular markers compared to the traditional polymorphic SSR markers screening [21]. Including bioinformatics pipelines such as CandiSSR, it is applied to detect candidate polymorphic SSRs from next-generation sequencing data [21].
The de novo or reference-guided assembled chloroplast genomes of two C. chinense accessions are available on NCBI, nevertheless, the microsatellites of C. chinense have never been studied. Thus, this paper is specifically aimed at utilizing the genome skimming data of C. chinense to develop cpSSRs and nSSRs markers. A set of randomly selected nSSRs markers was further used to validate the cross-amplification in 35 individuals collected from three populations of C. chinense, as well as 20 individuals each of Artemisia stolonifera Maxim., and A. argyi Lévl. et Van..

Plant material and DNA extraction
Fresh young leaves of 35 individuals were sampled from three populations of C. chinense for assessment and validation of genetic markers (Table S1). Genomic DNA was extracted using Plant DNAzol Reagent (LifeFeng, Shanghai) following the manufacturer's protocol. Plant tissues were pulverized in liquid nitrogen and then transferred to a 1.5 ml centrifuge tube containing 1 ml Plant DNAzol. The mixture was incubated in a water bath at 65 ℃ for 30 min, then centrifuged at 12,000 rpm for 5 min. After the addition of 750 μl chloroform, the contents of the vials were mixed, and the mixture was centrifuged for a second time at 12,000 rpm for 5 min. The supernatant was carefully transferred into a new 1.5 ml centrifuge tube, and the genomic DNA was precipitated from the supernatant with 550 μl ice-cold isopropyl alcohol. The precipitated DNA pellet was washed with 1000 μl of 70% ethanol following final centrifugation at 12,000 rpm for 10 min. The ethanol was discarded and the pellet was dried at 37 ℃ for 20 min. The dry pellet was dissolved in 80 μl of deionized water and stored at -20 ℃.

Polymorphic nuclear SSRs (nSSRs) development and validation
The genome skimming data of two C. chinense individuals (C. chinense-ZJWZ, collected from Dongtou, Zhejiang, China; C. chinense-JPBB, collected from Motobu, Okinawa, Japan) that we obtained previously were used to develop polymorphic nuclear SSRs markers [22]. The raw data were filtered and assembled into contigs using the CLC de novo assembler beta 4.06 (CLC Inc., Aarhus, Denmark). The chloroplast and mitochondria contigs from both C. chinense sequences were removed using the search engine of BLAST (NCBI BLAST v2.2.31). This was done by comparing to the chloroplast sequence of C. chinense (NCBI accession number: MH708561), and the mitochondria sequence of Helianthus annuus (NCBI accession number: CM007908). Then, the software CandiSSR [21] was used to identify polymorphic nSSRs markers for C. chinense. The search in CandiSSR was performed by setting the flanking sequence length at 80, blast identity cutoff set at 95, blast e-value cutoff set at 1e−10, and blast coverage cutoff set at 95. For each target nSSR, the primers are automatically designed in the pipeline developed by the Primer 3 package [26].
Ten of the developed polymorphic nSSR markers were randomly selected to test the amplification in 35 individuals (three populations) of C. chinense collected from three different localities (Table S1). These ten nSSR markers were also tested for transferability to Artemisia stolonifera and A. argyi (n = 20, respectively; Table S1). PCR amplifications were performed in a final volume of 10 μL, which contained 1 μL of genomic DNA, 5 μL 2 × Taq MasterMix (CWBIO, China), 0.1 μM of both forward and reverse fluorescently labeled universal primers (FAM, HEX, TAMRA; Table 1). The PCR conditions involved a single initial denaturation stage at 94 °C for 1 min; followed by 28 cycles of denaturing, annealing, and extending reactions respectively set at 94 °C for 30 s, 50-59 °C for 30 s, and 72 °C for 30 s. PCR reaction was completed with a final extension at 72 °C for 5 min. Fragment lengths of PCR products were analyzed on an ABI PRISM 3720xl Genetic Analyzer (Applied Biosystems). Genotypes were scored by using the software GeneMarker v2.2.0 (SoftGenetics, LLC, State College, PA, USA). We estimated genetic diversity parameters such as the number of alleles, observed and expected heterozygosity using CERVUS v3.0 [27]. Deviations from Hardy-Weinberg equilibrium were tested through GENEPOP v4.2 [28].

Chloroplast genome markers (cpSSRs) development
Sixty-four cpSSR markers were identified from the C. chinense-ZJWZ cp genome. Among them, 51 markers were located in the large single copy (LSC) regions, whereas the small single copy (SSC) and inverted repeat (IR) regions possessed seven and six copies, respectively ( Fig. 1a-A). Of the genes and intergenic spacer of C. chinense-ZJWZ (NCBI accession number: MH708561), nine cpSSRs each were present in the protein-coding regions and in the introns, whereas 46 were identified in the intergenic spacer regions (Fig. 1a-B). Among the lengths of repeated sequences, 48 cpSSRs are mononucleotides, 11 are dinucleotides, and five are tetranucleotides (Fig. 1a-C). For the C. chinense-JPBB (NCBI accession number: MH708560), 63 cpSSRs markers were detected. Fifty cpSSRs were located in the LSC regions, whereas seven and six were located in the SSC and IR regions, respectively. Among these markers, 47 are mononucleotides, 11 are dinucleotides, and five are tetranucleotides. The distribution and types of cpSSRs in C. chinense-JPBB are shown in Fig. 1b. All types of repeats were ATC-rich (Fig. 1). Comparative analyses between the two C. chinense chloroplast genomes have shown that nine cpSSR loci are polymorphic, of which eight are located in the intergenic regions and one in the coding region (Table 1).

Nuclear microsatellite markers (nSSRs) development and validation
A total of 133 polymorphic nSSRs markers were generated for C. chinense, when the following screening criteria are used, (1) similarity of flanking sequences ≥ 90%, (2) at least one primer pairs could be designed for the locus (Table S2). Among them, di-, tri-, tetra-, penta-and hexanucleotides account for 57.10%, 39.90%, 1.50%, 0.75% and 0.75%, respectively (Fig. 2). Ten selected primers for cross-amplification successfully amplified the nSSRs loci of 35 C. chinense individuals which were collected from three populations ( Table 3). The number of observed alleles varied between 2 to 9 per locus, while H O and H E ranged from 0.000 to 0.286 and from 0.029 to 0.755, respectively (Table 4). For Artemisia stolonifera and A. argyi, the observed alleles ranged from 1 to 6, and H O and H E varied from 0.000 to 0.450 and from 0.000 to 0.565, respectively (Table 4). Besides, four loci (CC19, CC32, CC55, and CC66) showed significant deviations from expectations under Hardy-Weinberg equilibrium for C. chinense due to the presence of excess homozygotes.

Discussion
Even though C. chinense was widely studied for its economic value [4,5] and conservation strategies, this study is the first detailed survey investigating its genomic resources. Both C. chinense-ZJWZ and JPBB which possess chloroplast genome lengths of 151,024 bp and 151,097 bp, respectively, share identical gene content information [22], which may explain the low level of divergence. Nevertheless, both sequences harbored 19 highly variable loci ( Table 2) that might serve as potential mutational hotspots. Comprehensive mutational hotspot markers screened from the whole chloroplast genome are useful for polymorphic site identification that elucidate the evolutionary history and resolve controversial phylogenetic relationships, hybridization issues, and biogeography [29,30]. For instance, 20 mutational hotspots were, respectively, recognized for Artemisia scoparia [31], A. maritima, and A. absinthium [32]. Meanwhile, Kim et al. [33] compared 21 Artemisia species (32 accessions) and suggested markers in accD and ycf1 may represent potential markers to be tested for the whole Asteraceae. Recognition of these two markers seems to be in line with several other studies on genera in Asteraceae, where either one or both markers were observed in the suggestion list [34,35]. The marker ycf1 in this study represented one of the 19 divergent hotspots although it possessed a lower nucleotide diversity (Pi = 0.0004), implying the potential application of this marker at higher taxonomic levels. Overall, low nucleotide  Fig. 2 The distribution of polymorphic nuclear simple sequence repeats (nSSRs) for Crossostephium chinense Table 3 Characteristics of the ten selected polymorphic nucleotide microsatellite markers for Crossostephium chinense  (Table 2), likely because comparative analyses were made using samples from two localities only. The highly divergent hotspots are usually identified between different species [33][34][35]. Furthermore, the chloroplast genomes of the same species were relatively conserved, which was exhibited in a less remarkable polymorphism. Thus, the mutational hotspot regions listed in this study, although with low nucleotide diversity, could still apply for interpopulation genetic and phylogeographic studies to test the biogeographic hypotheses. Nine polymorphic cpSSRs observed in both genomes of C. chinense-ZJWZ and -JPBB are mononucleotide tandem repeats with an intraspecific variation of polyA (polyadenine) representing the most repeated motif in the six primer sets (Table 1). Overall, the repeat motif varies between 10 and 12 nucleotides, with either polyA or polyT shown as the content. Among the studied Asteraceae, the identified loci with abundant A/T content were also present in A. scoparia [31]. Moreover, mononucleotide SSRs were also the most frequently identified sequence in Artemisia species [31,32], although multiple-nucleotide SSRs may be sometimes present in the least frequency. A distribution pattern of continuous repeat sequences of polyT (polythymine) followed by polyA occurred in atpA-trnR (Table 1). Among the cpSSR markers, three primer sets of rpoC2-rps2 (cpSSR2), atpA-trnR (cpSSR4), and ycf1 (cpSSR9) were also suggested for mutational hotspots (Table 1). Repeat sequences have been proven crucial in chloroplast genome arrangement and sequence variation [30]. Furthermore, the variable repeat sequences between lineages could be used as microsatellite markers for genetic diversity and population genetics studies of plant species [31,36].
In this study, the employment of genome skimming data using CandiSSR represents the first study to identify the appropriate polymorphic nSSRs for C. chinense. Estimation of the expected heterozygosity that shows significant deviation at four loci (CC19, CC32, CC55, and CC66) may not result only due to the presence of excess homozygotes. Other factors including the Wahlund effect, inbreeding, null alleles, and sampling effect are also the potential causes of the deviation [37,38]. Ten selected nSSRs were tested for the transferability of loci. All ten nSSRs have successful results among the three populations of C. chinense, whereas only four markers were applicable for Artemisia stolonifera and A. argyi ( Table 4). Verification of transferability of markers to other plant species could further contribute to the understanding of the phylogenetic relationships [39]. Therefore, further verification of the transferability of nSSRs markers from the remaining 123 markers in this study should be tested in the future. The approaches of applying nSSRs have been developed in Asteraceae for Chresta [40], Solidago [41], as well as to study the hybridization of two Tithonia species [42]. Applications of nSSRs were also used for other plant species such as transferability in Sanguinaria [43], and genetic structure studies in Salix [44], Euptelea [45], and Engelhardia [46]. The 133 successfully developed polymorphic nucleotide microsatellite markers can be further applied to reveal the genetic diversity, population structure, and to develop effective conservation as well as management strategies for C. chinense. Table 4 Characteristics of the selected ten polymorphic nuclear microsatellite markers in three populations of C. chinense and two species of Artemisia A = number of alleles per locus; H E = expected heterozygosity; H O = observed heterozygosity; N = number of individuals sampled a Locality and voucher information are available in Table S1 b Significant deviations from Hardy-Weinberg equilibrium at *P < 0.05, **P < 0.01, and ***P < 0.001, respectively

Conclusions
In summary, 133 polymorphic nucleotide microsatellite markers were developed successfully and can be applied to reveal the genetic diversity, population structure, and possible intra-and inter-population gene flow of C. chinense. It could also apply for effective conservation as well as management strategies for C. chinense. Moreover, our study confirms the transferability of developed nSSRs across other species and its applicability for other taxa of Artemisia.