Development of Genomic Resources From Crossostephium Chinense (Asteraceae) Based on Genome Skimming Data

Crossostephium chinense is a traditional Chinese medicinal herb and it is often cultivated as an ornamental plant. Previous studies on this species mainly focused on its chemical composition and it was represented rarely and marginally in genetic studies, which limited knowledge about its genetic background, and thus genomic resources remain scarce. To develop both chloroplast and nuclear polymorphic microsatellites for C. chinense, potential microsatellites were screened from genome skimming data of two individuals of C. chinense. Sixty-four and 63 cpSSR markers were identied from two chloroplast genomes of C. chinense. This study performed for the rst-ever study on employment of genome skimming data and CandiSSR, consequently a total of 133 polymorphic nSSRs were developed. Ten nSSRs were randomly selected to test their transferability across 35 individuals from three populations of C. chinense, and 20 individuals each of Artemisia stolonifera and A. argyi. Cross-amplications were successful done for C. chinense, and were partially successful amplied for both Artemisia species. The number of alleles varied from two to nine. The observed heterozygosity and expected heterozygosity per locus ranged from 0.000 to 0.286 and from 0.029 to 0.755, respectively. These genomic resources will be valuable for population genetics and conservation studies in C. chinense and Artemisia. microsatellites of C. chinense have never been studied. Thus, this paper is specically aimed on utilizing the genome skimming data of C. chinense to develop cpSSRs and nSSRs markers. A set of randomly selected nSSRs markers was further used to validate the cross-amplication in 35 individuals collected from three populations of C. chinense, as well as 20 individuals each of species Artemisia stolonifera and A. argyi.


Introduction
Crossostephium chinense (L.) Makino (Asteraceae) forms a monotypic genus [1], or it is alternatively classi ed as Artemisia chinensis L. placed in the Artemisia subg. Paci ca C.R. Hobbs & B.G. Baldwin [2]. Whole plants of C. chinense are usually used for traditional Chinese medicinal treatments [3], and are widely cultivated for ornamental purposes [4,5]. In the wild, the populations of C. chinense are restricted to the Southern region of China (Zhejiang, Fujian, Guangdong, and Taiwan), and the Ryukyus of Japan [6]. This pattern is congruent with the results of ecological niche modeling [2] and 215 occurrences available in the GBIF database (GBIF, https://doi.org/10.15468/39omei). The narrow distribution range may be a consequence of its coastal zone habitat limited to raised coral outcrops [6], as well as its physical adaptation to regional microclimates [2]. Personal observations found that each site forms an isolated population, especially in islands or the coast regions. Following WFO's red-listed C. chinense as a threatened species [7], this has raised public awareness on the importance and conservation of these rare populations. Previous studies of C. chinense mainly focused on its phytochemical composition [3,8,9]. The genetic studies by using fragment chloroplast and nuclear DNA of C. chinense have been partly carried out, nevertheless this species is always treated as a boundary species for other clades clari cation in studies devoted to Asteraceae systematics [10]. Scarcity of the genetic information may hinder its effective utilization and protection, therefore there is a need for further studies.
Chloroplast genomes that possess an intermediate level of nucleotide substitution rate, are more conserved than nuclear and mitochondrial genomes [11].
Apart, its' non-recombinant nature and generally uniparental inheritance has led to the increasing utilization of the cp genome as a useful tool to understand the evolutionary history [12], as well as the genetic resources to develop abundant molecular markers such as chloroplast hotspot regions and chloroplast SSRs (cpSSRs) [13]. Unlike cpSSRs, nuclear SSRs (nSSRs) are highly polymorphic, codominant and biparentally inherited, making it widely applied for the evaluation of genetic variation [14], construction of genetic linkage maps [15], and conservation of the genetic resources [16]. The huge availability of genome database at reasonable costs, simultaneously coupled with a series of developed bioinformatics tools [17] had twisted the defect for large-scale investigations of molecular markers as compared to the traditional polymorphic SSR markers screening [18]. Including bioinformatics pipelines such as CandiSSR, is applied to detect candidate polymorphic SSRs from the next generation sequencing data [18].
The de novo or references-guided assembled chloroplast genomes of two C. chinense accessions are available on NCBI, nevertheless the microsatellites of C. chinense have never been studied. Thus, this paper is speci cally aimed on utilizing the genome skimming data of C. chinense to develop cpSSRs and nSSRs markers. A set of randomly selected nSSRs markers was further used to validate the cross-ampli cation in 35 individuals collected from three populations of C. chinense, as well as 20 individuals each of species Artemisia stolonifera and A. argyi.

Materials And Methods
Plant material and DNA extraction A total of 35 fresh young leaves were sampled from three populations of C. chinense for assessing and validation of genetic markers (Table S1). Genomic DNA was extracted using Plant DNAzol Reagent (LifeFeng, Shanghai) following the manufacturer's protocol. After isolation the material was frozen prior to the next downstream analyses.

Polymorphic nuclear SSRs (nSSRs) development and validation
The genome skimming data of two C. chinense individuals we obtained previously were used to develop polymorphic nuclear SSRs markers [19]. The raw data were ltered and assembled into contigs using the CLC de novo assembler beta 4.06 (CLC Inc. Rarhus, Denmark). The chloroplast and mitochondria contigs from both C. chinense sequences were removed using the search engine on BLAST (NCBI BLAST v2.2.31). This was done by comparing to the chloroplast sequence of C. chinense (NCBI accession number: MH708561) and mitochondria sequence of Helianthus annuus (NCBI accession number: CM007908). Then, software CandiSSR [18] was used to identify polymorphic nSSRs markers for C. chinense. The selected parameters in CandiSSR are performed by setting the anking sequence length at 80, blast identity cutoff set at 95, blast e-value cutoff set at 1e-10, and blast coverage cutoff set at 95. For each target nSSRs, the primers are automatically designed in the pipeline developed for the Primer 3 package [23].
Ten developed polymorphic nSSRs markers were randomly selected to test the transferability to 35 individuals (three populations) of C. chinense collected from three different localities (Table S1). These ten nSSRs markers were also used for cross-ampli cation on Artemisia stolonifera and A. argyi (n= 20 respectively; Table S1). PCR ampli cations were performed in a nal volume of 10μL, which contained 1μL of genomic DNA, 5μL 2×Taq MasterMix (CWBIO, China), 0.1μM each of both forward and reverse uorescently labeled universal primer (FAM, HEX, TAMRA; Table 1). The PCR conditions involved a single initial denaturation stage at 94°C for 1 min; followed by 28 cycles of denaturing, annealing and extending reactions respectively set at 94°C for 30s, 50-59°C for 30s, and 72°C for 30s. PCR reaction was completed with a nal extension at 72°C for 5 min. Fragment lengths of PCR products were analyzed on an ABI PRISM 3720xl Genetic Analyzer (Applied Biosystems). Genotypes were scored by using the software GeneMarker v2.2.0 (SoftGenetics, LLC, State College, PA, USA). Deviations from Hardy-Weinberg equilibrium were tested through GENEPOP v4.2 [24]. We estimated genetic diversity parameters such as the number of alleles, observed and expected heterozygosity using CERVUS v3.0 [25].

Chloroplast genome markers (cpSSRs) development
Sixty-four cpSSRs markers were identi ed from the C. chinense-ZJWZ cp genome. Among them, 51 markers were located in the LSC regions, whereas SSC and IR regions possess seven and six copies, respectively (Figure 1a (Table 1).
Nineteen SNPs, which also known as the mutational hotspots were detected from the pairwise alignment of both C. chinense chloroplast genomes. This include nine SNPs in the intergenic regions, two in the intron regions and eight SNPs in the coding sequences ( Table 2). All the SNPs marker were located in the large and small single copy regions (LSC and SSC). All regions contained one substitution type, except the SNPs marker of atpA-trnR, ndhA, and ycf1 which contained two substitution types. Among the six substitution types, shifting from the T to G and A to C had the highest frequencies. Further, narrow nucleotide diversity was examined for the sixteen SNPs marker, ranging from 0.0160 (atpA-trnR) to 0.0002 (rpoC2; Table 2).

Nuclear microsatellite markers (nSSRs) development and validation
A total of 133 polymorphic nSSRs markers were generated for C. chinense, where the screening hit the criteria of similarity < 90% and no available markers were designed (Table S2). The standard deviation of these markers ranged from 0.5 to 2.5. Among them, di-, tri-, tetra-, penta-and hexanucleotides account for 57.10%, 39.90%, 1.50%, 0.75% and 0.75%, respectively ( Figure 2). Ten selected primers for cross-ampli cation successfully ampli ed the nSSRs loci of 35 C. chinense individuals which were collected from three populations (

Discussion
This study performs the rst detailed study of the genome resources of C. chinense, although conservation strategies should have implemented due to its economically important value [4,5]. Both C. chinense-ZJWZ and JPBB that possess chloroplast genome lengths of 151,024 bp and 151,097 bp have sharing identical gene contents information [19], which then causes the divergence hotspots could not be detected in C. chinense. Nevertheless, both sequences have harbored 19 highly variable loci ( Table 2) that might serve the potential mutational hotspots. Comprehensive mutational hotspot markers screening from the whole chloroplast genome is useful in polymorphism sites identi cation to elucidate the evolutionary and resolving controversial in phylogenetic relationships, hybridization issues, and biogeography [26,27]. For instance, 20 mutational hotspots were respectively recognized for Artemisia scoparia [28], A. maritima and A. absinthium [29]. Meanwhile, Kim et al. [30] compared 21 Artemisia (32 accessions) and suggested the markers accD and ycf1 may represent the potential markers to be tested for the whole Asteraceae. Recognition of these two markers seem to be line with several other studies on the genera in Asteraceae, where either one of both markers were observed in the suggestion list [31,32]. The marker ycf1 is included among the 19 divergent hotspots although it possesses a lower nucleotide diversity (Pi = 0.0004), imply the potential application of these markers over all species of Asteraceae. Overall, narrow nucleotide diversities for C. chinense are observed (Table 2), likely because of comparative analyses was made within species that sampling differently from two localities. The highly divergent hotspots usually are identi ed between closer species [30,31,32]. Furthermore, the chloroplast genomes of same species were relatively conserved, exhibited in less remarkable polymorphism. Thus, the listed mutational hotspots regions in this study, though with low nucleotide diversity, could still apply for inter-population genetic study and phylogeographic study to test the biogeography origin.
Nine polymorphism cpSSRs observed from both genome of C. chinense-ZJWZ and-JPBB are mononucleotide tandem repeats with intraspeci c variation of polyA (polyadenine) represents the most repeated motif in six primer sets (Table 1). Overall, the repeat motif is varied between 10 and 12 nucleotides, with either polyA or polyT shown as the content. Among reported Asteraceae, the identi ed loci with abundance A/T content were also present for A. scoparia [28]. Moreover, mononucleotide SSRs were also the most frequent identi ed sequence in Artemisia species [28,29], though multiple-nucleotide type SSRs may sometime present in least frequency. A distribution pattern of continual repeat sequences of polyT (polythymine) following by polyA are occurred in atpA-trnR (Table 1). Among the cpSSRs markers, three primer sets of which rpoC2-rps2 (cpSSR2), atpA-trnR (cpSSR4), and ycf1 (cpSSR9) were also suggested for mutational hotspots (Table 1). Repeat sequences have been proven crucial in chloroplast genome arrangement and sequence variation [27]. Further, the variable repeat sequences between lineages allow it signi cances used as microsatellites markers for genetic diversity, and population genetics studies of plant species [28,33].
In this study, the employment of genome skimming data using CandiSSR represents the rst-ever study in Asteraceae to identify the appropriate polymorphic nSSRs for C. chinense. Estimation on the expected heterozygosity that shown signi cant deviation on four loci (CC19, CC32, CC55 and CC66) may not only due to the presence of an excess homozygotes. Other factors including Wahlund effect, inbreeding, null alleles, and sampling effect are also the potential causes to the deviation [34,35]. Attempt of 10 selected nSSRs tested for the transferability of loci among the populations of C. chinense is perfectly successful, whereas only four nSSRs is applicable for Artemisia stolonifera and A. argyi (

Conclusions
In summary, 133 polymorphic nucleotide microsatellite markers were developed successfully and can be applied to reveal the genetic diversity, population structure and possible intra-and inter-population gene ow of C. chinense. It could also apply for effective conservation as well as management strategies for C. chinense. Moreover, our study con rms the suitability used of nSSRs across species and is applicable to Artemisia. This imply the potential use of these nSSRs for robust genetic studies.     Table 4 Characteristics of the selected ten polymorphic nuclear microsatellite markers in three populations of C. chinense and two species of Artemisia.