Molecular Insights Into the Genetic Diversity and Population Structure of Artemisia Annua L. As Revealed by Insertional Polymorphisms

Retrotransposons-RTNs, are main source of variations in plant genomes that copying and pasting themselves into different transposon and work by changing RNA back into DNA via reverse transcription. For they are utilized in plants as optimum molecular markers to determine DNA ngerprinting, genetic mapping, and genetic variability. Inter-retrotransposon amplied polymorphisms (IRAPs) were used to measure genetic variability and structure in a collection of 118 sweet wormwood (A. annua) accessions identifying and amplifying 849 loci using 32 IRAP primers, derived from Rosaceae, Gramineae, and Solanaceae retroelements. Single IRAP primer Tnt1.OL16 based on RTN produced the maximum count of markers. Percentage of polymorphic loci (PPL), mean expected heterozygosity (He), number of effective alleles (Ne) and information index (I) of Shannon, in the studied collection were 95.80%, 0.30, 1.48 and 0.46, in the same order. AMOVA analysis showed nonexistence of signicant genetic structures during structure analysis, however, the 4 populations had three clusters based on the NJ dendrogram that depicted a relatively higher level of genetic variation within each population. These clusters were approximately congruent with corresponding geographical distributions. The study made use of 32 IRAP primers tested on 118 sweet wormwood accessions (Table 1) for initial screening. Among these primers, 12 single primers (Ty1-Copia, LTR6149, LTR6150, LTR1, Nikita, Sukkula, Tnt1.OL16, ToRTL1, 3(cid:0)LTR, Tnt-1, 5(cid:0)LTR1, 5(cid:0)LTR2) and 20 primer combinations (3(cid:0)LTR/LTR1, 3(cid:0)LTR/Tnt-1, Bare1/LTR2, Nikita/LTR1, Nikita/Sukkula, Nikita/Ty1-Copia, Tnt-1/Sukkula, Tnt-1/5(cid:0)LTR2, Tnt1.OL16/Ty1-Copia, Tnt1.OL16/Nikita, Tnt1.OL16/LTR6150, Tnt1.OL16/5(cid:0)LTR1, Tnt1.OL16/5(cid:0)LTR2, Tnt1.OL16/Sukkula, ToRTL1/Sukkula, ToRTL1/5(cid:0)LTR2, LTR6150/5(cid:0)LTR1, LTR6150/5(cid:0)LTR2, 5(cid:0)LTR1/3(cid:0)LTR, 5(cid:0)LTR1/5(cid:0)LTR2) produced 849 distinguished and scored loci, out of 819 loci (95.80%) that were polymorphic and that 14 primers, LTR6149, LTR1, Sukkula, Tnt1.OL16, 3(cid:0)LTR, Tnt-1, 3(cid:0)LTR/LTR1, 3(cid:0)LTR/Tnt-1, Bare1/LTR2, Nikita/LTR1, Nikita/Sukkula, Tnt-1/Sukkula, Tnt1.OL16/Ty1-Copia


Abstract Background
Retrotransposons-RTNs, are main source of variations in plant genomes that copying and pasting themselves into different transposon and work by changing RNA back into DNA via reverse transcription.
For that reason, they are largely utilized in plants as optimum molecular markers to determine DNA ngerprinting, genetic mapping, and genetic variability.

Results
Inter-retrotransposon ampli ed polymorphisms (IRAPs) were used to measure genetic variability and structure in a collection of 118 sweet wormwood (A. annua) accessions identifying and amplifying 849 loci using 32 IRAP primers, derived from Rosaceae, Gramineae, and Solanaceae retroelements. Single IRAP primer Tnt1.OL16 based on RTN produced the maximum count of markers. Percentage of polymorphic loci (PPL), mean expected heterozygosity (He), number of effective alleles (Ne) and information index (I) of Shannon, in the studied collection were 95.80%, 0.30, 1.48 and 0.46, in the same order. AMOVA analysis showed nonexistence of signi cant genetic structures during structure analysis, however, the 4 populations had three clusters based on the NJ dendrogram that depicted a relatively higher level of genetic variation within each population. These clusters were approximately congruent with corresponding geographical distributions.

Conclusions
In conclusion, low genetic diversity of Iranian Sweet wormwood was detected that could be reduced through introduction of appropriate exotic or improved germplasm to reduce the effects of inbreeding depression.

Background
Medicinal Sweet wormwood (Artemisia annua L., 2n = 18), Persian vernacular name "Dermaneh" is a fast growing, branched plant species that reaches about 150 cm in height [1]. It is included among 35 species from genus Artemisia, family Asteraceae that are widely distributed throughout Iran [2][3][4] and is highly heterozygous [5]. It is also found in Australia, Central Asian, South East Asian and West Asian countries variably [6,7].
The plant is used as source of obtaining artemisinin, that is an endoperoxide sesquiterpene lactone, with anti-malarial properties at concentrations of 0.01% − 2% leaf dry weight [8]. Dried leaves of the plant are also a source of both essential oil (1.4-4.0%) and other active metabolites like sesquiterpene lactones, avonoids, polyalkynes, antioxidants and coumarins [9].
The exact prediction of genetic difference in a speci c germplasm is important for e cient utilization, survival, evolution, and conservation of crop genetic resources improvement programs [10,11]. To begin with strategic crop improvement programs, information about inter and intra genetic variation among existing A. annua germplasm is crucial for their management as well as conservation and development of new cultivars with higher quantities of artemisinin and better quality of other secondary metabolites [12,13]. Genetic diversity among medicinal plants could be understood using molecular (DNA), phenotypic (morphological) and biochemical markers [14][15][16], but morphological markers are in uenced by environmental determinants controlled by epistatic and pleiotropic gene effects and could show a little variation [14]. In contrast, the use of molecular markers are not in uenced by the environment and overcome these disadvantages. The use and development of molecular markers for study of backcrosses, phylogenetic, population genetics, marker assisted selections, mapping of desired genes, analysis of genetic relationships, building linkage maps, the exploitation and detection of DNA polymorphism is being carried out in the subjects of molecular biology, biotechnology and genetics.
There is need to carefully select any type of molecular markers. Because variations in their methodologies, applications and principles, need careful considerations before selecting them by their searchers. DNA-based molecular markers generate new probabilities of genome genotyping provided by bene cial molecular evidence to evaluate genetic diversity. Several researches have been performed to measure genetic diversity and structure of Artemisia species population worldwide during the last ten years, using RAPD [15][16][17][18], AFLP [1], ISSR [18][19][20][21][22], SSR [23] and EST-SSR markers [23,24]. So far, only one study with signi cantly small native populations of A. annua collected from different areas of Northern Iran size has been studied using four AFLP markers, [1] for evaluation of genetic diversity.
Retrotransposons (RTNs) marker systems have not been previously used for evaluation of any sweet wormwood accessions in Iran. It is important to evaluate population structure and genetic diversity for these exceptionally rare resources. Therefore, this study was carried out to e ciently facilitate utilization and conservation of Iranian Sweet wormwood genotypes. To determine their genetic diversity and population structure along with determining their relationships at species level. Many features of RTNs, such as their ubiquity, activity, abundance and general dispersion in the plant genome, offer a tremendous foundation for the improvement of molecular marker systems [25,26]. Different types of DNA molecular markers were developed from RTNs that most of RTNs are among the main sources of genetic diversity. Among them, inter RTN ampli ed polymorphism (IRAP) illuminates polymorphisms that ampli es segments between two close by long terminal repeats (LTRs) using primers complementary to the 3′ end of a LTR sequence [25]. IRAP as a RTN based markers system with a high level of polymorphism are rarely used to investigate genetic diversity in medicinal plants such as Crocus [27], Leonurus cardiac L.
We used IRAP system marker to nd activity of LTR RTN families in the Iranian Sweet wormwood after isolating them from barley, tobacco and apple. The aim of the research was to nd desired RTN markers for checking diversity and characterize them for polymorphism of integration events through combing of the markers among 118 Sweet wormwood accessions that originated from four populations originating from different eco-geographical locations of Iran.

Retrotransposon insertional polymorphism in sweet wormwood genome and IRAP analysis
The study made use of 32 IRAP primers tested on 118 sweet wormwood accessions (  Supplementary Fig. 2). The lowest values of Ne, He and I parameters were achieved for the primer combination ToRTL1/Sukkula. The number of scored loci for these primers ranged from 60 to 12 with a mean of 26.53 loci for each primer. Further detail of the primers is shown in Table 1. Supplementary  Fig. 2 shows the banding pattern of Tnt1.OL16 and LTR1 primer in some sweet wormwood accessions.
RTN activity and comparison of these families are shown in Table 2.   To further elucidate and explain concepts the population structure, distance-and model-based cluster analyses we performed. Neighbor-Joining (NJ) clustering to check pattern of differences among 118 accessions. The un-weighted Neighbor-joining algorithm and Number of differences evolutionary distance coe cient sourced IRAP data that grouped 118 accessions of the four populations into three major clusters (Fig. 1). In the resulting tree, the accessions were mainly grouped based on their geographical origins with minor mixture. To settle or nd a solution to the pattern of variation, principal coordinate analysis (PCoA) was used to further elucidate relationships among the selected accessions and assess the population subdivisions.
The PCoA bi-plot showed no distinguished cluster pattern for 118 sweet wormwood accessions studied (Fig. 2). The rst three axes accounted for 27.00, 19.13 and 15.65% genetic variation in same order, explaining altogether 61.78% of the total variation. These results showed that Mazandaran population was distanced from Golestan population on axis 1. The accessions of Gilan population were scattered over a large area. The second coordinate could not well separate the accessions of Mazandaran, Gilan, Golestan and East Azerbaijan. Coordinates 1 and 2 un-abled a discrimination of all three clusters in the phylogenic dendrogram in combination. Color-codes of all accessions in 2-dimensional PCoA plot was in accordance with these population groups noted from the "structure" analyses ( Fig. 3). Because only 61.78% of the differences in this study was added in the 1st three coordinate components, the sweet wormwood germplasm was also analyzed utilizing the model-based method implemented in the software "structure". The accessions within a cluster were represented by unique colour while accessions with two different colours indicated admixed forms. The results detected the maximal ΔK at K = 2 followed by K = 7. The ΔK value reduced with increased K, without showing any peak of ΔK at K > 7 (Fig. 3A). The guessed delta K value was 222.23 for 118 sweet wormwood accessions, which represented two subpopulations. More" subgroups were noted at the value of K = 7 (Fig. 3C), that showed that genotypes admixture con rmed the PCA. Most of sweet wormwood accessions expressed populations admixture of four populations as shown in the structure analysis at K = 7 (Fig. 3C).

RTN activity and insertional polymorphism in Iranian A. annua genome
The main reason of the study was to build a base for the breeding of A. annua and nd exact level of variation in Iranian populations of sweet wormwood due to the de ciency of enough available data on the subject. Transposable elements may affect the adaptation due to their ability to spread into a genome by self-duplication, and evolution potential of hosts using events like insertion mutations, gene interruption, increment of chromosomal rearrangements and gene expression [34]. Therefore, transposable element-based marker systems could provide authentic and reliable information regarding genotype identi cation and performances. Recently, Sorkheh et al. [33] exhibited that the molecular genetic diversities measured by RTNs were more than the values measured by ampli ed fragment length polymorphism (AFLP), of 18 wild almond species.
To date, the IRAP markers are not utilized to measure genetic diversity researches in Iranian A. annua accessions. In this study, 32 IRAP primers ampli ed polymorphic and clear separate banding patterns were utilized to study genetic diversity among 118 sweet wormwood accessions. Single IRAP primers Ty1-Copia, LTR6149, LTR6150, LTR1, Nikita, Sukkula, Tnt1.OL16, ToRTL1, 3 LTR, Tnt-1, 5 LTR1, 5 LTR2 generated scoreable banding patterns, that indicated the presence and insertional activities of these elements in A. annua genome ( Table 1). The multiplicity of IRAP primers bands Tnt1.OL16 favored the opinion that the LTR families, have tendency to induce clusters in A. annua genome [35]. Primers Bare1and LTR2 were monomorphic or did not generate any banding presenting infrequent insertion or absence of these retroelements in the A. annua genome, but Bare1 generated much more polymorphism in a combination with LTR2, showing the insertion of these two RTNs near or into each other in the A. annua genome. The insertion of Bare1 near the LTR2 and other RTNs has been reported in wild almond genome [33]. The application of the barley RTNs for genome analysis among closely related genera, across species lines, and even sometimes between plant families has been previously demonstrated (27,31,33,36,37). Sorkheh et al. [33] indicated that LTR primers designed based on barley produced a high level of insertional polymorphism in wild almond and was active throughout evolution of Prunus species. Presence and insertional polymorphism of the Nikita, Sukkula, and Bare-1 elements in Hypericum perforatum has also been documented [31]. Du et al. [38] used RTN primers successfully. These were derived from Gramineae to identify and characterize 13 fruit crops and described that RTN sequences were isolated from one species can be utilized in another plant species.
The high percentage (%) of polymorphism noted by IRAP (95%) markers proposed activities of the RTN families used in the Iranian sweet wormwood genome in our study. Several studies describe the highly distinguished variations in RTN contents under several abiotic and biotic stress conditions that increased genome size [39,40]. Sorkheh et al. [33] observed same primers to describe relationships and genetic diversity across wild almond species obtained from west, north-west and central Iran that described that barley RTN based primers showed more distinguished participation to express genetic diversity among the Iranian species.

Genetic characterization and relationship of sweet wormwood accessions
The mean value of expected heterozygosity (He), Shannon's Information index (I) and number of effective alleles (Ne), (0.21, 0.32 and 1.33 respectively) in the current study can be compared to that reported with Gaafar et al. [22], Huang et al. [20] and Huang et al. [41]. Huang et al. [20] in a study of genetic diversity among 290 China A. halodendron genotypes reported the high level of genetic variation (I=0.323, Ne=1.39, and Nei's gene diversity (h) =0.22). In another study by Huang et al. [41], reported average values of I=0.32, He=0.21, Ne=1.33 in Chinese materials. Studying the genetic diversity of Egyptian A. judaica by ISSR markers [22], the Nei gene diversity values were reported from 0.000 to 0.189 with an average value of 0.139 and the I values were varied between 0.000 to 0.278 with an average value of 0.199. These relatively low values indicate moderate insertional activity of the used RTNs, signi cant inbreeding in the populations studied and nature of oral biology of A. annua, induce deadlock throughout process of A. annua evaluation and selection. In this study as regards four populations studied from any geographic region tend to have very similar genetic relationship indicating limited germplasm variation or interpopulation hybridization regionally. Another reason of low level of variability could be due to human practices such as trading of seeds. However, the strict selection pressure imposed due to genetic improvement results in eroding genetic diversity fully or partially.
AMOVA analysis revealed 94% variation to be within-populations, and 6% among-populations, it was concluded that the analyzed Sweet wormwood germplasm are genetically structured. In A. halodendron Huang et al. [20] reported vast genetic differences within-populations (90%) than among populations (10%). Similarly Huang et al. [41], reported 9% differences among populations compared to 91% variations within populations. Low level of variability among population can be explained that these populations share nearly the same gene pool and experience high connectivity, showing that genetic exchange within the population was higher, resulting in wide variation. Therefore, genetic variation between and within populations of a medicinal plant species can be very important for breeding and conservation.
The PCoA (Fig. 2) and Bayesian analysis (Fig. 3) results also support weak divergence between these accessions. But clustering analysis based on NJ algorithm (Fig. 1) clearly discriminated the accessions in three main groups in concordance with geographical distribution of the genotypes. Most of the accessions from Mazandaran, Gilan and Golestan populations aggregated together in cluster I, II and III respectively. In addition two accessions from East Azerbaijan located in cluster II. In an analysis of genetic diversity of Judean wormwood populations collected from Egypt by Gaafar et al. [22] using ISSR markers, clustering according to geographical origin was evident. Population structure analysis showed genotypes grouped into 3 subpopulation at K = 2 and more division took place at K = 7 indicating that all accessions were genetically taken from 7 subpopulations. Any association or relationship with place of origin was not detected among the accessions despite the accessions were collected from separate and distinguished geographical regions of Iran. Population genetic structures were affected by many factors, including fragmentation of habitat, breeding system, gene ow and mechanism of seed dispersal [42]. If gene ow (Nm) < l, the population/s is/are highly susceptible to genetic drift [43]. The average Nm value of natural A. annua is 3.987, which indicated that genetic variation of A. annua is due to gene drift among inter-populations, and the in uence of the gene ow within the populations was non-signi cant.

Conclusion
This study evaluated the population structure, genetic diversity, and relationships of A. annua germplasm across Iran using 18 IRAP primers. RTN-based markers are being used newly and are yet to be exploited fully. This approach can provide comprehensive information about level of genetic diversity and population structure of A. annua which could be useful for conservation and management of A. annua germplasm genetic resource. Single IRAP primer Tnt1.OL16 showed more distinguished contribution in de ning the genetic diversity of the studied taxon's or species and their segregation. These results described that the genetic diversity of Iranian Sweet wormwood is low and that makes it a distinct population for extraction of artemisinin, uniformly. However, low genetic diversity in the plant could lead to genetic depression in the plant and lead to number of genetic depression related problems. Therefore, it is important to increase the genetic base of Sweet wormwood germplasm in Iran from a place with signi cant genetic variation to carry out extended breeding studies or through introduction of exotic germplasm from a place with appropriate eco-geographical background or source population; as the plant is highly heterozygous, cross-pollinating plant with very self-pollination. This will conserve the plant and reduce loss of genetic diversity to improve the effects of inbreeding depression on these genotypes.  Fig. 1. The seeds were sown in pots having 1-3 cm diameter lled with a mixture of peat and vermiculite in the greenhouse at temperature of 25±1°C. Above ground leaves and stems were collected after about 4 weeks of growth followed by their instant freezing in 2 mL tubes using liquid nitrogen. Total genomic DNA from each sample was extracted using CTAB method [44]. The quality and quantity of DNAs was measured utilizing spectrophotometer (Bio-Photometer 6131, Eppendorf, Germany non-denatured polyacrylamide gel and checked using EtBr stain, utilizing a Gel-Scan 3000 electrophoresis system (Corbett, Sydney, Australia). Thermo Scienti c GeneRuler 100 bp Plus DNA Ladder 100 to 3000 bp was utilized to estimate size of ampli ed fragments.

Data analysis
The absence (0) or presence (1) of clear and distinguishable ampli ed fragments were scored for IRAP analysis. Only well separated and bright bands were scored to compare the extent of the activity, variability and discriminating power of each RTN family, mean of expect heterozygosity (He), number of effective alleles (Ne), Shannonʼs information index (I) were calculated for each RTN family, and percentage (%) of polymorphic bands (PPB), number of bands with > 5% frequency number of private bands, number of less common bands with <25% and 50% frequency were calculated. AMOVA (Analysis of molecular variance) was carried out to separate the total genetic variation within and among origins based on IRAP data. Principal coordinate analysis (PCoA) was computed making use of genetic distances utilizing binary data described above implemented with GenAlEx 6.4 [49]. A cluster analysis helped to generate a dendrograms utilizing Neighbor Joining (NJ) algorithm, Number of variations, evolutionary distance coe cient with MEGA 4.0 [50].
Pattern of the selected population structure in the selected accessions was analyzed with Bayesian cluster approach shown in Structure 2.3.1 [51]. Ten independent replications were used to set the number of subpopulations (k) from 1 to 10, burn in period and MCMC iterations, both to 100,000, an admixture model and correlated allele frequencies. The optimal K value (as true cluster number) was calculated by the posterior probability [ln P(D)] and an ad hoc statistic ΔK based on the rate of change in [ln P(D)] between successive K [52] using the software Structure Harvester. Inferred ancestry estimates of genotypes (Q-matrix) were taken out for the selected subpopulations [51]. Figure 1 Grouping of 118 Sweet wormwood (A. annua) accessions based on IRAP data using Neighbor Joining clustering algorithm along with number of difference evolutionary distance coe cient.