Novel repetitive sequences decipher the evolution and phylogeny in Carthamus L

Repetitive sequences are ubiquitous features of eukaryotic genomes, which contribute up to 70-80% of the nuclear genomic DNA. They are known to impact genome evolution and organization and play important role in genome remodelling. The widespread distribution and sufficient conservation of repeats reinforce the value of repetitive DNA sequences as markers of evolutionary processes. The repetitive DNA-based phylogeny reconstruction method is consistent in resolving expected phylogenetic and evolutionary relationships. In the present study, we address the isolation and characterization of four novel repetitive sequences (pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I) from Carthamus tinctorius. Detailed phylogenetic analysis of 18 taxa belonging to 7 species of Carthamus has also been done with pCtHaeIII-I, and pCtHaeIII-II which clearly indicated concerted evolution while delineating phylogenetic relationships among the 18 taxa studied. The above understanding can assist in the marker assisted genetic improvement/ enhancement programmes in this crop species.


Background
The genus Carthamus (Asteraceae) includes about 25 species and subspecies distributed from Spain and North Africa along Middle East to Northern India. Carthamus tinctorius is grown commercially for the purpose of edible seed oil, orange-red dye, carthamine, and also has applications in herbal medicine.
The genus Carthamus consists of taxa with five gametic numbers (n=10, 11, 12, 22, and 32) including diploids as well as polyploid species. Most of the wild species of the genus including the cultivated safflower are diploids with 2n=2x=20, 22 and 24 whereas the polyploid taxa exhibit 2n=4x=44 and 2n=6x=64 chromosomes. The taxonomic classification of the genus has seen many revisions (refer to Table 1, Mehrotra et al., Total genomic DNA of 18 taxa of Carthamus was extracted from young, tender leaves by modified CTAB method as detailed by Porebski et al. (1997). The quality and quantity of DNA was determined through agarose gel electrophoresis (AGE) using 0.8 % agarose gel.
Total genomic DNA (100 mg/sample) of Carthamus tinctorius was separately digested with Hae III and Taq I restriction endonucleases, at 37° C (Hae III ) and 65° C (TaqI). The digested DNA was fractionated overnight on 0.85% AGE prepared in 1XTAE (0.04M Tris acetate, 1mM Na 2 EDTA) buffer at 15V. Flourescein labelled HindIII digested Lambda DNA served as a molecular size marker. After completion of AGE, the gel was photographed under UV light (Fig. 1).

Colony Hybridization, Southern Hybridization and Dot-Blot
The Hae III and Taq I digested DNA fragments of size <1000bp were separately eluted from the gel and purified using agarose gel extraction kit (Qiagen) according to manufacturers' instructions. The purified DNA fragments were then cloned and transformed using Ecoli DH5α strain and plated on 2% Luria broth (1.5% agar) medium with 100 mg/ml ampicillin.
The DNA library thus obtained was used for colony hybridization purpose.
For preparing colony lifts, petridishes with bacterial colonies were precooled for at least 30 min at 4º C before taking a lift. Correct size of nylon membrane disc matching the size of petriplate having colonies was allowed to sit on the surface. The disc position was marked at several positions with a pin to ensure correct orientation of colonies for subsequent manipulations. Membrane was removed from the petridish after 1 min, in one continuous movement using blunt ended forceps. Membrane disc was placed on Whatman paper No. 1 for drying, with the side having colonies facing up.
DNA was liberated from the colonies, denatured and fixed to the membrane by placing the membrane discs with colonies uppermost on a series of solution saturated 3 MM paper pads. Membrane discs were initially placed in denaturation buffer for 2-5 min and then in neutralization buffer for 3 min and again in neutralization buffer for another 3 min. Finally, membrane was vigourously washed in 2X SSC to remove proteinaceous debris. The membrane disc, with DNA side up, was finally transferred to a pad of 3 MM papers and air dried. The DNA was fixed to the membrane by baking it for 2 h at 80º C, and the membrane was stored in saran wrap until further use. Total genomic DNA of Carthamus tinctorius was used as a probe for colony hybridization. Labelling of the probe and the subsequent hybridization reactions were performed according to Mehrotra et al. (2013).
The colonies with bright signals were expected to harbor repetitive sequences and were sequenced. The identified repetitive sequences were designated as pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I.
Total genomic DNAs (1µg/sample) of 18 taxa of Carthamus were separately digested with Hae III and Taq I restriction endonucleases, separated in 0.8 % agarose gels, and then transferred onto a nylon membrane (Hybond N+, Amersham, UK) by alkaline transfer method (Reed and Mann, 1985). Total genomic DNAs (50ng/sample) were also dot-blotted onto a nylon membrane (Hybond N+, Amersham, UK) and allowed to dry at room temperature. The membrane was then baked at 80 °C for 1 h. Finally, the membrane was rinsed in 2× SSC and was wrapped in saran wrap and stored wet at 4 °C until further use.
Hybridizations were performed using the four identified repetitive sequences separately as probes according to Mehrotra et al., 2013.

PCR Amplification, Cloning and Sequencing
Amplification of the four repetitive sequences, pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I in each of the 18 taxa of Carthamus was done by PCR using primers designed from the sequences of C. tinctorius by the authors using Primer 3 software. Primer sequences have been detailed in Table 6. Amplification and cloning of repetitive sequences were performed according to Mehrotra et al. (2013). The amplified products were cloned into pGEMT-Easy vector (Promega Co., USA) in E.coli strain DH5α and positive clones (4 clones of each sequence in each taxa) were sequenced at the DNA Sequencing Facility, University of Delhi (South Campus), India.

DNA Sequence Analysis
Cloned sequences were analyzed for homology to known nucleotide sequences from the data base (GeneBank, EMBL) using BLAST from NCBI and PlantSat database (http:// w3lamc.umbr.cas.cz/PlantSat). Sequences have been submitted to GenBank under the accession numbers KX986356-KX986359. Dot-matrix analyses of self comparisons of repetitive sequences was done using MegAlign application from Lazergene '99 software package at different matrix stringencies The repetitive sequences were analyzed using a predictive model of sequence-dependent DNA bending. The bendability/ curvature propensity plot was calculated according to Goodsell andDickerson (1994), Burkner et al. (1995) and the consensus bendability scale.
The values of the curvature are presented as the deflection angle per 10.5 residue helical turn (1°/bp). The maximum curvature peak is localized within the monomer satellite DNA consensus sequence. DNA Motif search was performed from Prosite documentation using MOTIF tool of Genome Net database. Retroposon finder was used to search for any retroposons in the sequences.

Phylogenetic Analysis
For sequence data, alignment was done with Clustal X program (Saitou and Nei, 1987;Thompson et al., 1997) using default settings with a fixed gap penalty of 6.66, and DNA transition weight of 0.5 in the multiple alignment parameter option. The presence of phylogenetic signal was assessed by likelihood mapping analysis (LMA) using TreePuzzle-5.0 software based on quartet analysis (Strimmer and Haesler, 1997). Neighbor joining and maximum parsimony methods were used to create phylogenetic trees from the aligned sequence data matrix using PAUP*4.0 b 8 (Swofford, 2002). Gaps were treated as missing data. Given a large size of the data set, heuristic searches used the Tree Bisection Reconstruction (TBR) option with MULPARS and ACCTRAN optimization. The amount of support for the branches was assessed using 100 bootstrap replicates with 10 random additions per replicate using TBR and MULPARS. A 50 % majority rule consensus tree was calculated from the most parsimonius trees using the CONTREE command in PAUP.

Results
Restriction digestion of total genomic DNA of Carthamus tinctorius with enzymes, HaeIII and TaqI, separately, showed a smear with some prominent bands between 1000-500bp ( Fig. 1). Colony hybridization of HaeIII and TaqI libraries of C. tinctorius with total genomic DNA of C. tinctorius revealed colonies with faint and bright signals. Among these colonies, four colonies with bright signals were considered as repet sequences and were selected for further analysis. These colonies were designated as pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I (Fig. 2). Analysis of each repetitive sequence was done separately.

Restriction Analysis of Genomic DNA
Hybridization of HaeIII digested total genomic DNA of 18 taxa of Carthamus with pCtHaeIII-I revealed regular periodicity of the hybridization bands in a typical ladder pattern with the smallest visible band of 340 bp (Fig. 3a). Hybridization profile of pCtHaeIII-II for 18 taxa of Carthamus showed homogenous presence of bands at 284 bp, and 568 bp positions ( Fig. 4a). Hybridization with pCtHaeIII-III in 18 taxa of Carthamus showed homogenous presence of a prominent band at 158 bp and a faint band at 316 bp position (Fig. 5a).
Hybridization signals of longer DNA fragments were gradually less pronounced. Southern hybridization with pCtTaqI-I in 18 taxa of Carthamus showed homogenous presence of a prominent single band at 362 bp (Fig.6a). The hybridization pattern did not change with increasing amount of enzyme or with longer incubation times in each case suggesting that digestions were complete and multiple bands were due to alterations in restriction site.
The dot-blots of total genomic DNA of 18 taxa of Carthamus, probed with pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I separately, showed strong signals (Fig.3b, 4b, 5b, 6b) suggesting repetitive nature of these sequences in all the taxa.

Sequence Analysis and Characterization
The repetitive sequences pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I were found to be 340bp, 284bp, 158bp and 362bp in length respectively (Table 1). The sequences did not show any significant similarity to the previously reported sequences when subjected to homology searches in GenBank, EMBL, DDBJ and PDB databases using BLAST.
pCtHaeIII-I pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I had around 28%, 45%,51% and 39% GC content respectively. Base changes were analyzed within the clones of each taxa separately in case of each sequence. Almost all the changes were single base pair substitutions in which transitions and transversions occurred evenly. The base changes did not seem to be clustered in any restricted regions. Microsatellites were also evident in the four sequences ( Table 5). The sequences showed the presence of GG, GA, and AG nearest neighbours. Sequences analysis also revealed a frequent occurrence of GGT and GTT trinucleotides and presence of poly-A tracts and a pentanucleotide CAAAA (or its inverse complementary TTTTG). A perfect polyadenylation signal, AATAAA was present in pCtHaeIII-I and pCtHaeIII-III.

Phylogenetic Analysis
Detailed phylogenetic analysis of the genus Carthamus was carried out with pCtHaeIII-I and pCtHaeIII-II repetitive sequences using Carthamus arborescens (2n=24) as outgroup.
The monomer units of pCtHaeIII-I and pCtHaeIII-II of 18 taxa of Carthamus were separately cloned and sequenced. Four randomly selected clones for pCtHaeIII-I and pCtHaeIII-II were sequenced for each of the 18 diploid (2n = 20, 24) and polyploid (2n = 44, 64) taxa.
Interclonal sequence variation of each sequence ranged from 2-5% within each taxa. All phylogenetic reconstructions showed that repeat types in each taxon were more closely related to one another than to repeat types of the other taxa. Therefore, a consensus sequence was obtained for each of them, in all the 18 taxa separately, which was then used for further phylogenetic analysis. The two sequences were analyzed separately for phylogeny.

pCtHaeIII-I
The length of amplified pCtHaeIII-I repetitive sequence in the taxa surveyed, varied from 306 bp to 320bp. Intertaxa sequence divergence in Carthamus averaged 8.59%. The average sequence divergence within the lanatus complex was 6.48%. The consensus tree and NJ tree shared similar topologies (Fig. 3i,j). The parsimony analysis of pCtHaeIII-I resulted in the strict consensus tree (Fig. 3j) (Table 2). Within the ingroup, 287 indels were present ranging from 1 to 15. One indel, a 1bp deletion separated the polyploid taxa from the diploid taxa. Another indel, a 1bp deletion was present in 6 Carthamus taxa (C. tinctorius tinctorius, C. tinctorius inermis, C. oxyacantha, C.palaestinus, C.glaucus and C. arborescens). There were 5 synonymous substitutions within the ingroup.
Likelihood mapping analysis of pCtHaeIII-I sequence data revealed that 81.8% of all quartets were within the three regions representing a well resolved phylogeny, 4.9% were unresolved and 13.4% showed star like evolution. The per cent of well resolved was much high in this sequence data (Fig. 3h).
The inter taxa genetic similarity indices ranged from 0.8492 between C. tinctorius inermis and C. lanatus creticus to 0.9965 between C. oxyacantha and C. tinctorius inermis, with a mean value of 0.854 (Table 3). Based on pCtHaeIII-I repetitive sequence of the 18 taxa of

pCtHaeIII-II
The length of amplified pCtHaeIII-II repetitive sequence in the taxa surveyed, varied from pCtHaeIII-II. C. arborescens showed highest divergence. The average sequence divergence within the lanatus complex was 16.59%. The consensus tree had topology almost similar to NJ tree (4i,j). The parsimony analysis of pCtHaeIII-II resulted in the strict consensus tree (Fig. 4j) having a length of 220 steps, with a consistency index of 0.7000, CI excluding uninformative characters of 0.6489, homoplasy index of 0.3000, HI excluding uninformative characters of 0.3511, and a retention index of 0.8226. There were 100 Parsimony Informative Sites (Table 2). Within the ingroup, 44 indels were present ranging from 1 to 5. One indel, a 1bp deletion in all polyploid taxa separated them from diploid taxa. There were 4 synonymous substitutions within the ingroup.
Likelihood mapping analysis of pCtHaeIII-II sequence data revealed that 93.6% of all quartets were within the three regions representing a well-resolved phylogeny, 3.0% were unresolved and 3.4% showed star like evolution (Fig. 4h). The per cent of well resolved was higher in pCtHaeIII-II sequence data as compared to pCtHaeIII-I.
The inter taxa genetic similarity indices ranged from 0.713 between C. lanatus turkestanicus and C. species 5 to 0.9965 between C. tinctorius inermis and C. glaucus; C. palaestinus and C. glaucus; and C. species 4 and C. lanatus creticus with a mean value of 0.8543 (Table 4). Based on pCtHaeIII-II repetitive sequence analysis, neighbour joining (NJ) tree yielded two distinct clades (Fig.4i). First clade included all the polyploid taxa with 2n = 44 and 64 (lanatus complex). The second clade resolved into two sub clades.
C.glaucus anatolicus and C.boisserii with a bootstrap value of 77, grouped with C. species 5 with a bootstrap confidence of 58%. The diploid taxa grouped with 100% bootstrap confidence. C.glaucus and C.tinctorius inermis with a bootstrap value of 61, strongly allied with C. palaestinus with 95% bootstrap confidence. All the taxa of lanatus complex and the unverified polyploid Carthamus taxa intermingled with each other.

Screening of the Four Repetitive Sequence in Various Angiosperms
Dot-blots of taxa other than Carthamus did not show any signals with pCtHaeIII-I, pCtHaeIII-I, pCtHaeIII-I and pCtTaqI-I (Fig. 3f, 4f, 5f, 6f). Moreover, no amplification product was obtained for primers designed from any of the four sequences (Fig. 3g, 4g, 5g, 6g).

Discussion
Repetitive sequences have proven successful in resolving species relationships and understanding genome evolution in various angiosperms (Dodsworth et al., 2015).
Repetitive sequences have not been extensively studied in the family Asteraceae. There are only two major reports of repetitive sequences in Asteraceae. Subtribe Centaureinae (of tribe Cardueae) which comprises of the genus Centaurea and the Carthamus complex has proved an excellent model group to analyze evolution in satellite repetitive DNA (Bosque et al., 2013(Bosque et al., , 2014. Detailed phylogenetic and evolutionary studies have been reported in HinfI satellite DNA of Centaurea and related species (Suarez-Santiago et al., 2007;Bosque et al., 2013). Phylogenetic studies have also been reported in genus Carthamus with the KpnI satellite repeats (Mehrotra et al., 2013).
The present study reports four novel repetitive sequences, pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I which were isolated by screening of DNA libraries of HaeIII and TaqI digested DNA with total genomic DNA of C. tinctorius. Plasmids which gave strong hybridization signals were expected to harbor repetitive DNA sequences and served as sources of DNA probes in further hybridization experiments after sequencing.

Organization of Repetitive Sequences
To understand the organization of the identified repetitive sequences, Southern hybridization was carried out using the repetitive DNA clones using the total genomic DNA of C. tinctorius and the corresponding restriction endonuclease (HaeIII/ TaqI Vicia faba (VfB) (Frediani et al., 1999), Brassica nigra (pBN-4 and pBNE8) (Kapila et al., 1996) and Gossypium (Zhao et al., 1998). It has been suggested that dispersed repeats also can contribute to alterations in the amount of nuclear DNA. Dispersed repeats have been known to be involved in recruitment of genes, repair of chromosomal, and induction of favorable mutants (Martignetti and Brosius 1993, Teng et al. 1996, Zeyl et al. 1996. Similar analysis of 18 taxa of Carthamus with the repetitive sequences as probes revealed that these sequences are present homogenously in all the taxa studied and produced a similar pattern as in C. tinctorius (Figs. 3a, 4a, 5a, 6a). The observed tandem repetitive pattern of pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III is probably the result of either mutation or of methylation or both, which might have altered the restriction endonuclease recognition sequence.
The typical unit sizes of plant satellite repeats are 150-180 bp or 300-360 bp (Hemleben et al., 1982;Lin et al., 1999;Heslop-Harrison, 2000). Dimerization and formation of complex higher order repeats is a molecular feature typical for satellite DNA and has been observed in many plant species, such as Pennisetum (Ingham et al., 1993), Avena (Grebenstein et al., 1996), and Arabidopsis thaliana (Simoens et al., 1988). Such repetitive unit sizes could be favored by evolution because they might correspond to the length of the DNA strand wrapped around the nucleosome core (Fischer et al., 1994;Vershinin & Heslop-Harrison, 1998). The presence of multimers of repeat units may be due to loss of restriction enzyme sites, due to mutation events or methylation (Kulikova et al., 2004).
Loss (or alteration) of a restriction sites resulting in a ladder pattern, has also been reported in two dimer sequences of radish satellite DNA and canrep sequences in Brassica (Grellet et al., 1986;Xia et al., 1993).
Repetitive sequences may be species, genus, or family specific or may even be widespread among a taxonomic class or kingdom (Mehrotra et al., 2014). However the repetitive families analyzed in the present study show specificity to a single genus (Figs. 3f,g; 4f,g; 5f,g; 6f,g), indicating that new or diverged sequences have appeared and amplified during speciation. This evolution of tandem repeats during speciation is a characteristic of many tandem array families in plants (Heslop-Harrison, 2000), and the rapid amplification of homogeneous repeat units is followed sequentially by mutation and independent amplification of coexisting sequence variants (Nijman and Lenstra, 2001).

Characteristics of Repetitive Sequences
The repetitive DNA sequences pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I, reported in the present study shared some common features which are common to most of the repetitive sequence families in plants. The sequences showed the presence of repetitions of poly A and T tracts scattered randomly in the sequences which are reported to be typical structures of bent DNA which may cause intrinsic binding of DNA molecules and may possibly form the heterochromatin (Koo and Crothers, 1988;Macas et al., 2000;Mehrotra et al., 2013). The AT content of pCtHaeIII-I, pCtHaeIII-II, pCtHaeIII-III and pCtTaqI-I was 72%, 55%, 49% and 61%, respectively.
Sequence analysis revealed the presence of direct, inverted, mirror, complementary repeats and microsatellites within pCtHaeIII-I sequence. The other three sequences showed only some direct repeats and microsatellites ( Table 5). Presence of these internal repeats is a characteristic feature of diverse plant satellite families suggesting that the repetitive units are formed by amplification of smaller repeats . These regions are reported to be preferential sites for DNA alterations and potential substrates for homologous recombination (Gordenin et al., 2003;Linares et al., 1998;Vershinin et al., 1994Vershinin et al., , 1995Vershinin et al., , 2001.
The four repetitive sequences showed high frequencies of GG, AG and GA nearest neighbours which are characteristic of repetitive DNA families (Blake et al., 1997) and are involved in repair of heteroduplex products of unequal cross-over (Smith, 1976;Friedberg et al., 1995). Presence of GGT and GTT trinucleotides in the monomers of the four repetitive sequences is reported to aid in de novo synthesis of telomere (Tsujimoto, 1993. The pentanucleotide CAAAA in pCtHaeIII-I and pCtTaqI-I, which is supposed to be involved in a breakage-reunion mechanism of repeat sequences during arrays evolution (Appels et al., 1986;Katsiotis et al., 1998;Macas et al., 2002) may provide specific structural properties required for the amplification and maintenance of satellite DNA in the genome and may also act as a hotspot for transposon insertions (Appels and Peacock, 1971;Appels et al., 1986;Katosiotis et al., 1998;Macas et al., 2000, Ansari et al., 2004. The polyadenylation signal, AATAAA, present in pCtHaeIII-I and pCtHaeIII-III is known to influence the transmission rate of the chromosome to descendants (Murphy and Karpen, 1995). The curvature-propensity values of the four repetitive sequences ranged between 7 and 12 (Figs. 3e, 4e, 5e, 6e) implying that the repeats are possibly curved and are responsible for tight compacting of heterochromatin (Mehrotra et al., 2013).

Phylogenetic Analysis
A detailed phylogenetic analysis was carried out with two tandem repetitive sequences (pCtHaeIII-I, and pCtHaeIII-II) in all the 18 taxa of Carthamus. Likelihood mapping analysis of the two sequences revealed that pCtHaeIII-II sequence data shows higher percentage (93.6%) of quartets within the three regions representing a well resolved phylogeny (Figs. 3h, 4h) and has a higher value of Parsimony Informative Characters. A high value of consistency index excluding uninformative characters of 0.7455, and a low value of homoplasy index of 0.1931 suggests that pCtHaeIII-II sequence is phylogenetically more informative. All phylogenetic reconstructions showed that repeat types in each taxon were more closely related to one another than to repeat types of the other taxa supporting their concerted evolution.
The present sequence assays indicated that C. palaestinus, C. oxyacantha, C. tinctorius tinctorius and C. tinctorius inermis are closely related. The grouping of C. oxyacantha, C.
palaestinus with the two varieties of C. tinctorius in the repetitive sequence based dendrograms (Figs. 3i,j; 4i,j) strengthens the conclusion that these species are closely related, hence supporting the earlier views (Sehgal et al., 2009;Mehrotra et al., 2013).
Moreover, pCtHaeIII-I based phylogeny also suggests that C. palaestinus is involved in the ancestry of C. tinctorius tinctorius and C. oxyacantha is the probable ancestor of C. tinctorius inermis. According to Imrie and Knowles (1970), C. tinctorius and C. oxyacantha have evolved concurrently from C. palaestinus through adaptive radiation. C. tinctorius is the product of selection by man in an agricultural environment whereas C. oxyacantha is a weed of disturbed areas.
The cladograms of pCtHaeIII-I and pCtHaeIII-II studied in detail (Figs. 3i,j; 4i,j), showed two major evolutionary lines in the genus Carthamus, while considering C. arborescens as the third lineage as supported by Sehgal et al., 2009. The first lineage included the diploid taxa with 2n=24 and taxa with 2n=20 (C. glaucus anatolicus and C. boisserii); and the other included the polyploid taxa with 2n=44 and 64. The repetitive sequence, pCtHaeIII-II showed distinct and better resolution between the diploid taxa with 2n=24 and taxa with 2n=20 as compared to pCtHaeIII-I. The present analysis revealed that none of the x=12 taxa grouped with polyploids. However, according to previous reports based on molecular markers like RAPD, ISSR, PCR-RFLP of chloroplast DNA, ITS and ETS sequence data and nuclear SACPD and chloroplast trnL-trnF IGS region and also the repetitive sequences, pCtKpnI-I and pCtKpnI-II, one of the lineages included all the diploid taxa with 2n=24 and the other included the taxa with 2n=20 and polyploid taxa with 2n=44 and 64 (Sasanuma et al., 2008;Sehgal et al., 2009;Mehrotra et al., 2013) which suggest that C. glaucus anatolicus and C. boisserii are likely to be involved in the ancestry of polyploids. However, the present analysis with pCtHaeIII-I, and pCtHaeIII-II, indicated grouping of taxa with 2n=20 with diploid taxa of Carthamus (2n=24) which could be due to the coevolution of these two repetitive sequences in diploid taxa of Carthamus (2n=24) and taxa with 2n=20.
These two repetitive sequences seem to have originated and evolved before speciation.
According to previous studies, the first lineage comprised species of section I consisting of diploid taxa according to Ashri and Knowles (1960) or Carthamus of Hanelt (1961). Second lineage comprised species from sections II, III and IV of Ashri and Knowles (1960) or sections Lepidopappus and Atractylis of Hanelt (1961). According to Sehgal et al. (2009), the genus Carthamus should be divided into two sections, i.e. Section Carthamus and a combined section of Lepidopappus and Atractylis sections, taking C. arborescens as the outgroup. C. arborescens has been placed as the most divergent taxa and comprises the third section Thamnacanthus (Sehgal et al., 2009).
Our study showed separate clades for diploid and polyploid taxa of Carthamus. Our study has also been successful in assigning the unverified taxa sent by USDA to different phylogenetic groups and in resolving several taxonomic considerations. Five unverified taxa, not given any name by USDA were included in the present study out of which four had 2n=6x=64 and the remaining one had 2n=2x=24. Four of the taxa with 2n=64 (C. species 2, C. species 3, C. species 4 and C . species 5) clustered along with polyploid taxa and the remaining one taxon with 2n=24 (C. species 1) clustered with diploid taxa of

Carthamus.
The phylogeny constructed on the basis of the repetitive sequences, pCtHaeIII-I, and pCtHaeIII-II is more or less consistent with the evolutionary tree reconstructed from molecular markers (Sehgal et al., 2009;Sasanuma et al., 2008) and repetitive sequences (Mehrotra et al., 2013) except for the placement of taxa with 2n=20 with diploid taxa (2n=24). The results presented here indicate that analysis of the distribution and sequences of repetitive DNA is a valuable part of genome analysis and evolution.
The repetitive sequences, pCtHaeIII-I, pCtHaeIII-II analyzed in Carthamus species clearly indicated concerted evolution while delineating phylogenetic relationships among the 18 taxa studied. The above understanding can assist in the marker assisted genetic improvement/ enhancement programmes in this crop species. These novel repetitive sequences could further be analyzed using insitu hybridization technique to elucidate genome evolution of the various taxa of genus Carthamus.

' A A T A A A A T T A C A A T A G G G T T G C A A A T G 3 ' 2 7 p C t H a e I I I -I R 3 ' T T T G G A C C C A A A A G T T T T T A A T T G 5 ' 2 3
p C t H a e I I I -I I F 5 ' C C T C A A C T A T A G C G A G C T C T T T G 3 ' 2 3 p C t H a e I I I -I I R 3 ' C C T G T C T G A T G G C T A T C A T C G 5 ' 2 1 p C t H a e I I I -I I I F