A novel TE-like DNA fragment discovered in the coding sequence of a gene that encodes for 5-formyltetrahydrofolate in Mt. Hermon population of wild emmer wheat.
As part of a study [29] that aimed to identify and characterize polymorphic insertions of miniature inverted-repeat transposable elements (MITEs) in five wild emmer wheat populations, we have discovered a short DNA insertion (307 bp in length) in exon 6 of a gene that encodes 5-formyltetrahydrofolate (TRIDC2AG023940, EnsemblPlants) in two accessions of Mt. Hermon population wild emmer populations (Fig 1 top). Genome-specific primers were designed from intron 5 upstream to a MITE insertion, termed Fortuna, and from exon 6 of the gene (Fig 1, Additional file 1: Table S1). The expected size of the full site was 543 bp. The site-specific PCR experiment showed that Fortuna element is present in all accessions of these five populations (Mt. Hermon, Amiad, Tabgha, Jaba and Mt. Amasa), yet in two accessions of the Mt. Hermon population, a higher band of 850 bp was amplified instead of the expected full or empty site (Fig 1 bottom).
Sequence analysis of the 307 bp insertion (Fig 2) revealed that this sequence did not hit any known sequence from the database including genomic and transcriptomic databases (https://plants.ensembl.org/Triticum_aestivum/Info/Annotation/ [30], http://plants.ensembl.org/Triticum_dicoccoides/Info/Annotation/ [4]), and repeat databases such as ITMI [31], the Triticeae repeats database (http://botserv2.uzh.ch/kelldata/trep-db/index.html), and GIRI database [32] (http://www.girinst.org/censor/index.php) that yielded no hits to annotated transposable elements. Interestingly, the 307 bp sequence was flanked by a 9 bp sequence (CCAAGAACT) at both ends resembling a target-site duplication (TSD) that can be generated as a result of transposable elements insertions (Fig 2). The fact that the 9 bp site was found in only one copy in the gene lacking the new insertion indicates the 307 bp sequence was integrated in a TE-like manner, thus creating TSD. Furthermore, the 307 bp insertion does not possess terminal-inverted repeat sequences (TIRs) that would allow us to assign this sequence to one of the TIR DNA super-families. In addition, it does not possess characteristic features of SINEs such as a Pol-III promotor or a poly-A tail.
To test whether other copies of the TE-like insertion appear in Triticum-Aegilops genomes, MAK software was used to retrieve TE-like insertion sequences from the recently available genome drafts. We have found one insertion in T. urartu (AA genome), 3 insertions in Ae. tauschii (DD genome), 12 insertions (8 in subgenome A and 4 in subgenome B) in T. turgidum ssp. dicoccoides (wild emmer, AABB genome), 6 insertions (4 in subgenome A and 2 in subgenome B) in T. turgidum ssp. durum (durum, AABB genome) and 11 insertions (3 in subgenome A, 4 in subgenome B and 4 in subgenome D) in T. aestivum (bread wheat, AABBDD genome). Sequences of all retrieved insertions and their molecular charactrization, including sub-genome and chromosmal locations can be found in Additional file 1: Table S1. Note that this insertion was found to be unique to Triticum-Aegilops group as it was not found in other plant genomes.
A sequence logo was created from the flanking sequences of the 33 full-length 307 bp insertions retrieved from the 5 genome drafts using WebLogo software [33] (http://weblogo.threeplusone.com/ , Additional file 4: Fig S1). While 16 of the 33 insertions showed clear TSD, the remaining 17 insertions did not show notable TSD (Additional file 1: Table S1). The sequence logo demonstrated a certain sequence preference for specific nucleotides at positions 4 and 9, however the 9-bp duplicated sequences were not highly conserved. The lack of detectable TSDs for 17 insertions can be explained by mutations /rearrangements around the insertions that might have disrupted the formerly identical sequences flanking each insertion. Note that over 50% (9 out of 17) of the insertions lacking TSD are located in genome BB, which is known to be very dynamic [20, 21]. To this end, the fact that clear TSDs of conserved length were detected in ~ 50% of the insertions supports the hypothesis of a transposon-like behaviour, therefore we named this TIR-less, minature element “Mariam”. Note that multiple sequence alignment analysis revealed high sequence conservation (over 90%) of the 33 full-length Mariam elements retreived from the five genome drafts. The high sequence conservation of Mariam among diploid and polyploid species might strongly indicate a recent prolifiration of this element in wheat.
Insertional polymorphism of Mariam in Triticum-Aegilops group and within wild emmer populations
As mentioned above, the existence of Mariam elements in T. urartu and Ae. tauschii, and in the three sub-genomes of bread wheat suggests this element was probably present in all diploid progenitors of the wild emmer and bread wheat and was amplified in the polyploid species. Of the 7 Mariam insertions found in A and B sub-genomes of T. aestivum, 6 were common (monomorphic insertions) to wild emmer. In addition, all 3 Mariam insertions found in Ae. tauschii were common to the D sub-genome of T. aestivum. Finally, the insertion found in T. urartu was common to wild emmer and to T. aestivum. The dynamics of Mariam elements in wild emmer wheat and in bread wheat accessions were assessed using site-specific PCR analysis on DNA isolated from 45 wild emmer accessions (collected from 5 different goegraphically isolated populations; Mt. Hermon, Amiad, Tabgha, Jaba and Mt. Amasa – 9 accessions from each population) and 8 bread wheat accessions (see Additional file 2: Table S2). Overall, primers were designed from flanking sequences of 10 Mariam insertions mapped to A and B sub-genomes of wild emmer and/or bread wheat (Additional file 3: Table S3).
The results of the PCR analysis demonstrated high insertional polymorphism levels of Mariam based on presence (full site) vs. absence (empty site) among wild emmer wheat accessions. Only one of the examined insertions, A5-6 (Table 1), was present in all accessions of wild emmer and bread wheats (monomorphic insertion), while 6 insertions (A4-4 (Fig 3), A6-2, A7-5, B3-4, B7-4, B7-6 (Additional file 4: Fig S2) were polymorphic in wild emmer wheat accessions in a population-specific manner and were not detected in the bread wheat accessions (Table 1). One insertion (B1-4, Table 1) was absent from all wild emmer wheat accessions but was present in all bread wheat accessions. In two additional cases (A4-1 (Additional file 4: Fig S2f), B1-4 (Additional file 4: Fig S2g)), insertions were polymorphic in both wild emmer wheat accessions and in bread wheat accessions. A4-1 insertion exists in only one of eight bread wheat accessions, while B1-4 insertion exists in seven of eight accessions.
The insertional pattern of Mariam (according to the results of the ssPCR analysis) was used to build a phylogenetic tree generated by hierarchical agglomerative clustering (see Fig 4). The phylogenetic tree showed significant (p-value<0.05) separation between Tabgha population (except for T3 accession) and other populations, which correlates with our previous finding according to Minos (a MITE family) insertional pattern that was assessed Transposon Display assay [29]. All T. aestivum accessions were clustered together, as well as Mt. Amasa and Amiad that were generally clustered by their population, except for two of their accessions. Jaba and Hermon populations showed high variability among accessions within population. These findings might indicate the population-specific dynamics of Mariam in wild emmer wheat and in bread wheat species.
Table 1. Insertional polymorphism of Mariam in wild emmer wheat and in bread wheat accessions based on ssPCR analysis.
Locus1
|
Presence (full site)/ absence (empty site) of Mariam in five wild emmer populations (9 accessions from each population) and in 8 bread wheat accessions2
|
Wild emmer populations (T. turgidum ssp. dicoccoides)
|
Bread wheat
(T. aestivum)
|
Mt. Hermon
|
Amiad
|
Tabgha
|
Jaba
|
Mt. Amasa
|
A2-MH
|
1
|
0
|
0
|
0
|
0
|
0
|
A4-1
|
0
|
1
|
1
|
0
|
0
|
1
|
A4-4
|
1
|
1
|
1
|
1
|
1
|
0
|
A5-6
|
2
|
2
|
2
|
2
|
2
|
2
|
A6-2
|
0
|
1
|
1
|
0
|
0
|
0
|
A7-5
|
0
|
1
|
1
|
0
|
1
|
0
|
B1-2
|
0
|
0
|
0
|
0
|
0
|
2
|
B1-4
|
1
|
1
|
2
|
1
|
1
|
1
|
B3-4
|
0
|
1
|
1
|
0
|
0
|
0
|
B7-4
|
0
|
1
|
1
|
1
|
0
|
0
|
B7-6
|
0
|
1
|
1
|
0
|
0
|
0
|
1The chromosome where the insertion was found, and an additional identifier of each insertion site.
2The presence/absence of a TE insertion in examined accessions of a given population (for T. dicoccoides) or a given species (T. aestivum): 0 – empty site in all accessions, 1 - full site in some accessions, 2 – full site in all accessions. See Fig 1, Fig 3 and Additional file 4.
Mariam insertions are associated with wheat genes
Sequence annotation of Mariam insertion sites revealed that 11 of the 33 Mariam insertions were located adjacent to other TE elements including class I and class II elements (Additional file 1: Table S1), while 12 insertions were located into or adjacent (up to 500 bp downstream or upstream) to genes, such as formyltetrahydrofolate cyclo-ligase, sucrose-phosphatase 3, soleucyl-tRNA synthetase, Serine/threonine-protein kinase, xyloglucanase and others. This is in addition to the previously described Mariam insertion (A2-MH) in exon 6 of the TRIDC2AG023940 gene encoding a predicted mitochondrial 5-formyltetrahydrofolate cyclo-ligase.
To test whether Mariam insertion in the gene coding sequence has any impact on the gene activity, we designed primers for RT-qPCR analysis from TRIDC2AG023940 gene and/or its homolog from the B genome. The relative expression level of each accession was normalized to the relative expression level of an arbitrary chosen accession (T1) to enable comparison between samples. The relative expression levels of TRIDC2AG023940 and/or its homolog from the B sub-genome were found to vary within and between populations of wild emmer wheat (Fig 5).
Amiad, Tabgha and Jaba populations demonstrated over 2-fold differences in expression levels of this gene between some accessions, while the variation within the Mt. Amasa population was less prominent. Mt. Hermon population demonstrated significantly lower expression level than Amiad Population (one-way ANOVA, F=4.3655, p-value=0.0169, Fig. 5). In addition, ~ 2-fold variation was seen within the different accessions in Mt. Hermon population, while the accessions that harbor Mariam insertion within the TRIDC2AG023940 gene showed the lowest expressions levels. For example, accession H11 shows reduction in the expression level of TRIDC2AG023940 gene vs. other accessions in Mt. Hermon population. These results might indicate the complex control of the expression of this gene, and do not allow validation or ruling out of the possible effect of this Mariam insertion on transcript quantity. However, in this case, the insertion was present in an exon and thus might either interfere with splicing or remain in the mature RNA. To test these possibilities, primers complementary to the predicted cDNA sequence (harbor Mariam insertion) were designed for RT-PCR analysis. The expression analysis was performed using cDNA of 9 accessions from the Mt. Hermon population as template. In all examined accessions where no insertion in exon 6 was detected (see Fig 1), a single clear band corresponding to the expected 626 bp size of the spliced product could be seen (Fig 6). In the H13 accession where the insertion in exon 6 was present at the DNA level (Fig 1), there was a single higher band, corresponding to the expected 942 bp size of a spliced product with an insertion in exon 6 (Fig 6). For validation, this higher band in H13 accession was extracted from the agarose gel, purified and sequenced, and found that this transcript is indeed harboring the full-length Mariam insertion. This experiment demonstrates that the insertion in exon 6 altered the mature mRNA sequence produced from the gene in a genome-specific manner.
According to the coding sequence (CDS) prediction for this gene in the EnsemblPlants database, CDS starts in exon 1 and ends in exon 7. Therefore, if the transcript is translated, the new insertion in exon 6 will interfere with translation. ORF analysis for cDNA with the new insertion predicted a stop codon at position 41 out of the 307 nucleotides of the new insertion, so that the protein product produced from the allele with the insertion in exon 6 would possess a shortened and altered C-terminus. A Blastp search of the translated cDNA of the TRIDC2AG023940 gene without the insertion (Fig 7a) and with the insertion in exon 6 (Fig 7b) in the NCBI database demonstrated that the insertion interferes with the C-terminal part of the cyclo-ligase domain.