Comparison of nutritional traits in flour of twenty-fourteff varieties
To understand the nutritional potential of 24teffvarieties (Additional file 1) currently grown and consumed in Ethiopiawe measured the levels ofa number of nutritional traits, comparing the relationship between these traits, and using genomic data to look at the genetic relationship between these 24 teffvarieties relative to these nutritional traits.The nutritional traits included elemental micronutrient (Fig. 1&Additional file 2), nitrogen as a proxy for protein content (Fig. 2), phytate (Fig. 2) anda range of phenolic compounds (Additional file 3).
Significant variation between the 24teffvarietieswas found for all micronutrients tested (Additional file2) (p.val<0.001). Focus on elements essential for human health, Zn and Fe (Figure 1), showed Zn flour concentrations to range from 14.8 (var Etsub) to 29.2 mg/kg (var. Heber-1) and Fe to range from 22.6 (var Abay) to 684.25 mg/kg (var Yilmana)(Figure 1A& B). Positive correlations were found betweenZn andCa, Mg, P, S, and to a lesser extent K, while negative correlations were found with Cu, Fe, Mo and Se. A positive correlation was also seen between Zn and Cd(Suppl. Figure 1). No correlations were found between Pb or Ti and the other elements measured.
Cd is a toxic micronutrient, detrimental to human health. High levels of Cd were seen in many of the teffsamples tested, the highest levels of Cd beingfound in the variety Wellenkomi, having 40.7 µg/kg. Only one of the teff varieties, var Magna, had Cd levels below the current EU limits for Cd in food products, being below 1 µg/kg (1ppm)[49].
The Nlevels within the teffflours were measured as a proxy for protein content[50].Significant differences were found between the teff varieties, with N values ranging from 1.3 g/100g in var. Ambo Toketo nearly 2 g/100g in var. Dagan teff(pval< 0.001)(Figure 2A). Using the standard conversion of 5.95 this would give protein content in the range of7.79% to 11.71%[51].Apositive correlation was found between Zn content and N levels in the flours tested (R = 0.59; p val. 0.01; Figure 2B) but not for Fe and N(R= 0.0096; p val 0.97; Figure 2C).
Significant differences werefound in phytate levels between the 24 teff varieties(p val< 0.01)(Figure 2D). The phytate levels ranged from 0.83 g/100g in var. Magna to 2.56 g/100g in var. Abay.No significant correlations were observed between phytate and Zn(R 0.042p val. 0.84; Figure 2E), but a significant negative correlation was found between Fe and phytate (R -0.54 p val. 0.007; Figure 2F). There was no significant correlation between overall P levels,measured via ICP-MS,and phytate (R=0.29; p val. 0.18).
Flour of the teff varieties werealso screened for19phenolic compounds(Additionalfile3).Significant differences (F p val.< 0.001) were found between the 24 teff varieties for 11 phenolic compounds, significant differences between teff varieties not being found for gallic acid. Seven phenolic compounds were not detected.These include cyanidin, cyanidin glycoside, delphinidin, delphinidin glycoside, flavone, pelagonidin and p-coumaric acid.
Kaempferol, quercetin, catechin and myricetin all belong to a group of phenolic compounds commonly known as flavonoids (Figure 3)[52]. The levels of kaempferol and quercetin were low in all the teffvarieties, the majority of these compounds being present as kaempferol glycoside and quercetin glycoside (Figure 4).The levels of kaempferol glycoside and quercetin glycoside ranged from 4.92 (var. Tseday) to 205.79µg/g(var. Magna), and 0.46(var.Felagot) to 105.41 µg/g(var. Quncho), respectively. The levels of catechin ranged from 27.41 (var. Yilmana) to 183.19 µg/g (var. Tseday) (Figure 4).Catechin is derived from dihydroxyquercetin (Figure 3) and a precursor of proanthocyanidins, which are thought to give rice its red colouration [53].The levels of myricetin ranged from 0.82 (var. Were-kiyu) to 17.24 µg/g(var.Dima) (Figure 4).
The levels of t-cinnamic acid ranged from 1.51 µg/g in var. Baset to 24.65 µg/g in var. Dukem.Cinnamic acid is a precursor of ferulic acid, which in turn gives rise to vanillic acid. A strong positive correlation was found between cinnamic and ferulic acids, and a lesser positive correlation with vanillic acid. However, the levels of vanillic acid were low in all 24 teff varieties (Additional file 3, Additional file 4).
Protocatechuic acid and gallic acid are synthesised from a side branch of the shikimate pathway that leads to the synthesis of folates and aromatic amino acids, including phenylalanine (Figure 3;[53]). The levels of protocatechuic acid ranged from 6.78µg/g (var.Areka-1) to 44.78µg/g (var. Felagot) (Additional file 3). The levels of gallic acid were low in all 24 teff varieties, ranging from 0.12µg/g (var. Dima) to 0.97µg/g (var. Ambo take (Figure 4).
Pearson’s correlation analyses were undertaken on the levels of each phenolic compound, phytate and Fe (Additional file 4).A strong, positive correlation was seen between cinnamic acid and ferulic acid (r = 0.78), fitting with the biochemical pathway (Figure 3), where cinnamic acid is a precursor of ferulic acid. Similarly, a positive association was seen between ferulic acid and vanillic acid (r = 0.51), vanillic acid sitting downstream of ferulic acid. However, positive correlations were also seen between ferulic acid and myricetin (r = 0.61), which may relate to their common precursor cinnamic acid, and between cinnamic acid and quercetin (r = 0.49) and quercetin glycoside (r = 0.50).
The precursors of kaempferol and quercetin, dihydrokaempferol and dihydroquercetin, exist in an equilibrium controlled by a flavonoid 3’-hydroxylase (F3’H) enzyme which converts dihydrokaempferol to dihydroquercetin. This resulted in anegative correlations between kaempferol and quercetin (r = -0.28) and kaempferol glycoside and quercetin glycoside (r = -0.43); the majority of kaempferol and quercetin being present in the glycosylated state. This was reflected in positive correlations between kaempferol and quercetin glycoside (r = 0.62) and quercetin and kaempferol glycoside (r = 0.45).
Catechin is derived from dihydroquercetin and therefore competes for synthesis with quercetin. Consequently, a negative correlation was observed between the levels of catechin and quercetin (r = -0.32). However, a positive correlation was seen between catechin and quercetin glycoside (r = 0.52). A positive correlation was also observed between catechin and kaempferol (r = 0.57) and a negative correlation with kaempferol glycoside (r = -0.51).
In addition to positive correlations with kaempferol (r = 0.62) and catechin (r = 0.52), positive correlations were observed between quercetin glycoside and other phenolic compounds, including ferulic acid (r = 0.65), myricetin (r = 0.56), cinnamic acid (r = 0.50). These correlations may indicate a positive feedback mechanism operating through the biosynthetic pathways leading to quercetin glycoside synthesis.
A negative correlation was found between gallic acid and salicylic acid (r = -0.43), and between catechin and protocatechuic acid (r = -0.44), while a positive correlation was observed between protocatechuic acid and Fe content (r = 0.40). No significant correlations were found between phytate levels and any of the phenolic compounds measured in this study (Additional file 4).
Genetic relationship between thetwenty-four teffvarieties
All 24 teff varieties were sequenced togreater than 25X coverage. Sequences were compared to the referencevar.Dabbi across the whole genome. The varietiesdiffered by as few as 1.567 million SNPs in var. Areka-1relative to Dabbi, to as high as 2.372 millionSNPs in var. Yilmana (Additional file 5). There was also considerable variation in INDELs, with Yilmana showing the fewest INDELs at 343,257 and Wellenkomi the highest at 520,234. Overall,3,193,582 unique SNPs and 897,272 unique INDELs were found, containing a minimum allele frequency of 0.1 within the 24 varieties tested. Considering that the whole teffgenome is estimated to be 622 Mb in size, the2.372 million variantsidentified in var.Yilmana relative to the reference var. Dabbi, equatesto roughly four variants in every kb of the genome.
To understand how the 24 teffvarieties relateto each other a phylogenetic tree analysis(Figure 5A), a principal component analysis (PCA; Figure 5B), and a structure analysis (Figure 5C) were carried out using the genome wide SNP data. The phylogenetic tree and PCA separated the 24 teff varieties into 3 groups, however the structure analysis returned two groupings. In general, the teff varieties fell into similar groupings when comparing the phylogenetic tree and PCA, the exception being the varieties Negus and Kora, which fell on the same branch of the tree but in distinct PCA groups.
Genetic variation underlying differences in Zn levels
Zn concentrationswere overlaid on the PCA to see if a relationship between flour Zn levels and the teff variety groupings could be identified (Figure 5B).As no specific association between genetic grouping of teff varieties and Zn concentration were apparent, we chose to look at the genetic variation in specific gene families involved in Zn transport. Two gene families were selected. The ZIP (Zinc Iron Permease) family areinvolved in uptake of Zn from the soil and the HMA (Heavy metal associated) family of transportersare involved in movement of Zn from roots to seeds.
Zinc Iron Permeasetransporter family in teff
To identify putative ZIP family members the gene models of the Eragrostisteffvar. Dabbi sequence in Ensembl plants was searched using the PFAM domain PF02535 and Interpro ID IPR003689. This revealed 32 predicted genes in teff which contained theseprotein domains. As teff is a tetraploid species these 32 potential ZIP familymembers is comparable to the15 inArabidopsis and 17 in rice[30,54]. However, some of the predictedgenescould bepseudogenes. Onlysix of the 32 coding regionsidentified have a good ATGstart codon, and two of these six putative ZIP transporters lacked a stop codon. However, as this incomplete sequence data could be due to gaps in the reference var. Dabbiteff sequence we used all 32 putative ZIP sequences in subsequent analyses.
Phylogenetic analysis of the translated, amino acid sequences of the 32teffZIP transporterswas performed withmembers of the ZIP families from rice and Arabidopsis. Figure 6).Teff ZIP proteins showed closer linkages with rice ZIP proteins compared to Arabidopsis, both teff and rice being monocots.In most cases there were two teff genes for every rice ZIP gene, which is expected as teff is a tetraploid species, but not all 32 teff ZIP genes showed clear associations with genes found in riceor Arabidopsis, suggesting some divergence.Rice ZIP genes associated with more than two teff genes includedOsZIP8, which was associated with four teffgenes. Only one of the 32 teffZIP genes localized with OsZIP3, while three teffgenes colocalized with OsIRT2, as well as OsZIP10.
Heavy Metal Associated transporter family in teff
To identify putative HMA family members the gene models of the Eragrostisteffvar. Dabbi sequence in Ensembl plants was searched using the Interprodomain IPR027256. Fourteen HMA proteins were identified in teff compared to eight in Arabidopsis and nine in rice. Phylogenetic analysis of the translated, amino acid sequences of the 14teffHMA transporters was performed withmembers of the HMA families from rice and Arabidopsis(Figure 7). As 14 teff genes is less than expectedfor this tetraploid species, based on conservation of each of the eight core genes found in both rice and Arabidopsis, the teffDabbi reference sequence maybe incomplete with regards to this family of Zn transporters.In addition, five of 14predicted HMA genes in teff did not contain an ATG start codon, suggesting that some sequence is missing, or the gene models are incorrect.Four rice HMA proteins, OsHMA2, OsHMA5, OsHMA6 and OsHMA9, have two clear homologues with predicted teff HMA genes, but there does not appear to be any identified teff HMA gene aligning with OsHMA3, a major gene involved in Cd tolerance and sequestration[55].
Identification of variants in ZIP and HMA proteins
Numerous SNPs were found in both the ZIP and HMA tefffamilies of transporters (Additional files6, Additional file7). In the 32 EtZIPgenes a total of 355variants were identified in the coding regionsrelative to the reference var. Dabbi,this included frame shift mutations in twelve of the 32 genes(3.4% of the total mutations)(Additional file 6). Most of the variantswere located in an intron (41.4%), followed by synonymousvariants in the coding region (22.5%), andthenby non-synonymous mutations that result in a change in the amino acid sequence (14.6%).Many of the variants were conserved between the varieties, including a frameshift in both homologues of OsZIP9, with most of the varieties containing the mutated/truncated form of each gene.
In EtHMA transporters 298 variants were identified relative to var. Dabbi (Additional file 7). Most of the variants were again found in introns or represented synonymous mutations in the coding sequence (48% and 16.4%,respectively).Only two of the 14 teffHMA genes had frame shift variants. This included the predicted gene loci Et_s3091-2.42-1.mrna1 and Et_s3193-0.29-1.mrna1.The third most common variants werenon-synonymous mutations in the protein coding region (12.4%).
Genetic variation underlying differences in phenolic compounds
Flavonoid 3’-hydroxylase(F3’H) is a keyenzymein the conversion of dihydrokaempferol to dihydroquercetin[53,56]. To determine whether F3’H in teffwas responsible for the differences seen between teff varieties in kaempferol glycoside and quercetin glycoside levels the amino acid sequence of the rice F3’H gene(Os10g0320100) was used to identify possible orthologs in teff. Using an e-value cut off of 1e-10 five putative orthologs of OsF3’H were identified in the teff reference sequence of var. Dabbi:Et_s9738-1.8-1.mrna1, Et_s9399-0.5-1.mrna1, Et_s3159-0.29-1.mrna1,Et_s6352-0.10-1.mrna1 and Et_s15942-0.0-1.mrna1 (Suppl. Figure 2). Alignment of the amino acid sequences of thesefiveteff genes to their rice putative orthologs showed 58.74% to 78.57% identity. The two teff genes Et_s3159-0.29-1.mrna1and Et_s15942-0.0-1.mrna1clustered most closely with Os10g0320100 and were subject to further analysis.SNPs were identified between Et_s3159-0.29-1.mrna1 and Et_s15942-0.0-1.mrna1. Correlations between these SNPs and the levels of kaempferol glycoside and quercetin glycoside in the 24 teff varieties revealed a SNPin Et_s3159-0.29-1.mrna1that was strongly correlated with kaempferol glycoside and quercetin glycoside levels (p val<0.001). This SNP resulted in a T G substitution in the second intron ofEt_s3159-0.29-1.mrna1, and thereforedoes not directly change the coding sequence of the gene (Suppl. Figure 2).Teff varietiescontaining thewild-type (WT) “T” allele in the homozygous state had lower levels of kaempferol glycoside than teff varieties containing the mutant “G” SNP, and visa verse,while varieties with a heterozygous SNP had intermediary levels of the two glycosides (Figure 8). No furtherassociations were found between the other phenolic compounds measured and SNPs in any of the F3’H-type genes in teff.