Whole-Genome Sequencing and Analysis of Nutritional Traits in a Core Set of Ethiopian Teff (Eragrostis tef) Varieties


 Background: Teff (Eragrostis teff) is a tropical cereal domesticated and grown in the Ethiopian highlands, where it has been a staple food of Ethiopians for many centuries. Food insecurity and nutrient deficiencies are major problems in the country, so breeding for enhanced nutritional traits, such as Zn content, could help to alleviate problems with malnutrition. Results: To understand the breeding potential of nutritional traits in teff a core set of 24 varieties were sequenced and their mineral content, levels of phytate and protein, as well as a number of nutritionally valuable phenolic compounds measured in grain. Significant variation in all these traits was found between varieties. Genome wide sequencing of the 24 teff varieties revealed 3,193,582 unique SNPs and 897,272 unique INDELs relative to the teff reference var. Dabbi. Sequence analysis of two key transporter families involved in the uptake and transport of Zn by the plant led to the identification of 32 Zinc Iron Permease (ZIP) transporters and 14 Heavy Metal Associated (HMA) transporters in teff. Further analysis identified numerous variants, of which 14.6% of EtZIP and 12.4% of EtHMA variants were non-synonymous changes. Analysis of a key enzyme in flavanol synthesis, flavonoid 3’-hydroxylase (F3’H), identified a T-G variant in the teff homologue Et_s3159-0.29-1.mrna1 that was associated with the differences observed in kaempferol glycoside and quercetin glycoside levels. Conclusion: Wide genetic and phenotypic variation was found in 24 Ethiopian teff varieties which would allow for breeding gains in many nutritional traits of importance to human health.


Background
Teff(Eragrostisteff) is a tropical cereal that has its origins in the Ethiopian highlands, where it was domesticated and has been grown for thousands of years [1]. Globally teff is a minor cereal crop in terms of both production and planted area, with Ethiopia growing an estimated 90% of the annual global teffcrop on about three million hectares, which equates to about a quarter of the Ethiopian graincultivated area [2]. Compared to other cereals teff is considered a resilient crop.It can withstand adverse weather conditions,growing well at elevations between 1,800 and 2,200m above sea level, in regions where there is adequate rainfall [3]. Te s a staple of Ethiopian diets,providing 11% of the per capita caloric intake and two-thirds of the average Ethiopians daily protein [4].
Teff is widely considered a healthy alternative to cereals such as wheat, maize and rice as it does not contain gluten,is high in slowly digestible starch, rich in calcium (Ca) and polyphenols [5,6].However, levels of zinc (Zn) in the grain(in the range of 28-40 mg/kg) areoften less than therecommended level of 40 mg/kg necessaryto meet human nutritional requirements [1,7,8], withfurther forti cation efforts being hampered by low Zn levels in many Ethiopian soils [1].
While breeding efforts to improve teffhave been ongoing in Ethiopia since the mid-1950s germplasm advancements have been slow, with only 20 new teffvarieties being released [3]. A better understanding of teff nutritional diversity and its genetic control could help drive nutritional gains for this crop. Recent advancements in our understanding of the genetics of teff include a fairly complete genome assembly, the release of gene models,as well as a number of RNASeq datasets which are all publicly available [9].
These allow for translation of knowledge on grain forti cation from other species, supporting more rapid advancements in efforts to improve the nutritionalpotentialofteff [10][11][12][13][14].
Several gene families have been identi ed that support accumulation and transport of various heavy metals withinthe plant. The Zinc Iron Permease (ZIP) family of transporters, the Heavy Metal Associated (HMA) and Metal Tolerance Protein (MTP) families,and the Natural Resistance-Associated Macrophage protein (NRAMPs)family of transportershave all been shown to be integral in micronutrient transport in plants [15][16][17][18][19]. The transport of Zn from soil to seed can involve a number of these transporters, with Zn primarily being accumulatedwithin the seedembryo and aleuronelayer [20]. Inrice Zn uptake from the soil is thought to occur through the transportersOsZIP5and OsZIP9 [21,22]. Others have suggested that many routes may be available to Zn uptake, as no single transporter is believed to be solely responsible for theuptake of Zn from the soil [23][24][25][26][27].Severaltransporter families have also been implicated in the transport of Zn through the plantto the developing grains [16,28,29]. These include the HMA family and the ZIP family of transporters, both shown to be able to transport a broad range of metals [16,17,24,30], whichcan alsoincludethe transport of unwanted metals such as cadmium (Cd) andlead (Pb), both ofwhich are detrimental to most plants and animals [15,17,24,[30][31][32].
Otherfactors can also in uence the Zn content in both plant leaves and grain.These include phytates, which readily bind cationssuch as calcium (Ca), iron (Fe) and Zn,phosphate (P) and phytate being found to correlate with Zn concentration in the grain of both wheat and rice [29,33].In wheat, nitrogen (N) content has been found to positively correlate with Zn content in the grain [8].Recent work has shown that macronutrients such as N, P and sulphur (S), can also in uence overall micronutrient content [34]. Thus, a holistic approach to nutrient enhancement is required to ensure adequate Zn supply in the human diet.
In this study we set out to understand the phenotypic variation of several nutritional traits inteff varieties commonly grown and consumed in Ethiopia. We used whole genome sequencing and single nucleotide polymorphisms (SNPs) to determine the genetic relationship between theseteff varieties. This genomic data was then used to look for relationships between the teff varieties andthenutritional traits to identify those varieties with optimal nutritional potential for breeding. In addition, SNPs within candidate genes underlying nutritional traits of interest were examined to determine linkages with the trait variation.

Plant materialand our preparation
Twenty-four teff varieties (Additional le 1) weregrown in the main cropping season in 2018 to 2019 at the experimental station of AdetAgricultural Research Center, Ethiopia. Adet (11 o 28 N, 37 o 48 E; 2216 m a.s.l) is located 42km southwest of Bahir Dar, the capital city ofthe Amhara regional state, Ethiopia. Agronomic and cultural practices recommended for teff production werewhich included application of 40 kg N and 60 kg P 2 O 5 per ha [35]. The soil at Adet is brown Nitosol. The lines which has been used in the study are publicly available lines in Ethiopia from the AdetAgricultural Research Center. The voucher specimen ID numbersare listed in additional le 1.
Teffwhole grains were milled to our using a Laboratory Hammer Mill (Model: TPS-JXFM110, China) to a sieve size of 0.5mm, packed in polyethylene bags and stored at room temperature until analysed.

Analyses of nutritional traits
Analyses of all nutritional traits were undertaken at NIAB, UK. Analyses included assessment of elemental concentrations, phenolic compounds, phytate and N content.
Elemental concentrations were assessed using inductive coupled plasma mass spectrometry (ICP-MS).ICP-MS was used to detect the bene cial micronutrients Ca, Cu, Fe, K, Mg, Mn, Mo, P,S, Se, Ti, and Zn, and the detrimental elements Cd and Pb. Approximately 0.2-0.3 g of our was digested in 5 mL of nitric acid (Sigma) overnight in a 7 mL bijou. The digested sample was transferred to a 50 mL beaker and heated to 115°C to remove the residual acid,after which 3 mL of H 2 O 2 (Sigma) was added. After H 2 O 2 reduction the remaining powder was dissolved in 15 mL of ddH 2 0and samples analysed on the ICP-MS using a Thermo-Fisher Scienti c iCAP-Q equipped with CCTED (collision cell technology with energy discrimination). Three, independent technical replications were run for each teff our sample.Possible soil contamination was identi ed where our samples had greater than 100 mg/kg of Fe, Al and possibly Si, and/ or Ti above 1 µg/g [36,37].
N content was determined using Dumas analysis.Flour samples were dried at 104°C for three hours. One gram of our was loaded according to the manufacturer's instructions (LecoTruMacN Dumas gas analyser) to determine N content. Dumas gas analysis was performed on three,technical replicate 1 g aliquots of our from each variety.
Phenolic compounds were assessed using HPLC. Flour samples of approx. 2.5 g were extracted into 50 ml of ethanol-acetic acid (10% 1M acetic acid v/v) under re ux conditions for 2 hours. Extracts were stored at -20 ºC until analysed. The extracts were prepared for chromatography by centrifugation for two minutes at 13000 rpm, then ltrated through a 0.2 μm lter. The compounds were separated using the Dionex Ultimate 3000 HPLC system. A 150 mm x 4.6 mm x 5 μm x 100 Å Kinetix C18 column was used, with a gradient 0.1% formic acid /acetonitrile mobile phase running at 0.2 ml/minute (gradient of 0.95: 0.05 for 2 minutes, then 0.72; 0.28 for 18 minutes, 0.00; 0.10 for 28 minutes, and then held until 45 minutes) were used. The column e uent was monitored with a PDA detector between 200 -600 nm, with data recorded at 254 nm, 280 nm, 340 nm and 520nm.
Phytate was measured using the commercialMegazymePhytic Acid Assay Kit (Brey, Ireland)following the manufacturer'sinstructions with minor modi cations.Approximately 100 mg of our was digested in 1.8 mL HCl (0.66 M) in 2.2 mL tubes, placed in a rotator mixer overnight with a constant rpm of 20, at room temperature. Three technical replicateswere applied for each our sample.
Genomic DNA isolation and sequencing DNA was extracted from our of the 24teffvarieties using Qiagen'sDNAeasy Kit as per the manufacturer's instructions, including RNase treatment. The same our samples were used for DNA extraction as were used of nutrient trait analyses. DNA was shipped to Novogene (Cambridge UK) for Illumina Sequencing. Illumina libraries contained on average 350 bp inserts and were sequenced using paired end technology. The estimated genome coverage of each teff variety wasat least 25X,based on an estimated size of the genome of 622 MB [9]. Raw reads for all teffvarieties tested have been deposited with the ENA under the ArrayExpress accession E-MTAB-8827.
Whole genome sequenceassembly and variant calling Paired-end sequence reads were provided by the sequencing service that had already undergone quality checks, including the removal of over-represented sequences, adapters, and reads with low-quality base scores (at Q > 20). No further analysis using FastQC[38]was thereforerequired. The paired-end reads were mapped against the indexed reference sequence of the teff variety Dabbi using BWA mem [9,39]. Output was piped to bam and sorted using SAMtools. PICARD tool was used to assign reads to a single Read Group (option: AddOrReplaceReadGroup). PCR duplicates were agged using PICARD (option: MarkDuplicates).A dictionary for the reference genome of contig names and sizes was also created using PICARD (option: CreateSequenceDictionary).Variants were identi ed using the Broad Institute Genome Analysis Tool Kit (GATK4; [40]). GATK (option: Haplotypecaller) was run for each teff variety, using its default parameters, to call variants between the reference Dabbi genome sequence and the variety sequence in both Variant Call Format (VCF) and Genomic VCF (option: -ERC GVCF) modes. Only variants with a quality score of > 20 were considered. The variant SNPs were then analysed using gif3 and bcftoolscsq [41] to determine whether the SNP was within a coding region, and whether the SNP was synonymous vs non-synonymous.
Using PLINK2 the VCF le was converted into PLINK binary format (bed, bim and fam). A phylogenetic tree was created using SNPhyloandthe biallelic SNPs identi ed across the 24 teff varieties [42]. A principal co-ordinate analysis (PCA) plot was constructed using the same biallelic SNPs and analysed using the R program SNPrelate [43]. FastStructure (with logistic prior, K = 5) was used to infer population structure within the 24 teff varieties using the same SNPs [44].

Identi cation of zinc iron permease and heavy metal associated transporter familymembers in teff
The Eragrostisteffgenome was searched using theBiomart toolinEnsembl plants (http://plants.ensembl.org/biomart/martview/) to identify predictedgenes that contain both the PFAM domain PF02535and Interpro ID IPR003689, or Interpro ID IPR027256.The PFAM domain PF02535and Interpro ID IPR003689 are used for the identi cation of ZIP transporter proteins and Interpro ID IPR027256for identi cation of HMA transporter proteins.Candidate ZIP transporter and HMA transporter proteins in Arabidopsis and rice were also identi ed using Ensembl'sBiomart. The amino acid sequences of these candidate genes were aligned using MUSCLE as part ofMEGA X [45] using the default parameters and a phytogenic tree created using the Maximum Likelihood option, with 50 replications used for the determination of bootstrap values.

Statistical analysis
Micronutrient, N and phytate levels were tested for normality and homogeneity of variance using Shapiro-Wilk's test in R, which indicated a normal distribution. Two-way analysis of variance (ANOVA) was used to infer signi cance between teff varieties using the aov function and TukeyHSD commands in R. A least signi cant difference of 5% probability level was used as a post-hoc test to determine signi cance. Results were plotted using R ggplot2 and ggpubrpackages [46][47][48]. Correlations between micronutrients, N and phytate were calculated in R using the Spearman rank correlation (cor.test) or Kruskal-Wallis test for categorical variables [43].
All statistical analyses of phenolic compound levels were performed using Genstat v.16 (VSN International 2020). The levels of the phenolic compounds were analysed using a modi ed two-way ANOVA approach, General Linear Regression. The model applied was replicate by variety.Only comparisons having a F probability < 0.001 were considered as statistically signi cant.The linear relationship between phenolic compound levels were measured using the Pearson correlation coe cient in Excel (Microsoft). Boxplots were generated using R ggplot2 [47].

Comparison of nutritional traits in our of twenty-fourteff varieties
To understand the nutritional potential of 24teffvarieties (Additional le 1) currently grown and consumed in Ethiopiawe measured the levels ofa number of nutritional traits, comparing the relationship between these traits, and using genomic data to look at the genetic relationship between these 24 teffvarieties relative to these nutritional traits.The nutritional traits included elemental micronutrient ( Fig. 1&Additional le 2), nitrogen as a proxy for protein content (Fig. 2), phytate ( Fig. 2) anda range of phenolic compounds (Additional le 3).
Signi cant variation between the 24teffvarietieswas found for all micronutrients tested (Additional le2) (p.val<0.001). Focus on elements essential for human health, Zn and Fe (Figure 1), showed Zn our concentrations to range from 14.8 (var Etsub) to 29.2 mg/kg (var. Heber-1) and Fe to range from 22.6 (var Abay) to 684.25 mg/kg (var Yilmana) (Figure 1A& B). Positive correlations were found betweenZn andCa, Mg, P, S, and to a lesser extent K, while negative correlations were found with Cu, Fe, Mo and Se. A positive correlation was also seen between Zn and Cd(Suppl. Figure 1). No correlations were found between Pb or Ti and the other elements measured.
Cd is a toxic micronutrient, detrimental to human health. High levels of Cd were seen in many of the teffsamples tested, the highest levels of Cd beingfound in the variety Wellenkomi, having 40.7 µg/kg. Only one of the teff varieties, var Magna, had Cd levels below the current EU limits for Cd in food products, being below 1 µg/kg (1ppm) [49].
The Nlevels within the tef ours were measured as a proxy for protein content [50].Signi cant differences were found between the teff varieties, with N values ranging from 1.3 g/100g in var. Ambo Toketo nearly 2 g/100g in var. Dagan teff(pval< 0.001) (Figure 2A). Using the standard conversion of 5.95 this would give protein content in the range of7.79% to 11.71% [51].Apositive correlation was found between Zn content and N levels in the ours tested (R = 0.59; p val. 0.01; Figure 2B) but not for Fe and N(R= 0.0096; p val 0.97; Figure 2C).
Signi cant differences werefound in phytate levels between the 24 teff varieties(p val< 0.01)( Figure 2D). The phytate levels ranged from 0.83 g/100g in var. Magna to 2.56 g/100g in var. Abay.No signi cant correlations were observed between phytate and Zn(R 0.042p val. 0.84; Figure 2E), but a signi cant negative correlation was found between Fe and phytate (R -0.54 p val. 0.007; Figure 2F). There was no signi cant correlation between overall P levels,measured via ICP-MS,and phytate (R=0.29; p val. 0.18).
The levels of t-cinnamic acid ranged from 1.51 µg/g in var. Baset to 24.65 µg/g in var. Dukem.Cinnamic acid is a precursor of ferulic acid, which in turn gives rise to vanillic acid. A strong positive correlation was found between cinnamic and ferulic acids, and a lesser positive correlation with vanillic acid. However, the levels of vanillic acid were low in all 24 teff varieties (Additional le 3, Additional le 4).
Protocatechuic acid and gallic acid are synthesised from a side branch of the shikimate pathway that leads to the synthesis of folates and aromatic amino acids, including phenylalanine (Figure 3;[53]). The levels of protocatechuic acid ranged from 6.78µg/g (var.Areka-1) to 44.78µg/g (var. Felagot) (Additional le 3). The levels of gallic acid were low in all 24 teff varieties, ranging from 0.12µg/g (var. Dima) to 0.97µg/g (var. Ambo take (Figure 4).
Pearson's correlation analyses were undertaken on the levels of each phenolic compound, phytate and Fe (Additional le 4).A strong, positive correlation was seen between cinnamic acid and ferulic acid (r = 0.78), tting with the biochemical pathway (Figure 3), where cinnamic acid is a precursor of ferulic acid. Similarly, a positive association was seen between ferulic acid and vanillic acid (r = 0.51), vanillic acid sitting downstream of ferulic acid. However, positive correlations were also seen between ferulic acid and myricetin (r = 0.61), which may relate to their common precursor cinnamic acid, and between cinnamic acid and quercetin (r = 0.49) and quercetin glycoside (r = 0.50).
The precursors of kaempferol and quercetin, dihydrokaempferol and dihydroquercetin, exist in an equilibrium controlled by a avonoid 3'-hydroxylase (F3'H) enzyme which converts dihydrokaempferol to dihydroquercetin. This resulted in anegative correlations between kaempferol and quercetin (r = -0.28) and kaempferol glycoside and quercetin glycoside (r = -0.43); the majority of kaempferol and quercetin being present in the glycosylated state. This was re ected in positive correlations between kaempferol and quercetin glycoside (r = 0.62) and quercetin and kaempferol glycoside (r = 0.45).
Catechin is derived from dihydroquercetin and therefore competes for synthesis with quercetin. Consequently, a negative correlation was observed between the levels of catechin and quercetin (r = -0.32). However, a positive correlation was seen between catechin and quercetin glycoside (r = 0.52). A positive correlation was also observed between catechin and kaempferol (r = 0.57) and a negative correlation with kaempferol glycoside (r = -0.51).
In addition to positive correlations with kaempferol (r = 0.62) and catechin (r = 0.52), positive correlations were observed between quercetin glycoside and other phenolic compounds, including ferulic acid (r = 0.65), myricetin (r = 0.56), cinnamic acid (r = 0.50). These correlations may indicate a positive feedback mechanism operating through the biosynthetic pathways leading to quercetin glycoside synthesis.
A negative correlation was found between gallic acid and salicylic acid (r = -0.43), and between catechin and protocatechuic acid (r = -0.44), while a positive correlation was observed between protocatechuic acid and Fe content (r = 0.40). No signi cant correlations were found between phytate levels and any of the phenolic compounds measured in this study (Additional le 4). To understand how the 24 teffvarieties relateto each other a phylogenetic tree analysis( Figure 5A), a principal component analysis (PCA; Figure 5B), and a structure analysis ( Figure 5C) were carried out using the genome wide SNP data. The phylogenetic tree and PCA separated the 24 teff varieties into 3 groups, however the structure analysis returned two groupings. In general, the teff varieties fell into similar groupings when comparing the phylogenetic tree and PCA, the exception being the varieties Negus and Kora, which fell on the same branch of the tree but in distinct PCA groups.

Genetic variation underlying differences in Zn levels
Zn concentrationswere overlaid on the PCA to see if a relationship between our Zn levels and the teff variety groupings could be identi ed ( Figure 5B).As no speci c association between genetic grouping of teff varieties and Zn concentration were apparent, we chose to look at the genetic variation in speci c gene families involved in Zn transport. Two gene families were selected. The ZIP (Zinc Iron Permease) family areinvolved in uptake of Zn from the soil and the HMA (Heavy metal associated) family of transportersare involved in movement of Zn from roots to seeds.

Zinc Iron Permeasetransporter family in teff
To identify putative ZIP family members the gene models of the Eragrostisteffvar. Dabbi sequence in Ensembl plants was searched using the PFAM domain PF02535 and Interpro ID IPR003689. This revealed 32 predicted genes in teff which contained theseprotein domains. As teff is a tetraploid species these 32 potential ZIP familymembers is comparable to the15 inArabidopsis and 17 in rice [30,54]. However, some of the predictedgenescould bepseudogenes. Onlysix of the 32 coding regionsidenti ed have a good ATGstart codon, and two of these six putative ZIP transporters lacked a stop codon. However, as this incomplete sequence data could be due to gaps in the reference var. Dabbiteff sequence we used all 32 putative ZIP sequences in subsequent analyses.
Phylogenetic analysis of the translated, amino acid sequences of the 32teffZIP transporterswas performed withmembers of the ZIP families from rice and Arabidopsis. Figure 6).Teff ZIP proteins showed closer linkages with rice ZIP proteins compared to Arabidopsis, both teff and rice being monocots.In most cases there were two teff genes for every rice ZIP gene, which is expected as teff is a tetraploid species, but not all 32 teff ZIP genes showed clear associations with genes found in riceor Arabidopsis, suggesting some divergence.Rice ZIP genes associated with more than two teff genes includedOsZIP8, which was associated with four teffgenes. Only one of the 32 teffZIP genes localized with OsZIP3, while three teffgenes colocalized with OsIRT2, as well as OsZIP10.

Heavy Metal Associated transporter family in teff
To identify putative HMA family members the gene models of the Eragrostisteffvar. Dabbi sequence in Ensembl plants was searched using the Interprodomain IPR027256. Fourteen HMA proteins were identi ed in teff compared to eight in Arabidopsis and nine in rice. Phylogenetic analysis of the translated, amino acid sequences of the 14teffHMA transporters was performed withmembers of the HMA families from rice and Arabidopsis( Figure 7). As 14 teff genes is less than expectedfor this tetraploid species, based on conservation of each of the eight core genes found in both rice and Arabidopsis, the teffDabbi reference sequence maybe incomplete with regards to this family of Zn transporters.In addition, ve of 14predicted HMA genes in teff did not contain an ATG start codon, suggesting that some sequence is missing, or the gene models are incorrect.Four rice HMA proteins, OsHMA2, OsHMA5, OsHMA6 and OsHMA9, have two clear homologues with predicted teff HMA genes, but there does not appear to be any identi ed teff HMA gene aligning with OsHMA3, a major gene involved in Cd tolerance and sequestration [55].
Identi cation of variants in ZIP and HMA proteins Numerous SNPs were found in both the ZIP and HMA tefffamilies of transporters (Additional les6, Additional le7). In the 32 EtZIPgenes a total of 355variants were identi ed in the coding regionsrelative to the reference var. Dabbi,this included frame shift mutations in twelve of the 32 genes(3.4% of the total mutations)(Additional le 6). Most of the variantswere located in an intron (41.4%), followed by synonymousvariants in the coding region (22.5%), andthenby non-synonymous mutations that result in a change in the amino acid sequence (14.6%).Many of the variants were conserved between the varieties, including a frameshift in both homologues of OsZIP9, with most of the varieties containing the mutated/truncated form of each gene.
In EtHMA transporters 298 variants were identi ed relative to var. Dabbi (Additional le 7). Most of the variants were again found in introns or represented synonymous mutations in the coding sequence (48% and 16.4%,respectively).Only two of the 14 teffHMA genes had frame shift variants. This included the predicted gene loci Et_s3091-2.42-1.mrna1 and Et_s3193-0.29-1.mrna1.The third most common variants werenon-synonymous mutations in the protein coding region (12.4%).
Genetic variation underlying differences in phenolic compounds Flavonoid 3'-hydroxylase(F3'H) is a keyenzymein the conversion of dihydrokaempferol to dihydroquercetin [53,56]. To determine whether F3'H in teffwas responsible for the differences seen between teff varieties in kaempferol glycoside and quercetin glycoside levels the amino acid sequence of the rice F3'H gene(Os10g0320100) was used to identify possible orthologs in teff. Using an e-value cut off of 1e-10 ve putative orthologs of OsF3'H were identi ed in the teff reference sequence of var.
Correlations between these SNPs and the levels of kaempferol glycoside and quercetin glycoside in the 24 teff varieties revealed a SNPin Et_s3159-0.29-1.mrna1that was strongly correlated with kaempferol glycoside and quercetin glycoside levels (p val<0.001). This SNP resulted in a T G substitution in the second intron ofEt_s3159-0.29-1.mrna1, and thereforedoes not directly change the coding sequence of the gene (Suppl. Figure 2).Teff varietiescontaining thewild-type (WT) "T" allele in the homozygous state had lower levels of kaempferol glycoside than teff varieties containing the mutant "G" SNP, and visa verse,while varieties with a heterozygous SNP had intermediary levels of the two glycosides (Figure 8). No furtherassociations were found between the other phenolic compounds measured and SNPs in any of the F3'H-type genes in teff.

Discussion
Teff is a staple crop for many Ethiopians, however, due to naturally low Zn levels in soils on which it is grown the grain does not meet the Zn needs of those who consume it [57]. Zn is essential for a healthy immune system and stimulates the activity of many different enzymes. The low level of Zn in teff has therefore led to many Ethiopian children suffering from Zn de ciency [58].Fe is also an essential element, required for normal blood cell function [59], but has a low bioavailability, the small intestine not readily absorbing large amounts of Fe. However, a number of compounds have been shown to in uence Fe absorption in the human gut, including phytate, kaempferol and quercetin [60-62].
To understand the breeding potential of teffforZn and Fe content, as well as several other nutritional factors,grain from 24 teffvarieties were assayed for a range of nutritional traits.Analysis of these 24 varieties showed signi cant variation in all the micronutrients tested, with atwofold difference in the amount of Zn beingobserved. However,even var. Heber-1, with 29.2 mg/kg of Zn, hadlevels below the recommended 40 mg/kg of Zn in our [8]. This may be recti ed by Zn fertilization of the soil, but future work would be required to understand how each variety responds to Zn soil supplements,given the complexity of Zn uptake and transportation to the grain.
No correlation was found between Zn and Fe levels in the grains, and is most likely due to different regulatory mechanisms and transporters involved in the movement of each ion through the plant [16,28]. The lack of identi cation of a homologue ofOsHMA3, a gene associated with Cd tolerance and sequestration in rice, may also suggest an underlying cause for some of the high levels of Cd seen in the 24 tef ours. However, it should also be noted that other putative teffgenes, including a homolog of OsZIP1(Et_s3548-1.55-1.mrna1)which is involved in e ux of excess heavy metals,was one of the twelve teffgenes with a frame shift variant present in all the teffvarieties. This would suggest that multiple genes which sequester Cd in the roots, and keep it away from the grain,may not exist in teffor are non-functional in the modern teff varietiesassayed in this study [63].
Sequencing of the24 teffvarieties enabled the identi cation ofseveralvariants which could be used to develop markers for future breeding. Overall, there was a large amount of genetic variation between the varieties compared to the reference var.Dabbi. Although most of these variants were in non-coding regions or synonymous mutations, as seen in the analysis of the ZIP and HMA transporter families.The variation observed is not unexpectedas similar levels of variation have been observed in a core set of ricevarieties which were recently sequenced [64].
In rice, double mutants of OsZIP5 and OsZIP9have shown severe Zn de ciency symptoms, suggesting these two genes are the major route for Zn into the rice plant [21,22].However, mutants of either gene in isolation does not show major effects on Zn uptake. Mutations in two homologues of OsZIP9,including frame shifts,as well as a frame shift variant in a homologue of OsZIP5 in some varieties, would suggest that the route of Zn into teff, while compromised by these mutations, is not wholly dependent on the homologues of OsZIP5 and OsZIP9. Other transportersmaybe the major route for Zn intoteff plants. These other routes might also contribute to the high levels of Cd seen in many of the teff varieties. It has been shown that genes involved in Mn and Fe uptake can also transport Cd in vivo whereas high a nity Zn transporters do not appear to have this capability [15,27,[65][66][67].
Comparison of other nutritional traits,including nitrogen (as a proxy for protein), phytate and a number of phenolic compounds,also showed signi cant variation between the 24 teff varieties.The variation in the phenolicpro les is of importance as some phenolicshave been found to alter Fe bioavailabilityand may therefore present a more effective strategy than Fe forti cation for enhancing nutritional outcomes. It is often reported that phytate inhibits Fe bioavailability, so the negative correlation between Fe and phytate levels in teff could prove bene cial. While future research is requiredto determine whether elevated levels of phenolics that stimulate and inhibit Fe uptake in the human gut can help alleviate the anaemia seen in Ethiopian children [69], the signi cance of the SNP in the teff F3'H gene Et_s3159-0.29-1.mrna1,which explainsa large proportion of the variation in kaempferol glycoside and quercetin glycoside levels, suggest this might be a breeding target to improve Fe bioavailability in teff, as kaempferol glycoside is known to promote Fe absorption while quercetin glycoside inhibits Fe absorption in cell assays [60].In addition, phenolic compounds have also been found to in uence other agronomic traits, including tolerance to both biotic and abiotic stress [14,60,61,68].
With the considerable variation seen for the nutritional traits assessed in this study breeders are in a good position tobreed for enhanced nutritional value in teff. The genomic sequence information collected can be usedto identify and develop markers linkedto target genes and traits.We have yet to test the heritability of these traits and their stability over growing locations and seasons, butmarker-assisted selection, particularly within target genes,can now provide a feasible approach to breed for these nutritional traits in teff.

Conclusion
For many subsistence Ethiopian farming families teff is a major crop and source of calories. Yet, as shown in this study, levels of Zn are usually below recommended levels, while levels of Cd exceed EU limits. However, considerable phenotypic and genetic variation for a range of nutritional traits and the genes regulating their levels in planta, exists. This provides considerable potential to determine the relationship between these nutrition phenotypes and identify allelic variants that would allow breeding of new teff varieties with optimal nutritional potential. Permission to collect and analyse the milled seeds/ our of the teff varieties documented in this work was obtained from the Ethiopian government prior to research being conducted and samples being sent.

Abbreviations
Ethics approval and consent to participate: All the methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication:
was written by MM and LB and subsequently revised by all authors. All authors have read and approved the nal manuscript.  Relationship between Zn, Fe, nitrogen and phytate levels in 24 teff varieties. Nitrogen levels (A).
Correlation between Zn and nitrogen content (B). Correlation between Fe and nitrogen content (C).
Phytate levels (D). Correlation between Zn and phytate content (E). Correlation between Fe and phytate content (F).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.