Novel allelic variations in Tannin1 and Tannin2 contribute to tannin absence in sorghum

Sorghum is an important food crop commonly used for brewing, feed, and bioenergy. Certain genotypes of sorghum contain high concentration of condensed tannins in seeds, which are bene�cial such as protecting grains from herbivore bird pests but also impair grain quality and digestibility. Previously, we identi�ed Tannin1 and Tannin2, each with three recessive causal alleles, regulate tannin absence in sorghum. In this study, via characterizing 421 sorghum accessions, we further identi�ed three novel recessive alleles from these two genes. The tan1-d allele contains a 12-bp deletion at position 659 and the tan1-eallele contains a 10-bp deletion at position 771 in Tannin1. The tan2-dallele contains a C-to-T transition, which results in a premature stop codon before the bHLH domain in Tannin2, was predominantly selected in China. We further developedKASP assays targeted these identi�ed recessive alleles to e�ciently genotype large population. These study provided new insights in sorghum domestication and convenient tool for breeding program.


Introduction
Sorghum (Sorghum bicolor L. Moench), domesticated in Africa (Rooney and Waniska 2000), is one of the leading cereal crops worldwide due to its adaptability to diverse regions and versatile end uses as food, feedstock, beverages, and brooms (Hao et al. 2021; Boyles et al. 2019).In recent years, sweet sorghum has been implicated as a potential source for biofuel production for clear energy (Xin et al. 2021).In addition, sorghum is a rich source of bioactive phenolic compounds, most notably 3deoxyanthocyanidins and condensed tannins, compared to other common cereal crops (Xiong et al. 2019).
Condensed tannins were identi ed in sorghum genotypes with pigmented testa.Condensed tannins, due to effective radical scavenging ability (Dykes and Rooney 2007), conferred many bene cial attributes such as UV protection, pathogens defense, and increased plant health.Unlike other major crops, in which tannins have essentially been removed by strong arti cial selection, both tannin and nontannin sorghum exists in domesticated varieties (Wu et al. 2012).In Africa and Asia, the utilization of sorghum for food and alcoholic beverages is dependent on tannin in the grain.On the other side, tannins in sorghum can be anti-nutritional as they can combine with proteins, carbohydrates, and minerals which decrease feedstock nutritional quality.As such, almost all sorghum cultivars planted for feedstock in the United States and Australia are nontannin types (de Morais Cardoso et al. 2017).
Tannin presence is regulated by two major loci (B1 and B2) with a duplicate recessive interaction.A lostof-function mutation in either of gene cause the absence of condensed tannins.For genotypes with high tannin levels, tannins can also be found in the pericarp due to an additional dominant locus S (Earp et al. 2004;Dykes et al. 2005;Rhodes et al. 2014).The genes underlying B1 and B2 have been identi ed as Tannin1 (Tan1, Sobic.004G280800)encoding a WD40 protein, and Tannin2 (Tan2, Sobic.002G076600)encoding a bHLH transcription factor.A total of six recessive alleles, three from each gene, had been uncovered in natural population (Wu et al. 2012(Wu et al. , 2019)).The coevolutionary relationships among sorghum, human, and herbivore pest bird suggested a bidirectional selection for both tannin and nontannin in Africa during domestication (Wu et al. 2019).
Here, we characterized the genetic basis of tannin presence and absence with a large sorghum population comprising many landraces collected from China and India.We identi ed three new recessive alleles contributed to nontannin phenotype with one (tan2-d) was speci cally selected in China.We further developed Kompetitive Allele-Speci c Polymerase assays (KASP) for all 9 recessive alleles to facilitate the screening of these alleles in sorghum germplasm that can be utilized in marker assisted selection in sorghum breeding.

Plant materials
The 421 diverse sorghum accessions were compiled with germplasm collected by the Center for Crop Germplasm Resources, Institute of Crop Science Chinese Academy of Agricultural Science (http://www.cgris.net/),or Crop Research Institute, Shandong Academy of Agricultural Science (Supplementary Table S1).A small subset of accessions with known Tan1 and Tan2 alleles was used for the development of KASP (Supplementary Table S2).Three F 2 populations were developed from crosses of M6693×AiJiaoNuo, Tx615B×Jin5-0, and GA1×DaLuoChui to evaluate these KASP assays.

Phenotyping for tannin presence and tannin content measurement in sorghum grain
The presence of tannins in sorghum grain was determined by the bleach test and visual assessment (5-10 seeds per accession) after removing the dorsal-side pericarp (Dykes et al. 2019;Morris et al. 2013).
Tannin content was determined using a high-throughput 96-well HCl-vanillin assay (Herald et al. 2014).10 kernels from each accession were ground into a ne powder using a Tissue Grinder (Coyote Bio G200, China).Approximately 0.1 g of sorghum our was extracted in 1 mL 70% acetone solution at 50°C for 60 min.Three replications were performed for each sample.The extracts were centrifuged at 8000 rpm for 4 min and 40 µL of supernatant was pipetted into a 96-well plate.200 µL of vanillin reagent (1:1 ratio (V/V) of 1.0% vanillin in methanol: 30% HCl in methanol) was added to each sample.Plates were incubated for 30 min at 30℃.Tannin content was read using a microplate reader (CLARIO star Plus, Germany) with methanol as a blank.The catechin standard curve was developed using a range of 0-1.0 mg mL −1 .

DNA extraction, sequencing, and data analysis
Seeds from each accession were grown on trays in a green house at 23 °C on a 16hr/8hr day/night cycle.At the third leaf stage, leaf tissue was collected in 96-deep well plates, freeze dried for 3 d (Scientz-18N, China), and ground to a ne powder using a Tissue Grinder (Coyote Bio G200, China).DNA was extracted using a modi ed cetyltrimethylammonium bromide (CTAB) protocol.Gene-speci c primers based on the reference genome sequence of Tx623 were designed using software Primer 3 (Primer3 Input version 0.4.0).Sanger sequencing reads were analyzed using DNASTAR (http://www.dnastar.com).

Characterization of tannin phenotypes in sorghum grains
We assessed the presence of tannins in a diverse panel 421 sorghum accessions from Asia and the North America (Supplementary Table S1).Of these accessions, 236 had tannins in grain, indicated by the blackred stained after bleach solution, For the tannin containing accessions, tannin levels ranged from 0.11-2.46mg/100 mg, with a mean of 0.86 mg/100 mg (Supplementary Table S1).
We then screened the 421accessions with the six KASP assays (Supplementary Table S4).We detected ve recessive alleles, except tan2-c, were segregating.As expected, all 236 tannin accessions were wildtype at all ve alleles.We found that 150 nontannin accessions contained a recessive allele in either Tan1 or Tan2, while 9 accessions carrying recessive alleles at both genes.In this panel, tan1-b (34.6%) and tan1-c (29.7%) were the most frequent allele.A total of 26 nontannin accessions were wildtype at all six sites in Tan1 and Tan2, which suggested that additional recessive alleles in these two genes that may be responsible for the tannin absence (Supplementary Fig. 1).

Novel recessive alleles in Tannin1 and Tannin2
To identify these potential new causal variants, we sequenced the coding regions of Tan1 and Tan2 across the 26 nontannin accessions.In total, we identi ed three novel recessive alleles, tan1-d, tan1-e, and tan2-d.The causal mutation at tan1-d, only identi ed in GanShuaZao, is a 12-bp (TCGTCTACGAGA) deletion at position 659 nt (220 aa), which results in a four amino acid deletion between the second and third WD40 repeat domain in Tan1.The tan1-e allele identi ed in two accessions is a 10-bp (CGACATACGT) deletion at position 771 nt (257 aa), which results in a frame shift at the end of the third WD40 repeat domain that introduces a premature stop at the fourth WD40 repeat domain and causes a 58 amino acid truncation of the C-terminal region (Fig. 2A).The predicted protein structures of both tan1d and tan1-e indicate there is a disruption of WD40 protein structure compared with wild-type Tan1 (Fig. 2A).The tan2-d allele, present in all other 24 accessions, contains a C-to-T transition at position 7923 nt (456 aa) of Tan2 gene.The C-to-T transition introduces a premature stop codon just before the bHLH domain, which results in a loss of 222 amino acid residues including the whole bHLH domain (Fig. 2B).
To facilitate the screening of these novel alleles in marker-assisted selection and breeding, we developed KASP assays for tan1-d, tan1-e, and tan2-d.Like our previous KASP analyses (Fig. 1B), there was clear separation between wild-type and mutant alleles in all three assays (Fig. 1A&B).

Table 1 Haplotypes of
GA1×DaLuoChui, P = 0.68; AiJiaoNuo×M-6693, P = 0.83).The KASP assays indicated that Tx615B, GA1 and M-6693 carry the tan1-b, tan1-a, and tan2-d allele, respectively, whereas Jin5-0, DaLuoChui, and AiJiaoNuo were all wildtype at both Tan1 and Tan2.We next genotyped 123 (Tx615B×Jin5-0), 125 (GA1×DaLuoChui), and 120 offsprings (M-6693×AiJiaoNuo ) using KASP-tan1-b, KASP-tan1-a, and KASP-tan2-d, respectively.All three populations segregated 1:2:1 and heterozygous individuals were clearly distinguishable from both homozygous wild-type and mutant samples (Fig. 4).These results indicate that Tan1and Tan2 KASP assays can be used for marker-assisted selection of tannin or nontannin accessions at early breeding stages.).With these platforms, numerous genetic loci/genes and variants have been identi ed by biparental QTL mapping, GWAS, RNA-seq, and genome resequencing (Adeyanju et al. 2015).However, molecular breeding in sorghum still lags behind that of other crops such as rice and maize, and the knowledge of causal alleles related to important traits and their application in breeding programs have been restricted to traits such as waxy starch, owering time, photo sensitivity, height, and brown midrib (Lu et al. 2013;Childs et al.1997;Gloria et al. 2019).Tannins, one of the most investigated phenolic compounds in sorghum, have intriguing functions like agronomic advantages, human health bene ts, and anti-nutritional effects on feed stock.Sorghum tannin content is highly variable and has been largely shaped by environment and human demands.Tannin content is regulated by Tan1 and Tan2.Previously, we identi ed three loss-of function alleles in each of Tan1 and Tan2 that regulate tannin presence in sorghum grain.In this study, we identi ed three novel causal alleles, tan1-d, tan1-e, and tan2-d that result in nontannin phenotypes in some germplasms that are predominantly from Asia.

Application of developed KASP markers in sorghum
We developed SNP KASP assays to genotype nine recessive alleles that all result in nontannin phenotypes.The assays for the three alleles (tan1-a, tan1-b, and tan2-d) were further tested in three F 2 populations.The success of the KASP assays to distinguish tannin and nontannin cultivars and characterize haplotypes indicates they can be incorporated into breeding programs to develop sorghum cultivars with different tannin content for diverse enduses.

Conclusions
This study identi ed three novel loss-of-function alleles associated with tannin presence in sorghum.KASP assays for the six previously identi ed Tan1 and Tan2 loss-of-function alleles and the three novel alleles were developed and utilized to conduct a haplotype analysis of a diverse sorghum population.The assays revealed natural allelic variations associated with the absence of tannins.The techniques and ndings of this study provide tools for tannin germplasm screening and sorghum molecular breeding.

Figure 3 (
Figure 3 Novel Tan1 and Tan2 recessive alleles contributing to tannin absence Due to its diverse types and broad adaptations, sorghum is grown for versatile enduses such as for food, feedstock, fuel, and ber(Boyles et al. 2019).Unlike commercial crops primarily grown for food or feedstock, modi cations of sorghum need to consider these enduses(Hao etal.2021).Recently, extensive genomic resources such as diverse germplasm panels, mutant populations, bioinformatic datasets, high-throughput phenotyping platforms, genome editing, and DNA marker technologies have advanced rapidly (Xin et al. 2021; Liu et al. 2019 Char et al. 2020 Zhang et al. 2023 Gladman et al. 2022 Tao et al. 2021 Gloria et al. 2019; Silva et al. 2022 (Wu et al. 2012;2019);He et al. 2008;Yu et al. 2023) and regulatory networks involved in plant tannin metabolism have progressed rapidly, mainly through the analysis of seed-coat-color mutants of Arabidopsis and barley(Abrahams et al. 2002;He et al. 2008;Yu et al. 2023).In contrast, the molecular mechanism and signal network of tannin remain unclear.These studies are hindered by the low levels of tannin in major cultivars for most commercial crops.Both tannin and nontannin sorghum have been selected globally to meet various human needs, and as such, the development of tannin or tannin-free cultivars is a key objective for sorghum breeding.While the molecular mechanism and signal network of sorghum tannin remains unclear, the genes Tannin1 and Tannin2, identi ed recently that regulate tannin presence in sorghum grain with duplicate recessive epistasis(Wu et al. 2012;2019)are promising targets to manipulate tannin levels.Here, our development of 9 KASP SNP markers speci c to Tannin1 and Tannin2 should facilitate deployment in sorghum tannin breeding programs.