Genome wide association analysis of acid detergent fiber content of 206 forage sorghum (Sorghum bicolor (L.) Moench) accessions

Forage sorghum (Sorghum bicolor (L.) Moench) is a C4 cereal crop with excellent quality, which is widely cultivated in many countries and regions. Acid detergent fiber is widely found in the stem and leaf tissues of plant, which is the main substance affecting the digestibility of livestock. It has a very important impact on the milk yield and quality of cows, dairy sheep and pigs. In order to further understand its genetic mechanism, we re-sequenced 206 forage sorghum germplasms from different regions of the world and identified 14,570,430 SNPs and 1,967,033 indels. Based on SNP markers, we analyzed the population genetic structure and identified the gene loci related to acid detergent fiber content by genome-wide association analysis (GWAS). Genetic relationship between materials showed that Asian and American sorghum varieties, breeding lines and improved varieties were more diverse, while European varieties were relatively more closely related. These findings provide new clues and directions for further study. GWAS revealed that 8 QTLs harboring 91 genes were found to be associated with acid detergent fiber content. These genes were significantly enriched into 6 major genes involved in cell membrane material transport or enzymes, showing a regional distribution. The findings provide a basis for us to understand the origin and spread of haplotypes related to acid detergent fiber content. The findings will accelerate the study of genetic gain of acid detergent fiber content and help breeders to improve feed quality and stress tolerance of forage sorghum.


3
Vol:. (1234567890) Research Council 1996;Vermerris 2011;Liu et al. 2018;Mundia et al. 2019). Forage sorghum is mainly used as fresh chop (Shoemaker and Bransby 2010), hay and silage for livestock, and is also widely used in the field of fishery breeding (Ping et al. 2018). In recent years, many scientists have made efforts to breed multi-purpose forage sorghum varieties or hybrids with good stem quality, high genetic stability, high sugar content, high dry matter yield and low hydrocyanic acid content by using traditional and biotechnological methods (Ping et al.2018).
The content of acid detergent fiber (ADF) is one of the main indexes to evaluate dietary fiber (Yu et al. 2017;Sara et al.2019;Miranda et al. 2020), the effect of acid detergent fiber concentration on nutrient digestibility and nitrogen balance were huge. In addition, the study on the effect of ADF on pigs and cattle feeding showed that the digestibility of conventional nutrients and N-balance of pigs were significantly negatively correlated with ADF level (Yu et al. 2017(Yu et al. , 2018. Adding appropriate content of ADF can improve the production performance of pigs (Zhang et al. 2020;Yu et al. 2018). ADF content in barley (Hordeum vulgare L..) grains is a major index of digestibility, which has a negative impact on feed quality, especially for non-ruminant livestock and poultry (Han et al. 2003).
In recent years, the whole genome deep sequencing of sorghum population has been increasingly used for gene mapping analysis of target traits (Casa et al. 2008;Boddu et al. 2004;Bouche et al. 2017;Mathur et al. 2017). However, compared with other cereal crops such as corn (Ross et al.2017;Jiao et al. 2017;Ma et al. 2021), rice (Verma et al. 2021;Huang et al. 2010), molecular biology research on sorghum quality traits is relatively scarce. Recently, two sweet sorghum restorer lines and one grain sorghum restorer line were re-sequenced and 1500 genes were identified to distinguish the two types of sorghum (Zheng et al. 2011;Morris et al. 2013) analyzed the diversity of a sorghum panel of 971 accessions genotyped by genotyping-by-sequencing (GBS) and several typical plant height and inflorescence structure loci and genes were mapped by GWAS. Genetic diversity of 48 forage sorghum germplasm resources were analyzed by 109 SSRs (Ping et al. 2018). Zhang et al. (2018) analyzed the narrow sense genetic diversity of 21 introduced sorghum BMR male sterile lines based on SRAP. In order to further analyze the genetic diversity of forage sorghum, we re-sequenced 206 forage sorghum accessions from different geographical regions to explore the key genes related to acid detergent fiber content of forage sorghum by GWAS.

Plant materials
We selected 206 forage sorghum, including 30 pairs of male sterile lines and 146 restorer lines. This includes 89 Chinese core collection accessions, 4 landraces, and 113 other accessions from different sorghum-producing nations and areas, representing the genetic, geographic, and morphological diversity of forage sorghum. These materials are also the core breeding materials of our research team.
All 206 forage sorghum accessions were planted in Dongbai experimental station of Jinzhong Shanxi China, from 2016 to 2018. We used a randomized block design with three replications. Plots were 5 m in length and 2.6 m in width 6rows per plot. At the end of August every year, 206 forage sorghum whole plants were harvested, each whole plant was chopped into 2-3 cm fragments according to the number and take 500 g for each variety in Parchment bags. The samples were first dried at 105℃for 1 h, then lower the oven temperature to 65℃ and continue to bake for 72 h, at the end of all procedures, the dried samples were sent to Fine Tune Bio Technology Limited for quality traits analyzed. We tested 19 agronomic traits and 18 quality traits of 206 forage sorghum samples, including acid detergent fiber.

Determination of ADF content
NIRS technology was used to analyze the nutritional components of forage. Take 100-g of each sample to be tested, further crush it with a cyclone mill, pass through a 1 mm sieve, and conduct spectral scanning with FOSS 5000 near infrared analyzer (FOSS company, Denmark) (the working parameters are: wavelength range: 1100-2500 nm, scanning times: 32, spectral interval: 2 nm), Based on the near infrared rapid detection model of Alfalfa from Cumberland Valley analytical services (CVAs), the dry matter 1 3 Vol.: (0123456789) (DM) and acid detergent fiber (ADF) in the samples were obtained.
DNA isolation and genome sequencing DNA was extracted from forage sorghum by CTAB method (Murray and Thompson 1980) for genome sequencing on the Illumina HiSeq system (Personal Bio, Shanghai, China). Briefly, after the quality assessment, the genomic DNA was separated by ultrasonic fragmentation and electrophoresis, and then the DNA fragment of the desired length (400 bp) was purified by gel. The library with 400 inserts was constructed and sequenced by next generation sequencing (NGS) based on illumine HiSeq.

Population genetics analysis and GWAS
Based on SNPs among populations and linkage disequilibrium, markers or candidate genes closely related to target traits were identified by correlation analysis of molecular markers and phenotypes (analysis software EMMAX) Kang et al., 2010). 100 kb before and after the point with-log10 (P value) > 6 was used as the threshold for all binary-like quantitative traits and 91 genes were included in the candidate region significantly associated with ADF. Principal component analysis (PCA) was performed by R package.
Genome wide linkage disequilibrium (r 2 ) values of all materials and each subgroup were calculated by TAS-SEL 3.0 (Bradbury et al., 2007).

SNPs and Indels identified by re-sequencing
Genome wide re-sequencing (WGRS) was performed on 206 samples. These included 89 Chinese core collections, 4 local varieties and 113 other collections from different countries and regions, representing the genetic, geographical and morphological diversity of forage sorghum.
After comparing the reference genomes and the SNP calling, we identified 14,570,430 SNPs with MAF ≥ 0.05 and 1,967,033 indels. In principal component analysis, based on all collected SNPs, the first and second principal coordinates (PC) explained 13.16% and 4.78% of SNPs molecular variance, respectively (Fig. 1a). Genetic relationship of 206 sorghum reflected the diversity of distribution, however the genetic diversity of European varieties is relatively scarce compared with other regions (Fig. 1b).
All materials can be divided into two categories by NJ tree analysis (Fig. 1c). Group I mainly includes most American accessions and group II includes sorghum breeding lines from all other regions. Accessions from all breeding lines were mainly but not completely divided into several subgroups or minor groups, which indicated that the genetic mixture of breeding lines and local varieties was widespread in sorghum breeding.
Linkage disequilibrium analysis SNP data were used to analyze the genome-wide linkage disequilibrium patterns of Asian and American, Asian × American, Asian × Australian and European germplasms, respectively. In the same linkage group, the speed of LD decay can indicate whether the group is selected or not to some extent. In general, LD of wild population decreased faster than that of domesticated population, and LD of cross-pollinating crops decreased faster than that of self-pollinating crops. For example, maize: local variety 1 KB, inbred line 2 KB, commercial inbred line 100 kb. In Asia, America, Asia × America, Asia × Australia and Europe 1 3 Vol:. (1234567890) Fig. 1 Genetic structure of 206 sorghum accessions using SNPs detected in whole-genome resequencing data. a PCA, b genetic relatedness using SNPs detected in whole-genome resequencing data, c NJ tree constructed from simple matching distance of all SNPs, and d LD decay subgroups, the decay rates are very different, which may lead to different resolution of association mapping in different genome regions. The LD decay rate of European varieties was the west, which may be due to self-fertilization. (Fig. 1d).

ADF content variation analysis between haplotypes
For further understanding the origin and development of haplotypes, the geographical distribution of SNPs of 8 QTLs in 206 accessions was analyzed (Fig. 3). Allelic distribution for C2_S6746476 show that the 5 accessions contain all haplotypes from AA to TT, ADF content of America × Asia was significantly lower than other accessions, except in genotype of AA,CC,GG,TT. Europe accessions was the opposite (Fig. 3a). For C5_S5473787 genotype of AA, CC, GG, TT, were still high. But among the four genotypes, European sorghum had the lowest ADF content. However, among the other genotypes, European sorghum had the highest ADF content. Overall, ADF content of different genotypes was similar (Fig. 3b). For C6_S60364789 the content of ADF in Asia × Australia sorghum is relatively wide. There were varieties with high and low ADF content. Even some genotypes were higher than Europe sorghum. America and America × Asia sorghum had the lowest ADF content (Fig. 3c). For C7_ S60623290 the ADF content of all the haplotypes was similar, In AA, GG, CC and TT haplotypes, the ADF content of European sorghum was still the lowest, but in other haplotypes, the ADF content of Asia × Australia sorghum breeding lines were higher than that of European sorghum and became the highest among the five sorghum varieties (Fig. 3d). For C8_S56789460 All haplotypes were still classified into two groups: AA, CC, GG, TT and others, However, the acid detergent fiber content of European sorghum is no longer the lowest in any haplotype, The content of ADF in Asia × Australia haplotypes showed little difference. (Fig. 3e). For C9_S40246128 Haplotypes AA, CC, GG and TT have almost the same content of sorghum acid detergent fiber in all regions, but AA and TT haplotypes hardly contain European sorghum (Fig. 3f).

Genetic collinearity analysis
The 8 candidate regions of ADF content showed about 200,000 SNPs on chromosomes 2, 5, 6, 7, 8 and 9 (Table 1), including 91 genes. The results of collinearity analysis showed that 44 of 91 ADF related loci were highly collinearity with previously reported gene families (Table 2). Sorbi_3006G272600 is highly collinear with Osaae3 (Os04g0683700) (Kikuchi et al. 2003) in rice, which regulates rice blast resistance, floret development and ADF biosynthesis.

Discussion
Genetic improvement is an important way to improve crop quality traits. However, compared with rice (Huang et al. 2010) and maize (Jiao et al. 2014), the genetic resources used to improve forage sorghum quality traits are very limited. It is of great significance to evaluate and improve the existing germplasm resources by studying the allelic variation of important quality traits of forage sorghum, such as biological yield and edible part quality (acid detergent fiber content) (Zheng et al. 2011;Ordonia et al. 2016) In this study, we characterized 14,570,430 SNPs and 1,967,033 indels, which created a valuable resource for both future gene-phenotype research and molecular breeding of Sorghum bicolor. The population structure and genetic relationship of 206 forage sorghum were analyzed by whole genome SNP, which will help us to grasp the direction of forage sorghum breeding. The results showed that the genetic diversity of forage sorghum accessions from Asia and America was richer, and the sorghum from Europe was relatively limited, which provided a clue for us to preliminarily determine the origin of breeding lines.
The results of genetic research on sorghum molecular biology are increasing year by year, which is bound to increase our understanding of allele variation in sorghum genetic resources. In total, we identified 8 QTLs related to ADF content and observed the 1 3 Vol:. (1234567890) Fig. 3 The geographic distribution of alleles and ADF content variation among different haplotypes in each QTL.
Vol.: (0123456789) characteristics of multiple alleles of ADF content in sorghum varieties from different regions (Fig. 3a-f). All the genotypes can be divided into two groups. The ADF content of varieties with AA, CC, GG and TT haplotypes were significantly higher than that of other haplotypes. However, among the 4 haplotypes, the ADF content of forage sorghum varieties in Europe are the lowest (Fig. 3a-f). From another point of view, among all haplotypes except AA, CC, GG and TT, the ADF content of European forage sorghum is the highest, and no matter which haplotype, the change of ADF content of European sorghum is not obvious. We speculate that the ADF content of forage sorghum in other regions has been reduced through extensive genetic improvement, while the ADF content of European sorghum is the highest due to its small number of samples and low degree of genetic improvement. This shows that the level of ADF content is not determined by QTL. Alleles of ADF content help to reduce genetic diversity and be found in specific genomic regions. We revealed that 44 of 91 genes related to ADF content in sorghum were highly collinear with those previously reported in rice and maize (Table 2). SORBI_3006G272600 is highly collinear with OsAAE3 (Os04g0683700) (Kikuchi et al.2003) in rice, which regulates rice blast resistance, floret development, and ADF biosynthesis. This information will guide breeders to develop more   Putative vesicle-associated membrane protein 725 SORBI_3002G068600 Os07g0194500 Probable prolyl 4-hydroxylase 6 SORBI_3002G068800 Os07g0194800 Hydroxyproline O-galactosyltransferase GALT6 SORBI_3002G068900 Os07g0195100 Ras-related protein RABE1c SORBI_3002G069000 Os12g0265400 Glycosyl transferase, family 31 domain containing protein SORBI_3002G069000 Os07g0195200 Hydroxyproline O-galactosyltransferase GALT6 SORBI_3005G054800 Os11g0169700 Aldehyde oxidase GLOX SORBI_3005G055000 Os11g0169900 V-type proton ATPase 16 kDa proteolipid subunit-like SORBI_3005G055300 Os03g0269100 Tropinone reductase homolog At5g06060 SORBI_3005G055400 Os11g0170000 Similar to Amidase family protein, expressed SORBI_3005G055600 Os11g0170200 Protein of unknown function DUF869, plant family protein SORBI_3006G272200 Os04g0683100 Pre-mRNA cleavage factor Im 25 kDa subunit 2 SORBI_3006G272400 Os04g0683400 Nuclear transcription factor Y subunit C-2 SORBI_3006G272600 Os04g0683700 Oxalate-CoA ligase SORBI_3006G272700 Os10g0133166 Putative disease resistance protein RGA3 SORBI_3006G272800 Os04g0683800 FT-interacting protein 1 SORBI_3006G273000 Os04g0683900 AT-hook motif nuclear-localized protein 9 SORBI_3006G273400 Os04g0684500 White stripe leaf5 SORBI_3006G273700 Os04g0684800 Ubiquitin-conjugating enzyme E2 variant 1C SORBI_3006G273800 Os04g0684900 Probable CCR4-associated factor 1 homolog 11 SORBI_3006G274000 Os04g0685000 Transcription repressor OFP13 SORBI_3006G274200 Os04g0685200 Hypothetical protein SORBI_3006G274300 Os04g0685300 NDR1/HIN1-like protein 2 SORBI_3007G171000 Os08g0557600 Monodehydroascorbate reductase 4, cytosolic-like SORBI_3007G171400 Os08g0557400 Putative low molecular weight protein-tyrosine-phosphatase slr0328 SORBI_3007G171500 Os05g0301700 Cytochrome c1-2, heme protein, mitochondrial SORBI_3007G171600 Os08g0557200 Manganese-dependent ADP-ribose/CDP-alcohol diphosphatase-like SORBI_3007G171800 Os08g0557100 28 kDa ribonucleoprotein, chloroplastic SORBI_3007G172100 Os08g0556900 KDEL-tailed cysteine endopeptidase CEP1 SORBI_3007G172300 Os08g0556600 NADH dehydrogenase [ubiquinone] iron-sulfur protein 5-B SORBI_3007G172400 Os08g0556550 Hypothetical conserved gene SORBI_3007G172500 Os08g0556400 Probable protein S-acyltransferase 19 SORBI_3007G172600 Os08g0556200 Similar to Dihydroneopterin aldolase SORBI_3007G172700 Os08g0556000 Similar to YTH domain protein 2 SORBI_3007G172800 Os08g0555700 Zinc finger protein 2 SORBI_3007G172900 Os08g0555800 Protein MULTIPOLAR SPINDLE 1 SORBI_3007G173000 Os08g0555200 Nonaspanin (TM9SF) family protein SORBI_3007G173000 Os08g0554900 Transmembrane 9 superfamily member 8 SORBI_3007G173000 Os08g0555000 Transmembrane 9 superfamily protein member 2 precursor (p76) SORBI_3002G068100 Zm00001d018852 DDT domain-containing protein DDR4 SORBI_3002G068100 Zm00001d039545 DDT domain-containing protein DDR4 SORBI_3002G068300 Zm00001d007767 Vesicle-associated membrane protein 722 SORBI_3002G068300 Zm00001d019091 Vesicle-associated membrane protein 722 SORBI_3002G068800 Zm00001d007764 Hydroxyproline O-galactosyltransferase GALT4 SORBI_3002G068900 Zm00001d019093 Small GTP-binding protein 1 3 Vol.: (0123456789) accurate breeding strategies and further improve the quality of sorghum varieties. Forage sorghum has the characteristics of high biomass, good stress resistance, juicy stem, high nutrient utilization efficiency, and is one of the main biomass raw materials for animal husbandry. The results showed that ADF affected the absorption of nutrients in feed by cattle, sheep, pigs and other livestock. The reasonable proportion of ADF in feed could improve the nutrient absorption rate of feed animals, and thus improve meat quality (Zhang et al, 2020;Yu et al.2018). In forage sorghum breeding, we hope to select varieties with moderate acid detergent fiber is desired. The MTAs and materials with ideal alleles found in this study will help to accelerate the improvement of forage sorghum varieties and hybrids and make the ADF content reach the ideal range. In order to better understand the potential regulatory mechanism of ADF content, further studies are needed to identify these and other MTAs related genes, which will further encourage us to speed up the breeding process of new forage sorghum varieties. In conclusion, the genomic data of forage sorghum collected all over the world, the inferences from the analysis and the MTAs related to agronomic traits provide valuable resources for promoting forage sorghum variety improvement. The study of candidate genes related to ADF content lays a foundation for the potential application of functional genomics methods in verifying the function and role of candidate genes in determining ADF content.
Authors' contributions HN is mainly responsible for the implementation of experimental design, data analysis and paper writing. YH helped to design the experimental scheme, JP provided financial support, and YW, XL and JC cooperated with HN in field investigation and sampling. All authors have read the manuscript and agree with all the conclusions and opinions.