One Way ANOVA analysis of eight morphological traits of grain
Morphology of grains and spikes in the three types of barley were shown in Fig. 1B and Fig. 1C, respectively. Both Wb-NE and Wb-T had brittle rachis while Cb-C had non-brittle rachis (Fig. 1C). One way ANOVA analysis of eight morphological traits of grain showed that Wb-NE had the highest grain perimeter (27.56 mm), the highest length-width ratio (4.05), the highest grain length (12.04 mm), the lowest grain width (3.00 mm), the lowest roundness of grain (0.25) and the lowest density factor (0.90 mg/mm2), compared with these traits in other two populations (Table 1). Grain area and grain diameter didn’t showed significant difference among the three populations (Table 1).
Table 1
Eight morphological traits of grain in three barley populations.
Populations
|
Grain area
(mm2)
|
Grain perimeter
(mm)
|
length-width ratio
|
Grain length
(mm)
|
Grain width
(mm)
|
Roundness of grain
|
Density factor
(mg/mm2)
|
Grain diameter
(mm)
|
Wb-NE
|
26.20a
|
27.56a
|
4.05a
|
12.04a
|
3.00b
|
0.25c
|
0.90b
|
5.74a
|
Wb-T
|
24.70a
|
24.66b
|
3.20b
|
10.51b
|
3.30a
|
0.31b
|
1.10a
|
5.57a
|
Cb-C
|
22.62a
|
22.05b
|
2.78b
|
9.20c
|
3.34a
|
0.35a
|
1.12a
|
5.35a
|
Wb-NE, wild barley of the Near East Fertile Crescent. Wb-T, wild barley of Tibetan Plateau. Cb-C, cultivated barley of China. |
Genetic diversity and evolutionary relationships of the three populations
RNA sequencing of 29 barley accessions generated a total of 185.60 gigabase (Gb) of high-quality cleaned sequences, with an average of 6.19 Gb per accession (Supplementary Data 1) and revealed 1,182,394 raw SNPs. After stringent quality filtering, we obtained 176,741 high-quality SNPs for subsequent analyses.
The results of genetic diversity analysis for the three populations were summarized in Table 2. The highest nucleotide diversity (π = 0.23036), highest Watterson’s estimator (θW = 0.21255), highest minor allele frequency (MAF = 0.1745) and greatest polymorphism information content (PIC =0.2176) were observed in Wb-NE, followed by the Wb-T population with π = 0.18906, θW = 0.15944, MAF = 0.1453 and PIC = 0.1796 (Table 2). The lowest in those indexes were found in Cb-C (Table 2).
Table 2
Estimate of genetic diversity per base pair.
Populations
|
π
|
θW
|
MAF
|
PIC
|
Wb-NE
|
0.23036
|
0.21255
|
0.1745
|
0.2176
|
Wb-T
|
0.18906
|
0.15944
|
0.1453
|
0.1796
|
Cb-C
|
0.11034
|
0.10183
|
0.112
|
0.1048
|
π, nucleotide diversity. θW, Watterson’s estimator. MAF, Minor allele frequency. PIC, Polymorphism information content. Wb-NE, wild barley of the Near East Fertile Crescent. Wb-T, wild barley of Tibetan Plateau. Cb-C, cultivated barley of China. |
An unrooted phylogenetic tree, PCA scatter plot and population structure (K = 2 ~ 4) were shown in Fig. 2. The unrooted phylogenetic tree showed an obvious separation of wild barleys (Hordeum vulgare ssp. spontaneum) and cultivated barleys (Hordeum vulgare ssp. vulgare) (Fig. 2A). The wild barley accessions were further divided into two clusters, one Wb-NE cluster and one Wb-T cluster (Fig. 2A). Two accessions (HS30 and HS31) of Wb-NE and one accession (Huangqingke) of Cb-C exhibited closer genetic relationships with Wb-T, suggesting that the diffusion process of barley were accompanied by gene flow. PCA scatter plot also displayed that cultivated accessions were clustered together and obviously separated from wild accessions (Fig. 2B). Furthermore, the highest genetic diversity was observed in Wb-NE because of the largest hollow ellipse, followed by Wb-T and Cb-C (Fig. 2B). An optimal population division appeared when the K values was 2 (cv error = 0.82356) (Supplementary Data 2). All accessions were separated into wild accessions and cultivated accessions. The results were consistent with the results of PCA and unrooted Neighbor-joining tree (Fig. 2A-B). When the K values was 3, Wb-T was separated from Wb-NE (Fig. 2C). The emergence of Wb-T might be due to the spread of Wb-NE accompanied by domesticated barley accessions from west to east. Then continuous gene flows took place between the adrift wild barley and cultivated barley during the process of diffusion or after settlement in Tibet.
In order to test evolutionary relationships of the three populations, we constructed rooted phylogenetic trees based 165,848 high-quality SNPs using the bulbous barley (Hordeum bulbosum) as an outgroup that was differentiated from barley about 5.80 MYA (million years) ago (http://www.timetree.org/). As a result of the bootstrap consensus tree, Wb-NE was located at the basal position on the rooted tree (node 1) (Fig. 3A), and Wb-T was derived from Wb-NE (node 2) (Fig. 3A), supporting that Wb-NE was ancestral to Wb-T. Cb-C was differentiated with Wb-T in node 3, which suggested Wb-T was ancestral to Cb-C (Fig. 3A). Evolutionary relationships revealed by the phylogenetic tree suggested that Wb-T might be evolved from Wb-NE and Cb-C evolved from Wb-T (Fig. 3B). The taxa of Wb-T are mixed with two Wb-NE accessions (HS30 and HS31) and a Cb-C accession (Huangqingke), suggested continuous gene flow accompanied by evolution process of barley.
Search of Identical-by-descent (IBD) blocks and test of gene flow
We tested the gene flow among populations. When K was 3, one Wb-T accession HS18 showed more genetic composition of Cb-C, while one Cb-C accession Huangqingke showed more genetic composition of Wb-T (Fig. 2C; Fig. 4A). These phenomena suggested ancient admixture in domestication process of barley. In order to gain further insights into the evolutionary relationships among the three populations and assess the admixture state, we searched for identical-by-descent (IBD) blocks. In the paired accessions within populations, as many as 30 of 45 pairs shared IBD blocks within Wb-T were found, followed by Cb-C with 16 of 45 pairs shared IBD blocks (Fig. 4B; Table 3), only 3 of 36 pairs shared IBD blocks within Wb-NE (Fig. 4B; Table 3). In the paired accessions between populations, 19 of 100 pairs shared IBD blocks were found between Wb-T and Cb-C, followed by 9 of 90 pairs shared IBD blocks between Wb-T and Wb-NE (Fig. 4B; Table 3). Only 1 of 90 pairs shared IBD blocks between Cb-C and Wb-NE (Fig. 4B; Table 3). These results suggested that Wb-T accessions were the most admixture due to gene flow within or between populations. Furthermore, the average length of shared IBD block between Cb-C and Wb-T (9.67 Mb) were longer than that between Wb-T and Wb-NE (8.04 Mb), suggesting that gene flow occurred earlier between Wb-NE and Wb-T than between Wb-T and Cb-C (Supplement Data 3).
Table 3
Accession pairs shared IBD blocks within and among populations.
Population
|
Cb-C
|
Wb-T
|
Wb-NE
|
Cb-C
|
30 (45)
|
|
|
Wb-T
|
19 (100)
|
30(45)
|
|
Wb-NE
|
1(90)
|
9(90)
|
3(36)
|
Wb-NE, wild barley of the Near East Fertile Crescent. Wb-T, wild barley of Tibetan Plateau. Cb-C, cultivated barley of China. |
The numbers in brackets denote all accession pairs within or between populations. |
Although the IBD analysis did not reveal conclusive evidence for gene flow (without statistics to participate in judgment), we used the results to guide further analyses. Specifically, we used the presence of between-populations IBD blocks (Fig. 4B) to identify probable cases of gene flow and then further examined these using the ABBA-BABA test [34]. We used HS18, a wild barley of Near East Fertile Crescent at the basal position in the rooted phylogenetic trees (Fig. 2A-B), as the outgroup. Three tree topological models of trios were simulated. They are (HS18,(Wb-NE,(Wb-T,Cb-C))), (HS18,(Wb-T,(Wb-NE,Cb-C))) and (HS18,(Cb-C,(Wb-NE,Wb-T))), respectively. As a result, the most suitable topological model of (HS18,(Wb-NE,(Wb-T,Cb-C))) was inferred (Fig. 4C), which indicated gene flow between Wb-NE and Cb-C because the D statistic (0.167869) was greater than 0 and the Z score (6.0212) was greater than 3 (Fig. 4D). The result suggests that Wb-NE accessions are possible genome donor of Cb-C. We also found gene flow between Wb-T and Cb-C from two other topological models (Fig. 4D), suggesting that Wb-T made genetic contributions to the genome of Cb-C.
Wb-T has made more genetic contribution than Wb-NE to Cb-C
Based on the results of population structure inference (Fig. 2C) and rooted phylogenetic tree (Fig. 3A), we deleted 4 accessions (including Huangqingke, HS30, HS31 and HS108) to minimize the effect of genetic drift between populations. The retained 25 accessions (9 accessions in Cb-C, 9 accessions in Wb-T and 7 accessions in Wb-NE, respectively) represented the three pure populations for further analysis. In order to clarify that Wb-T or Wb-NE has higher genetic contributions to Cb-C, the three pure populations were used to perform genomic similarity analysis. We used 150-kb windows and 75-kb overlapping slide windows to detect the potential similarity of genomic regions, resulting in the best coverage along the reference genome of barley. As a result, total 1155 windows with high similarity detected fell into the selection criteria, which account for ~ 3.58% of barley reference genome. The 1155 windows were showed in Circos diagrams (Fig. 5A-B). The numbers of windows with high similarity was 560 between Wb-T and Cb-C (Fig. 5A-B), and was 281 between Wb-NE and Cb-C (Fig. 5B). More windows (656) with high similarity between Wb-NE and Wb-T were detected (Fig. 5A), indicating a close genetic relationship between the two wild barley population. Compared with the numbers of windows between Wb-NE and Cb-C (281) (Fig. 5B), more windows were detected between Wb-T and Cb-C (560) (Fig. 5A-B), indicating that Wb-T has made more genetic contributions than Wb-NE to Cb-C.
There were only 122 unique genomic regions between Wb-NE and Cb-C (Fig. 5C). In contrast, 383 and 461 unique genomic regions were detected between Wb-T and Cb-C, and between Wb-T and Wb-NE, respectively (Fig. 5C). Meanwhile, unique genomic regions on each chromosome were summarized in Fig. 5D. Compared with the numbers of unique windows between Cb-C and Wb-NE, the numbers of unique windows between Cb-C and Wb-T exhibited overwhelming advantages on chromosome 1H (481.8%), 2H (260.0%), 5H (400.0%) and 7H (485.7%) (Fig. 5D).
Based on the results of unrooted and rooted phylogenetic trees, PCA, population structure, IBD blocks, the ABBA-BABA test, and genomic similarity, we proposed two evolutionary stage of the Cb-C domestication process, first from Wb-NE to Wb-T, and then from Wb-T to Cb-C. Wb-T played an important intermedia role in the domestication from Wb-NE to Cb-C, and has made more genetic contributions than Wb-NE to Cb-C. Furthermore, continuous gene flow in the domestication process has occurred.
Positive selection signals from Wb-NE to Wb-T and from Wb-T to Cb-C
The three pure barley populations described in genomic similarity analysis were used to calculate sliding windowed fixation index (Fst) between populations. The average Fst values was calculated in the genomic region from 48Mbs to 49Mbs on the 3H chromosome, which contains the two genes (Btr1 and Btr2) regulating brittle rachis. As a result, the lowest average Fst value of 0.1115 was observed between Wb-NE and Wb-T, suggesting weaker signals of selective sweeps in the region between the two wild barley populations. By contrary, strong signals of selective sweeps were detected in comparison between Wb-NE and Cb-C (Fst = 0.573) and between Wb-T and Cb-C (Fst = 0.456) in the same genomic region (Fig. 6).
Genomic regions with the Fst > 0.5 threshold values, were used to conduct selective sweep analysis and locate genes influenced by natural selection or artificial selection. As a result, 1090 windows under strong selective sweeps were detected in comparison between Wb-NE and Wb-T and contained 686 high-confidence genes (Supplementary Data 4). More windows (6546) and more high-confidence genes (3529) were detected between Wb-T and Cb-C (Supplementary Data 5). In comparison between Wb-T and Cb-C, distribution of selective genes on chromosome 1H ~ 7H were 479, 504, 615, 629, 506, 253 and 459, respectively, much less than those between Wb-NE and Wb-T (Table 4). 261 and 1390 high-confidence genes were KEGG annotated in comparison between Wb-NE and Wb-T, and between Wb-T and Cb-C, respectively (Supplementary Data 4; Supplementary Data 5).
Table 4
Distribution of selected genes on each chromosome.
Group
|
1H
|
2H
|
3H
|
4H
|
5H
|
6H
|
7H
|
Unknow
|
Total
|
Wb-NE vs Wb-T
|
122
|
60
|
64
|
85
|
60
|
18
|
244
|
33
|
686
|
Wb-T vs Cb-C
|
479
|
504
|
615
|
629
|
506
|
253
|
459
|
84
|
3529
|
Wb-NE, wild barley of the Near East Fertile Crescent. Wb-T, wild barley of Tibetan Plateau. Cb-C, cultivated barley of China. |
Three populations exhibited significant divergence in their metabolomes
To determine the difference in the metabolomes among the three populations, the seedling metabolomes of three populations including 25 accessions described in genomic similarity analysis were quantified using a non-targeted metabolomics approach (see methods for detail). In total, 8828 non-redundant metabolites were detected (Supplementary Data 6), including 328 structure-annotated metabolites (Supplementary Data 7). These annotated metabolites covered 12 metabolite classes, including 114 amino acids and derivatives, 54 sugar and derivatives, 37 carboxylic acides, 24 lipides, 21 glycosides, 17 nucleic acides and derivative, 15 amines, 11 vitamines, 9 hormones, 6 phenylpropanoides, 3 alkaloid and 17 unclassified metabolites (Fig. 7A; Supplementary Data 7).
Principal component analysis (PCA) and hierarchical cluster were performed based on converted contents of all metabolites. Interestingly, the 25 barley accessions were also separated into three clusters (Fig. 7B). Moreover, Wb-T resided between Wb-NE and Cb-C, indicating that metabolic divergence might have occurred during the evolution from Wb-NE to Wb-T and from Wb-T to Cb-C. Similar results of PCA were obtained by the hierarchical cluster tree (Fig. 7C).
Different metabolic sets were targeted during the evolution from Wb-NE to Wb-T and from Wb-T to Cb-C
We used a Qst-Fst comparison strategy (see methods for detail) to identify the specific metabolites that were targeted by selection or by neutral processes in domesticated stages from Wb-NE to Wb-T and from Wb-T to Cb-C, respectively. Qst and Fst were used to estimate the divergence for metabolic level and for molecular level between populations, respectively. At the molecular level, the average Fst from Wb-NE to Wb-T (0.23733) was much higher than that from Wb-T to Cb-C (0.14329), suggesting that greater genomic divergence appeared in the stage from Wb-T to Cb-C due to natural or artificial selection (Fig. 8A). The SNPs were considered to be selective with Fst > 0.326 in the stage from Wb-NE to Wb-T, and so did those in the stage from Wb-NE to Wb-T with Fst > 0.347 (Fig. 8B). Those SNPs that deviated from neutrality were excluded, and two sets of neutral SNPs including 51097 and 96208 SNPs was generated in domesticated stages from Wb-NE to Wb-T and from Wb-T to Cb-C, respectively (data not shown). At the metabolic level, 99% confidence intervals (CIs) of Qst at each metabolite was compared with Fst of neutral SNPs to determine divergent metabolites driven by selection. As a result, of the 8828 non-redundant metabolites (Supplementary Data 6), total 3851 metabolites showed significant differences in metabolite contents due to selection in the two domesticated stages (Fig. 8C; Fig. 8D), including 1850 metabolites from Wb-NE to Wb-T and 2766 metabolites from Wb-T to Cb-C (Fig. 8D; Supplementary Data 8; Supplementary Data 9). Among the 8828 metabolites that showed evidence of selection, only 765 (8.67%) metabolites were common targeted in both the two domesticated stages (Fig. 8C; Fig. 8D), indicating that different sets of metabolites were targeted during the domesticated stages from Wb-NE to Wb-T and from Wb-T to Cb-C.
Among the 328 structure-annotated metabolites (Supplementary Data 7), 132 metabolites (126 classified and 6 unclassified metabolites) showed divergence in the two domesticated stages (Supplementary Data 10; Supplementary Fig. 1), including 55 metabolites between Wb-NE and Wb-T, and 108 metabolites between Wb-T and Cb-C (Supplementary Data 11; Supplementary Data 12). To determine which class of metabolites was involved in specific domesticated stage, we analyzed the distribution of 11 metabolite classes from 126 classified metabolites. As a result, these 11 metabolite classes exhibited three types of evolutionary features (Fig. 8E).
Type Ⅰ of metabolite class only included alkaloids with 2 metabolites that were specially targeted in evolutionary process from Wb-NE to Wb-T (Fig. 8E). Two alkaloid metabolites, N-methyltyramine (M152T230) and Hordenine (M166T177), exhibited selective status between Wb-NE and Wb-T because their CIs of Qst were above 0.326, while exhibited neutral status between Wb-T and Cb-C (Fig. 8F). Type II of metabolite class was targeted in both two domesticated stages, including lipids, hormones, nucleic and derivate, amino acid and derivate, amines, carboxylic acids and sugar and derivate, with metabolite numbers of 17, 4, 12, 42, 8, 8 and 15, respectively (Fig. 8E). Type III metabolite class were mainly targeted in domesticated stage from Wb-T to Cb-C, including phenylpropanoids, glycosides, and vitamines (Fig. 8E). All of 5 metabolites (3 compounds) in phenylpropanoids class, exhibited selective stated between Wb-T to Cb-C because their CIs of Qst were above 0.347 (Fig. 8G). In phenylpropanoids class, only one metabolite (M147T299, 4-Hydroxycimamic) exhibited selective status in evolutionary stage from Wb-NE to Wb-T, but the other four metabolite showed neutral status in the stage (Fig. 8G).
Identification of key positive selective genes and SNPs that influence the metabolic divergence
In our study, we found that two alkaloid compounds (N-Methyltyramine and Hordenine) exhibited specific divergence in evolutionary stage from Wb-NE to Wb-T (Fig. 8F; Fig. 9A). Further, 5 of 261 positive selective genes (the Fst value > 0.5 of the window where the gene is located) in the evolutionary process were in the same regulatory pathway with the 2 divergent compounds (Table 5). The 5 selective genes regulate alcohol dehydrogenase, acylpyruvate hydrolase (FAHD1), aspartate aminotransferase (GOT1), aromatic-L-amino-acid/L-tryptophan decarboxylase (DDC) and 4-hydroxyphenylpyruvate dioxygenase (HPD), respectively. Among the 5 genes, HORVU1Hr1G010120, HORVU3Hr1G073220 and HORVU6Hr1G027650 were closer to N-Methyltyramine and Hordenine (Fig. 9A).
Table 5
Positive selective genes related to alkaloid compounds in evolutionary stage from Wb-NE to Wb-T
Gene ID
|
KEGG ID
|
Function
|
HORVU1Hr1G010130
|
K00001
|
alcohol dehydrogenase
|
HORVU4Hr1G013370
|
K01557
|
FAHD1, acylpyruvate hydrolase
|
HORVU3Hr1G073220
|
K14454
|
GOT1, aspartate aminotransferase
|
HORVU1Hr1G010120
|
K01593
|
DDC, aromatic-L-amino-acid/L-tryptophan decarboxylase
|
HORVU6Hr1G027650
|
K00457
|
HPD, 4-hydroxyphenylpyruvate dioxygenase
|
Since there are multiple SNPs in gene structure of HORVU6Hr1G027650 (Fig. 9B), it was used to further analyze the relationship with the two compounds. We found 19 SNPs in the structure of HORVU6Hr1G027650 (Fig. 9B), of which 13 SNPs were located in the coding region (8 synonymous mutations and 5 missense mutations) (Fig. 9C). In addition, a SNP named 6H:109670128 causes stop lost (Fig. 9C). Five missense mutations in the coding region of HORVU6Hr1G027650 detected in 16 accessions from Wb-NE and Wb-T produced four haplotypes (Fig. 9C-D). Hap1 to hap4 were detected in Wb-NE, while only hap4 was found in Wb-T (Fig. 9D). Moreover, hap1 contained the most accessions (accession number = 4) of Wb-NE, while hap4 contained all accessions (9 accessions) of Wb-T (Fig. 9D). The “t test” between the 4 Wb-NE accessions with hap1 and 9 Wb-T accessions with hap4, showed that both N-Methyltyramine and Hordenine in Wb-T exhibited significant more content than in Wb-NE (Fig. 9E-F).
We also found that three phenylpropanoids compounds (4-Hydroxycimamic, L-Phenylalanine and Apiin) exhibited specific divergence in domesticated stage from Wb-T to Cb-C (Fig. 8G; Fig. 10A). Further, 41 of 1390 positive selective genes (the Fst value > 0.5 of the window where the gene is located) in the domestication stage were related to the 3 divergent compounds (Supplementary Data13). There are four genes that directly regulate these three divergent compounds (especially 4-Hydroxycimamic and L-Phenylalanine). Among the four genes, both HORVU2Hr1G038110 and HORVU2Hr1G038120 regulate phenylalanine ammonia-lyase (PAL), HORVU3Hr1G080830 and HORVU4Hr1G072150 regulate trans-cinnamate 4-monooxygenase (C4H) and 4-coumarate-CoA ligase (4CL), respectively (Table 6).
Table 6
The key positive selective genes genes involved in phenylpropanoid metabolites
Gene ID
|
KEGG ID
|
Function
|
HORVU2Hr1G038110
|
K10775
|
PAL, phenylalanine ammonia-lyase
|
HORVU2Hr1G038120
|
K10775
|
PAL, phenylalanine ammonia-lyase
|
HORVU3Hr1G080830
|
K00487
|
C4H, CYP73A, trans-cinnamate 4-monooxygenase
|
HORVU4Hr1G072150
|
K01904
|
4CL, 4-coumarate-CoA ligase
|
Since there were multiple SNPs in gene structure of HORVU4Hr1G072150 (Fig. 10B), it was used to further analyze the relationship with L-Phenylalanine and 4-Hydroxycimamic. We found 6 SNPs in the structure of HORVU4Hr1G072150 (Fig. 10B), and all of them were located in the coding region. Among the six SNPs, four SNPs were missense mutations, while two were synonymous mutations (Fig. 10B). The four missense SNPs can be further from 16 haplotypes (Fig. 10C), but only three haplotypes (hap1, hap15 and hap16) were found in our 18 accessions containing Wb-T and Cb-C (Fig. 10D). Hap1 contained the all accessions (accession number = 9) of Cb-C and few accessions (accession number = 3) of Wb-T, while hap15 contained most accessions (accession number = 5) of Wb-T (Fig. 10D). Hap16 only contained 1 accession of Wb-T (Fig. 10D). For the 2 divergent compounds (L-Phenylalanine and 4-Hydroxycimamic), The “t test” between the 9 Cb-C accessions with hap1 and 5 Wb-T accessions with hap15 indicated that both L-Phenylalanine and 4-Hydroxycimamic exhibited more content in Wb-T than in Cb-C (Fig. 10E-F).