Isolation of TaGS3 in wheat
In wheat, three copies of TaGS3 on chromosome arms 4AL (TraesCS4A02G474000), 7AS (TraesCS7A02G017700), and 7DS (TraesCS7D02G015000) were searched in the Chinese Spring RefSeq v1.0 genome database. Primers were designed based on specific regions for each sequence. The primers amplified PCR products 2445, 2393, and 2409 bp in length for TaGS3-4A (GenBank accession: KY888174), TaGS3-7A (GenBank accession: KY888186), and TaGS3-7D (GenBank accession: KY888197) in wheat accession ‘Changzhi 6406’ (a cultivar that exhibits high grain weight). The lengths of the predicted coding sequences of the three genes were 513, 510, and 510 bp, encoding putative proteins of 170, 169, and 169 amino acids, respectively. TaGS3-4A, TaGS3-7A, and TaGS3-7D showed similarity in exon–intron structure to that of OsGS3, which consisted of five exons and four introns (Figure 1). The degree of similarity between the coding region of OsGS3 and that of TaGS3-4A, TaGS3-7A, and TaGS3-7D was 45.92%, 43.94%, and 44.87%, respectively. Similarly, the similarity of the deduced amino acid sequences was 45.26%, 44.83%, and 42.24%, respectively.
Phylogenetic analysis of GS3 in plants
A phylogenetic tree was reconstructed for 14 deduced amino sequences for GS3 homologs from Aegilops tauschii, Hordeum vulgare, Panicum hallii, Setaria italica, Triticum aestivum, Zea mays, Triticum urartu, Sorghum bicolor, Glycine max, and Arabidopsis thaliana. A reported atypical Gγ domain gene, DEP1, was used as the outgroup. In the phylogenetic tree the sequences were resolved into two main groups, which comprised the GS3 orthologs and DEP1 orthologs (Figure 2a). The GS3 orthologous group consisted of ten sequences that were all derived from monocotyledons, whereas the DEP1-orthologous group comprised sequences derived from monocotyledons and dicotyledons.
In the GS3 orthologs group, the relationships among the genes were congruent with the species phylogeny, except that the relationship between TaGS3-7A in wheat subgenome A and TuGS-7A in Triticum urartu was more distant than that with GS3 of Hordeum vulgare. The similarity of the conserved domain among the sequences was 85.14% overall and the sequences were of similar length (60–66 amino acids) except for that of T. urartu (46), whereas the length of the cysteine-rich region was variable and was divided into three classes. Class I consisted of Triticeae species with a conserved length of 92-94 amino acids (85 in Triticum urartu was an exception); Class II included Oryza species of which the longest sequence was 150 amino acids; and Class III consisted of Panicoideae species with a conserved length of 111-116 amino acids. Comparison of the atypical Gγ subunit conserved domain among the GS3 ortholog sequences revealed that all sequences contained an identical amino acids fragment “PRP--RLQLAVDALHR--FLEGEI” (“--" represents a non-conserved region and the number of “-” does not reflect the number of amino acids).
Polymorphism of TaGS3-4A and TaGS3-7A among wheat accessions
Sequences for TaGS3-4A and TaGS3-7A from ten wheat accessions were obtained. To avoid possible false variation generated by sequencing error, an advanced analysis for detection of real variants resulted in a minor allele frequency > 0.1 (real SNPs must be detected at least twice in ten sequences). In total 17 and 18 variations for TaGS3-4A and TaGS3-7A was detected by multiple alignment, respectively. There were 3, 1, 6, 0, 3, 0, 0, 0, 0, 2, and 2 variations located in the 5′ upstream region, first exon, first intron, second exon, second intron, third exon, third intron, fourth exon, fourth intron, fifth exon, and the 3′ downstream region for TaGS3-4A, respectively. In parallel, 4, 0, 2, 0, 4, 0, 0, 1, 3, 3, and 1 variable sites were detected in the corresponding regions for TaGS3-7A (Tables 2 and 3). Among them, both three SNPs causing changes causing amino acid change were determined for each of TaGS3-4A and TaGS3-7A, which were located in the first exon (1) and fifth exon (2), and in the fourth exon (1) and fifth exon (2), respectively. The variant (GC/AT) at position 70 for TaGS3-4A was located in an atypical Gγ subunit, of which the GC allele was harbored by 80% and 60% of accessions grouped into heavy and light grain weight pools, respectively, which indicated this locus was not associated with grain weight. In addition, the frequency of allele A at position 1907 (amino change: ALA/THR) was 0.6 in the heavy grain weight accessions compared with zero in the light grain weight accessions. Therefore, this allele was considered more likely to be associated with grain weight in wheat.
Molecular marker design and validation
Based on the results of SNP mining, a Kompetitive allele-specific PCR (KASP) marker was designed for the SNP at position 1907 (A/G). The calls for the TaGS3-7A-A allele of the potential light grain weight group were clustered along with the x-axis, whereas the calls of the TaGS3-7A-G allele of the potential heavy grain weight group were clustered along with the y-axis (Figure 3). Thus, the clustering results clearly distinguished the two alleles, which was used to validate its effects on grain weight in the natural population.
The association between genotypes and grain traits was analyzed based on a mini-core collection (MCC) of 224 wheat accessions for this marker (Table S1). On the basis of the SNP calls using the KASP marker, significant differences for kernel width, kernel weight, and kernel thickness was observed for the TaGS3-7A-G and TaGS3-7A-A alleles, whereas no significant difference for kernel length was observed (Table 3). In addition, the allele TaGS3-7A-A with superior traits with a rare frequency of 13.107%, which was significantly less than that for TaGS3-7A-G. However, the frequency of TaGS3-7A-A in cultivars (18.085%) was significantly higher than that in landraces (0.980%).
Distribution of the TaGS3-7A-A allele in the major wheat-production areas of China
In total, 238 wheat cultivars grown in the main wheat-production areas of China were analyzed to determine the frequency of the TaGS3-7A-A allele (Table S2). The TaGS3-7A-A allele was detected in 58.4% of all tested cultivars. Significant differences in the frequency of TaGS3-7A-A were observed among the five province areas. The TaGS3-7A-A allelic frequency exceeded 50% in all provinces except Henan (i.e., 92.0% in Sichuan, 62.5% in Shandong, 60% in Hebei, and 52.6% in Shaanxi). The TaGS3-7A-A allele was detected in 50% of the cultivars in Shaanxi province. However, a significantly lower frequency of TaGS3-7A-A was detected among cultivars from Henan (28.23%) compared with the other provinces.