Genome-wide Identication, Expression, Characteristics of the Prolamin Superfamily in Thinopyrum Elongatum and Its Kneading Performance in Common Wheat

Background: Prolamins, unique to Gramineae (grasses), play a key role in the human diet. Thinopyrum elongatum (also known as tall wheatgrass, rush wheatgrass, or Eurasian quackgrass) of Elytrigia is genetically well-characterized, but little is known about its prolamin genes and the relationships with homologous loci in the Triticum genus. Results: In this study, a total of 19 α-gliadin, 9 γ-gliadin, 19 ω-gliadin, 2 high-molecular-weight glutenin subunit (HMW-GS), and 5 low-molecular-weight glutenin subunit (LMW-GS) genes in the Th. elongatum genome were annotated. The transcriptome data of Th. elongatum exhibited differential expression in quantity and pattern in the same subfamily or different subfamilies. In addition, microsynteny and phylogenetic analysis revealed dynamic changes of prolamin gene region and genetic anities among Th. elongatum, T. aestivum, T. urartu, and Aegilops tauschii. The E genome, like the B genome, only contained DQ8-glia-α1/DQ8.5-glia-α1, which provided a theoretical basis for the study of celiac disease (CD). Dough rheological properties of T. aestivum-Th. elongatum disomic substitution (DS) lines 1E(1A), 1E(1D), and 3E(3A) showed much higher peak height values than that of their parent. Conclusions: Overall, this study provides a comprehensive overview of the prolamin gene superfamily in Th. elongatum, and suggests a promising use of this species in the generation of improved wheat breeds intended for the human diet. and three related species aestivum, and Ae. tauschii) in terms of gene numbers, dynamic expression pattern, molecular characteristics, chromosome location, microsynteny, phylogenetic relationship, and CD epitope content. Mixogram tests of disomic substitution (DS) lines 1E(1A) and 1E(1D) were also performed. The results presented will also provide important evolution information in Triticeae and the application of prolamin genes of Th. elongatum in wheat breeding.

gramineous crops have been published, but the relationship among their prolamin genes has not been well described, including gene features and molecular characteristics [13][14][15][16]. High pseudogene rate is associated with gliadin families and this was estimated to be 87% of 230 distinct α-gliadin gene sequences in several diploid wheat species [1]. Hence, it is necessary to understand the expression pro le of these genes to clarify the mechanism of gene action. However, current research is still insu cient, especially regarding species other than common wheat [17].
Gene clusters of a gene family are often prone to genetic variations in copy number, sequence polymorphism, and gene expression [18]. Comparisons of these homologous regions between different genomes of related species will provide insights into gene differentiation, as well as local rearrangements [19]. For example, a comparison of the Gli-2 loci between T. dicoccoides Korn and T. aestivum cv. Chinese Spring (CS) showed a large sequence difference between the two A genomes and a conservative region between the two B genomes [20]. To date, a detailed analysis of other genomes (except for the A, B, and D genomes) on the interval of these loci has not been reported. Prolamins can re ect the evolutionary relationship of Gramineae species to a certain extent [21].
Gluten is the most important source of proteins from wheat for human beings. Unfortunately, prolamins of gluten are also responsible for certain intolerances, among which celiac disease (CD) is one of the most common wheat-related disorders [22]. CD is a chronic intestinal immune-mediated bowel disease, which occurs in genetically susceptible people and is caused by the intake of gluten [23]. α-gliadins are the main substance causing CD [18]. A breeding effort had been proposed to develop wheat with reduced immunoreactive epitopes while retaining baking functions [24]. Prolamin genes from wild relatives of wheat were reported to have low gluten toxicity and could be used to improve wheat quality by distant hybridization [25].
In this study, prolamin genes were obtained and comparative analysis were carried out between Th. elongatum and three related species (T. aestivum, T. urartu, and Ae. tauschii) in terms of gene numbers, dynamic expression pattern, molecular characteristics, chromosome location, microsynteny, phylogenetic relationship, and CD epitope content. Mixogram tests of disomic substitution (DS) lines 1E(1A) and 1E(1D) were also performed. The results presented will also provide important evolution information in Triticeae and the application of prolamin genes of Th. elongatum in wheat breeding.

Results
Identi cation of the prolamin genes in Th. elongatum and error correction in the related species A total of 19 α-gliadin genes, 9 γ-gliadin genes, 19 ω-gliadin genes, 2 HMW-GS genes, and 5 LMW-GS genes were identi ed and annotated in the Th. elongatum genome while no β-gliadin genes were detectable (Additional le 1: Table S1). In addition, some annotation errors were found in related species.
For example, there were no sequences of AET6Gv20125500.1 and AET6Gv20126100.1 in the existing CDS and protein les of Ae. tauschii. By checking their positions provided in that study, we found an incorrect coordinate layout [15]. All prolamin genes found in this study were manually checked and corrected according to their structure information (Additional le 1: Table S2) [8,[26][27][28][29]. Because of the lack of complete sequence of some ω-gliadins and HMW-GSs, detailed investigations of them were not performed.
Characteristics and sequence analysis of the prolamin gene superfamily All prolamin genes of four species were rst named based on abbreviations of species names and chromosomal locations. Then, the pI, MW, sequence length, and other characteristics of putative functional prolamin genes were calculated (Additional le 1: Table S3). No obvious differences in prolamin characteristics between Th. elongatum and other three species were found except for Tel_LMW_glutenin_1E_1, a special prolamin gene that has the highest MW and the longest protein sequence (Additional le 1: Table S3).
We describe the results of prolamin gene number changes among different ploidy species we studied ( Fig. 1A). Only HMW-GS gene subfamilies, carrying one x-and one y-type in each diploid genome, follow a conservative pattern in the process of evolution [7]. In contrast, no speci c pattern was observed among gliadin subfamilies across the species. What is particularly noticeable is that the numbers of Th. elongatum were the highest in three gliadin subfamilies of the three diploid species, and the number of ωgliadin genes of Th. elongatum was about 3-5 times that of the other two diploid species. A common feature of gliadins of the four species was that the numbers of α-gliadin genes were the largest, followed by ω-gliadin genes, and nally γ-gliadin genes.
In order to determine the proportion of prolamin genes that were effectively expressed in different species, the pseudogene rates involved in related gene families were calculated according to the pseudogene number based on the inference from gene sequences to protein sequences (Additional le 1: Table S3). Th. elongatum again had the highest pseudogene rates in α-gliadin (84.2%) and γ-gliadin gene families (77.8%), while no pseudogenes were found in γ-gliadin and LMW-glutenin gene families in T. urartu (Fig. 1B). Comparisons of pseudogene rates revealed that few α-gliadin and γ-gliadin genes were effectively expressed in Th. elongatum, which may be related to their smaller seed.
Expression pattern and electrophoretic maps of prolamin genes in Th. elongatum The study of prolamin gene expression can expand our knowledge of their functions by using transcriptome data. Genes with an average TPM value greater than 5000 were consistent with putative functional genes, which further indicated that our inference is correct ( Fig. 2A, Additional le 1: Table S4). Next, these highly expressed genes were further studied to discover their contribution to grain development. Comparative analysis revealed that the expression characteristics of prolamin genes in Th. elongatum were different in the same subfamily, which was one of the factors in uencing grain prolamin content. For example, the expression of Tel_gamma_gliadin_1E_8 was about 5-6 times that of Tel_gamma_gliadin_1E_2. Slight differences in gene expression were found in the α-gliadin, LMW-GS and HMW-GS gene subfamilies. In addition, different prolamin gene subfamilies had their own speci c expression patterns. In α-gliadin and LMW-GS gene subfamilies, the expression level of these genes in the semi-grain stage was higher than that in the grain stage, which indicated that the expression of these genes increased at rst and then decreased [28]. However, the expression of γ-gliadin genes has been increasing and we speculate that they will decline rapidly or slowly in the future due to reduced gene activity with the maturity of grains. In the HMW-GS gene subfamily, the expression of x-type HMW-GS genes was different from that of y-type. The expression of Tel_hmw_glutenin_1E_x in the half-grain stage was higher than that in the grain stage, while the expression of Tel_hmw_glutenin_1E_y increased slightly from the half-grain to the grain stage. Among these expressed genes, the expression of Tel_gamma_gliadin_1E_8 was the highest, followed by three genes of the α-gliadin gene subfamily. From these results, the expression patterns of prolamin gene subfamilies are complex and diverse. Now, we turn to the experimental evidence for prolamin expression. In the electrophoretic map of gliadins, with CS as control, a total of 11 gliadin bands were obtained in Th. elongatum (Fig. 2B). In the ω-gliadin region, the number of protenin bands of Th. elongatum was signi cantly more than that of CS, with about 7 bands. Two bands in γ-gliadin region were detected in Th. elongatum. Although two protenin bands of Th. elongatum located at the β-gliadin region of CS, no β-gliadin genes in its genome were detected. We thought they should be α-gliadin genes. In the electrophoretic map of glutenin subunits, compared with CS, a total of 6 bands were obtained (Fig. 2C). A band (Ex) of HMW-GS was near the 8-subunit of CS and the other band was lower, which is consistent with the map of a previous study [29]. Both electrophoretic patterns show polymorphism of prolamins between Th. elongatum and bread wheat, indicating that Th. elongatum is rich in prolamin genes and can be utilized in manufacturing products for human consumption.
Chromosomal location and duplication of prolamin genes Consistent with previous studies, the γ-gliadin, ω-gliadin, and LMW-GS genes of the four species are distributed on the short arm, and the HMW-GS genes are found on the long arm of the rst homologous group [10,[30][31]. Most α-gliadins of the four species were mainly distributed on the sixth homologous group (Fig. 3) [11]. It is interesting to note that a new α-gliadin cluster was detected with four pseudogenes on the short arm of chromosome 7E (109202365-109312140 bp). Strangely, microsynteny analysis showed that these α-gliadin genes on chromosome 6E and 7E did not belong to paralogous genes. To date, the origin of the loci of chromosome 7E remains unknown.
Both tandem duplication and segmental duplication are associated with gene production. The proportions of tandem repeat genes of α-gliadins, γ-gliadins, and LMW-GSs in Th. elongatum were 73.7%, 88.9%, and 50%. This supports the hypothesis that the major expansion of prolamin genes in Th. elongatum was through tandem duplication of the above three families.
To better analyze the evolutionary mechanism of α-gliadin, γ-gliadin, and LMW-GS gene subfamilies, an interval (from 5417104 to 11909933 bp) covering the rst to the last prolamin gene on the short arm of 1E chromosome was used for microsynteny analysis with the other genomes (Fig. 4). The results show that all these regions of analyzed chromosomes were located in collinear blocks and indicated two ndings that an inversion occurred between Glu-3 and Gli-1 loci on chromosome 1D of bread wheat and a orthologous ω-gliadin gene far away from the other ω-gliadin genes was not detected in the A genome of T. urartu, the D genome of Ae. Tauschii and common wheat. Changes in these gene loci between homologous species provide evidence for the dynamic evolution of the prolamin gene superfamily.
The next section of research was concerned with the microsynteny of a single prolamin subfamily between other selected species and Th. elongatum (Additional le 2). Additional le 2A and 2B show a shared feature that most prolamin genes in Th. elongatum tend to exit synteny relationship with few gliadin genes of co-originated chromosomes in selected species. The numbers of α-gliadin genes in a syntenic relationship between T. aestivum and Th. elongatum were 12 (A genome: 7, B genome: 5), which may be due to the large number of alpha-gliadin gene subfamily (Additional le 2C). This phenomenon suggested that these few genes were evolutionarily conserved, and gene subfamily expansion occurred after genome differentiation. Without the limitation of using incomplete sequences, the HMW-GS genes of Th. elongatum showed a collinear relationship with those on co-originated chromosomes of selected species, which indicated that these genes existed before genome differentiation (Additional le 2D).
Evolutionary analysis of each subfamily of prolamins between Th. elongatum and three species selected Based on the four selected species, we have constructed four phylogenetic trees of prolamin gene subfamilies to show the internal evolutionary relationship within Triticeae and the same prolamin gene subfamilies. The outgroups were GQ139526.1 (Psathyrostachys huashanica), X13508.1 (Hordeum vulgare), HQ293220.1 (Dasypyrum villosum), and FJ481574.2 (Eremopyrum triticeum), respectively. All 66 α-gliadin genes were obviously divided into 8 clades, supporting the notion that genes from the same genome generally group together (Fig. 5) [18]. The distribution of genes with explicit chromosome location hinted that clade 1, clade 2, and clade 3 represented the A, B, and D genomes, respectively, and clade 6, with only 2 genes, represented the E genome, which supported the standpoint that T. urartu and Ae. tauchii contributed their genomes to common wheat. Among these clades, clade 1 and clade 2 were clustered together, representing a close relationship of the D genome of the Aegilops genus and the A genome of the Triticum genus. These results also clearly show that the E genome evolved earlier than the B genome, followed by the D genome and the A genome. Clade 4 and clade 5 were clustered in a larger branch, representing a collection of α-gliadin genes from the genomes of four species. Clade 5 − 2 contained most of the α-gliadin genes of the E genome while clade 5 − 1 consisted of parts of α-gliadin genes of the B and E genomes, indicating that the B and E genomes are, again, closely related. By comparison, clade 4 formed after the differentiation of clade 5 (A, B, D, and E genomes), therefore we inferred that it included some ancient genes of the four species. It is interesting to note that 4 genes of the 7E chromosome formed a separate clade (clade 8) with a high bootstrap value and had a closer genetic distance with the root which indicates that these genes on chromosome 7E evolved earlier than those on chromosome 6E and formed as a result of different evolutionary trajectories in early stages. Importantly, the results consistent with the above evolutionary relationships of the A, B, D, and E genomes were also revealed by clade 2 of the HMW-GS gene subfamily that the HMW-GS genes of E and B genomes evolved rst, followed by those of the D genome and the A genome (Additional le 3C). Also, the HMW-GS genes of the E genome were more closely related to those of the B genome of the Triticum genus, while those of the A genome showed a closer relationship within the Triticum genus with those of the D genome of the Aegilops genus (Additional le 3C). In HMW-GS gene subfamily, all 12 genes were well divided into 2 clades, representing y-type HMW-GSs (clade 1) and x-type HMW-GSs (clade 2) (Additional le 3C) [14]. The topological structure of y-type HMW-GSs had some differences with x-type HMW-GSs, which might be related to the sequence variation of the C-terminal region of y-type HMW-GSs (Additional le 1: Table S6).
The LMW-GS and γ-gliadin gene subfamilies did not exhibit similar results of genome evolution with the α-gliadin and y-type HMW-GS gene subfamilies due to the limited gene numbers, but a new classi cation method of LMW-GS gene subfamily has been proposed. All 28 LMW-GS genes were divided into four groups based on the type of the 21st-29th amino acids, corresponding to traditional LMW-m (clade 1 and clade 2), LMW-s (clade 3), and LMW-i (clade 4) (Additional le 3B and Additional le 4) [6, [32][33]. Interestingly, the distributions of the rst and penultimate cysteine of LMW-GS sequence were found to be related to the classi cation of the LMW-m (clade 1 and clade 2) and LMW-s (Additional le 4). Although the penultimate cysteine of clade 1 was at the same site, clade 1 was divided into 3 subclades according to the different locations of their rst cysteine residue (Additional le 4), and subclade 1-1 and subclade 1-2 showed a closer relationship (Additional le 3B). Clade 2 was de ned as an intermediate type between subclade 1-3 and clade 3. The reason is that the sequence from the rst to about 345th amino acids (including the rst cysteine) were similar to that of clade 1-3, but the remaining amino acids sequence from 350th to about the last 483th amino acids (including the penultimate cysteine) were similar to that of clade 3 (Additional le 4). Whether based on the location of the third cysteine or the type of the 21st-29th amino acids, LMW-GSs of clade 4 belonged to the LMW-i type. We also found three special sequences (Tel-LMW-glutenin-1E-2, -3 and − 4) in the E genome with an extra serine in front of the stop codon (Additional le 4).

Distribution of CD epitopes of α-gliadins of four species
The most in uential T cell epitopes in CD patients are PFPQPQLPY (DQ2.5-glia-α1a), PYPQPQLPY (DQ2.5-glia-α1b), PQPQLPYPQ (DQ2.5-glia-α2), FRPQQQPYPQ (DQ2.5-glia-α3), and QGSFQPSQQ (DQ8glia-α1/DQ8.5-glia-α1), as well as the most toxic 33-mer peptide (LQLQPFPQPQLPYPQPQLPYPQPQLPYPQPQPF) [34][35]. In our research, α-gliadin genes without clear chromosome coordinates were classi ed according to their evolutionary relationship ( Fig. 5 and Additional le 1: Table S5). Based on previous conclusions, we also showed that genomic speci city also existed in the E genome [1]. Almost 86.7% of genes contained DQ2.5-glia-α1a or DQ2.5-glia-α3 in the A genome, but only 28.6% of genes contained one type of epitopes, DQ8-glia-α1/DQ8.5-glia-α1 in the B genome. The epitope types of the D genome were the most abundant and all six types were included. The E genome, like the B genome, only contained one type of epitopes, DQ8-glia-α1/DQ8.5-glia-α1, with an existence rate of 66.7%. Another result similar to previous reports was that the 33-mer peptide was only detected in the D genome of bread wheat [1]. When it comes to the number of existence of every peptide, DQ2.5-glia-α1b and DQ2.5-glia-α2 were only found in the D genome and often existed in the form of multiple peptides, especially DQ2.5-glia-α2. Conversely, other small peptides existed in the form of single peptides. These results provide a reference for the cultivation of wheat with low CD exacerbation.
Electrophoretic maps and kneading quality performance of DS1E(1A) and DS1E(1D) Through chromosome engineering, prolamins of chromosome 1E of Th. elongatum were implanted into wheat by substituting chromosome 1A or 1D. In this part of the study, we aimed to detect the quality effect of chromosome 1E after replacing chromosome 1A or 1D of bread wheat. In order to ensure the correctness of the materials, we analyzed the cytological characteristics of substitution lines by uorescence in situ hybridization (FISH). As expected, Fig. 6A  Through SDS-PAGE and A-PAGE, we show that 1Ex subunits and some gliadins from the E genome were expressed normally in DS1E(1A) and DS1E(1D) (Fig. 2B, 2D) [53]. The 1Ey subunit of Th. elongatum was speculated not to be expressed on account of the deletion of its band in DS1E(1A) and DS1E(1D). Next, kneading parameters were detected to check the kneading performance of Th. elongatum with prolamins for common wheat. Width at 8 min and midline peak height of DS1E(1A) were much higher than those of CS, which is consistent with the results of Guo et al. (Fig. 6C and 6D) [53]. This is an interesting phenomenon that the protein content and water absorption of DS1E(1A) were similar to those of CS, but the protein content of DS1E(1D) was 3% higher than that of CS ( Fig. 6C and 6E). There was no signi cant difference in width at 8 min between CS and DS1E(1D), but the peak height of DS1E(1A) and DS1E(1D) increased, indicating that the introduction of 1E chromosome enhanced the kneading resistance of dough ( Fig. 6C and 6E). By examining the effects of other chromosomes of the E genome on bread wheat quality, we found that DS3E(3A) improved the 8-min band width and the peak height of midline in CS, which indicated that there may be regulatory sites related to our mixing characteristics on bread wheat chromosome 3A (Fig. 6B and 6F).

Discussion
Prolamin gene families play important roles in our viscoelasticity, nutritional quality, and CD epitope content. For research on these gene subfamily, the method of known genome-wide sequence searches are regarded as the most comprehensive method. The genome-wide identi cation of prolamin gene families have been widely carried out in T. aestivum, T. urartu, and Ae. tauschii, but lack systematic analysis [14-16, 37]. Currently, knowledge is still limited about prolamin gene families in Th. elongatum.
In this study, we identi ed 19 α-gliadins, 9 γ-gliadins, 19 ω-gliadins, 2 HMW-GSs, and 5 LMW-GSs from the Th. elongatum genome. Genes from the genomes of the above related species were summarized. Although there are limitations in genome assembly, these results at least provide a reference for the study of prolamin genes in a single germplasm [37].
Transcripts of grain development at different stages indicated the complexity of differential expression of prolamin genes. According to previous reports of hexaploid wheat, the expression of LMW-GS genes reached the peak at 10 days after anthesis, and then decreased with the maturity of seeds [28]. Another analysis also shows a similar result, in which LMW-GS genes began to express on the 5th day after owering, reached the highest level at the 14th day, and then decreased gradually with seed ripening [17]. In Th. elongatum, the expression of three LMW-GS genes showed a consistent trend with the above studies ( Fig. 2A). Compared with α/β-gliadin genes, their expression has genomic differences that genes of B and D genomes belong to early-expressed genes (highest level at 10 days after owering), similar to the expression of LMW-GS genes, while those of the A genome are late-expressed genes (highest level at 20 days after owering) [28]. The expression trend of α-gliadin genes is synchronous with that of LMW-GS genes in Th. elongatum, therefore the α-gliadin genes of E genome should be early-expressed genes ( Fig. 2A). In CS, although the expression levels of all genes decreased to an exceptionally low level at 23-25 days post-anthesis (DPA), the two types of γ-gliadins showed different expression patterns; the expression of type I decreased rapidly at 10-15 DPA, while the expression of type II decreased slowly and gradually after reaching the highest level at 10 DPA [25]. In this study, the expression of γ-gliadin genes increased from the half grain to the grain stage and reached its peak later than that of α-gliadin genes and LMW-GS genes ( Fig. 2A). The expression levels of different LMW-GS genes vary greatly (almost tenfold) [17]. However, the largest difference was only found in two periods of γ-gliadin gene families in Th. elongatum (5-6 times) ( Fig. 2A).
Previously, the collinearity of genes at different loci has been studied in genome fragments of rice, maize, sorghum, barley, and wheat [38]. Relieving limitations of DNA markers based on genetic maps, these studies and comparisons of the smaller regions will provide us with preliminary insights into the detailed composition and organization of many plant genomes [19]. Prolamins are concentrated into clusters on chromosomes, which is helpful in comparing the homologous regions of different species to elucidate their evolutionary characteristics. A comparison of a 307 kb physical contig was analyzed between the A and B genomes of durum wheat and the D genome of Ae. tauschii. It was realized that, although gene collinearity appears to be retained, four of six genes including the two paralogous HMW-GS genes are disrupted in the orthologous region of the A genome [39]. Another study inferred that considerable sequence changes caused rearrangements of prolamin genes in these genomic regions after the split of the two homoeologous wheat genomes [10]. In this study, the homology of the whole Gli-1 and Glu-3 intervals was shown between Th. elongatum and other selected species (Fig. 4). The order of ω-gliadin, γgliadin, and LMW-GS genes was maintained in the E genome of Th. elongatum and other genomes of common wheat (A and D subgenomes), T. urartu (A genome), and Ae. tauschii (D genome). However, an inversion occurred in the interval of Gli-1 and Glu-3 loci on chromosome 1B of common wheat, revealing a dynamic change in this region (Fig. 4). As reported, the homoeologous genomes of wheat are not as well conserved as previously thought, owing largely to the differential insertion of transposable elements. In addition, a homologous ω-gliadin locus of Th. elongatum was detected only on chromosomes 1A and 1B of bread wheat (Fig. 4). We speculated that this locus was lost from chromosome 1D of bread wheat and Ae. tauschii, and chromosome 1A of T. urartu during the progress of evolution. These results lay the foundation for the further study of prolamin genes and the anking regions of Th. elongatum.
The phylogenetic relationships of Hystrix, Leymus, and their relatives were investigated using the Acc1 gene and obtained a result consistent with morphological and cytological studies, which indicates that the Acc1 gene is a potentially valuable source for phylogenetic analysis in Triticeae [57]. In the past, the Acc1 gene has also been successfully applied to the study of evolution of Triticum/Aegilops, as well as that of switchgrass (Panicum virgatum L.) [58,59]. However, the evolutionary relationship of species usually requires multiple groups of evidence. Prolamin genes are also considered as research resource for evolutionary relationship of Gramineae species. For example, the evolution of LMW-m genes indicated that there was a close relationship between the B genome and the Ss genome, which supported the view that the B genome originated from Ae. apetala [25]. Recently, a study indicated that the E genome of the Elytrigia genus was more related to the B genome of common wheat and the A genome was more related to the D genome of common wheat through the analysis of single copy genes of its genome [13]. In our study, this result is supported by phylogenetic trees of α-gliadin genes and x-type HMW-GS genes, respectively ( Fig. 5 and S2C). In the α-gliadin subfamily, the genes on chromosome 7E were shown to diverge earlier than other genes (Fig. 5). Because only the M, T, and L sites of the nine amino acid residues at the beginning of the N-termini were conserved, we speculate that there will be more types of LMW-m with species diversi cation (Additional le 4). We also found that clustering of LMW-GSs was related to the position of the rst and the penultimate cysteine. Therefore, an improved method of dividing LMW-m into four types was proposed (detailed analysis is shown in evolutionary analysis).
Based on the putative functional α-gliadin sequences of four species, our results showed that CD peptides were genome-speci c, which is consistent with the results of a previous study (Additional le 1: Table S6) [1]. Importantly, α-gliadins of the E genome contain only one type CD peptide, which is bene cial to low-CD breeding. Although many germplasm resources of Triticeae have been used to test CD content based on protein sequences, they have not been applied to the cultivation of wheat varieties. DS6E(6D), which replaces the polypeptide D genome with the oligopeptide E genome and has been veri ed through cytological study (results not shown), will be used to detect the effect of low CD in the future.

Conclusions
Unlike the genome of T. aestivum, T. urartu and Ae. tauschii, prolamin genes have not been carried out in Th. elongatum genome arti cially, and there is a lack of understanding of these genes. We completed this work, compared and characterized these genes with those we summarized in related crop genomes. We also explained the defferent expression pattern of prolamin gene subfamilies, located these genes on the chromosomes, found the dynamic changes of gene intervals, elaborated the evolutionary relationship of crop species. The pattern that large number prolamin genes existed but few were effectively expressed may be suitable for grain size in evolution. The E genome with low content of small CD peptides will be pro table genomic choice for the cultivation of wheat with low celiac disease. These results will promote the research of prolamin gene superfamily in wheat quality, and provide thinking for wild resources in improving wheat quality.

Feature analysis
In this study, putative functional genes were deduced according to whether the amino acid sequences of prolamins could produce proteins with complete structure. The pseudogene rate of the gene subfamily was calculated with the number of pseudogenes divided by the total number of genes. Two genes were deleted because the sequences were not completed. All putative functional prolamin sequences were submitted to Expasy (https://web.expasy.org/compute_pi/) for calculation of molecular weight (MW) and theoretical isoelectric point (pI).

Protein extraction and gel electrophoresis
The method for glutenins Single milled seed was suspended in 0.5 mL mixture extraction liquid and then incubated at 65 ºC for 2 hours. The components of the extracted liquid mixture were as follows: 25 mL Tris-HCl (0.5 M, pH 6.8), 20 mL glycerol, 4 g sodium dodecyl sulfate (SDS), 30 g bromophenol blue, and 1 g dithiothreitol (DTT). The total volume of above extracted liquid mixture was then made to 100 mL with deionized water. The glutenin subunits were separated by a discontinuous sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) system. The 100 mL 10% separating gel included 40.7 mL deionized water, 33.3 mL 30% Acryl/Bis solution (29:1), 25 mL 4X separation gel buffer, 1 mL 10% (m/v) ammonium persulfate solution and 800 uL TEMED. The 15% separating gel were calculated according to the 10% separating gel. The 80 mL 5% stacking gel included 45.6 ml deionized water, 13.6 mL 30% Acryl/Bis solution (29:1), 20 mL concentrated rubber buffer, 800 uL 10% (m/v) ammonium persulfate solution and 65 uL TEMED. Gels were run at a constant current (12 mA) for 20 hours.
The method for gliadins Single milled seed was suspended in 0.3 mL 70% ethanol and then incubated at 37 ºC for 2 hours. Acid gel composition was as follows: 0.5 g ascorbic acid, 0.01 g FeSO 4 , 50 g acrylamide, and 2.5 g Bis-Tris.
The total volume of each sample was then made to 500 mL with deionized water. The separating gel Coomassie brilliant blue dye liquor included 1 g coomassie brilliant blue R-250, 100 mL glacial acetic acid, 250 mL isopropanol and 650 mL deionized water. The decolorizing solution included 50 mL anhydrous ethanol, 100 mL glacial acetic acid and 850 mL deionized water. The gels were stained with a Coomassie brilliant blue dye liquor for 4 hours and destained overnight.

Distribution and duplication of prolamin genes
The distribution of prolamin genes on chromosomes was determined by the MapChart software [50]. Three criteria were used to determine whether they were tandem repeat genes [51][52]: (a) The distance between two genes on the same chromosomal fragment was less than 100 kb; (b) the shorter aligned sequence covered > 70% of the longer sequence; (c) the similarity of aligned sequences was > 70%. If AB and BC are two pairs of tandem repeat genes (A, B, and C are three adjacent genes) but AC does not meet the tandem replication criteria, A and C are considered to be tandem repeat genes. The microsynteny analysis was conducted using the JCVI software with appropriate parameters (cscore = 0.8) (https://github.com/tanghaibao/jcvi/wiki).
Phylogenetic analysis and classi cation of each prolamin subfamily Four evolutionary trees were constructed using CDSs of the conservative domains derived from α-gliadin, γ-gliadin, LMW-GS, and HMW-GS genes (Additional le 1:

CD content detection
The key peptides involved in the pathogenesis of CD are DQ2.5-glia-α1a (PFPQPQLPY), DQ2.5-glia-α1b (PYPQPQLPY), DQ2.5-glia-α2 (PQPQLPYPQ), DQ2.5-glia-α3 (FRPQQPYPQ), and a 33-mer peptide (LQLQPFPQPQLPYPQPQLPYPQPQLPYPQPQPF). These small peptide sequences were retrieved from the deduced functional protein sequences, and the contents of these small peptides of single prolamin subfamily in the genomes of the four species were determined. FISH FISH was performed with the probes oligo-psc119.2-1 from Secale cereal and oligo-pta535-1 from T. aestivum. Hybridization solution was prepared according to the number of samples prepared. The two probes were mixed at a ratio of 1:1 before hybridization. The speci c methods used were performed according to that of Han et al. [56].

Dough rheological properties test
Mature grains were milled into our using a mill for further testing. In this study, the protein and water contents of our were determined by a DA7200 multi-function near infrared analyzer. By referring to the formula of the "AACC54-40A" method, the main mixing parameters, such as the mixing time, middle peak height, middle peak time, middle peak at 8 min, and width at 8 min, were determined with a 10 g mixograph.    Microsynteny analysis between the E genome of Th. elongatum and other studied genomes. The relationship of these families is indicated by blue (ω-gliadin subfamily), red (γ-gliadin subfamily), and orange lines (LMW-GS subfamily), respectively.