Alcohol dehydrogenase (ADH) genes family in wheat (Triticum aestivum): Genome-wide identication, characterization, phylogenetic relationship and expression patterns

Alcohol dehydrogenase (ADH) plays important roles in plant survival under anaerobic conditions. Although some research has been carried out the functions of ADH in other plants, that of wheat TaADH family genes in response to abiotic stress are unclear. A total of 22 ADH genes were obtained from 14 chromosomes of the wheat genome by systematic screening. Multiple sequence alignment and evolutionary relationship show that these genes contain the characteristics of GroES-like domain and Zinc-binding domain, and these belong to Medium-chain -ADH type and can be divided into three subfamilies. There are 17 pairs of fragment replication genes among TaADH family members in the wheat genome, while there are 9 pairs of collinear gene pairs from ADH family members between wheat and rice genome. We speculate that these fragment repetition events may be the main reason for the amplication of TaADH family genes. Ka/Ks analysis indicated that there were 64 repetitive gene pairs, and the Ka/Ks value of these gene pairs was less than 1, which indicated that these sequences of TaADH gene were relatively conservative and did not change greatly in the process of evolution. Promoter element analysis showed that almost all of the upstream promoters of these genes contained the responsive anaerobic inducible element. Tissue localized expression and expression patterns also demonstrated that the TaADH genes responded to abiotic stress and may play an important role in waterlogging stress during the seed germination stage. The results of this study helpful further study TaADH determine family genes. In this study, we identied 22 ADH family genes in wheat through sequence alignment among the wheat whole genome (IWGSC RefSeq v1.0; IWGSC2018) and Arabidopsis, rice and melon genomes, which greatly expanded the previously reported (NCBI) of pairs are puried and selected. The Ks value was used to estimate the occurrence time of repetitive events. The results showed that 64 repetitive events of TaADH family genes occurred between 1.55 and 16.42 Mya, including 30 repetitive gene pairs with evolution time of 11.19–16.42 Mya and 12 repetitive gene pairs with evolution time of 7.73–9.56 Mya, all of which were earlier than the rst time of genomic replication in wheat [34]. The second time is that the subgenome of common wheat can be traced back to 300000 years ago, including Triticum urartu (A sub-genome) and the hybridization of the Aegilops speltoides to form tetraploid wheat (T. dicoccoides, A and B sub-genome). About 8, 000 years ago, tetraploid wheat, also known as the Aegilopstauschii (D subgenome, produced hexaploid wheat (T. aestivum, A, B, and D subgenomes) [35]. It showed that most of the TaADH family genes in wheat have undergone strict purication and screening in the process of evolution, and retained the original function through fragment replication events. Through phylogenetic tree analysis with other species, it was found that 22 TaADH genes in wheat belonged to medium-chain ADH type and were grouped into 3 subfamilies. Collinearity analysis within the wheat genome and between the wheat genome and rice genome showed that there were 17 fragment replication events among members of the TaADH family, while the number of collinear ADH gene pairs of wheat and rice was 9 pairs. According to the Ka/Ks and evolutionary years, it was found that there were 64 repetitive gene pairs with Ka/Ks less than 1, which tended to purication selection, indicating that strong purifying selection had taken place in the process of evolution of TaADH genes. Cis-acting elements and tissue expression analysis showed that these genes were responsive to 11 kinds of abiotic stress. TaADH3, TaADH5, TaADH8, TaADH10-11, TaADH14, and TaADH16-22 were expressed in all parts of wheat, belonging to constitutive expression, and other genes were specically expressed. By comparing the expression proles of waterlogging-tolerant wheat Bainong 607 and waterlogging-intolerant wheat Zhoumai 22 at different germination stages after waterlogging treatment, it was found that TaADH1/2, TaADH3 and TaADH9 played an important role in responding to waterlogging stress, and could be used as an important basis for screening waterlogging-tolerant wheat varieties. These results will provide valuable information regarding further functional elucidation of TaADH genes in wheat. relative humidity (16 h light/8 h dark cycles). In this study, two wheat (Triticum aestivum L.) varieties, ‘Zhoumai 22’ (ZM22, waterlogging-intolerant) and ‘Bainong 607’ (BN607, waterlogging-tolerant), were selected as the materials. ZM22 is a commercial winter wheat cultivar in Henan Province, China. The BN207 and BN607 seeds used in this study were cultivated by Prof. Xingqi Ou from the School of Life Science and Technology, Henan Institute of Science and Technology, Xinxiang, China. All the seeds were provided by the School of Life Science and Technology, Henan Institute of Science and Technology. endosperm RNase-free DNase avoid genomic DNA First-strand cDNAs the PrimeScript Reagent gDNA according to the manufacturer’s protocol. The qRT-PCR assays were performed with the Primer Script RT Reagent Kit (Takara, Dalian, China). The 18S (AJ272181.1) was used as a reference gene. Data were analyzed with Opticon monitor software (Bio-Rad). All primers for qRT-PCR were designed using Primer 6.0 software and primer follows: and s and s. Step three replicates. One three −ΔΔCt

six genes in wheat. By comparison and phylogenetic analysis with other species, it was found that these ADH family genes belong to the medium-chain ADH subgroup. In this study, the structure, promoter, tissue-speci c expression, and expression of 22 TaADH genes in responses to waterlogging stress of two wheat varieties with different sensitivity, Bainong 607 and Zhoumai 22, were investigated. The information obtained from this study will greatly promote our understanding of the gene function of the ADH family.

Results
Genome-wide identi cation A total of 22 TaADH genes in the wheat genome were further identi ed by BLAST with 26 ADH genes reported in muskmelon as bait and wheat genome database using ADH domains (PF00107.26, PF08240.12, and PF13602.6) through HMMER software. According to the location distribution of these genes on chromosomes, they were named TaADH1-TaADH22 (Table 1, Table S1). Except for chromosomes 2 and 3, TaADH genes were distributed on all chromosomes, including 6 genes on chromosome 4 and one TaADH gene on chromosomes 5, 6, and 7, respectively. The number of the intron for all 22 TaADH genes ranged from 7 to 9, while that of the exons ranged from 8 to 10. The length of amino acid in TaADH genes ranged from 347 to 415. The range of pI was from 5.68 to 8.2, and the molecular weight was among . Through the subcellular localization prediction of these genes, it was found that they were all located in the cytoplasm.

Alignment and evolutionary analysis
By comparing protein sequences of these 22 TaADH genes, it was found that most of the residues of these protein sequences were the same. Pfam scanning of the sequences showed that all of these sequences contained the characteristic motifs of ADH (GroES-like domain and Zinc-binding domain) (Fig. 1A), in which the residues of the GroES-like domain were within 35-164 amino acid, and the amino acid residues of the Zinc-binding domain were within 206-340 amino acid. However, the location of amino acid residues of the Zinc-binding domain in TaADH20-TaADH22 was different from those of other genes (marked with a blue box). It inferred that these genes belonged to the ADH family. To examine their evolutionary relationships in wheat and the other plant species: Arabidopsis thaliana (7), Cucumis melo (13), Cucumis sativus (12), Glycine max (3), Hordeum vulgare (1), Lycopersicon esculentum (7), Oryza sativa (1) and Vitis vinifera (8), a phylogenetic tree was constructed by multiple sequence alignment of 22 TaADH proteins using the adjacent linkage of (NJ) method (Fig. 1B). The predicted ADH genes were classi ed into three groups, namely short-chain ADH, medium-chain ADH and long-chain-ADH. 22 TaADH genes in wheat belonged to medium-chain ADH type. According to the evolutionary relationship, these genes can be divided into 3 subfamilies: Class I contained the largest number of TaADHs (15 genes, TaADH1-9 and TaADH11-15), followed by Class II (4 genes, TaADH10 and TaADH17-19) and III (3 genes, TaADH20-22).

Conservative domain analysis
Through the conservative analysis of TaADH genes in the wheat genome, it was found that the exon of TaADH genes in Class I had 9, and the distribution of intron number was similar; the TaADH genes of Class II had 9 exons, and the position of intron number distribution was similar; the TaADH genes of Class had 8 exons ( Fig. 2A). In order to further clarify the protein structure of TaADHs family members in wheat, we identi ed the conserved motif ( Fig. 2B and Fig.   S1) using MEME software, and found that the number of motifs in Class I TaADHs protein was 12 (such as Motif 1-7-4-9-2-8-5-11-6-10-3-14). However, the TaADHs protein of Class II has 11 motifs (such as Motif 1-7-4-9-2-8-5-11-6-10-3), and that of Class was different from other motif composition patterns (such as Motif 1-4-7-12-5-13-3). In order to further analyze the functional domains of these proteins, we analyzed the functional structure of these genes (Fig. 2C). The members of wheat TaADH family have highly conserved functional domains, in which Class I TaADHs protein was mainly alcohol_DH_plant, Class II TaADHs protein was mainly alcohol_DH_class_ domain, while Class III TaADHs protein was Zn_ADH10. Generally speaking, these TaADH family members contained the typical structural domain of alcohol_DH.

Chromosomes distribution and synteny analysis
From the distribution of TaADHs on the wheat chromosomes ( Fig. 3A and 3B ), it was found that 22 TaADHs family members were mainly distributed on 15 chromosomes, of which there were 3 genes on chromosomes 4A, 4B and 4D, respectively. These genes have tandem replication events. To explore the collinear relationship of the TaADHs in the wheat genome and between the wheat genome and rice genome, collinear analysis was carried out by MCScanX method ( Fig. 3A and 3B). We found that there were 17 fragment replication events among members of the TaADHs family in the wheat genome, including three homologous gene pairs on chromosomes 1, 6 and 7, and fragment replication in TaADH15 and TaADH16 of chromosomes 5B and 5D. However, there were 7 fragment replication events (TaADH5-TaADH8, TaADH6-TaADH9, TaADH7-TaADH10, TaADH5-TaADH11, TaADH6-TaADH12, TaADH8-TaADH11, TaADH9-TaADH12) in chromosomes 4A, 4B and 4D, which were related to the tandem replication events on chromosomes 4A, 4B and 4D. A total of 9 pairs of syntenic paralogs were found in wheat and rice genomes (Fig. 3B), in which of TaADH6 corresponds to LOC_Os11g10510.1; TaADH8 and TaADH11 correspond to LOC_Os11g10480.1; TaADH17, TaADH18, and TaADH19 correspond to LOC_Os02g57040.1; TaADH20, TaADH21, and TaADH22 correspond to LOC_Os08g01760.1.

Evolutionary analysis
The non-synonymous substitution rate (Ka), synonymous substitution rate (Ks), and Ka/Ks for 64 duplicated pairs were calculated to reveal the selection pressure of wheat TaADH family genes in the process of evolution (Fig. 4A, Table S2). It was found that the Ka/Ks of these duplicated pairs were less than 1, which tended to a pure selection, indicating that the sequence similarity of TaADH genes was very high and relatively conservative in the process of evolution. The evolution time of the duplicated events of TaADH genes can be divided into three evolution periods (Fig. 4B, Table S2), of which 30 copies of TaADH duplication genes occurred about 11.19 to 16.42 million years (Mya), 12 copies of TaADH duplicated gene pairs occurred about 7.73 to 9.56 Mya, and the other 22 copies of TaADH duplicated gene pairs occurred about less than 6 Mya, the time period mostly before the wheat polyploidization event. It showed that although these genes sequences were conserved, they were different in evolutionary time.

The cis-regulatory elements analysis of TaADH genes in wheat
To further identify the cis-regulatory elements located upstream of the TaADH genes, we selected the 2K bp promoter region upstream of the CDS of TaADH genes and used TBtools software to predict and visualize the cis-acting elements (Fig. 5). There were a variety of cis-acting elements in the upstream promoters of these genes, which are responsive to 11 kinds of stress (hormone response, anaerobic response, defense, and stress response, drought induction, light response, low-temperature response, etc.). Except for TaADH13, the upstream promoters of other genes contain elements (ARE), that respond to anaerobic induction, and TaADH6 and TaADH9 contained as many as six cis-regulatory elements of ARE. We also found that the upstream promoter of TaADH4 contains 8 cis-regulatory elements (ABRE) responsive to abscisic acid. The upstream promoter of TaADH3 contains as many as 14 cis-regulatory elements (TGACG-motif and CGTCA-motif) responsive to Me-jasmonic acid.
Tissue-speci c expression patterns of TaADH genes in different tissues and organs In RNA-seq data of different tissues and organs in T. aestivum from, FPKM values of transcript accumulation of 22 TaADH genes were obtained from publicly available expression data sets, and then the corresponding heatmaps of relative expression levels were generated using Heatmap tool. The transcription levels in various T. aestivum tissues, including the roots, leaf, stem, spike, grain, and seeding were examined (Fig. 6). We found that except for TaADH2, there was no expression of TaADH6-7 in leaves, but the expression of TaADH6 was the highest in grain and that of TaADH7 was the highest in stems. The expression pattern of TaADH4 was similar to that of TaADH6, and the expression level was the highest in grains. The TaADH1 and TaADH9 expressions were only detected in grains and roots of wheat, but not in other parts of wheat, while the expression pattern of TaADH15 was opposite to that of TaADH1 and TaADH9.
The expression of TaADH genes in wheat seed under waterlogging treatment To further analyze the response of two wheat seed with different waterlogging tolerance to waterlogging stress during the germination stage, we analyzed the relative expression levels of 22 members of wheat TaADH family (Fig. 7). The results showed that the expression levels of seven TaADH genes (TaADH1/2, TaADH13, TaADH17, TaADH18, TaADH19, TaADH20) in the seeds of the intolerant variety Zhoumai 22 were signi cantly up-regulated at 24 hours after waterlogging treatment compared with the control treatment, but there was no signi cant difference in the expression levels of TaADH1/2, TaADH17, TaADH18, TaADH19, and TaADH20 genes compared with the control treatment 72 hours after germination, only the expression level of TaADH13 gene showed an upward trend. Compared with the control treatment, the expression levels of 14 genes (TaADH1/2, TaADH3-6, TaADH8-13, TaADH19, and TaADH20) in the seeds of Bainong 607 were signi cantly up-regulated at 24 hours after waterlogging treatment, while the expression levels of TaADH1/2, TaADH3, and TaADH9 genes were signi cantly up-regulated at 72 hours after germination compared with the control treatment, while the expression levels of TaADH5, TaADH6, TaADH14, and TaADH16 genes decreased. The results showed that the difference between waterlogging-tolerant and non-waterlogging-tolerant varieties after waterlogging treatment was closely related to the early and rapid expression of TaADH genes.

Discussion
In recent years, with global warming, extreme weather occurs more frequently, in which ood disaster is one of the abiotic stresses faced by plants, and hypoxia will rst occur in the ooded environment. Waterlogging-tolerant plants can often stimulate alcohol fermentation and reduce lactic acid fermentation to prevent acidifying cytoplasm. To avoid cytoplasmic acidosis, ethanol production is necessary for plants to survive under anaerobic conditions [17]. The role of alcohol dehydrogenase has been reported in many species [18][19][20]. With the development of sequencing technology, the genome sequences of many species can be analyzed, which promotes the identi cation of plant gene families at the genome-wide level. The ADH genes have been detected in various plants, including tomato [5], rice [21], barley [22], melon [7], and pear [8]. With the publication of the wheat genome, it is possible to systematically study the function of wheat TaADH family members.

Structural characterization of TaADHs
A total of 22 TaADH genes were identi ed in the wheat genome database, including 6 TaADH genes published by NCBI. In this study, we found that there are 8-10 exons and 7-9 introns of TaADH family members in wheat, which is consistent with the 8-9 introns of ADH genes reported by Strommer [23]. For example, the ADH2/3 in barley and the ADH in wheat contain 8 introns. While ADH1 and ADH3 in Arabidopsis thaliana contain 6 and 4 introns, respectively, and ADH in Chinese cabbage contains 5 introns [23]. The increase in the number of introns in common wheat (Triticum aestivum L.) indicated that the TaADHs family was expanding signi cantly in the evolution from lower plants to higher plants, and the expansion of the number of ADH members could enhance the ability of organisms to adapt to more complex environmental changes.
The plant ADH family usually divides into short-chain-ADH and medium-chain-ADH. Kitaoka et al. [21] found that there was only one short-chain alcohol dehydrogenase / reductase SDRs (OsMAS/SDR110C-MS1) in rice, which belonged to the short-chain-ADH type, while the other long-chain-ADHs were rarely reported [24,25]. 22 TaADH family members identi ed in this study belonged to medium-chain-ADH, and these genes have a highly conserved functional domain (GroES-like domain and zinc-binding domain) (Fig. 1A). This was similar to the structure of pear PbrADHs reported by Qin et al. [8]. ADH is a member of the medium-length dehydrogenase/reductase (MDR) protein superfamily. There is a typical domain of GroES-like in MDR, which is determined by chaperonin-10, based on its similarity to GroES molecular chaperone [26]. Association of Zn co-factors with a primitive MDR may have occurred in the early days of atmospheric oxygen, when Zn is likely to have been a preferred co-factor due to its valence stability [27]. From the perspective of evolution, Vv-ADH2 in grape [28], CmADH1 in melon [29] and At-ADH1 in Arabidopsis thaliana [30] all clustered in a class of medium-chain zinc-bound ADHs (Fig. 1B). Therefore, 22 TaADH genes in wheat belonged to the typical ADHs family.

Phylogenetic analysis and evolution of TaADHs
Gene replication is an important evolutionary process of gene family expansion, and the replicated genetic material provides an opportunity for functional differentiation. Functional differences caused by gene duplication are considered to be an important factor in species formation and environmental adaptability [31]. Therefore, gene replication analysis can help us better understand the evolution of genes and species. Two repetitive genes located on the same chromosome are called tandem replication. They exist on different chromosomes but come from the same subgenome, which is called fragment replication [32]. There are 4 gene tandem replication and 17 fragment replication events among members of the TaADHs family in the wheat genome (Fig. 3A). Most of the fragment replication events occurred in homologous chromosomes, such as chromosomes 1, 4, 6, and 7. In addition, there were gene collinearity between chromosomes 4, 6 and 7 in the wheat genome and chromosomes 2, 8 and 11 in the rice genome, which indicated that wheat polyploidy was the main reason for the expansion of TaADH family members, which also con rmed the conservatism of TaADH family members on individual chromosomes.
Ka, Ks, and Ka/Ks can well explain the history of a gene or gene region facing selection pressure [33]. In this study, we found that there were 64 duplicate gene pairs in the TaADHs family, and their Ka/ KS values were all less than 1 (Fig. 4B) Mya, all of which were earlier than the rst time of genomic replication in wheat [34]. The second time is that the subgenome of common wheat can be traced back to 300000 years ago, including Triticum urartu (A sub-genome) and the hybridization of the Aegilops speltoides to form tetraploid wheat (T. dicoccoides, A and B sub-genome). About 8, 000 years ago, tetraploid wheat, also known as the Aegilopstauschii (D subgenome, produced hexaploid wheat (T. aestivum, A, B, and D subgenomes) [35]. It showed that most of the TaADH family genes in wheat have undergone strict puri cation and screening in the process of evolution, and retained the original function through fragment replication events.
Expression pro le analysis of TaADHs Gene promoters are important factors in regulating gene expression patterns. They regulate gene expression at the transcriptional and post-transcriptional levels [36]. Cis-acting elements in a speci c promoter region participate in tissue-speci c expression patterns under various environmental conditions. In the corresponding upstream region for a gene, there was a positive correlation between the number of cis-acting elements and multiple stimuli [37]. Through the prediction of cis-acting elements of wheat TaADH family members, it was found that these genes were selected by a variety of biotic and abiotic factors (Fig. 5). Alcohol dehydrogenases catalyze the reversible conversion of aldehydes to the corresponding alcohols. They are involved in the stress response of plants and are mainly responsible for the production of ethanol in an anaerobic environment. At the same time, it is also widely involved in other stress, activator, and abscisic acid responses [38]. ADH has a good protective effect on hypoxia stress after ooding [39], as well as seed development and pollen aerobic metabolism [40]. There are many isozyme genes of ADH in the seed. Studies on tracking the activity of ADH during seed development have proved that ADH isozyme genes are active at different times. When seeds were under hypoxia stress at the germination stage, ADH activity might play a very important role in germination [23]. Three ADH genes (HvADH-1, HvADH-2, and HvADH-3) in barley were found, in which the activity of HvADH-1 during aerobic growth could be detected, while the expression of HvADH-1 and HvADH-2 could be induced by hypoxia, while the expression level of HvADH-3 under hypoxia was signi cantly lower than that of the other two genes [41]. Proels et al. [42]found that HvADH-1 was the only constitutively expressed in barley seedlings. However, ADH-3 activity could not be detected in barley leaves under any conditions.
In this study, it was also found that TaADH3, TaADH5, TaADH8, TaADH10-11, TaADH14, and TaADH16-22 were expressed in all parts of wheat. This suggested that these genes might also have constitutive expression. Manangkil et al. [22]con rmed that the expression of ADH1 and ALDH2a in rice seedlings increased rapidly under submergence stress, but decreased rapidly when submergence was removed, indicating that higher express levels of AHD1 and ALDH2a might be one of the reasons why rice is more tolerant to submergence than other plants. When wheat seeds were exposed to anaerobic environment (waterlogging treatment), TaADHs was expressed differently with different exposure time. No matter the waterlogging tolerant variety Bainong 607 or the intolerant variety Zhoumai 22, the expression levels of TaADH1/2, TaADH13, TaADH19, and TaADH20 genes at 24 hours after waterlogging treatment were signi cantly higher than those of the control treatment (Fig. 7). From the tissue expression analysis, it was found that the relative expression of these genes was the highest in grains, especially the speci c expression of TaADH1 and TaADH13 in wheat grains. This suggested that these genes might be induced the rapid expression under anaerobic stress. However, after 72 hours of germination, the relative expression levels of TaADH1/2, TaADH3 and TaADH9 genes in Bainong 607 seeds were signi cantly up-regulated compared with the control treatment. However, only the expression level of the TaADH13 gene increased at 72 hours after germination of Zhoumai 22 seed (Fig. 7). This showed that TaADH1/2, TaADH3, and TaADH9 played an important role in responding to waterlogging stress and served as an important basis for screening waterlogging-tolerant wheat varieties.

Conclusions
A total of 22 TaADH genes were identi ed and analyzed at the wheat genome. These genes were distributed on 15 chromosomes, of which there were 3 homologous genes on chromosomes 4A, 4B, and 4D, respectively, and there were tandem replication events in these genes. According to Pfam domain compositions, it was found that all these protein sequences contained the characteristic motifs of ADH (GroES-like domain and Zinc-binding domain).
Through phylogenetic tree analysis with other species, it was found that 22 TaADH genes in wheat belonged to medium-chain ADH type and were grouped into 3 subfamilies. Collinearity analysis within the wheat genome and between the wheat genome and rice genome showed that there were 17 fragment replication events among members of the TaADH family, while the number of collinear ADH gene pairs of wheat and rice was 9 pairs. According to the Ka/Ks and evolutionary years, it was found that there were 64 repetitive gene pairs with Ka/Ks less than 1, which tended to puri cation selection, indicating that strong purifying selection had taken place in the process of evolution of TaADH genes. Cis-acting elements and tissue expression analysis showed that these genes were responsive to 11 kinds of abiotic stress. TaADH3, TaADH5, TaADH8, TaADH10-11, TaADH14, and TaADH16-22 were expressed in all parts of wheat, belonging to constitutive expression, and other genes were speci cally expressed. By comparing the expression pro les of waterlogging-tolerant wheat Bainong 607 and waterlogging-intolerant wheat Zhoumai 22 at different germination stages after waterlogging treatment, it was found that TaADH1/2, TaADH3 and TaADH9 played an important role in responding to waterlogging stress, and could be used as an important basis for screening waterloggingtolerant wheat varieties. These results will provide valuable information regarding further functional elucidation of TaADH genes in wheat.

Wide-genome identi cation of TaADH genes in wheat
The whole-genome data was downloaded from the wheat genome database

Sequences alignment and evolutionary analysis of TaADH genes in wheat
The ADH protein sequences were downloaded from the genomes of Arabidopsis thaliana, Chinese spring wheat, rice, and muskmelon respectively. The protein sequences of these species were compared by ClustalW2 (http://www.genome.jp/tools-bin/CLUSTORW) software, and the phylogenetic tree was constructed  Exon-intron structures and motif compositions of TaADH proteins in wheat. The sequence information for each motif is provided in Fig. S1.  Cis-acting elements distribution and statistical analysis of TaADHs promoters. The color of the square depicts the quantity of the predicted cis-acting elements in the promoter region.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.