Identification and Chromosomal Distribution ofUGTgenes inQ.robur
A total of 244 UGT genes were identified from the genomic sequencing data of Q.robur. The location of Q.robur UGT genes on chromosomes is shown in Fig. 1. A total of 219 UGT genes were located on 12 chromosomes, and another 25 UGT genes were located on unassembled chromosomal fragments. The number of UGT genes and tandem repeat UGT genes on different chromosomes of Q.robur is shown in Fig. 2. The most 39 UGT genes are located on chromosome 11, and the least UGT genes are located on chromosome 5, with only 6. Genes location information on chromosomes shows that UGT gene family in Q.robur is closely arranged on each chromosome, especially on chromosomes 10 and 11. Part of these tightly arranged UGT genes are tandem repeats, such as P0398930.2, P0398960.2, P0398970.2, and P0398980.2 on chromosome 4, and other closely packed genes are not tandem repeats, such as P03156000.2, P0315610.2, P0315640.2, P0315650.2, P0315660.2, and P0315680.2 on chromosome 8. There are 21 gene clusters on Q.robur chromosomes, distributed on chromosomes 1, 2, 3, 4, 5, 6, 7, 10, and 11 (Fig. 1).
Collinearity analysis showed that only 318 genes had collinearity among 28,488 genes (Fig. 3), accounting for 1.12% of all genes. The type collinear genes include genes encoding ribosomal subunit protein, genes encoding NADH plastid quinone oxidoreductase subunit, genes encoding methyltransferase, genes encoding receptor protein, and genes encoding disease resistance protein. Among them, the number of TMV resistance protein is the most, with 32. Among 244 UGT genes, only P0376460.2 and P0376630.2 on chromosome 12 were collinear.
Phylogenetic Analysis ofUGTSequences
A phylogenetic tree was constructed by combining 244 Q.robur UGT protein sequences, 14 UGT sequences identified in A.thaliana, and 2 UGT sequences identified in Z.mays, including 260 protein sequences. UGTs in Q.robur were divided into 15 groups (Fig. 4). Q.robur has O and P groups that did not exist in A.thaliana, and K group was lost in Q.robur. Two UGTs in Z. mays were clustered respectively in O and P groups; 3 UGTs in O group and 13 UGTs in P group of Q.robur. The number of glycosyltransferases in the 5 groups A, D, E, G, and L are larger; among them, group E has the largest number of UGTs, including 57 UGTs, which is the largest group, and group N has the least number of UGTs, only 1. P0481590.2 is not clustered and is divided into a single branch.
Conserved Sequence and Gene Structure Analysis
The results of UGT genes structure of Q.robur are shown in Fig. 5b. The length of UGT genes is mostly less than 2,000 bp, with the longest of 5,000 bp and the shortest of 1,000 bp. There were 73 UGT genes with complete UTR sequence, 15 genes with 5`UTR, 22 genes with 3`UTR, 134 genes without UTR, UGTs in groups L and G almost had no UTR sequence. UGTs in groups G, H, J, P, and L had long intron sequences, and all genes had no more than 3 intron fragments, while UGTs in groups A, B, C, D, E, O, and M had no intron sequence. The homologous UGT genes in each group showed a similar structure. For L group, P0159440.2, P0159420.2, P0159330.2, P0159450.2, P0159430.2, P0159410.2, P0159310.2, P0159300.2, P0159360.2, P0159340.2, P0159320.2, P0296290.2, P0159380.2, P0288720.2, P0249750.2, P0760750.2, and P0350810.2 demonstrated great differences from other UGTs in the group, and they contain short introns or no intron structures.
The average length of UGT sequence is 433 amino acids, and N50 is 459 amino acids. Ten UGTs conserved motifs, named motif 1 ~ 10, including motif 1 of PSPG motif and other conserved protein motifs, were identified by online tool MEME. MEME analysis showed that unique PSPG motif (motif 1) of UGT gene family in Q.robur was located near the 3' carboxyl end. Motif 6 was lost in UGTs of group J, motif 7 and 10 nearly 3' carboxyl-terminal of group A UGTs were lost, and motif 5 close 5' terminal of some group P UGTs were lost (Fig. 5a). The PSPG motif of Q.robur (Fig. 6a) and A.thaliana (Fig. 6b) was constructed by intercepting all UGTs screened from Q.robur and A.thaliana. It was found that PSPG motif was relatively conservative in the two species.
Analysis of Cis-Acting Elements inUGTGene Promoter Regions
The cis-acting elements of promoter sequence of UGT gene family of Q.robur can be divided into plant hormone response, involved in light responsiveness, resistant response, promoter core elements, and common cis-acting element 5 categories. Plant hormone response elements include auxin-responsive element, cis-acting element involved in gibberellin-responsiveness, cis-acting element involved in abscisic acid responsiveness, and MeJA reaction involved cis-acting regulatory elements. Cis-acting related to stress resistance include temperature, drought, hypoxia, etc.. In addition to core elements of promoter and general cis-acting elements, the number of light-responsive cis-acting elements is the largest, with 2,769. On average, there are about 10 light-responsive cis-acting elements per UGT gene. The resistance-related cis-acting elements are the least, only 984. The structure of promoter of Q.robur gene is shown in Fig. 7a. The specific Q.robur UGT genes P0057520.2, P0057600.2, P0057620.2, P0057610.2, P0057550.2, P0057640.2 and P0350810.2 have very similar promoters. The structure contains many cis-acting elements related to light response, and they belong to L group of UGT family of Q.robur. The number of different type cis-acting elements in the promoter region of Q.robur is displayed in Fig. 7b.
Expression analysis ofUGTgene inQ.robur
When Q.robur was under T. viridana stress (Fig. 8a), UGT genes of two different Q.robur varieties in the same group had a similar expression pattern. For example, in the two different Q.robur varieties, the expression of UGTs in E group decreased under T. viridana stress, the expression of UGTs in L group was increased under T. viridana stress, and that of UGTs in D group was significantly different in the two Q.robur varieties. For UGTs of resistant Q.robur variety, P0640850.2, P0703800.2, and P0031820.2 in D group were down-regulated. The expression of P0170550.2, P0703800.2, and P0031820.2 of susceptible Q.robur variety UGT genes of D group was up-regulated under stress, and that of P0640850.2 and P0170550.2 was down-regulated.
When Q.robur was under drought stress, UGTs in group D showed different expression trends. P0640850.2 and P0640800.2 were down-regulated under drought stress, while P0571810.2 was up-regulated. The genes in group D show diverse expression patterns under different stress environments (Fig. 8b).