Identification of pomelo UGTs
A total of 145 UGTs was identified in pomelo fruit as containing a consensus sequence (PSPG box) at the C-terminus of the protein. These UGT genes encoded predicted proteins ranging from 144 to 680 amino acids (average 459 amino acids) (Table S1). The molecular weight from 16.39 to 76.87 kDa (average Mw = 51.26 kDa) and the isoelectric point (pI) ranged from 4.82 to 9.18 (average pI = 5.81). Subcellular localization of these genes indicated that 88 UGT members (61% of UGTs) were probably in the cytoplasm, and 35 (24%) and 17 (12%) UGTs were most probably in the plasma membrane and chloroplast, respectively. Only one UGT (Cg7g000340) was predicted to be located in the mitochondria, two (Cg6g025740 and Cg8g023190) in the nucleus and two (Cg3g014800 and Cg3g014820) extracellular (Fig. S1, Table S2).
Phylogenetic analysis of pomelo UGTs
In order to explore the evolutionary relationships of plant UGT families, the phylogenetic tree was constructed based on the pomelo and other plant UGT protein sequences, including Arabidopsis, citrus, maize, tomato, grapevine, peach, apple, kiwifruit and strawberry (Fig. 1). All UGT members were divided into 16 phylogenetic groups, including 14 conservative groups (A-N) identified in Arabidopsis , and two newly identified groups O and P found in other plants, such as grapevine . Cm1_2RhaT (100% amino acid sequence identity to Cg1g023820) from pomelo and Cs1,6RhaT from sweet orange that were identified as flavonoid 7-O-UGTs [13, 15, 16], were clustered in group A. CsUGT76F1, located in group H, was identified as being involved in the biosynthesis of flavonoid 7-O-glucosides and 7-O-rhamnosides in sweet orange . Arabidopsis UGT73C6 (flavonol-3-O-rhamnoside-7-O-glucosyltransferase)  and strawberry FaGT7 (flavonol-3-O-glucosyltransferase)  were located in group D. Other UGTs responsible for flavonol-3-O-glycosylation were located in group F, including UGT78D1 from Arabidopsis , and VvGT5 and VvGT6 from grapevine .
Three putative terpenoid UGTs were isolated in ‘Valencia’ sweet orange, CsUGT1 and CsUGT3 were clustered in group L, while CsUGT2 was clustered in group D . Several UGT73 (belonging to group D) and UGT71 family members (belonging to group E) were functional in the biosynthesis of anthocyanins and the glycosylation of volatile metabolites, including terpenoids [23, 24]. Some other UGTs in group G also have been functionally characterized as participating in terpenoid glycosylation and affecting fruit flavor, such as kiwifruit AdGT4 , grapevine VvGT14 , and peach PpUGT85A2 .
Distribution of plant UGTs in phylogenetic groups
The evolutionary pattern of the plant UGT gene family was analyzed by comparing the distribution of UGTs in the different phylogenetic groups (Table 1). During the evolution of higher plants, the five phylogenetic groups A, D, E, G, and L appeared to expand more than others, although the number of genes found in these groups varies widely among species. In pomelo, six phylogenetic groups, A, D, E, H, I, and L expanded more than the other groups. There were only 9 pomelo UGTs in group G, more than in Arabidopsis (6 UGTs), but much fewer than in other plants, especially peach and apple, which had up to 34 and 40 UGTs respectively in group G. The number of pomelo UGTs in the group I accounted for 12% of the total pomelo UGTs, much higher than in other plants (Fig. 2A). The proportion of pomelo UGTs in group H (about 12% of the total pomelo UGTs) was much higher (1.5~12 fold) than in other fruits such as peach (Prunus persica), apple (Malus x domestica) and grapevine (Vitis vinifera) (Fig. 2A).
It was worth noting that the number of plant UGTs in group I and group M was significantly increased by comparing the number of plant UGTs in each phylogenetic group with those reported in Arabidopsis (Fig. 2B). In Arabidopsis, there was only one UGT member in group I, while other plants contained 5-17 members, and the number of UGTs in pomelo was the highest (Table 1; Fig. 2B). In group M, the number of UGTs ranged from one in Arabidopsis to 14 in peach, and there was a 7-fold difference between Arabidopsis and pomelo. In addition, the number of pomelo UGTs in some groups was reduced relative to Arabidopsis, including group C, F, H and L.
Chromosomal location of UGT genes in pomelo
To summarize the genomic distribution of pomelo UGT genes, the chromosomal location of UGT genes was investigated based on the genome annotation information retrieved from the pomelo genomic databases (Fig. 3; Table S3). A total of 139 UGT genes were unevenly distributed on the pomelo genome of 9 chromosomes, the remaining 6 UGT genes were localized on the unknown chromosome (chrUn), including CgUng002730 of group K, CgUng021570 of group I, and four UGT genes of group D. In the pomelo genome, chromosome 2 contained the most UGT genes (23 members), followed by 21 members located on chromosome 8, and 20 members on chromosome 6. Only five members were distributed on chromosome 4, which contained the least number of UGT genes.
Since pomelo UGTs could be divided into 16 groups, the localization of these groups on the chromosomes was observed (Fig. 3). The UGT genes of group E with the most members (25 genes) were randomly distributed across eight chromosomes (chromosome 1-7 and 9). For group I, chromosome 3 and 5 each contained six UGT genes, chromosome 6 had three UGT genes, and the remaining two members were located on chromosome 8 and unknown chromosome. Group M contained 7 UGT genes, five of which were located on chromosome 2 and two were located on chromosome 1.
Structural analysis of UGT genes in pomelo
To better explore the relationships among the structure and function of pomelo UGT genes, and further clarify the evolutionary relationships within the UGT gene family, the exon/intron structure was analyzed. Among the 145 pomelo UGT genes, 70 UGTs (48%) had no introns, 63 UGTs (43%) contained one intron, whereas the remaining eight UGTs contained two introns, and three UGTs contained three introns, only one UGT contained eight introns (Table 2; Fig. S2). For UGT groups, group E contained the largest number of genes with losing introns (22 members), followed by 15 in group D and 13 in group A. All members of group B, C and O had no introns. Most of the UGTs in group H and I contained one intron with 14 UGTs out of all 17 members (82%) in each group.
After searching for all of the 75 intron-containing sequences and mapping the introns to the amino acid sequences, 10 independent intron insertion events were observed in the pomelo UGT gene family members (Fig. 4). Based on the positions in the protein sequences, these insertion events were numbered sequentially from I-1 to I-10. Intron 5 (I-5) was indicated to be a highest conserved intron, which was the most widespread intron of UGTs, containing 48 UGT members (64% of the intron containing UGTs), except for group E, L and M. All members of group F, K, J, N and P, and most members of group G, H and I contained the intron 5. A total of eight of the nine UGTs in group G contained intron 5, 13 of 17 in group I, and 12 of 17 in group H. Intron 6 was mainly observed in group L.
Most of the total 96 introns identified in the UGT gene structures of pomelo were in phase 1, accounting for 64% (61 introns), followed by 28% (27 introns) in phase 0, and only 8% (8 introns) in phase 2 (Fig. 4). For the highly conserved intron 5, each had one intron in phase 0 and 2, while phase 1 accounted for 96% of all introns. All members of intron 4 and 7 out of 9 UGTs with intron 6 were in phase 0. These findings indicated that most of the high conserved introns were ancient elements and their phases remained stable during evolution.
Expression profiles in different fruit tissues during development and ripening
To detect the expression profiles of 145 pomelo UGT genes, transcript abundances of UGTs in different fruit tissues during development and ripening were analyzed using RNA-seq data (Fig. 5; Fig. 6). The four different tissues of pomelo fruit were flavedo, albedo, segment membrane (SM) and juice sacs (JS) (Fig. 5A, 5B). A total of 111 UGT genes (accounting for 84.1% of total pomelo UGTs) were expressed in all four fruit tissues. Additionally, 4 UGT genes (accounting for 3%), three UGT genes (2.3%), and one UGT gene (0.8%) were specifically expressed in JS, SM, and flavedo, respectively, but no genes were specifically expressed in albedo (Fig. 5C).
For pomelo fruits at different developmental stages, nearly half (71 members) of the UGT genes showed the highest level of transcript in the flavedo (Fig. 6). Among them, 13 UGTs (52%) belonged to group E, 10 UGTs (56% of members in group D) belonged to group D, 8 UGTs (47%) in group H, 9 UGTs (53%) in group I, and 7 UGTs (58%) in group L. The expression levels of 29 members in pomelo were highest at green stage (80 DAB), followed by 23 at color break stage (140 DAB), and 19 at mature stage (200 DAB). Cg1g023820 in group A had 100% identity with the amino acid sequence of Cm1_2RhaT from pomelo, which was identified to be a flavonoid 7-O-UGT [13, 15, 16], showed the highest transcript levels at color break stage (140 DAB) and mature stage (200 DAB) in flavedo of pomelo fruit. This finding was consistent with previous research. Considering the largest number of pomelo UGT genes in group E, the expression patterns in different fruit tissues during development and ripening were further analyzed (Fig. 6). A total of nine UGT genes showed the highest abundance of transcript at the green stage (80 DAB) of flavedo, two genes at the color break stage of flavedo, and two at the mature stage. Only one UGT gene, Cg2g037190 expressed highest at the color break stage and mature stage of albedo, while the two genes Cg3g022390 and Cg6g012170 had the highest expression level in the segment membrane (SM) of pomelo fruit, and three genes predominantly expressed in the juice sacs (JS).