Identification and genomic distribution of B3 superfamily in citrus
A total of 72 (CsB3) and 69 (CgB3) B3 superfamily TFs were identified in the sweet orange (Citrus sinensis) and pummelo (C. grandis) genomes, respectively (Additional file 1). B3 superfamily members were classified into LAV, RAV, ARF and REM families, then systematically named according to their sequence similarity. In citrus, REM was found to be the biggest B3 family, with 52.8% (38 CsREMs) and 55.1% (38 CgREMs) of the total B3 genes identified in sweet orange and pummelo, respectively (Additional file 1). ARFs constituted the second largest family with 26.4% (19 CsARFs) and 24.6% (17 CgARFs) of the B3 genes in sweet orange and pummelo. The LAV and RAV families were much smaller, with 11.1% (8 CsLAVs) and 9.7% (7 CsRAVs) of B3 genes identified in sweet orange, and 11.6% (8 CgLAVs) and 8.7% (6 CgRAVs) of B3 genes identified in pummelo.
CsB3 TFs were distributed over eight of the nine sweet orange chromosomes. None of the CsB3 genes was located on chromosome 9 (Fig. 1A). The CsB3 gene density per chromosome was variable, with only three genes (4.2%) (namely CsRAV5, CsARF11 and CsARF17) on chromosome 4, but up to 17 (23.6%) of the 72 members on chromosome 5. Relatively high densities of CsB3 genes were observed at the chromosome ends, with the highest density at the bottom of chromosome 5. However, it should be noted, the chromosomal locations for 10 CsB3 genes were not defined because of the incompleteness of sweet orange physical genome map. The distribution and density of CgB3 TFs was also not uniform on the nine chromosomes of pummelo (Fig. 1B). Chromosome 8 contained the largest number of 19 (27.5%) CgB3 genes, whereas on chromosome 1 there were only three (4.3%) CgB3 genes.
Orthologous genes of the B3 superfamily between sweet orange and pummelo were not located consistently on the same citrus chromosomes. For example, CsLAV7 was on chromosome 1 of sweet orange (Fig. 1A), whereas its orthologous gene CgLAV7 was on chromosome 2 of pummelo (Fig. 1B). These different locations of B3 TFs on chromosomes between citrus species indicated that genetic recombination has occurred extensively in citrus varieties. Among all identified CsB3 genes, a total of ten chromosomal segmental duplication events and four tandem duplication events were identified in the sweet orange genome, whereas in the pummelo genome the corresponding events were eleven and nine respectively (Fig. 1 and Additional file 2), indicating that segmental and tandem duplications may have contributed to the expansion of citrus B3 superfamily. Segmentally duplicated gene pairs (average Ka/Ks=0.22, where Ka/Ks is the non-synonymous/synonymous substitution ratio) appeared to have undergone extensive intense purifying selection compared to tandemly duplicated gene pairs (average Ka/Ks=0.52). The Ka/Ks ratios for the majority (82.4%) of the duplicated pairs were less than 0.5, suggesting that the citrus B3 superfamily has evolved under the effect of purifying selection. However, the other two tandemly duplicated gene pairs (CgREM28-1/CgREM28-2 and CgREM6-1/CgREM29-2) seemed to be under neutral selection, as their Ka/Ks ratios were close to 1.0.
To further explore the relationship of B3 superfamily genes between citrus and other plant species, comparative syntenic analyses were conducted in a pairwise manner (Fig. 2), with 37 and 24 collinear B3 gene pairs identified in the sweet orange/Arabidopsis and sweet orange/rice pairs, respectively (Additional file 3). For pummelo/Arabidopsis and pummelo/rice comparisons the corresponding gene pair numbers were 39 and 24. The number of orthologous events of CsB3/CgB3-AtB3 was higher than that of CsB3/CgB3-OsB3, indicating that the divergence between citrus and Arabidopsis occurred after the divergence of rice and the common ancestor of dicotyledons. It was noteworthy that some B3 collinear gene pairs of citrus/Arabidopsis were anchored to highly conserved syntenic blocks, in which the number of syntenic gene pairs was up to 246, whereas none of syntenic blocks of citrus/Oryza sativa pairs contained more than 20 genes (Additional file 3). The high level of syntenic conservation between the citrus and Arabidopsis indicated that B3 TFs in citrus might share similar structures and functions with orthologs in Arabidopsis.
Characterization of B3 proteins in citrus
The amino acids length of putative citrus B3 proteins varied widely, ranging from 93 to 1134 (Additional file 1). A few genes had short coding sequence lengths and showed very low expression levels in all samples studied (RPKM<1 by RNA-Seq; RPKM: reads per kilobase per million mapped reads) (Fig. 3 and 4), indicating that they may be pseudogenes. The molecular weights and theoretical isoelectric points were also diverse (Additional file 1). The majority of B3 TFs contained only one B3 domain except for some REM family members (Fig. 3D and 4D). A molecular modelling study was then undertaken using the known core structure of the B3 domain crystallized from AtFUS3 (Protein Data Bank code: 6j9b.2; Additional file 4) . Our results showed that the crystal structure had a high degree of sequence identity (88.46%) to the experimentally determined template structure, suggesting that a reliable model was generated. The amino acid sequences alignments showed that the B3 domain sequences were highly conserved in LAV (overall GUIDANCE alignment score=0.984), RAV (overall GUIDANCE alignment score=0.906) and ARF families (overall GUIDANCE alignment score=0.998) (Additional file 5), whereas the B3 domains of REM family exhibited a higher degree of divergence (overall GUIDANCE alignment score=0.772) (Additional file 6). A total of 20, 38, and 24 highly conserved amino acid residues were identical among the B3 domains of all the LAV, RAV, and ARF family members, respectively (Additional file 5). For REM family members, only some conserved amino acid residues including one proline (position 31, P), two tryptophans (position 72 and 97, W), three glycines (position 70, 96 and 109, G) and three phenylalanines (position 34, 100 and 114, F) were observed in the B3 domains (Additional file 6), which indicated that the B3 domain might have been evolved independently in the REM family.
Phylogenetic analyses of B3 genes
To explore the phylogenetic relationships of the B3 superfamily, an unrooted phylogenetic tree was constructed among the B3 genes of citrus (sweet orange and pummelo) and the model plant Arabidopsis (Additional file 7). In most subgroups, internal nodes were supported by confidence values of at least 70%, indicative of good consistency in the topology. The tree is in general agreement with Arabidopsis B3 superfamily trees published previously [1, 4], which further corroborates the reliability of the tree. In order to test the reliability of the tree topology, protein domain architecture (which was not used in the construction of the tree) were used to provide additional support for the proposed phylogeny. In addition to the B3 domain, other conserved motifs are highly clade specific (Fig. 3D). For example, the ARF and AUX/IAA motifs are specifically shared by ARF family. The distribution of the CW-type zinc finger motif supports the tree grouping of CsLAV3/CgLAV3, CsLAV4/CgLAV4 and CsLAV5/CgLAV5 together. Presence of the AP2 domain is also largely clade dependent in the RAV family. The fine structure of the trees is also supported by intron/exon structure data, with a few minor exceptions (Fig. 3C and 4C). For example, all the coding sequences of the ARF genes were disrupted by 2 to 15 introns, while the RAV family contained no more than one intron, except CgRAV5.
According to the classification criteria in Arabidopsis, we divided the members of the major four families into fourteen major subgroups (Fig. 3A and 4A). The LAV family could be subdivided into two subgroups, i.e. LEC2-ABI3 subgroup (I) and VAL subgroup (II). Four CsLAVs in sweet orange (CsLAV1, CsLAV2, CsLAV6 and CsLAV8) and their counterparts in pummelo (CgLAV1, CgLAV2, CgLAV6 and CgLAV8) were clustered with the Arabidopsis LEC2-ABI3 subgroup. The VAL subgroup of four citrus LAV genes (CsLAV3/CgLAV3, CsLAV4/CgLAV4, CsLAV5/CgLAV5 and CsLAV7/CgLAV7), which had a conserved B3 domain and a CW-type zinc finger, were clustered with three Arabidopsis VAL proteins (Fig. 3 and Additional file 7).
The RAV family was grouped into two main subgroups based on their phylogenetic relationship. Subgroup I comprised three citrus RAV genes (CsRAV1/CgRAV1, CsRAV2/CgRAV2 and CsRAV4/CgRAV4) that clustered with four AtNGA genes and three AtRAV-like genes from the same branch (Fig. 3A and Additional file 7). These genes commonly had the conserved B3 domain and contained no more than one intron (Fig. 3C and 3D). Subgroup II comprised of four CsRAV genes (CsRAV3, CsRAV5, CsRAV6 and CsRAV7) and three CgRAV genes (CgRAV3, CgRAV5 and CgRAV6), featuring a B3 domain with an upstream AP2 domain (Fig. 4D), and having no introns, except CgRAV5 (Fig. 3C).
Citrus ARF genes were classified into four major subgroups. Subgroup I and II belonged to the same branch, and contained 6 members (CsARF1/CgARF1, CsARF3/ CgARF3, CsARF5/CgARF5, CsARF11/CgARF11, CsARF17/CgARF17 and CsARF18) and 5 members (CsARF2/CgARF2, CsARF7/CgARF7, CsARF8/CgARF8, CsARF15/CgARF15 and CsARF16/CgARF16), respectively (Fig. 3A and Additional file 7). Most of these genes were characterized as having a B3 DNA binding domain, ARF and AUX/IAA domains (Fig. 3D). Subgroup III (CsARF4/CgARF4, CsARF6/CgARF6, CsARF10/CgARF10 and CsARF19) and Subgroup IV (CsARF9/CgARF9, CsARF12/CgARF12-CsARF14/CgARF14) only had the B3 and ARF domains. As most of the REMs in citrus possessed multiple B3 domains and shared low sequence similarity (Fig. 4D and Additional file 6), the phylogenetic analyses were performed within each subgroup of the REM family. The first step of the phylogenetic analysis was comparison of the AtREM sequences with CsREM/CgREM sequences according to the previous study  (Additional file 7). After this initial analysis, six common REM subgroups (REM I and REM VI to REM X) were identified between citrus and Arabidopsis, whereas REM V (AtREM5) was exclusively identified in Arabidopsis. The vast majority of subgroup I and subgroup II genes contained one B3 domain, and shared homology with the AtREM I and VII type genes, respectively (Fig. 4 and Additional file 7). Subgroup III and IV genes belonged to the AtREM IX and X types, respectively, which possessed only one B3 domain. Subgroup V (AtREM VI) and subgroup VI (AtREM VIII) genes contained several members, the majority of which had more than one B3 domain.
Expression profiles of B3 genes in different tissues and during somatic embryogenesis
To understand the tissue expression profiles of the B3 genes in citrus, we compared their transcript abundance based on previously published RNA-seq data of different tissues including leaf, fruit, embryogenic callus, flower, ovule and seed from sweet orange and pummelo (Fig. 3B and 4B). Many citrus B3 genes exhibited high transcript abundance level in all five tissues. However, the LEC2-ABI3 subgroup and two REM classes (REM IX type and REM X type) exhibited relatively lower expression levels compared with other CsB3 genes. In addition, some of the B3 TFs exhibited tissue-specific expression. For example, CsLAV1/2/6/7, CsARF9/19, CsREM3/4/6/7/9/13/14/17/27/28/29 showed the highest transcript abundance in the embryogenic callus (EC), whereas CsREM24 was expressed predominantly in fruit. Some duplicated gene pairs also showed divergent expression profiles. For example, CgARF13 showed a low expression level (RPKM=2.76) in fruit; whereas its duplicated gene, CgARF14, was highly expressed (RPKM=56.13) in fruit. These results suggest that duplicated genes may evolve to have diverse functions. Some clustered citrus B3 genes, which were identified as orthologous genes between sweet orange and pummelo species, showed different expression profiles. For example, CgARF17 was mainly expressed in leaf (RPKM=59.06) and ovule (RPKM=57.40) of pummelo, whereas its orthologous gene (CsARF17) in sweet orange showed relatively low expression in all citrus tissues studied, with RPKM values ranging from 4.16 to 7.57.
To explore the possible involvement of CsB3 genes during citrus SE, the expression profile of 23 CsB3 genes was investigated by qRT-PCR in the six SE stages of ‘Valencia’ orange, a citrus variety with strong SE capability. These genes were selected based on their relatively high transcript abundance (RPKM values > 10) in EC, or specific accumulation in EC with lower expression level (1 < RPKM values < 10) according to the RNA-seq data. Based on their expression profiles, these genes could be classified into four types (Fig. 5). The expression of Type I genes was up-regulated during differentiation and showed a highest peak value at the E2 stage (embryogenic callus induced for somatic embryos for 2 weeks；CsARF1, CsARF14, CsREM17 and CsREM18) or E4 stage (embryogenic callus induced for somatic embryos for 4 week；CsLAV1, CsREM4, CsREM5，CsREM13 and CsREM29), and then down-regulated at the early embryo morphogenesis stage (GE, globular embryos), whereas they showed another high peak at the late embryo morphogenesis stage (CE, cotyledon embryos). Type II genes comprise five CsLAVs (CsLAV2, CsLAV3, CsLAV5, CsLAV6 and CsLAV7), one CsRAV (CsRAV3), two CsARFs (CsARF5 and CsARF19) and one CsREM (CsREM27), and were specifically expressed highly at the CE stage, some of which also showed high transcript abundance in one other stage. For Type III genes (CsLAV4, CsARF12 and CsREM6), the mRNA abundance was down-regulated during differentiation stages (E0-E4, embryogenic callus induced for somatic embryos for 0-4 weeks), but was higher at the subsequent stages of embryo morphogenesis (GE or CE). Genes in Type IV (CsARF7 and CsREM9) increased progressively throughout the whole SE process.
A total of 15 CsB3 genes which were preferentially expressed in EC were retrieved from the RNA-seq data, including five CsLAVs (CsLAV1 to CsLAV4 and CsLAV7), two CsARFs (CsARF12 and CsARF19) and eight CsREMs (CsREM4 to CsREM7, CsREM9, CsREM13, CsREM27, CsREM29) (Fig. 3B, 4B and 6). Among their orthologous genes, eight (five CgLAVs, CgREM13, CgREM27 and CgREM29-1) were preferentially expressed in the ovules and/or seeds of pummelo (Fig. 6), suggesting that these genes may be associated with embryogenesis invivo and invitro. Meanwhile, eight B3 genes were identified in the genome of sweet orange, but not in that of pummelo, including CsRAV7, CsARF18, CsARF19, CsREM24, CsREM25, CsREM33, CsREM37 and CsREM38 (Fig. 6). Among them, CsARF19 (Cs7g02210) showed markedly higher expression levels (≥6-fold) in EC compared with the other tissues (Fig. 3B), indicating its potential association with callus initiation, because empirically, EC can only be induced from the seeds of the polyembryonic citrus genotypes. With the availability of the citrus genome sequences [46-50], two orthologs of CsARF19, MSYJ162170.1 (amino acids sequence identity of 99.36%) and Ciclev10030751m (amino acids sequence identity of 99.87%), were identified in Mangshan mandarin (C.reticulata, a wild mandarin) and Clementine mandarin (C. clementina, which is believed to be a chance hybrid of mandarin and sweet orange) [48, 50, 51], respectively, but not in Atalantia (Atalantia buxifolia, a primitive citrus), Ichang papeda (C. ichangensis, a wild citrus) and three genera related to citrus, viz. Hongkong kumquat (Fortunella hindsii), trifoliate orange (Poncitrus trifoliata) and citron (C. medica).