B. napus contains a large number of bHLHs
To identify bHLH encoding genes in the B. napus genome (Darmor–bzh), a preliminary repeated BLASTP search (e values of < 1.0) was performed using the representative sequences of Arabidopsis bHLH proteins as queries (Additional file 1: Table S1). To ensure the integrity of the bHLH protein data in B. napus, we referred to the method of Guo et al. [29] and searched the two sequenced B. napus cultivar genomes in GENOSCOPE (Darmor–bzh, http://www.genoscope.cns.fr/brassicanapus/)[30] and the NCBI database (ZS11, https://www. ncbi.nlm.nih.gov/genome/annotationeuk/Brassica_napus/101/)[31]. The sequence information of the candidate genes in these two cultivars were manually compared and corrected.
Originally, a large number of deduced amino acid sequences containing bHLH domains were obtained. Subsequently, redundant sequences were discarded, and the bHLH domains were verified in the remaining sequences by ExPASy. Consequently, 11 genes were excluded from our dataset as no bHLH domain was identified by ExPASy analysis. Meanwhile, the sequence information of 65 BnabHLHs from Darmor–bzh were corrected by the data from the ZS11 genome (Additional file 1: Table S1). Finally, a total of 602 BnabHLHs with relatively complete open reading frames [32] were obtained in this study, accounting for approximately 0.60% of the B. napus protein-coding genes. The corresponding proportion in wheat, rice, maize, and Arabidopsis are 0.55, 0.47, 0.59, and 0.61%, respectively [16]. The candidate BnabHLHs were then named according to their chromosomal distribution (Additional file 1: Table S1). Analysis of BnabHLH physicochemical properties showed that the BnabHLH proteins varied in length from 63 to 1440 aa; their molecular weight ranged from 6.9 (BnabHLH105) to 165 kDa (BnabHLH381); and their isoelectric points ranged from 4.36 (BnabHLH562) to 11.79 (BnabHLH023). Subcellular localization analysis demonstrated that all BnabHLHs were located in the nucleus (Additional file 1: Table S1).
For further comparative analysis across different species, we identified 245 bHLHs in the B. oleracea genome by the same method (Additional file 2: Table S2). Sequence information of the bHLHs in other species like Arabidopsis, B. rapa, tomato, potato, and rice were obtained from previously published studies [8-10, 16, 33].
Sequences characteristics of the bHLH domains of BnabHLH proteins
To investigate the aa sequence features, we performed multiple sequence alignment analysis of the 602 bHLH domains of candidate BnabHLHs. The results were visualized using Weblogo online software.
Our results showed that the length of the BnabHLH domains was approximately 55 aa, ranging from 39–57 aa. The bHLH domain was generally conserved in this gene family in B. napus, with ten residues being identified as having a conservation of more than 70% in the bHLH domains (Fig. 1a), including four located in the basic region, five in the two helix regions, and one in the loop region. Consistent with other studies [7-10], Leu-25 was the most conserved residue, with a conservation of almost 100% (Additional file 3: Table S3), indicating its essential role in bHLH proteins. Interestingly, Phe-30 was partly substituted by Ser in Arabidopsis, rice, and tomato, among others, however, no such substitution was observed in B. napus, B. oleracea, or B. rapa, (Additional file 3: Table S3), suggesting a higher conservation and/or close relationship in these three species.
To further characterize the BnabHLH sequence features, the criterion given by Massari and Murre [34] was used (Fig. 1). Our results showed that the 602 BnabHLHs were separated into two major categories according to their bHLH domain sequence profiles: 132 (21.93%) atypical BnabHLHs (non-DNA-binding proteins) and 470 (78.07%) typical BnabHLHs (Fig. 1). The latter category was further divided into three categories, including 300 (49.83%) G-box binding proteins, 103 (17.11%) E-box binding proteins, and 67 (11.13%) non-E-box binding proteins (Fig. 1). The sequences of the 132 atypical BnabHLHs were divergent in their basic region but were relatively conserved in their HLH region, especially in the loop region (Fig. 1d). A similar situation was found for the non-E-box binding proteins (Fig. 1c), suggesting its close relationship to the atypical BnabHLHs. In contrast, the residues in the basic region of the E-box/G-box DNA binding proteins were more conserved than the residues in the HLH region (Fig. 1b, c).
Protein structures of BnabHLHs were conserved in each subfamily
To determine the evolutionary relationship between the BnabHLHs from the Brassicease species, we constructed an NJ phylogenetic tree on the basis of the alignment of 772 bHLH domains from B. napus (602 and Arabidopsis (170).
The 772 bHLH proteins were divided into 35 subfamilies, which is the largest number of subfamilies found to date (Fig. 2a). Among these subfamilies, 28 were found previously [7], with seven (S33-S39) being newly identified in this study. Two previously reported subfamilies (S6 and S8) were not found in this study as they were only found in lower plants (moss and algae) [7]. Compared with the division of AtbHLHs, the S5 subfamily in B. napus was divided into S5 and S33; S17 was divided into S17 and S34; S21 was divided into S21 and S35; S24 was divided into S24, S36, and S37; S30 was divided into S30 and S38; whereas the orthologs of S39 in Arabidopsis were previously defined as orphan genes [7], which were defined as a new subfamily in this study. The distributions of BnabHLHs in the 35 subfamilies was biased, varying from two (S22 and S38) to 62 genes (S25). In addition, the BnabHLHs of different DNA binding types had a biased distribution tendency among different subfamilies as well, but the BnabHLHs in a given subfamily usually shared the same DNA binding type (Fig. 2b). A total of 11 subfamilies (S2, S3, S5, S7, S10, S11, S13, S14, S24, S25, and S26) contained G-box binding proteins; five subfamilies (S1, S9, S17, S27, and S37) contained E-box-binding proteins; three subfamilies (S20, S23, and S39) contained non-E-box-binding proteins; while seven subfamilies (S16, S21, S22, S33, S34, S35, and S38) contained non-DNA-binding proteins (Fig. 2b).
We subsequently used the MEME tool to discover the non-bHLH domains and explore their distribution patterns within each subfamily. A total of 27 conserved motifs of variable lengths (8–103 aa) was obtained from this analysis (Fig. 2c Additional file 4: Table S4). Motifs 1 and 2 were distributed in all BnabHLHs and consisted of the basic and two helix regions of the bHLH domains, respectively. The loop region was located between motif 1 and 2, indicating that this region was more variable than the basic and helix regions. Outside the bHLH domain, members of the same subfamily generally shared several of the same motifs. For example, all BnabHLHs in subfamily S9 contained motif 22; proteins in subfamily S26 all contained motif 5 (Fig 2b, Additional file 4: Table S4). Moreover, some motifs have been previously characterized and were defined as additional functional motifs. For instance, motifs 4, 8, and 20 were detected in many proteins within subfamilies S2 and S5 in various species, such as TabHLH239, AtMYC2, TabHLH184, and ZmbHLH103, which were significantly matched with an ACT domain that contributed to the recruitment of the C1 R2R3-MYB factor to the C1 binding sites located in the promoters of flavonoid biosynthetic genes [35]. Meanwhile, motif 6 in these two subfamilies was also conserved, and overlapped with the MIR and MYC_N domains that interact with jasmonate ZIM-domains (JAZs) [36]. Some motifs were subfamily-specific, however their functions are still unclear (Fig. 2c).
Intron insertion patterns of BnabHLHs were conserved within each subfamily
The intron and exon structure is an important clue to understand the gene evolutionary relationship and functional diversification within a gene family [37]. The intron and exon patterns of candidate BnabHLHs are determined by comparing their full length CDS and DNA sequences using the GSDS web server [38].
A total of eight intron insertion patterns (pattern a to k) were observed in the bHLH domains in B. napus, containing 0 to 2 intron insertion sites (Fig. 3). The nomenclature of the intron insertion patterns of BnabHLHs is referred to in the study by Carretero-Paulet et al. [7]. In this study, the previously defined pattern a and b (Carretero-Paulet et al., 2010) were characterized as the same type because they share the same insertion sites and phase, and therefore are uniformly named as pattern a. Similarly, pattern d and f (Carretero-Paulet et al., 2010) were uniformly named as f. The intron insertion positions were distributed across the basic and/or HLH regions in the bHLH domain. Among these insertions, those observed in the basic and loop regions were more conserved, while those in the helix regions were variable across different patterns. The intron insertion sites in the basic and helix regions were located at three highly conserved residues, Arg-11 (the E-box recognition site), Phe-21, and Lys-33 (Fig. 3). Furthermore, the intron insertion sites of most patterns were conserved, except pattern j (Fig. 3). Patterns a, c, e, and f were similar, thus they are likely to be homologous, whereas pattern f lacks the first intron as compared with pattern a, pattern e lacks the second intron, and pattern c has the second intron inserted at L-50 as compared with pattern e. A similar situation was observed in patterns h and i. Meanwhile, phylogenetic analyses showed a close relationship between intron insertion patterns a, c, e, and f, further confirming their close relationship. Moreover, patterns k and i appeared to be the ancestral types because they existed in members from algae [7]. Patterns a, f, and k were the three types that accounted for the majority of BnabHLHs (41.2%, 26.2%, and 20.1%, respectively). This trend is similar to the results in other species, such as Arabidopsis, rice, potato, poplar, and tomato [8-10, 16, 33]. Accordingly, these three patterns were obtained for many subfamilies while the remaining patterns (i.e., patterns c, e, h, i, and j) existed in only one or two subfamilies, indicating a different expansion trend.
The distribution of intron insertion patterns are generally conserved within most subfamilies. For example, members in subfamilies S12, S10, S11, S7, S9, S5, S33, S2, S1, S13, S23, and S38 contained pattern f, except for several genes that may have been absent due to incomplete genomic annotation information (Fig. 2b). The conservation of intron insertion patterns of BnabHLHs within each subfamily provides an independent criterion for the reliability of our phylogenetic analyses (Fig. 2b). Interestingly, the intron insertion patterns of BnabHLHs was almost the same as their orthologs in Arabidopsis. The only exception was in the S27 subfamily which contains pattern a for B. napus members while their homologs in Arabidopsis is pattern f [7]. We further compared these results to other species, e.g. rice [7], and found that it should be pattern a for the homologs in this subfamily, including Arabidopsis homologs (At080, At081, At122, At128, At129, and At130).
Overall, intron insertion patterns in BnabHLHs were conserved within most subfamilies and corresponded to AtbHLH orthologs as well. Moreover, the intron insertion sites in the basic and loop regions were more conserved than those in the helix regions.
Syntenic analyses revealed duplication events and the expansion mechanism of BnabHLHs
In this study, up to 602 BnabHLHs were identified, which is significantly higher than the gene number in lower plants, like Volvox carteri which only has three [6]. This indicates that there was a large-scale expansion of this gene family that occurred during plant evolution. To explore the expansion mechanism of this gene family in B. napus, the chromosomal locations and syntenic relationships of BnabHLHs were analyzed based on the genome information from Genoscope and CoGe databases [39].
Chromosomal location analysis showed that there are 294 and 306 BnabHLHs in An- and Cn-subgenomes, respectively, indicating no biased tendency between these two sub-genomes (Fig. 4a). The BnabHLHs are distributed on all of the 19 chromosomes. The An-subgenome has an average of 27.4 BnabHLHs on its 10 chromosomes with A10 having a minimum of 14 genes and A09 having a maximum of 41 genes. The average number of BnabHLHs in the Cn-subgenome was 29.9, with C06 containing a minimum of 19 genes and C03 having as many as 48 genes. Thus, the genes on each chromosome are uneven within both subgenomes.
Among the 602 BnabHLHs, 475 genes have syntenic relationships; 382 of which were inherited from B. rapa or B. oleracea genomes (Additional file 5: Table S5). In contrast, only 79 genes (7.8%) of 79 syntenic pairs originated from segmental duplication in the B. napus genome, and 34 genes (5.8%) from 28 syntenic pairs were from tandem duplications (Fig. 4b). These results provide an outstanding example of genome-wide allopolyploidization between B. rapa and B. oleracea that is the main contributor to the large BnabHLH expansion in B. napus. Moreover, we found that the genome-wide duplicated genes make up the largest number of BnabHLHs in the majority of subfamilies. However, segmental duplicated genes make up the biggest number BnabHLHs in subfamilies S10 and S25 (nine genes per subfamily), and the tandem duplicated genes have the largest number in the S12 subfamily (11 genes). Furthermore, the BnabHLHs with intron pattern f expanded most in B. napus (31 duplicates), contributing to the largest proportion of BnabHLHs (Additional file 1: Table S1).
Taken together, the main expansion mechanism of BnabHLHs is whole-genome duplication (allopolyploidization), while segmental and tandem duplication events preferentially occurred in certain subfamilies with specific intron patterns.
Expression profiles of BnabHLHs varied widely and were conserved within each subfamily
Gene expression patterns under different conditions can often give an indication of gene function. In order to explore the possible functions of the BnabHLHs, the temporal and spatial transcriptomes of the 602 BnabHLHs in 50 B. napus tissues of the root, leaf, flower, and seed at different developmental stages were characterized using RNA-seq (BioProject ID: PRJNA358784).
A total of 47 BnabHLHs (7.79%) were excluded from our analysis with FPKM < 1, which may be pseudogenes or expressed only at specific developmental stages or under special conditions. The remaining genes (555 genes) have relatively high confidence expression levels (FPKM ≥ 1), the majority of which showed preferential expression in one or a few tissues/organs. Few genes were constitutively expressed in all tissues or organs tested, suggesting that this gene family tends to play regulatory roles at specific developmental stages or tissues. The wide expression profile of BnabHLHs suggests that they possess diverse roles in B. napus. The expression patterns of BnabHLHs were summarized into seven main blocks (I to VII) (Fig. 5). The BnabHLHs in blocks I to VII show obvious tissue-specific expression patterns: the genes in block I were highly expressed in seed coats; genes in block II were highly expressed in the root and stem tissues; genes in block III were mainly expressed in the vegetative organs e.g., root, stem, leaf and silique pericarp; the genes in block IV were primarily expressed in organs at the seedling stage, such as in the germinating seed (24 to 72 h), root, and hypocotyl; the genes in block V were mainly expressed in the pistil, silique pericarp, seed coat, and seed tissues; the genes in block VI had higher expression levels in the pistil and inflorescence tip tissues; whereas the genes in block VII were mainly expressed in flower tissues including capillament, petal, and stamen. Overall, there were 153, 136, 129, 137, 149 and 145 BnabHLHs having relatively high expression levels in the stem, hypocotyl leaf, silique pericarp, flower and seed respectively, while up to 301 genes were highly expressed in the roots, indicating that BnabHLHs may possess some previously unknown functions in the roots.
Typically, members of a given subfamily generally exhibit the same/similar expression profile. For example, members of the S2 subfamily were mainly expressed in the root and leaf at seedling, budding, and flowering stages (Additional file 6: Table S6). Moreover, the bHLHs in the same subfamily likely process the same or similar expression profiles across different species, and thus may share conserved functions during evolution. For example, AtbHLH155/CPU and AtbHLH156/LHW in the S23 subfamily play an essential role in establishing vascular cells and the size of vascular initial population in root meristem [40]. The corresponding homologs in B. napus were also expressed in roots (Additional file 6: Table S6); their expression being relatively low in roots after germinating for 24 to 72 h but high in mature roots, which corresponds to their possible function in root vascular cells. Moreover, the expression profile of 28 pairs of tandem duplication genes, and 79 pairs of segmental duplication genes show similar expression patterns, indicating the functional redundancy of the duplicates (Additional file 5 and 6: Table S5 and S6).
Overall, the BnabHLHs were widely expressed but at different levels within B. napus. This was especially the case in the root, which provides an important clue as to the possible roles of this gene family. Moreover, BnabHLHs from a particular subfamily tend to possess conserved expression patterns and high structural similarity across different species, indicating a possible conservation of function during evolution.
Many BnabHLHs were induced in hormone-treated roots
As discussed above, several BnabHLHs were highly expressed in the roots of B. napus, suggesting that they play a role(s) in some root-related biological process(es). To further explore their functional characteristics in roots, a comprehensive expression analysis of candidate BnabHLHs in roots treated with five hormones (IAA, auxin; GA3, gibberellin; 6-BA, cytokinin; ABA, abscisic acid and ACC, ethylene) was performed, based on previously acquired RNA-seq data (BioProject ID PRJNA358784).
Our results showed that many BnabHLHs (221, 36.7%) responded to more than one hormone treatment in roots, and most of these genes were clustered in ten subfamilies (S2, S16, S18, S19, S20, S21, S25, S27, S35, and S39) taking up over 50% of the genes in each subfamily (Additional file 7: Table S7). Meanwhile, the genes in 14 subfamilies (S1, S4, S5, S7, S9, S10, S12, S13, S14, S15, S24, S26, S34, and S38) were partly induced by the five hormone treatments. Interestingly, BnabHLHs in subfamilies S16, S18, and S21 were upregulated in roots under all five hormone treatments, while BnabHLHs in subfamilies S7, S23, and S34 were all downregulated. Notably, BnabHLHs in S12, which had low or no expression in roots (Fig. 5) were highly expressed after the five hormone treatments (Additional file 7: Table S7). Conversely, some genes that were normally highly expressed in roots were unresponsive to the hormone treatments. Indeed, BnabHLHs in S3, S11, S17, S22, S23, S28, S30, S31, S33, S36, and S37 had hardly any response to the hormone treatments in roots (Additional file 7: Table S7). Overall, most BnabHLH genes responded to hormone treatments in roots, indicating that they have important roles in hormone response in B. napus roots.
To further verify the RNA-Seq results, five BnabHLHs that had high expression levels in roots (Additional file 6: Table S6) and obviously responded to hormone inductions (Additional file 7: Table S7) were selected to analyze their expression profiles under hormone inductions by qRT-PCR. Among them, three genes (BnabHLH033, BnabHLH041, and BnabHLH269) are orthologs of the Arabidopsis ILR3 gene which was demonstrated to participate in auxin-conjugate metabolism [41], and two genes (BnabHLH453 and BnabHLH126) are orthologs of the Arabidopsis MYC2 gene which is involved in ABA, JA, and light signaling pathways. As shown in Fig. 6, our results by qRT-PCR were similar to that observed in the RNA-seq analyses. These five BnabHLHs were positively induced by all five hormone treatments, with BnabHLH126 generally having a higher expression level than the others. Moreover, the genes from the same clade show similar expression patterns under certain hormone treatments. For example, BnabHLH033, BnabHLH041, and BnabHLH269 from the ILR3 clade in the S4 subfamily have similar expression patterns under IAA, ACC, and ABA treatments (Fig. 6a–c). However, their expression profiles are different under GA3 and 6-BA inductions, where BnabHLH033 was significantly downregulated while BnabHLH041 and BnabHLH269 were significantly upregulated (Fig. 6d–e). Similarly, the expression of BnabHLH453 and BnabHLH126 from the MYC2 clade in the S2 subfamily showed similar expression patterns under the five hormone treatments (Fig. 6). In addition, cis-acting elements analysis revealed that the promoter regions of these five BnabHLHs contain more than one cis-acting element that is related to hormone response (Additional file 8: Table S8). This further supports our above results.
Together, our expression profile analyses revealed that a large proportion of BnabHLHs were induced by more than one hormone treatment in roots, indicating that this gene family may be important in root development in B. napus. Moreover, our qRT-PCR experiments confirmed the hormone-induced expression characteristics of BnabHLH033, BnabHLH041, BnabHLH269, BnabHLH453, and BnabHLH126 which provides a valuable foundation for future functional research.