A large number of bHLHs were identified in Brassica napus
To identify bHLH encoding genes in B. napus genome (Darmor–bzh), a preliminary repeated BLASTP search (e values of < 1.0) is performed using the representative sequences of Arabidopsis bHLH proteins as queries (Additional file 1: Table S1). To ensure the integrity of the bHLH protein data in B. napus, we refer to the method of Guo et al. [32] and searched the two sequenced B. napus cultivar genomes in GENOSCOPE (Darmor–bzh, http://www.genoscope.cns.fr/ brassicanapus/) and NCBI database (ZS11, https://www. ncbi.nlm.nih.gov/genome/ annotationeuk/ Brassica_napus/101/). The sequence information of candidate genes in these two cultivars are manually compared and corrected.
Originally, a large number of deduced amino acid sequences containing bHLH domains are obtained. Then, redundant sequences are discarded, and the bHLH domains are verified in the remaining sequences by ExPASy. Consequently, 11 genes are excluded from our dataset as no bHLH domain is identified by ExPASy analysis. Meanwhile, the sequence information of 65 BnbHLHs from Darmor–bzh are corrected by the data from ZS11 genome (Additional file 1: Table S1). Finally, a total of 602 BnbHLHs with relatively complete open reading frame (ORF) are obtained in this study, account for approximately 0.60% of the B. napus protein-coding genes. The corresponding proportion in wheat, rice, maize and Arabidopsis are 0.55, 0.47, 0.59, and 0.61% respectively [13]. The candidate BnbHLHs are then named according to their chromosomal distribution orders (Additional file 1: Table S1). Physicochemical property analysis showed that the BnbHLH proteins (BnbHLHs) varied in length from 63 to 1440 amino acids (aa); their molecular weight ranged from 6.9 (BnbHLH105) to 165 kDa (BnbHLH381); and the isoelectric points are from 4.36 (BnbHLH562) to 11.79 (BnbHLH023). Subcellular localization analysis demonstrates that all BnbHLHs are located in nucleus (Additional file 1: Table S1).
For further comparative analysis across different species, we identified 255 bHLHs in B. oleracea genome by the same method (Additional file 2: Table S2). Sequence information of the bHLHs in other species like Arabidopsis, Brassica rapa, tomato, potato and rice are obtained from published researches [8-11, 13].
Sequences characterics of the bHLH domains of BnbHLH proteins
To investigate the sequence features, we perform multiple alignment analysis of the 602 bHLH domains of candidate BnbHLHs. The result is visualized using Weblogo online software.
Our result shows that the length of the bHLH domains of BnHLHs is approximately 55 amino acids, ranging from 39–57 aa. The bHLH domain is generally conserved in this gene family in B. napus, in which two typical conserved regions are included, namely the basic and HLH regions. Ten residues are identified with conservation of more than 70% consensus ratio in the bHLH domains (Fig. 1a), inclduing four are located in the basic region, five in the two helix regions and one in the loop region. Consistent with other studies [7-10, 33], Leu-25 is the most conserved residue, with a conservation of almost 100% (Additional file 3: Table S3), indicating its essential role for bHLH proteins. Interestingly, Phe-30 is partly substituted by Ser in Arabidopsis, rice, tomato etc, however, no such situation is observed in B. napus, B. oleracea and B. rapa, (Additional file 3: Table S3), suggesting a higher conservation and/or close relationship in these three species.
To further characterize the BnbHLHs sequence features, the criterion given by Massari and Murre [34] is used (Fig. 1). Our result shows that the 602 BnbHLHs are separated into two major categories according to the sequence profiles of the bHLH domains: 132 (21.93%) atypical BnbHLHs (non-DNA-binding proteins ), and 470 (78.07%) typical BnbHLHs (Fig. 1). And the latter is further consisted of three categories, including 300 (49.83%) G-box binding proteins, 103 (17.11%) E-box binding and 67 (11.13%) non-E-box binding proteins (Fig. 1). The sequences of the 132 atypical BnbHLHs are divergent in the basic region but are relatively conserved in the HLH region, especailly in the loop region (Fig. 1d). Similar situation is found for the non-E-box binding proteins (Fig. 1c), suggesting its close relationship to the atypical BnbHLHs. In contrary, the residues in the basic region of the E-box/G-box DNA binding types are more conserved than the HLH region (Fig. 1b, c).
Protein structures of BnbHLHs were conserved in each subfamily
To determine the evolutionary relationship of BnbHLHs among Brassicease species, we construct a NJ phylogenetic tree on the basis of the alignment of 1243 bHLH domains from B. napus (602), B. rapa (230), B. oleracea (244) and Arabidopsis (170).
The 1243 bHLH proteins are divided into 35 subfamilies, comprising the largest number to date (Fig. 2a). Among which, 28 subfamilies are consistent with the previous research thus their names keep the same [7]; seven subfamilies (S33-S39) containing bHLHs from these four species with a higher bootstrap value are newly identified in this study; while two previous reported subfamilies (S6 and S8) are not existed in this study as they were only found in lower plants (moss and algae) [7]. Compared with the division of AtbHLHs, S5 subfamily in B. napus is divided into S5 and S33; S17 is divided into S17 and S34; S21 is divided into S21 and S35; S24 is divided into S24, S36 and S37; S30 is divided into S30 and S38; whereas the orthologs of S39 in Arabidopsis are previously defined as orphan genes [7] which are defined as a new subfamily in this study. The distributions of BnbHLHs in the 36 subfamilies are biased, varying from two (S22 and S38) to 62 genes (S25). In addition, the BnbHLHs of different DNA binding types have a biased distribution tendency among different subfamilies as well, but the BnbHLHs in a given subfamily usually share the same DNA binding type (Fig. 2b). A total of 11 subfamilies (S2, S3, S5, S7, S10, S11, S13, S14, S24, S25 and S26) contain G-box binding proteins; five subfamilies (S1, S9, S17, S27 and S37) contained E-box-binding proteins; three subfamilies (S20, S23 and S39) contained non-E-box-binding proteins; while seven subfamilies (S16, S21, S22, S33, S34, S35 and S38) contained non-DNA-binding proteins (Fig. 2b).
To further discovery the non-bHLH domains and explore their distribution patterns within each subfamily, the MEME tool is applied. A total of 27 conserved motifs with variable length (8-103 amino acids) are obtained (Fig. 2c Additional file 4: Table S4). Among which, motif 1 and 2 are distributed in all BnbHLHs, and made up the basic and the two helix regions of the bHLH domains, respectively. The loop region is located between motif 1 and 2, indicating that this region is variable than the basic and helix regions. Outside the bHLH domain, members of the same subfamily generally share several same motifs. For example, all BnbHLHs in subfamily S9 contain motif 22; proteins in subfamily S26 all contain motif 5 (Fig2b, Additional file 4: Table S4). Moreover, some motifs have been characterized and were defined as additional functional properties, e.g. motifs 4, 8 and 20 were detected in many proteins in subfamilies S2 and S5 in various species, such as TabHLH239, AtMYC2, TabHLH184 and ZmbHLH103, which were found to be significantly matched with an ACT domain that contributed to the recruitment of the C1 R2R3-MYB factor to the C1 binding sites located in the promoters of flavonoid biosynthetic genes [35]. Meanwhile, motif 6 in these two subfamilies are also conserved, which overlapped with the MIR and MYC_N domains that can interact with JAZs (jasmonate ZIM-domain) [36]. Besides, some motifs are demonstrated to be subfamily-specific, yet their functions are still unclear (Fig. 2c).
Intron insertion patterns of BnbHLHs were conserved within each subfamily
The intron and exon structure is an important clue to understand the gene evolutionary relationship and functional diversification within a gene family [37]. The intron and exon patterns of candidate BnbHLHs are determined by comparing their full length CDS and DNA sequences using GSDS web server.
A total of eight intron insertion patterns (pattern a to k) are observed in the bHLH domains in B. napus, containing 0 to 2 intron insertion sites (Fig. 3). The nomenclature of the intron insertion patterns of BnbHLHs is referred to that of Carretero-Paulet et al. 2010 [7]. In this study, the previous defined pattern a and b (Carretero-Paulet et al., 2010) are characterized as the same type because they share the same insertion sites and phase, therefore are uniformly named as pattern a. Similarly, pattern d and f (Carretero-Paulet et al., 2010) are uniformly named as f. The intron insertion positions distribute across the basic and/or HLH regions in the bHLH domain. Among which, these in the basic and loop region are more conserved, while those in the helix regions are variable across different patterns. The intron insertion sites in the basic and helix regions are located at three highly conserved residues, Arg-11 (the E-box recognition site), Phe-21 and Lys-33 (Fig. 3). Further, the intron insertion sites of most patterns are conserved, excepting pattern j (Fig. 3). Among them, pattern a, c, e and f are similar, thus are likely to be homologous, where pattern f lacks the first intron as compared with pattern a, pattern e lacks the second intron while pattern c has the second intron inserted at L-50 as compares with pattern e. Similar situation is observed in pattern h and i. Meanwhile, phylogenetic analyses show a close relationship within intron insertion pattern a, c, e and f, further confirms their close relationship as well. Moreover, pattern k and i are indicated as the ancestral types because they are existed in members from algae [7]. Patterns f, a and k are the three types accounting for the majority of BnbHLHs (41.2%, 26.2% and 20.1% respectively). This trend is similar to the results in other species, such as Arabidopsis, rice, potato, poplar and tomato [8-11, 13]. Accordingly, these three patterns are obtained by many subfamilies while the remainings (pattern c, e, h, i and j) are existed in only one or two subfamilies, indicating a different expansion trend.
The distributions of intron insertion patterns are generally conserved within most subfamilies. For example, members in subfamilies S12, S10, S11, S7, S9, S5, S33, S2, S1, S13, S23 and S38 contain pattern f, excepting several genes that may attribute to incomplete genomic annotation information (Fig. 2b). The conservation of intron insertion pattern of BnbHLHs within each subfamily provides an independent criterion for the reliability of our phylogenetic analyses (Fig. 2b). Meanwhile, the intron insertion patterns of BnbHLHs is almost the same with their orthologs in Arabidopsis. The only exception is S27 subfamily which contains pattern a for B. napus members while their homologs in Arabidopsis is pattern f [7]. To confirm this result, we further compare the corresponding results in other species, e.g. rice [33]. And we confirm that it should be pattern a for the homologs in this subfamily, including Arabidopsis homologs (At080, At081, At122, At128, At129 and At130).
Overall, intron insertion patterns of BnbHLHs are conserved within most subfamilies and coordinated with these of AtbHLH orthologs as well. And the intron insertion sites in the basic and loop regions are more conserved than those in the helix regions.
Syntenic analyses revealed duplication events and expansion mechanism of BnbHLHs
In this study, up to 602 BnbHLHs are identified, while the gene number in lower plants are much lower, like Volvox carteri has only three [6]. This indicates a large scale expansion for this gene family has occurred during the evolution. To explore the expansion mechanism of this gene family in B. napus, the chromosomal locations and syntenic relationships of BnbHLHs are analysed based on the genome information from Genoscope and CoGe databases.
Chromosomal location analysis showed that there are 294 and 306 BnbHLHs in An- and Cn-subgenomes, respectively, indicating no biased tendency between these two sub-genomes (Fig. 4a). The BnbHLHs are distributed on all of the 19 chromosomes. The An-subgenome has an average of 27.4 BnbHLHs on its 10 chromosomes with A10 has a minimum number of 14 and A09 has a maximum of 41 genes. The average number of BnbHLHs in Cn-subgenome is 29.9, with C06 contains a minimum of 19 genes and the C03 has as many as 48 genes. Thus, the genes on each chromosome is uneven within both subgenomes.
Among the 602 BnbHLHs, 475 genes have syntenic relationships, and 382 of which are inherited from B. rapa or B. oleracea genomes (Additional file 5: Table S5). In contrast, only 79 genes (7.8%) of 79 syntenic pairs are originated from segmental duplication in B. napus genome, and 34 genes (5.8%) from 28 syntenic pairs are from tandem duplication (Fig. 4b). These results provide an outstanding example of genome-wide allopolyploidization between B. rapa and B. oleracea that mainly contributes to the large BnbHLHs expansion in B. napus (63.3%). Moreover, we find that the genome-wide duplicated genes take the largest number of BnbHLHs in majority subfamilies, while the segmental duplicated genes take the biggest amount in subfamilies S10 and S25 (nine genes each subfamily), and the tandem duplicated genes in S12 is the most (11 genes). Furthremore, the BnbHLHs with intron pattern f expands most in B. napus (31 duplicats), contributing to the largest proportion of BnbHLHs (Additional file 1: Table S1).
Taken together, the main expansion mechanism of BnbHLHs is whole-genome duplication (allopolyploidization), while segmental and tandem duplication events preferentially occurred in certain subfamilies with relative specific intron patterns.
Expression profiles of BnbHLHs are wide and conserved within each subfamily
Gene expression patterns are closely related to their functions. In order to explore important clues for gene functions, the temporal and spatial transcriptome of the 602 BnbHLHs in 50 B. napus tissues of root, leaf, flower, and seed at different development stage are characterized using the RNA-seq data (BioProject ID: PRJNA358784).
A total of 47 BnbHLHs (7.79%) are excluded from our analysis with FPKM<1, which may be pseudogenes or are expressed only at specific developmental stages or under special conditions. The remaining genes (555 genes) have relative high confidence expression levels (FPKM>1), and the majority of which showed preferential expressions in one or a few tissues/organs. Few genes are constitutively expressed in all tissues or organs tested, suggesting this gene family tends to play regulatory roles at specific developmental stages or tissues. The wide expression profile of BnbHLHs suggests their diverse roles in B. napus. The expression patterns of BnbHLHs can be summarized into six main blocks (I to VI) (Fig. 5). The BnbHLHs in blocks I, II and IV show obvious tissue-specific expression pattern; the genes in block I are specifically highly expressed in seed coat; these in block II are highly expressed in root and stem; the genes in block IV are mainly expressed in organs at the seedling stage, such as germinating seed (24 to 72h), root and hypocotyl; the genes in block VI are widely expressed in various tissues; while the genes in block III and V are mainly expressed in the tissues related to female reproduction organs, such as pistil, silique pericarp, seed coat and seed tissues. Overall, there are 180, 63, 119 and 106 BnbHLHs having relatively high expression levels in steam, leaf, seed and flower respectively, while up to 310 genes are highly expressed in roots, indicating a previously-ignored important roles of BnbHLHs in roots.
It is common that members of a given subfamily generally exhibit the same/similar expression profile. For example, members in S2 subfamily are mainly expressed in root and leaf at seedling, budding, and flowering stages (Additional file 6: Table S6). Moreover, the bHLHs in the same subfamily probably process the same or similar expression profiles across different species as well, thus may share conserved functions during evolution. For example, AtbHLH155/CPU and AtbHLH156/LHW in subfamily S23 play an essential role in establishing vascular cells and the size of vascular initial population in root meristem [38]. The corresponding homologs in B. napus are also expressed in roots (Additional file 6: Table S6). The homologs of AtbHLH155/CPU and AtbHLH156/LHW in B. napus have relatively low expression levels in root after germinating 24 to 72h but high expression levels in mature roots, which is corresponding to their possibly functions in root vascular cells. Moreover, expression profile of 28 pairs of tandem duplication genes, and 79 pairs of segmental duplication genes show similar expression patterns, indicating the functionally redundancy of duplicates (Additional file 5,6: Table S5, S6).
Overall, the BnbHLHs have a widespread expression patterns at different levels, especially in root, which provide an important clue for the possible roles of this gene family. Moreover, BnbHLHs in a certain subfamily tend to process conserved expression patterns and high structural similarity acorss different species, indicating a possible function conservation during evolution.
Plenty of BnbHLHs are induced by hormones treatments in root
As discussed above, plenty of BnbHLHs are highly expressed in roots, implying possible roles in root related biological processes. To further explore their functional characteristics in roots, a comprehensive expression analysis of candidate BnbHLHs in roots under five hormone treatments (auxin indole-3-acetic acid, IAA; abscisic acid, ABA; cytokinin 6-benzyladenine, 6-BA; ethylene precursor 1-aminocyclopropane-1-carboxylic acid, ACC; and gibberellic acid, GA) is performed, based on the RNA-seq data (BioProject ID PRJNA358784).
Our results show that many BnbHLHs (221, 36.7%) response to more than one hormone treatments in roots, and most of which are mainly clustered in ten subfamilies (S2, S16, S18, S19, S20, S21, S25, S27, S35 and S39) taking up over 50% genes in each subfamily (Additional file 7: Table S7). Meanwhile, the genes in 14 subfamilies (S1, S4, S5, S7, S9, S10, S12, S13, S14, S15, S24, S26, S34 and S38) are partly induced by the five hormones in roots. Interestingly, BnbHLHs in subfamilies S16, S18 and S21 are up-regulated in roots under all of the five hormones inductions, while BnbHLHs in subfamilies S7, S23 and S34 are all down-regulated instead. Notably, BnbHLHs in S12 with low or no expression levels in roots (Fig. 5) are highly expressed after the five hormone treatments (Additional file 7: Table S7), in contrary, some genesthat are normallyhighly expressed in roots are barely responded to hormone treatments. Meanwhile, BnbHLHs in S3, S11, S17, S22, S23, S28, S30, S31, S33, S36 and S37 hardly response to any hormone treatments in roots (Additional file 7: Table S7). Overall, most members of the B. napus bHLH gene family response to hormone treatments in roots, implying their important roles in hormone response in B. napus roots.
To further verify the results by RNA-Seq analysis, five BnbHLHs that have high expression levels in roots (Additional file 6: Table S6) and obviously respond to hormone inductions (Additional file 7: Table S7) are selected to analyse their expression profiles under hormone inductions (IAA, ABA, 6-BA, ACC and GA3) by qRT-PCR method. Among them, three genes (BnbHLH033, BnbHLH041 and BnbHLH269) are orthologs of Arabidopsis ILR3 gene which was demonstrated to participate in auxin-conjugate metabolism [39]; two genes (BnbHLH453 and BnbHLH126) are orthologs of Arabidopsis MYC2 gene which was involved in ABA, JA and light signalling pathways. As shown in Fig. 6, our results by qRT-PCRare similar to that of the RNA-seq analyses. These five BnbHLHs are positively induced by all of the five hormone treatments, with BnbHLH126 generally has relative higher expression levels than the others. Moreover, the genes from the same clade show similar expression patterns under a certain hormone induction. For example, BnbHLH033, BnbHLH041 and BnbHLH269 from ILR3 clade in S4 subfamily have similar expression patterns under IAA, ACC and ABA treatments (Fig. 6a-c). Meanwhile, their expression profiles are different under GA3 and 6-BA inductions, where BnbHLH033 is significantly down-regulated while BnbHLH041 and BnbHLH269 are significantly up-regulated (Fig. 6d-e). Similarly, the expressions of BnbHLH453 and BnbHLH126 from MYC2 clade in S2 subfamily show similar expression pattern under the five hormone treatments (Fig. 6). In addition, cis-acting elements analysis revealed that the promoter regions of these five BnbHLHs contain more than one cis-acting elements that are related to hormone response (Additional file 8: Table S8). This further supports our above results.
In a word, expression profile analyses reveals that a large proportion of BnbHLHs are induced by more than one hormone treatment in roots, indicating possibly roles of this gene family in root development in B. napus. qRT-PCR experiment confirms the hormone-induced expression characteristics of BnbHLH033, BnbHLH041, BnbHLH269, BnbHLH453 and BnbHLH126 which provides a valuable foundation for future functional research.