Genome-wide analysis of basic helix-loop-helix (bHLH) transcription factors in papaya (Carica papaya L.)

Background : As a superfamily of transcription factors (TFs), the basic helix-loop-helix (bHLH) proteins have been identified and functionally characterized in many plants. However, no comprehensive analysis of the bHLH family in papaya ( Carica papaya L. ) has been reported previously. Results: In this study, a total of 73 CpbHLH genes were found in papaya, and these genes were classified into 18 subfamilies based on phylogenetic analysis, with one orphans. Almost all of the CpbHLH in the same subfamily shared similar gene structures and protein motifs according to an analysis of exon/intron organizations and motif compositions. The number of exons in CpbHLH genes varied from 1 to 11 with an average of 5. The amino acid sequences of the bHLH domains were quite conservative, especially Leu-27 and Leu-63. Promoter cis -element analysis revealed that most of the CpbHLH genes contained cis -elements that can respond to various biotic/abiotic stress-related events. Gene ontology (GO) analysis revealed that Cp bHLH mainly functions in protein dimerization activity and DNA-binding, and most Cp bHLH proteins were predicted to localize in the nucleus. Abiotic stress treatment and quantitative real-time PCR (qRT-PCR) revealed some predicted CpbHLH genes that might be responsible for abiotic stress responses in papaya. Conclusions : A total of 73 bHLH transcription factors were identified from papaya, and their gene structures, conserved domains, sequence features, phylogenetic relationship, promoter cis -element, GO annotation and gene expression profiles responsible for abiotic stress were investigated. Our findings lay a foundation for further evolutionary and functional elucidation of Cp bHLHs. replicates. Gene-specific primers for qRT-PCR of bHLH CDSs -ΔΔCT TFs: transcription factors; bHLH: the basic helix-loop-helix; GO: gene ontology; qRT-PCR: quantitative real-time PCR; ORF: opening reading frame; PI: the theoretical isoelectric points; CDS: coding sequences; GSDS: gene structure display server; MEME: multiple EM for motif elicitation; SMART: a simple modular architecture research tool; MBS: MYB binding site involved in drought-inducibility; LTR: low temperature response elements; TC-richdefense and stress responsive elementsWUN-motifs: wound-responsive elements; GA: gibberellin; IAA: auxin; MeJA: methyl jasmonate; SA: salicylic acid; ABA: abscisic acid; ARE: anaerobic induction elements; BR: brassinosteroid.


Background
Since plants are unable to move, plant growth and development are regularly affected by abiotic and biotic stresses, which impair yields and result in losses to farmers. To optimize their growth and survival, plants must utilize a wide range of physiological and biochemical processes in their responses to diverse abiotic stresses by regulating the expression of a series of genes [1][2][3].
Transcription factors (TFs) play key roles in the stress regulation network and signaling pathways mainly by activating and repressing related downstream genes to regulate gene expression. Among them, basic/helix-loop-helix (bHLH) TFs are widely found in all eukaryotes and are the second largest TF family in plants [2][3][4][5]. Proteins of the bHLH superfamily in all eukaryotes are characterized by one highly conserved bHLH domain, which is approximately 50-60 amino acids in length and divided into two different functional regions: the basic region and the HLH region [6,7]. The basic region is located at the N-terminal end of the domain and consists of 10-15 amino acids. It is a DNA-binding region that enables bHLH TFs to bind to E-boxes (CANNTG) [8,9]. The HLH region, at the C-terminal end, is mainly composed of hydrophobic residues, containing two amphipathic α-helices linked by a loop region that has variable sequences and acts as a dimerization domain [7,10]. Outside of the two conserved regions, the rest of the bHLH protein sequences are usually very different [11].
In animals, the bHLH TFs can be divided into six groups (A to F) based on their phylogenetic relationships, functional properties and DNA-binding specificity [8]. These bHLH groups can be divided into several small subfamilies [12,13]. The bHLHs mainly function in sensing the external environment, cell cycle regulation and tissue differentiation [8,[14][15][16]. Compared to animals, the research on bHLH proteins in plants is limited: the number of groups has not been determined but is thought to cover 15-26 subfamilies [3,17]. Phylogenetic analyses of some atypical bHLH proteins have even extended that number to 32 [5]. With the availability of genome sequence data and the rapid development of molecular biology, increasing numbers of bHLH subfamily genes have been identified and characterized in a wide range of plant species, including Arabidopsis [6], peanut [18], apple [19], tomato [20], potato [21], peach [22], grapes[23], sweet orange [24], and bamboo [25]. The results from these studies have shown that bHLH TFs have versatile biological functions, such as regulating light morphogenesis [26,27], hormone signals [28,29], the developmental of root [30] and anther [31], regulating epidermal cell fate determination [32], participating in various biotic and abiotic stress responses [21,33,34], etc.
As an important and popular fruit, papaya is widely grown throughout the tropics and subtropics.
However, the production and quality of papaya were often threatened by various abiotic stresses, such as salt, drought, and cold. In recent years, some studies demonstrated that bHLH transcription factors play important roles in the stress-related regulation network and signalling pathways in many species. However, no systematic analysis of bHLH TFs has been performed in papaya previously. In this study, a total of 73 predicted bHLH genes were identified in papaya, and phylogenetic analyses were carried out to analyze the relationships among these genes. Meanwhile, gene structure, protein physicochemical properties and conserved motifs, the cis-element of the promoter region and gene ontology (GO) analysis were investigated. Furthermore, to analyze the functions of CpbHLHs responsible for responding to abiotic stresses, the expression profiles of 22 selected genes under salt, drought, ABA and cold stresses were investigated by using quantitative real-time PCR (qRT-PCR). We identified several candidate genes that might be responsible for abiotic stress responses. We completed the first comprehensive genome-wide analysis of the bHLH gene family in papaya, and our results provide information necessary for further functional research on the bHLH family in papaya.

Identification and characterization of CpbHLHs
A total of 105 putative bHLH transcription factors of papaya (C. papaya) were downloaded from the PlantTFDBv2.0 (http://planttfdb.cbi.pku.edu.cn/). To verify the reliability of these results, the 105 bHLH proteins sequences were filtered by Interproscan and SMART domain annotation, and a total of 73 predicted CpbHLH proteins were identified. They were named CpbHLH001 to CpbHLH073 at random except for 32 proteins that were explicitly excluded by Interproscan and SMART (Table S1).

Phylogenetic analysis, gene structure, conserved motifs analysis and multiple sequence alignment of CpbHLHs
To evaluate the evolutionary relationships of the papaya bHLH proteins, a neighbor-joining phylogenetic tree was generated using conserved bHLH domains from papaya, Arabidopsis and rice.
The phylogenetic tree showed that the 73 CpbHLH members were clustered into 18 subfamilies with one orphan (Fig.1A and Additional file 2), consistent with the earlier results showing that the bHLH subfamily in plants can be divided into 15-25 subfamilies [3]. Previous studies have named the bHLH subfamilies using English letters [7,19], Roman numerals [20,36], or Arabic numerals [6,37], In this 6 study, we adopted a nomenclature using Roman numerals. As shown in figure 1, the subfamily XII was the largest subfamily among all three species, and all of subfamilies include at least two species. In papaya, none of the bHLHs were grouped into Ⅳd, Ⅱ, ⅩⅤ, Ⅹ, ⅪⅤand ⅩⅢ subfamilies compared to rice and arabidopsis, which may be due to these bHLHs were losted during the process of evolution.
Exon/intron organization, as a type of structural divergence, plays an important role in the evolution of multiple gene families [38]. The annotation features of the CpbHLH genes were submitted to Gene Structure Display Server (GSDS) together to show their gene structures. As described in Additional file 1 and Fig subfamily are usually participated in the same signaling pathway or biological process, and the functions of these members are often partially or totally redundant [3]. For example, AtbHLH10, AtbHLH89 and AtbHLH91, corresponding rice orthologs OsbHLH141, OsbHLH142 are members of Ⅱsubfamily, they are all involved in the process of pollen development [7,39,40]. Especially in Arabidopsis, there is no obvious phenotype in single mutant of AtbHLH10, AtbHLH89 or AtbHLH91, only their various double or triple mutants showed the phenotype of pollen development deficiency [40]. In subfamily Ⅲb, OsbHLH001(OsICE2), OsbHLH002 (OsICE1), CpbHLH027, CpbHLH062, AtbHLH116 (ICE1) and AtbHLH33 (ICE2) were clustered within one clade. In previous studies, AtbHLH116(ICE1) and AtbHLH33(ICE2) and corresponding orthologs (OsbHLH001/OsICE2, OsbHLH002/OsICE1) have been reported to function in the stress of chilling [41][42][43][44][45]. And we also found transcripts of CpbHLH027 and CpbHLH062 were increased under chilling stress in this study, implying CpbHLH027 and CpbHLH062 involved in the process of chilling stress in papaya.
To further study the sequence characteristics of the bHLH domains at the amino acid level, we carried out a multiple sequence alignment of the 73 CpbHLH protein sequences (Fig.3). The result showed that the 73 CpbHLH proteins contained two conserved regions in the bHLH domains: the basic region plus helix 1 and the loop region plus helix 2 ( Fig. 3 and Table S2). Additionally, the MEME program was used to identify the conserved motifs shared by bHLH proteins by uploading the 73 amino acid sequences of the CpbHLH family [46]. Almost all of the sequences (except CpbHLH003) exhibited two highly conserved motifs: one is contains 29 amino acids, and the other consists of 21 amino acids, are shown in red and blue blocks, respectively ( Fig.2B and Fig.4A). Among the two motifs, motif 1 was composed of basic residues and helix 1, and motif 2 was composed of a loop and helix 2. And the space between motif 1 and 2 consists of a loop, which is variable in length in some bHLH proteins.
The sequence logos of motif 1 (in red) and motif 2 (in blue) are shown in Fig. 4A. The backbones of motif 1 and 2 are also conserved among plant species [10,20,47], which is believed to indicate that these highly conserved residues in bHLH domains are responsible for dimerization [10].
In addition to the two common conserved motifs shared by the CpbHLH proteins, some CpbHLHs that are mainly distributed into eight subfamilies (includingⅤa, Ⅴb, Ⅲf, Ⅳa, Ⅲb, Ⅲ , Ⅰb and Ⅰa subfamilies ) harbor another conserved motif (motif 3) with a length of 36 amino acids. This motif is indicated by the green blocks and the sequence logo is visualized as logo3 ( Fig. 2B and Fig. 4B). In previous reports, members of a given subfamily of the bHLH superfamily exhibited another conserved motif (conserved nonbHLH motif: motif 3) in plants [3]. However, in papaya, members of the bHLH proteins have the same motif that is distributed into eight subfamilies, not just one subfamily. In addition, among the 73 CpbHLHs, one protein (CpbHLH003) exhibited incomplete bHLH domains, whereas the remaining 72 bHLH proteins all presented complete bHLH domains. Similar observations have been made in other plant species, such as peach and blueberry [22,36].

Promoter analysis of bHLH genes in papaya
To further understand their functions and regulation patterns, cis-elements in CpbHLH gene promoter sequences were investigated. Regions of 2,000 bp upstream from the start codons of each CpbHLH gene were analyzed using PlantCARE. The results showed that the cis-elements could be divided into protein interactions [5]. In this study, 72 and 73 (out of 73) CpbHLH proteins were found to have Leu-27 (corresponding to Leu-27 in AtbHLHs) and Leu-63 (corresponding to Leu-73 in AtbHLHs), respectively ( Fig. 3 and Table S2).
Because of a lack of reported experimental data and databases, we used Arabidopsis as the reference species to perform a GO annotation of papaya bHLH proteins, and 54 of 73 CpbHLH proteins were obtained with results compared to Arabidopsis. We summarized the results in Fig. 6 and Additional file 5. The majority of CpbHLH proteins were involved in DNA binding. Almost all of the CpbHLH proteins (37, 68.5%) were predicted to localize in the nucleus, whereas the remaining CpbHLH proteins were located in other organelles, including plastids, the cytoplasm, and chloroplasts. Additionally, some CpbHLH proteins existed in multiple cellular components. For example, CpbHLH013 was located in three cellular components: chloroplasts, part of the cytoplasm, and the nucleus, which may reflect its multiple functions in various biological processes. The metabolic processes involved the greatest number of CpbHLH proteins (47, 87.0%). Biosynthetic processes and gene expression involved the second greatest number of CpbHLH proteins (46, 85.2%). In addition, bHLH proteins could respond to stimulus, morphogenesis, cell differentiation and the developmental process.

Expression analysis of bHLH superfamily genes under different abiotic stresses
The bHLH proteins have been characterized functionally in many plants with a vital role in the regulation of diverse biological processes, but little is known about their role in papaya. To analyze the functions of bHLHs responding to abiotic stresses, the expression profiles of 22 selected genes under salt, drought, ABA and cold stresses were investigated by using qRT-PCR (Table 2 and Fig. 7).
The results showed that 4 of 22 bHLH mRNAs were increased, and 3 bHLH mRNAs were reduced more downregulated. Under cold stress (4℃), there were 3 genes whose expression increased more than 1.5-fold, and 4 bHLH mRNAs were reduced more than 2-fold.  Interestingly, a few transcripts of bHLH responded to all or multiple stresses. For instance, bHLH056 was sensitive to all four stresses and was upregulated distinctly under the four stresses. The orthologue of bHLH056 in Arabidopsis is BEE1 (AtbHLH044) (Fig. 1), which has been functionally characterized in previous reports. At low temperatures, BEE1 is a positive regulator of flavonoid accumulation [49], which is consistent with our results. In addition, BEE1, BEE2 and BEE3 are functionally redundant positive regulators of BR (brassinosteroid) signaling, but these transcripts are repressed by ABA [29]. However, we found the transcription of bHLH056 was notably upregulated (>10-fold) under ABA treatment. More interestingly, bHLH042, it is an orthologue of BEE2 (Fig. S1), was distinctly repressed by ABA (approximately 4-fold). These results suggested that bHLH056 and bHLH042 may be provide different functionalities compared to Arabidopsis. Additionally CpbHLH027 was also upregulated distinctly under four stresses. In Arabidopsis, the orthologue of CpbHLH027 is AtbHLH116 (ICE1) (Fig. 1), which can be induced by Nacl, ABA and cold stresses, playing an important role in the cold-responsive signaling pathway via an ABA-independent pathway [41]. There are two orthologues of bHLH027 in rice, one ortholog is OsICE2/OsbHLH001, is induced by salt stress, and its overexpression can enhanced the tolerance to freezing and salt stress [44,45]. OsICE1/OsbHLH002 is another ortholog in rice, which is induced by chilling stress. OsbHLH002 can positively regulates cold signaling via targeting OsTPP1, which encodes a keyenzyme for trehalose biosynthesis [43]. These results implied bHLH027 plays essential roles in abiotic stresses in papaya. In addition, the transcript of CpbHLH062 was also increased under cold treatment, its orthologue is AtbHLH033/ICE2, which involving the cold response and the ABA pathway [42,50], implying the CpbHLH062 may involved the cold stress. bHLH053 was downregulated under salt, drought and ABA stresses. The orthologue of CpbHLH053 is AtbHLH129 (Fig. 1), which is a transcription repressor that negatively regulates the ABA response in Arabidopsis [51], implying CpbHLH053 may have the similar function with the AtbHLH129 in the process of ABA response.
We should also noticed a few bHLHs that showed distinct increases or decreases in their mRNA levels under different treatments while these bHLHs′ orthologues have not been reported in previous study.
For instance, CpbHLH050 is notably upregulated (>10-fold) under PEG treatment. CpbHLH046 is upregulated by PEG reatment, but sharply down regulated under ABA and cold treatments. implying these genes may have additional functions than response to drought by regulating root development.
We should also pay attention to these genes in the following research.

Discussion
Transcription factors (TFs) play key roles in the stress regulation network and signal pathways in plants. Basic/helix-loop-helix (bHLH) TFs are the second largest TFs family in plants and have been identified in many species [6,[18][19][20][21][22][23][24][25]. However, the bHLH TF family has not previously been reported in papaya (Carica papaya L.). In this paper, the number of 73 predicted bHLH genes were identified by bioinformatics methods in papaya. This TF family seemed to be one of the moderately sized families compared with other plant species, which might be because the papaya has a relatively small reference genome, the size is only 372 Mb[52]. Gene organization plays an important role in gene evolution. In this study, we found that bHLH genes showed diversities in their number introns, ranging  Table 2), showing four genes (HLH020/-027/-053/-056) involved in abscisic acid response. Another two genes (HLH020/-062) were also identified that were involved in abscisic acid response by GO annotation analysis and qRT-PCR (Additional file 5 and Table 2). We also identified a large number of cis-acting elements in bHLH genes that may respond to drought (MBS, 8.44%), which is also consistent with the qRT-PCR (including seven genes: bHLH027/-050/-056/-011/-068/-042/-053).
Other genes also had important elements, including LTR, TC-rich and WUN-motifs, which reflected 13 plant responses to low temperatures, defense stresses and wound-responsiveness, respectively.
These results implied bHLH genes may have a wide range of functions in papaya growth, disease resistance, and response to environmental conditions.
Many studies have shown that bHLH genes are involved in responses to various abiotic and biotic stresses. We also randomly selected 22 genes to investigate their expression profiles by using qRT-PCR under salt, drought, ABA and cold stresses ( Fig. 7 and Table 2). The results revealed some candidate CpbHLH genes that might be responsible for abiotic stress responses in papaya.

Conclusions
In conclusion, our study provided comprehensive information on the bHLH family in papaya, including phylogenetic relationships, gene structures, protein motifs, sequence features, promoter ciselements, GO annotations and gene expression profiles responsible for abiotic stress. These results of bHLH TFs will help to build a solid foundation for future exploration and potential improvements in 14 abiotic stress resistance, growth and development of papaya.

Phylogenetic tree building, motif identification and multiple sequence alignment
To investigate the phylogenetic relationship between bHLH proteins, protein sequences of papaya were pre-aligned using HMM align [56] and the pHMM HLH_ls.hmm from PFAM (https://pfam.xfam.org/family/PF00010) to identify the domains of bHLH TFs. Based on the manually aligned bHLH region of 158 bHLH proteins from Arabidopsis and 173 from rice [3], the identified bHLH domains were later aligned using MAFFT v7.305b [57]with default settings. Phylogenetic tree was constructed based on the neighbor-joining method using FastTree v2.1.11 [58]with default settings.
Bootstrapping with 1000 replicates was used to assess the statistical reliability of nodes in the tree.
Multiple sequence alignment based on protein sequences of these73bHLH TFs was generated by MAFFT v7.305b [57] with default settings.
The online tool Multiple EM for Motif Elicitation (MEME, version 5.02) was used to search for conserved motifs among the bHLH proteins (http://meme-suite.org/tools/meme) by uploading the protein sequences of the papaya bHLH superfamily. The parameter settings were as follows: zero or one, occurrence of a single motif per sequence; 3, maximum number of motifs found. All other parameters were set to the default values.

Promoter cis-acting Regulatory Element Analysis and Gene Ontology (GO) Annotation
The upstream 2000 bp genomic DNA sequences of the bHLH genes were downloaded and submitted to PlantCARE [59] to predict the putative cis-elements. The full-length protein sequences of papaya bHLH were blasted against Arabidopsis proteins with default parameters. The best hits were submitted to AgriGOv2.0 (http://systemsbiology.cau.edu.cn/agriGOv2/) for GO annotation [60]. GO terms include three aspects: biological process, cellular component and molecular function.

Plant materials, growth conditions and stress treatments
In this experiment, stems with axillary buds were selected as explants from two-year-old 'YiChiGua' papaya trees grown under standard field conditions in the Institute of Fruit Tree Research, Guangdong Academy of Agriculture Science, Guangzhou, China, and cultured in vitro to obtain the complete papaya seedlings with normal leaves and roots using tissue culture techniques. Healthy and uniform papaya seedlings were used for different treatments. For salt, drought and ABA treatments, seedlings were treated in MS liquid medium containing 200 mM NaCl, 25% PEG6000 (to mimic drought stress) and 100 μM ABA for 2 hours respectively, and then the roots were collected. For cold treatment, seedlings were subjected to 4℃ for 2 hours and the leaves were collected. All of the collected materials were immediately frozen in liquid nitrogen and stored at −80℃ for RNA isolation.
Untreated seedlings were used as the control groups. Three biological replications were carried out for each treatment.

RNA extraction and quantitative real-time PCR (qRT-PCR) analysis
Total RNA from papaya after different treatments was isolated using TRIzol reagent (Invitrogen). The

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and material
Putative bHLH transcription factors were downloaded from the PlantTFDBv2.0

Additional Files
Additional file 1: Detailed information of CpbHLH genes and CpbHLH proteins.
Additional file 2: bHLH subfamily members of papaya, rice and Arabidopsis.

Additional file 3: Promoter analysis of bHLH genes in papaya.
Additional file 4 and Additional file 5: GO annotation of bHLH proteins.
Additional file 6: Primers used for qRT-PCR in this study.      Quantitative RT-PCR analysis of 22 selected bHLH genes under cold stress condition (4℃).
The data are expressed as means ± SD of three independent biological determinations.
Untreated seedlings were used as the control groups. *P < 0.05 and **P < 0.01 (Student's t test) indicate significant differences between treated seedlings and control groups.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.