Genome-Wide Analysis of Basic Helix-Loop-Helix (bHLH) TFs in Sea Buckthorn (Hippophae rhamnoides)

The basic helix-loop-helix (bHLH) transcription factor gene family is one of the largest gene families and extensively involved in plant growth, organ development, and stress responses. However, limited studies are available on the gene family in sea buckthorn. In this study, we focused on 144 HrbHLH genes, exploring their DNA and protein sequences and physicochemical properties. According to their protein sequence similarities, we classied the genes into 15 groups with specic motif structures. In order to explore their expressions, we performed gene expression proling using RNA-Seq and identied 108 HrbHLH genes that expressed in ve sea buckthorn tissue, including root nodule, root, leaf, stem and fruit. Furthermore, we found 11 increased expressed HrbHLH genes during sea buckthorn fruit development. We validated the expression pattern of HrbHLH genes using reverse transcription quantitative real-time PCR.


Background
The basic helix-loop-helix (bHLH) TFs (TFs) family is widely existed in eukaryotes and plays an important role in plant growth and development [1]. However, limited studies are available in plants, especially in sea buckthorn. Since the rst bHLH protein structure was analyzed [2], more than 100 bHLH proteins have been identi ed in various species. The bHLH TFs family is named for its highly conserved bHLH domain, which consists of about 60 amino acids, including the basic region distributed at the Nterminus and the helix-loop-helix (HLH) region distributed at the C-terminus of the polypeptide chain [3]. The basic region contains about 15 amino acids and plays an important role in recognizing and binding to the target DNA. The HLH region, ranging from 40 to 50 amino acids in length, consists of two amphiphilic alpha helices and a loop structure with uncertain length and sequence. The two amphiphilic alpha helices can form homodimer or heterodimeric to interact with other bHLH proteins. The bHLH TFs can identify the element called E-box (5′-CANNTG-3′), with the most common one is G-box (5′-CACGTG-3′). Studies have shown that the nucleotides in the ank of the external of core element also impact speci c binding. Previous studies indicated that the known bHLH proteins in animals can be divided into six groups groups (A-F) [1]. However, many of the identi ed bHLH proteins belong to group B in plants, the most members of which are characterized by binding to G-box [4].
Sea buckthorn (Hippophae rhamnoides) is a hardy, fast growing, deciduous, spiny shrub, naturally distributed in Asia and Europe [26][27][28]. Sea buckthorn fruit and leaves have high levels of vitamins C, E and A, organic acids, amino acids, fatty acids, carotenoids and avonoids [29][30][31]. In this study, we focus on 144 bHLH genes in sea buckthorn, in order to investigate their structures and functions, especially differential gene expression patterns in different tissue and different fruit development stages.

Identi cation and Characterization of the Sea Buckthorn bHLH Gene Family
We downloaded the genome-wide protein sequences of sea buckthorn from H. rhamnoides Information Archive database. The Hidden Markov Model (HMM) pro le of the bHLH domain (PF00010) was downloaded from the Pfam database [35], and was used as a query to scan the proteome le via HMMER software (version 3.1) with a default E-value [36]. The protein sequences for bHLH genes shown in those HMMER results were obtained from the proteome le using TBTools (https://www.tbtools.com/). Redundant sequences were removed with online ElimDupes software, and a few sequences with obvious errors were removed manually.
The GFF le, containing location data and annotation information for sea buckthorn genes, was downloaded from the H. rhamnoides Information Archive database. MapInspect software was used to map the sea buckthorn bHLH genes on different chromosomes, and annotation data in the GFF le were exhibited by the online tool WEGO2.0 [37]. The lengths, masses, isoelectric point (PI)-values, and charge at pH7.0 for these bHLH protein sequences were determined with DNAstar software, and length distributions and functional annotations were analyzed with Blast2GO software [38].

Multiple Sequence Alignment and Phylogenetic Analysis
A total of 144 predicted HrbHLH proteins, with amino acids spanning the bHLH core domain, were subjected to a multiple sequence alignment using ClustalX 2.0 with the default parameters and the further multiple sequence alignment was performed using ClustalW 2.0 [39]. The phylogenetic tree representing HrbHLH proteins was generated using MEGA 7.0 software and the Maximum likelihood method, with the following settings: mode, "p-distance"; gap setting, "Complete Deletion"; and bootstrap test replicate, '1,000' [40].

Gene Structure and Conserved Motifs Analyses of HrbHLH Genes
Information about the gene structure (intron-exon) of each putative bHLH gene was obtained from the GFF le, downloaded from the GDR database. The schematic structures of these genes were drawn with the online Gene Structure Display Server (GSDS2.0) [41]. Local MEME software was used to identify conserved motifs in the protein sequences according to the following parameters: -protein, -oc m12, -mod zoops, -nmotifs 12, -minw 6, and -maxw 70 [42]. The results from these analyses of gene structure and conserved motifs were arranged according to the order shown on the phylogenetic tree.

3D Protein Homology Modeling and Protein Properties
First of all, BLASTP search with the default parameters was performed in the Protein Data Bank (PDB) with all bHLH proteins for identi cation of the best template having similar sequence and known threedimensional structure. Using 'intensive' mode in Protein Homology/Analogy Recognition Engine (Phyre2) [43], the data was analyzed for prediction of protein structure of sea buckthorn bHLHs. The theoretical isoelectric point (PI) and protein statistics were analyzed using ExPASy and Sequence Manipulation Suite, respectively.

Expression Analysis of HrbHLH Genes in Sea Buckthorn
To evaluate the sea buckthorn bHLH gene expression patterns, we downloaded the Illumina RNA-seq data of sea buckthorn different tissues from H. rhamnoides Information Archive database, including root nodule, root, leaf, steam and fruit tissues. First, all these raw data were spitted and converted to fastq format le by the NCBI SRA Toolkit's fastq-dump command. The quality of fastq les were evaluated with FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Also, low-quality reads were trimmed with local perl script. After nal quality check, RNA-seq sequence datasets were aligned to the whole sea buckthorn genome by using TopHat2 [44]. Normalization of the gene expression values were carried out by the fragments per kilobase of exon model per million mapped reads (FPKM) algorithms. The absolute ratio of log 2 foldchange ≥ 1 and FDR-value ≤ 0.001 were used for the identi cation of differentially expressed bHLH genes. The heatmaps of hierarchical clustering were visualized with OmicShare tools (http://www.omicshare.com/tools).

Plant Materials and Growth Conditions
The H. rhamnoides L. subsp. Mongolica Rousi was used in this study. All these sea buckthorns were planted in the desert Forest Experimental Center in Inner Mongolia, China. The records of planting, preservation and identi cation of all plant materials are completed and archived by this center, and the management of all experimental materials in this center is supervised and managed by China National Laboratory Accreditation Committee (CNACL). Healthy fresh sea buckthorn fruits were harvested at 30, 37, 42, 49, 56 and 63 days post-anthesis, and quickly placed in the liquid nitrogen, and stored at − 80 °C until next use.

RNA Extraction and Real-Time Quantitative qRT-PCR Analysis
Plant materials were harvested, frozen in liquid nitrogen, and then ground under RNase-free conditions.
The RNA was extracted with TRizol reagent (Invitrogen), following the manufacturer's instructions, and then treated with DNase I at 37 °C for 30 min. The RNA (1 µg) was then reverse-transcribed using a PrimeScript First-strand cDNA Synthesis Kit (Takara,Dalian, China) according to the manufacturer's instructions. A 10-µL aliquot of cDNA was diluted to 100 µL with water, and 2 µL of that diluted cDNA was used for the analyses. For real-time quantitative qRT-PCR, gene speci c primers for our selected HrbHLHs were designed and synthesized by Sangon Biotech (Shanghai, China) Company (product size 110-130 bp; Tm 59-61 °C; details are shown in Supplementary Table S1), and Hr18S was used as an internal control. All reactions were performed on an Icycler iQ5 system (Bio-Rad), using the SYBR Green Supermix Kit (Bio-Rad) according to the manufacturer's instruction. Expression levels of these genes were calculated as 2 ΔCT values. Besides, relative expression levels for each gene along the time series were also calculated as 2 ΔΔCT values. At least three biological replicates were used for the uorescencequantitative PCR reactions, with each biological repeat having at least three technical replicates. Each biological repeat contains at least 6 plantlets for mixing.

Identi cation, Chromosomal Locations, and Functional Annotation of Sea Buckthorn bHLH Genes
For genome-wide identi cation of HrbHLH genes, we used the HMM le as a query to search the sea buckthorn proteome. A total of 180 putative HrbHLH protein sequences were obtained. After the existence of the conserved bHLH domain was con rmed by SMART and CD-Search and redundant sequences were removed, we nally identi ed 144 bHLH proteins in the sea buckthorn genome. Based on their chromosomal locations, these sea buckthorn bHLH genes were named from HrbHLH1 to HrbHLH144 (Fig. 1). Sequence analysis revealed that these HrbHLH proteins vary widely in length and have an average of 326.5 aa. The most length of HrbHLH proteins was HrbHLH76 (758 aa), and the least length of HrbHLH proteins was HrbHLH129 (92 aa). Their predicted molecular weights ranged from 10.23 kDa to 81.93 kDa, with avarage of 36.47 kDa. Their predicted pI values ranged from 4.50 to 10.61. Gene IDs, genomic positions, and annotations information were also summarized for these HrbHLH proteins (Supplementary Table S1).
According to the genomic position of putative HrbHLH genes, we showed that the HrbHLH genes were found across all the 12 chromosomes, ranging from 1 to 21 per chromosome (Fig. 1). However, the other 4 bHLHs (HrbHLH1-HrbHLH4) were localized to unassembled genomic contigs, and cannot be mapped to any particular chromosome according to what we currently know about this genome (Fig. 1). Chromosome 3 and 12 has the most HrbHLHs (21 total), followed by chr 11 (17 genes) and chr 10 (16 genes). The sea buckthorn genome database provided GO and KEGG annotation information about these HrbHLH proteins. Among these HrbHLH, three HrbHLH genes (HrbHLH7, HrbHLH133, HrbHLH39) were related with organ development. In addition, two HrbHLH genes (HrbHLH120 and HrbHLH107) were enriched into plant hormone signal transduction pathway.

Phylogenetic Analysis and Prediction of Conserved Motifs
The exact number of subgroup classi cations for plant bHLH proteins is unknown, but is thought to be 15-32. To evaluate the evolutionary relationships of our 144 proteins, we conducted a phylogenetic analysis based on full-length protein sequences. Applying the ML method, we assigned the proteins to 15 main groups (G1-G15) (Fig. 2). The average size of the groups has approximately 9 members, ranging from 4 to 19. G15 group have the least members, including HrbHLH65, HrbHLH59, HrbHLH11, and HrbHLH45.
We identi ed 8 putative conserved protein motifs in HrbHLH family proteins using MEME online software (Motifs 1-8, Fig. 3). All HrbHLH family proteins contained Motif 1 and Motif 2, while HrbHLH in group G6 and G7 didn't have Motif 1. Also, all HrbHLHs in group G6 and G7 contained Motif 8. In each group, the components of the conserved motifs for most of the proteins were similar. For example, Motifs 1, 2, and 6 were identi ed in all 11 members of group G2, and Motifs 1, 2, and 3 were identi ed in all 19 members of group G1, and Motifs 4 were involved in 11 members of group G2. The evolutionary relationships among these HrbHLH proteins were also determined by analyzing their conserved motifs. These composition patterns tended to be consistent with the results from our phylogenetic tree, being nearly identical among genes within the same group, but varying greatly between groups.

Analysis of Gene Structures
The pattern of exon and intron positions can also provide important evidence to support phylogenetic relationships in a gene family. Gene structure analysis of the 144 HrbHLHs was performed to gain some information about exon-intron organization (Supplementary Figure S1). Among these 144 sea buckthorn genes, a total of 7 bHLH genes (HrbHLH26, HrbHLH48, HrbHLH52, HrbHLH58, HrbHLH63, HrbHLH67, HrbHLH109) without intron were found, which accounts for 5% of total HrbHLH genes. All these intronless genes were clustered into the group G3. According to exon-intron organization of bHLH genes in sea buckthorn, phylogenetically related proteins exhibited a closely related gene structure, in terms of intron number or exon length.

3D Homology Modeling of HrbHLH Proteins
To construct 3D homology model for bHLH proteins, BLASTP search was performed against the PDB. 15 HrbHLH proteins with high homology rate were selected. To predict homology modeling in Phyre2, which utilizes the alignment of hidden Markov models via HMM-HMM search, the detection rate method was used. The intensive mode was selected in Phyre2 to increase accuracy of alignment. In addition, it integrates a new abinitio folding simulation termed as poing to model areas of proteins without any signi cant homology for known structures. The 3D protein modelling of selected 15 bHLH proteins are predicted at > 90% con dence and the percentage residue varied from 80 to 100 (Fig. 4). The secondary structures were predominantly constituted of α helices and having rare incurrence of β sheets. Thus, all suggested protein structures are assessed to be highly reliable which offers a preliminary basis for understanding the molecular function of HrbHLH proteins.

Expression Patterns of HrbHLH Genes in Different Tissues
To further understand the function of sea buckthorn bHLH proteins, the expression patterns of sea buckthorn bHLH genes among 21 samples, including 5 tissues, were analyzed, according to the normalized RPKM data from RNA-sEq. Supplementary Table S2 shows the expression pro les of all bHLH genes in the 21 sea buckthorn samples. Among the 144 HrbHLH genes, 122 HrbHLH mRNAs had an RPKM value greater than 1 in at least one of the 21 samples, while the remaining 22 HrbHLH genes were expressed at very low levels in all 21 samples. Among these HrbHLH genes, HrbHLH47, HrbHLH74, HrbHLH90, HrbHLH131 were the highest expression level in root nodule, root, leaf, stem and fruit tissues, respectively.
In particular, HrbHLH9, HrbHLH42, HrbHLH69, HrbHLH74, HrbHLH75, HrbHLH85, HrbHLH90 and HrbHLH131 were constitutively produced at a relatively high level in all 21 samples, suggesting that these four bHLH genes perform a variety of functions in different tissues (Fig. 5). Furthermore, four HrbHLH genes showed preferential tissue-speci c expression, including one gene in root nodule (HrbHLH35), two genes in fruits (HrbHLH43 and HrbHLH58), one gene in stem (HrbHLH63) (Fig. 6). The speci c accumulation of these bHLH genes in a particular tissue suggests that they may play conserved regulatory roles in discrete cells, organs, or conditions.

Expression Pro ling for HrbHLH Genes during Sea Buckthorn Fruit Development Process
To analyze the expression pattern of HrbHLH genes among different development stages, we used the Fragments Per Kilobase per Million (FPKM) normalized data from RNA-sEq. Additional le showed the expression pro les of HrbHLH genes in sea buckthorn fruit different development stages. Among 144 HrbHLHs genes, 131 expressed at least in one fruit development stage, while the rest HrbHLHs genes not expressed in sea buckthorn fruit. Based on gene expression data, we identi ed 11 HrbHLH genes (HrbHLH27, HrbHLH54, HrbHLH70, HrbHLH79, HrbHLH91, HrbHLH92, HrbHLH100, HrbHLH104,  HrbHLH111, HrbHLH127, HrbHLH132) which showed a gradually decreased expression with fruit development and ripening, suggesting that they may function in fruit ripening. Among them, HrbHLH54, HrbHLH91 and HrbHLH92 exhibited a high expression in sea buckthorn ripe fruit, and the rest bHLH gene have low expression level in sea buckthorn fruit (Fig. 7). Furthermore, we identi ed 54 HrbHLH genes which showed a gradually increased expression with fruit development and ripening. Compared with down-regulated bHLH genes, these up-regulated bHLH genes have lower expression level. And, the expression level of 16 HrbHLH genes were con rmed with qRT-PCR (Fig. 8). Therefore, further characterization of the 11 HrbHLHs is highly important and will provide a new insight to understand the molecular mechanism of fruit development and ripening.

Discussion
Sea buckthorn is an important economic species of the Elaeagnus family [29]. The fruit of sea buckthorn is rich in nutrients, which can provide essential amino acids and have antioxidant activity [30]. With the nish of the sea buckthorn genome sequencing, we can identify and characterize transcription factor gene family at the whole genome level. Studying the transcription factor family of sea buckthorn is great signi cance for breeding excellent cultivars and improving fruit quality. However, no such detailed studies have been done with the bHLH family in sea buckthorn.
The second largest TF family across eukaryotic kingdoms is the bHLH superfamily [3]. Based on the highly conserved bHLH domain, the bHLH superfamily is categorized into two main groups: the basic region and the HLH region. The basic region is a DNA-binding interface located next to the N-terminus of the protein domain, whereas the HLH region serves as the dimerization domain and consists of two amphipathic α-helices linked by a loop [32]. Apart from the high frequency of bHLH conserved domains, some other motifs also occur within the bHLH superfamily. Based on the highly conserved domains and their evolutionary relationships, this superfamily is mainly divided into 20-25 subfamilies in agronomically important plants. A wide range of extensive studies have been reported on the bHLH superfamily-related genes in plant species after genome sequences have been produced, including cucumber [5], Arabidopsis [7], apple [10], tomato [24] and Carthamus tinctorius [32]. Additionally, functional and structural characterizations of most plant bHLH proteins have been described in detail. These ndings demonstrated that bHLH proteins are involved in various biochemical and physiological networks across the plant kingdom. Studies on bHLH revealed its partial involvement in iron uptake, response against salt and drought stress, tanshinone biosynthesis, and petal growth development. The most prominent studies on the bHLH family demonstrated its association with anthocyanin biosynthesis and regulation in owering plants and fruits [33].
Previous studies have suggested that the bHLH TFs were usually divided into 25 subfamilies in many plants. In this study, we identi ed 144 HrbHLH genes in sea buckthorn genome, which was less than bHLH genes in tomato and apple genome. Based on phylogenetic analysis, predicted conserved protein motifs and intron-exon organizations, all identi ed HrbHLHs were divided into 15 groups, which were different with other species [1,3]. These results indicate that sea buckthorn bHLHs have different classi cation model with other plants. Furthermore, our intron-exon gene structure and conserved protein motifs results strongly supported our classi cations of the HrbHLHs.
To further understand the function of sea buckthorn bHLH proteins, the expression patterns of sea buckthorn bHLH genes among 21 samples, including 5 tissues and 3 different developmental stages, were analyzed and normalized RPKM data from RNA-sEq. Supplementary Table S1 shows the expression pro les of all bHLH genes in the 21 sea buckthorn tissues. Among the 144 HrbHLH genes, 122 HrbHLH mRNAs had an RPKM value greater than 1 in at least one of the 21 samples, while the remaining 22 HrbHLH genes were expressed at very low levels in all 21 samples. Among these 122 HrbHLH genes, the expression level of 46 HrbHLH were more than 1 in all ve tissues. In particular, HrbHLH131, HrbHLH9, HrbHLH42, HrbHLH69 and HrbHLH85 were constitutively produced at a relatively high level in all 21 samples, suggesting that these ve bHLH genes perform a variety of functions in different tissues [13]. Furthermore, four bHLH genes showed preferential tissue-speci c expression, including one gene (HrbHLH35) in leaf, two genes (HrbHLH43 and HrbHLH58) in fruit, one gene (HrbHLH63) in stem. The speci c accumulation of these bHLH genes in a particular tissue suggests that they may play conserved regulatory roles in discrete cells, organs, or conditions [20,34].
Recently, more and more researches reveal that bHLH gene play important role in fruit development process, including tomato [11], peach [19], apple [20], almond [33] and Chinese jujube [34]. In this study, 91 HrbHLH mRNAs had an RPKM value greater than 1 in three fruit development stages. HrbHLH47 have a high expression level across whole fruit development process. HrbHLH47 is the homolog of AtPRE5, a key factor regulating cell elongation and plant development. The homologous comparison and protein interaction prediction also indicated that it might be involved in fruit enlargement [10][11]. The expression HrbHLH90 was signi cantly higher at the early stage of fruit development. This result also indicated that it might be involved in fruit enlargement [26,33]. In addition, the expression of HrbHLH91 and HrbHLH92 were signi cantly higher at the ripen stage of fruit development.

Conclusions
Our study provides the rst all-inclusive and systematic genome-wide analysis of the sea buckthorn bHLH superfamily. As a result, a set of 144 genes was characterized and classi ed into 15 subfamilies. Protein motifs, compositions, and their amino acid ratios were also thoroughly investigated. Gene expression analysis results provided preliminary information for future investigation of the sea buckthorn bHLH gene family during avonoid metabolism. The current investigation lays the highlights for functional characterization and con rmation of the sea buckthorn bHLH gene superfamily and improves our understanding of the bHLH gene superfamily in higher plants.

Consent to publish
Manuscript is approved by all authors for publication.

Availability of data and materials
The data used in this experiment are all downloaded from available public databases and analyzed by software. Usually these data do not need to be repeatedly provided to third parties because they can be easily obtained. Any reader who needs all the software analysis results can get it by contacting the corresponding author.    Figure 1 Distribution of 144 HrbHLH genes onto 12 sea buckthorn chromosomes Figure 8 Expression patterns of the HrbHLH candidate genes in different development stages of sea buckthorn fruit by qRT-PCR. S1-S5: ve development stages of sea buckthorn fruit.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.