Analysis the characteristics of longan C3H gene family
According to annotation files of InterPro software, the 68 candidate longan CCCH Znf family members were found in longan genome database. Then, the BLASTP program and CD search were performed. A total 49 non-redundant CCCH Znf genes were confirmed in longan, then we named them DlC3H01 to DlC3H49. Gene characteristics, including the Arabidopsis orthologs locus, number of exons, length of CDS, molecular weight (kD), isoelectric point (PI), number of CCCH motif and subcellular localization were showed in Table 1. Among the 49 DlC3H genes, the DlC3H41 was identified to be the smallest protein with 136 amino acid, whereas the DlC3H27 was largest with 1811 amino acid. The number exons of the genes range from 1 to 14, the kD range from 14.46 (DlC3H41) to 198.20 (DlC3H27), and the PI range from 4.90 (DlC3H33) to 9.50 (DlC3H28). In addition, the number of CCCH motif of DlC3H gene family was the same as that in Arabidopsis and rice, which was range from 1 to 6. Finally, the subcellular location showed that 9 of DlC3H members were located in cytoplasm, 35 DlC3Hs members were located in nucleus, and the rest was secreted protein.
Phylogenetic analysis and conserved motif multiple sequence alignment
A phylogenetic tree of longan and Arabidopsis was constructed by maximum likelihood (ML) method based the full length of protein sequence. The phylogenetic analysis showed that DlC3Hs and AtC3Hs gene family was divided into 3 clades contained 21, 39 and 55 members, respectively (figure 1). DlC3Hs had 9, 19 and 21 members in each of the three clades. In the first clade, four AtC3Hs members were not classified, and all members of the longan were classified. These results indicted that DlC3Hs had three different evolutionary directions. Such as AtC3H51 (PEI1), as a key protein for plant embryogenesis, was clustered with DlC3H01, speculated that they had similar function. The longan CCCH Znf domains were further multiple aligned according to the phylogenetic tree. The AtC3Hs (AtC3H01, AtC3H51, AtC3H08) in each clades were selected for a representatives. The results showed that longan CCCH Znf domain sequences were highly conserved in each clades with the length range from 19 to 27 amino acids (figure 2). And it basically belonged to C-X8-C-X5-C-X3-H and C-X7-C-X5-C-X3-H types, suggesting that these two types were parallel evolutionary. Besides, the conservation of clade Ⅱ was the worst. There are three different types domain in clade Ⅱ belonging to DlC3H15-1 (C-X9-C-X5-C-X3-H), DlC3H21 (C-X7-C-X4-C-X3-H) and DlC3H25 (C-X9-C-X4-C-X3-H).
Gene structure and motif composition of DlC3Hs
The introns and exons of all 49 DlC3Hs were identified for better understanding the evolution of DlC3Hs. As shown in figure 3B, among the 49 DlC3Hs, the number of exons were range from 1 to 14 ( eight with one exons, seven with two exons, six with three and four exons, two with five exons, one with six exons, nine with seven exons, three with eight exons, one with nine exons, three with ten exons, one with 11 exons, one with 12 exons and one with 14 exons). In the same class, genes usually had the same structure, such as class Ⅰ, except DlC3H08, they all contained one intron. All class Ⅱe/f members had no intron, except DlC3H26/15. Besides, within the same class, the intron structure were highly consistent. Although the gene structure and the introns phase were similar with phylogenetic relationship, the different between classes were significant.
The conserve motif was identified by CDD. Comparing the previous researches, the motif found in longan C3H family was the most containing 25 types (figure 3C). Only one C3H domain was observed in 16 DlC3Hs. The rest genes all possessed 2 to 5 domains. The cluster genes (DlC3H23/27, DlC3H13/20, DlC3H45/10) had consistent motif composition indicating functional similarity in longan. In addition, some motifs were unique to one group, for example, motif 6, motif 7 and motif 23 were special to class Ⅲd, Ⅱa and Ⅱf, respectively. The differentiation of motifs between different members reflected the functional diversification of DlC3Hs, and the function of motifs needed further verification. Overall, DlC3Hs members consisting of the same gene structure and motif composition were clustered into one branch of phylogenetic tree implying it’s highly conserved.
Analysis the AS events of DlC3Hs in longan non-embryonic and embryonic cultures
According to the RNA-seq analysis of longan NEC, EC, ICpEC and GE, the alternative splicing events of DlC3Hs were identified. A total of 445 AS events, including alternative 3’ splice site (A3’S), alternative 5’ splice site (A5’S), intron retaintion (IR) and exon skipping (ES), were detected from 29 DlC3Hs. The type of AS event and the statistics of AS events in 29 DlC3Hs was showed in Table 2. A3’S events (26.17%) were more frequent than A5’S events(18.30%). IR events were the most frequent with 45.17% (Table 2). This result was the same with previous studies which considered IR events were the most frequent events of AS in plants. Furthermore, the number of genes that with A3’S, A5’S and IR events was basically the same (Table 2). In addition, as the figure 4A shown, AS events might play a key role in longan somatic embryo morphological. For example, in EC stage, IR event sharp decrease and A3’S/A5’S marked increase. The ES event slight rise in ICpEC and GE stages.Meantime, we counted the number of AS events in longan NEC, EC, ICpEC and GE. The results shown that the AS events occur most frequently in the NEC stage and least frequently in the EC stage.(figure 4B). This result suggested that the AS events in DlC3Hs was related with longan somatic embryogenesis.
Stress and hormone related cis-elements in DlC3Hs promoter
To further explore the potential regulatory mechanism of DlC3Hs during external stress, the promoters regions, which were up-stream 2Kb sequences of DlC3H genes translation starts site, were submitted into PlantCARE database to search cis-elements. A total of 559 cis-elements related to hormone and stress were detected in DlC3H genes (Figure 7). Among them, except DlC3H07, DlC3H40 and DlC3H49, the rest genes contained at least 1 anaerobic induction element. Meanwhile, drought and low-temperature related cis-elememts possessed in 25 and 16 DlC3Hs, respectively. This result showed that DlC3H family might response these abiotic stress. In addition, 36 DlC3H genes contained 164 MeJA responsive cis-elements and 31 DlC3Hs possessed 88 abscisic acid responsive element indicating that MeJA and ABA play a key role in DlC3Hs regulatory. Furthermore, 34 auxin-responsive elements existed in 20 DlC3Hs and 38 gibberellin-responsive elements were found in 23 DlC3Hs. 29 salicylic acid responsive element was located in 20 DlC3Hs. On the whole, the cis-element analysis suggested that DlC3Hs family could involved into abiotic stress and hormone responsive.
Expression patterns of DlC3H genes after ABA, MeJA and theirs endogenous inhibitor treatments
According to the potential cis-elements analysis above, 26 DlC3H members, which possessed MeJA and ABA responsive cis-element, were selected from 49 DlC3H genes. The qPCR was performed to analyze their expression patterns after the identical concentration of MeJA, ABA and their endogenous inhibitor treatments. In ABA treatment, among the 26 DlC3Hs, 10 were up-regulated, 8 were down regulated and 8 DlC3Hs were no changed (Figure 6). STD was the inhibitor of endogenous ABA. In STD treatment, among the 26 DlC3Hs, 4 were up-regulated, 13 were down regulated and 9 were no changed (Figure 6). Some of DlC3Hs showed the opposite trends in ABA and STD treatment, such as DlC3H10/24/28/37/45/46 (Figure 6). Most of DlC3Hs signal significantly up-regulated responded MeJA. However, in SHAM treatment, the expression of DlC3Hs was almost invariant compared the control. In addition, several DlC3Hs (DlC3H09/24/26/28/30/33/37/46) were up-regulated in MeJA treatment, and down-regulated in SHAM treatment (Figure 7). This results implying that DlC3Hs were involved into ABA and MeJA signaling pathway.
Expression profiling of DlC3Hs with RNA-seq and qPCR in longan non-embryonic and embryonic cultures
The expression patterns of longan CCCH family in the longan NEC, EC, ICpEC and GE transcriptomes were investigated in this study (Transcriptome datas of DlC3H02, DlC3H08, DlC3H28, DlC3H29, DlC3H30 and DlC3H32 were absent.). As the figure 8 showed that the expression of 43 DlC3Hs was divided into 2 group. In the group Ⅰ, they were at a low expression levels in NEC stage, and high expression between EC stage and GE stage indicating that these genes were related to embryonic of longan somatic embryo. Moreover, 3 DlC3H genes were specific in GE stage and 2 DlC3H genes in ICpEC stage. 12 DlC3Hs highly expressed in NEC and EC stage, which were clustered at group Ⅱ. This results implied that these genes which highly expressed in specific stage might involved into their morphogenesis .
To further confirm whether the specific expression of DlC3Hs could regulate longan somatic embryo morphogenesis of specific stage, 17 DlC3Hs which highly expressed in a special stage were selected to study. Then, the qPCR was carried out to verify the expression patterns of these DlC3Hs in longan early SE. The results showed that only DlC3H01/07/14/16/38 was consistent with the data in the transcriptome. DlC3H05, DlC3H31, DlC3H39, DlC3H43 and DlC3H47 were down regulated during longan SE, and DlC3H38 and DlC3H41 showed the reverse trend,suggesting that members of the DlC3Hs gene family may have different functions in embryonic development. Whilst, 6 DlC3Hs (DlC3H07/11/14/16/36/49) were highly expressed in EC, and there were lower expression level of most DlC3Hs in ICpEC and GE than NEC and EC (Figure 9).
Small RNA involved into DlC3Hs transcription
Small RNAs played an important role in plant growth and development .These regulatory small RNAs (mainly include miRNAs and ta-siRNAs, sic passim) negatively regulate gene expression at post-transcriptional level by directing the cleavage of target transcript (mRNA)[30].Li Yiqun reperted that the MulZF1 which s a zinc finger protein containing CCCH domain is the target gene of mul-miRn26 in Morus alba L[31].To understand whether the DlC3Hs were regulated by sRNA in longan, the modified RLM-RACE was carried out to verified the cleavage site of 17 DlC3Hs which highly expressed in a special stage.As the figure 10 shown, among the 17 DlC3Hs, the fragments of 6 DlC3Hs (DlC3H01/03/05/11/19/39 ) were detected. The 6 DlC3Hs had 1 to 5 cleavage sites. Meantime, the longan small RNA (sRNA) database was used to predict the potential sRNA that could cleaved the 6 DlC3Hs. As the results shown, the 14 cleavage sites of 6 DlC3Hs were identified as the putative cleavage site for 131 sRNAs ( Figure 10, Additional file 2 to 7 ). This implied that sRNAs could widely involve into DlC3Hs pathway. For example, each of three cleavage sites of DlC3H01 could be combined with 4, 5 and 17 sRNA, respectively. Among these sRNA, 21 sRNA had been registered in miRBase database. It is suggested that miRNA could regulate DlC3Hs in longan somatic embryogenesis. Meantime, a larger number animal origin miRNAs were found in these sRNAs, indicating that the C3H family might conserved between plant and animal in terms of the formation principle of miRNA. Furthermore, the rest 5 sRNA had no similar in miRBase database and their had a reliable E value (one with 1.5, one with 2.5, three with 3.0). Thus, we speculated that they might be siRNA or piRNA.