Genome-wide investigation of CCCH zinc nger family in longan (Dimocarpus longan Lour): characteristic identication and expression proles in longan somatic embryo

Background: CCCH Zinc nger ( Znf ) transcription factors ( TF ), as a novel type Znf genes, regulate genes expression by binding on their mRNA and play important roles in plant abiotic stress, growth and development. However, no overall genome-wide analysis or expression proling of CCCH ( C3H ) gene family in Dimocarpous longan , especially during the early stages of somatic embryo in longan has been studied. Longan is a tropical/subtropical fruit tree of great economic importance in Southeast Asia,and longan embryogenesis is the main factor affecting fruit quality and yield. Results: In this study, a comprehensive analysis of longan C3H (cid:0) DlC3H (cid:0)gene family was carried out. 49 DlC3H genes were identied from longan genome database,which divided into 3 clades. Besides, genes characteristics, phylogenetic tree, gene structure, motif composition were comprehensively analyzed. The analysis of alternative splicing events (AS) suggested that AS events of DlC3H genes were related to longan non-embryonic and embryonic callus transformation. Promoter analysis indicted that most of DlC3H genes included cis -elements associated with hormones and stress response. Quantitative real-time PCR analysis indicated that 26 DlC3Hs ,which possess MeJA and ABA responsive cis elements, showed different expression patterns and may involved into ABA and MeJA signaling pathway. The expression proles of 17 DlC3Hs were performed in four stages of longan, the results showed that only DlC3H01/07/14/16/38 was consistent with the data in the transcriptome. DlC3H 07/14/16/36/49 were highly expressed in EC and only DlC3H 04/38 was in GE , suggesting that they have different functions in embryonic development. Finally, sRNAs were veried involved into regulating 6 DlC3Hs . Conclusion: This study provides the rst systematic analysis of CCCH protein in longan somatic embryo. Particularly, CCCH genes may be involved in hormone and stress respond, and somatic embryogenesis. Our results presented here may provide a insight into the characteristics and functions of this family in somatic embryogenesis.

The expression pro les of 17 DlC3Hs were performed in four stages of longan, the results showed that only DlC3H01/07/14/16/38 was consistent with the data in the transcriptome. DlC3H 07/14/16/36/49 were highly expressed in EC and only DlC3H 04/38 was in GE , suggesting that they have different functions in embryonic development. Finally, sRNAs were veri ed involved into regulating 6 DlC3Hs .
Conclusion: This study provides the rst systematic analysis of CCCH protein in longan somatic embryo. Particularly, CCCH genes may be involved in hormone and stress respond, and somatic embryogenesis. Our results presented here may provide a insight into the characteristics and functions of this family in somatic embryogenesis.

Background
Transcription factors ( TFs ), as a gene widely distributed in plants, play an important role in the growth and development of plants and morphogenesis [1,2]. Zinc nger ( Znf ) transcription factors is one of the largest TF families containing RING-nger [3], LIM [4], WRKY [5] and DOF[6] gene families which regulate gene expression through DNA-binding and protein binding proteins. However, recent evidence suggests that CCCH (C3H) as a novel type Znf transcription factors regulate gene expression by binding on targets genes mRNA [7][8][9]. To verify its characteristics, based on the genome-data, the comprehensive analysis of C3H Znf family was performed in dicotyledonous Arabidopsis thaliana [9], Medicago truncatula [10], Populus trichocarpa [11], Clementine mandarin [12], Vitis vinifera [13], Cicer arietinum [14], monocotyledonous Musa acuminata [15], Zea mays [16], and Oryza sativa [9] , showed that C3H Znf family widely involved in biotic and abiotic pathways. Besides, C3H Znf members have various functions. In Arabidopsis, ATSZF1 and ATSZF2 negatively regulate the expression of salt responsive genes [17]. Cotton GHZFP1 can be induced by salt, drought and SA, and its transgenic tobacco shows resistance to fungal diseases [18]. Rice OsDOS controls leaf senescence through jasmonate (JA) pathway [19]. Arabidopsis HUA1 is con rmed a regulator for ower morphogenesis [20]. Moreover, C3H Znf genes is an essential regulator for plants somatic embryogenesis. Previous studies have reported that PEI1, as an embryo-speci c expression C3H Znf gene, directly regulate the heart-shaped embryo development in Arabidopsis [21]. Cucumber CsSEF1 shows the importance of controlling cell polarity, and marks the cotyledon primordia and procambium tissues in later developmental stages [22]. The above studies indicate that C3H Znf family plays a signi cant role in plant abiotic stress, growth and development and somatic embryo morphogenesis.
Sapindaceae plants are widely distributed in tropical and subtropical areas, including important tropical fruit trees such as longan, lychee, Nephelium lappaceum, and well-know oil plants such as Sapindus mukurossi, Xanthoceras sorbifolium. To date, many draft genome sequence of plants have been identi ed, which has greatly promoted the research of corresponding plants. Longan As an important fruit tree, fruit quality of longan is closely related to economic effects. Embryonic development as one of the main factors regulate longan fruit quality. Therefore, understanding the mechanism of longan embryonic development is critical for improving longan fruit quality.
Despite the CCCH Znf gene is great signi cance in plants, the comprehensive analysis of CCCH gene family in plants embryo has not been performed. Longan genome sequencing provides an opportunity to reveal the function at the genome-wide level[23]. The 49 DlC3Hs were identi ed from longan genome database. We further analyzed the gene characteristics, phylogenetic tree, gene structure, motif composition, alternative splicing events and promoter cis-elements. Additionally, the expression pro les of 26 DlC3Hs were carried out by RT-qPCR to explore their responses to methyl jasmonate (MeJA), abscisic acid (ABA) and their endogenous inhibitor (Salicylhydroxamic acid, SHAM, Sodium Tungstate Dihydrate, STD) treatment. According to the transcriptome data, 17 DlC3Hs were selected to analyze the expression levels in longan embryogenic callus EC , incomplete compact pro-embryogenic cultures ICpEC , globular embryos GE , non-embryonic callus NEC and their cleavage sites were veri ed . Our preliminary results might provide valuable clues for researching the function of the CCCH Znf gene family in plant embryonic development.

Results
Analysis the characteristics of longan C3H gene family According to annotation les of InterPro software, the 68 candidate longan CCCH Znf family members were found in longan genome database. Then, the BLASTP program and CD search were performed. A total 49 non-redundant CCCH Znf genes were con rmed in longan, then we named them DlC3H01 to DlC3H49. Gene characteristics, including the Arabidopsis orthologs locus, number of exons, length of CDS, molecular weight (kD), isoelectric point (PI), number of CCCH motif and subcellular localization were showed in Table 1. Among the 49 DlC3H genes, the DlC3H41 was identi ed to be the smallest protein with 136 amino acid, whereas the DlC3H27 was largest with 1811 amino acid. The number exons of the genes range from 1 to 14, the kD range from 14.46 (DlC3H41) to 198.20 (DlC3H27), and the PI range from 4.90 (DlC3H33) to 9.50 (DlC3H28). In addition, the number of CCCH motif of DlC3H gene family was the same as that in Arabidopsis and rice, which was range from 1 to 6. Finally, the subcellular location showed that 9 of DlC3H members were located in cytoplasm, 35 DlC3Hs members were located in nucleus, and the rest was secreted protein.
Phylogenetic analysis and conserved motif multiple sequence alignment A phylogenetic tree of longan and Arabidopsis was constructed by maximum likelihood (ML) method based the full length of protein sequence. The phylogenetic analysis showed that DlC3Hs and AtC3Hs gene family was divided into 3 clades contained 21, 39 and 55 members, respectively ( gure 1). DlC3Hs had 9, 19 and 21 members in each of the three clades. In the rst clade, four AtC3Hs members were not classi ed, and all members of the longan were classi ed. These results indicted that DlC3Hs had three different evolutionary directions. Such as AtC3H51 (PEI1), as a key protein for plant embryogenesis, was clustered with DlC3H01, speculated that they had similar function. The longan CCCH Znf domains were further multiple aligned according to the phylogenetic tree. The AtC3Hs (AtC3H01, AtC3H51, AtC3H08) in each clades were selected for a representatives. The results showed that longan CCCH Znf domain sequences were highly conserved in each clades with the length range from 19 to 27 amino acids ( gure 2). And it basically belonged to C-X 8 -C-X 5 -C-X 3 -H and C-X 7 -C-X 5 -C-X 3 -H types, suggesting that these two types were parallel evolutionary. Besides, the conservation of clade was the worst. There are three different types domain in clade belonging to DlC3H15-1 (C-X 9 -C-X 5 -C-X 3 -H), DlC3H21 (C-X 7 -C-X 4 -C-X 3 -H) and DlC3H25 (C-X 9 -C-X 4 -C-X 3 -H).
Gene structure and motif composition of DlC3Hs The introns and exons of all 49 DlC3Hs were identi ed for better understanding the evolution of DlC3Hs. As shown in gure 3B, among the 49 DlC3Hs, the number of exons were range from 1 to 14 ( eight with one exons, seven with two exons, six with three and four exons, two with ve exons, one with six exons, nine with seven exons, three with eight exons, one with nine exons, three with ten exons, one with 11 exons, one with 12 exons and one with 14 exons). In the same class, genes usually had the same structure, such as class , except DlC3H08, they all contained one intron. All class e/f members had no intron, except DlC3H26/15. Besides, within the same class, the intron structure were highly consistent. Although the gene structure and the introns phase were similar with phylogenetic relationship, the different between classes were signi cant.
The conserve motif was identi ed by CDD. Comparing the previous researches, the motif found in longan C3H family was the most containing 25 types ( gure 3C). Only one C3H domain was observed in 16 DlC3Hs. The rest genes all possessed 2 to 5 domains. The cluster genes (DlC3H23/27, DlC3H13/20, DlC3H45/10) had consistent motif composition indicating functional similarity in longan. In addition, some motifs were unique to one group, for example, motif 6, motif 7 and motif 23 were special to class d, a and f, respectively. The differentiation of motifs between different members re ected the functional diversi cation of DlC3Hs, and the function of motifs needed further veri cation. Overall, DlC3Hs members consisting of the same gene structure and motif composition were clustered into one branch of phylogenetic tree implying it's highly conserved.
Analysis the AS events of DlC3Hs in longan non-embryonic and embryonic cultures According to the RNA-seq analysis of longan NEC, EC, ICpEC and GE, the alternative splicing events of DlC3Hs were identi ed. A total of 445 AS events, including alternative 3' splice site (A3'S), alternative 5' splice site (A5'S), intron retaintion (IR) and exon skipping (ES), were detected from 29 DlC3Hs. The type of AS event and the statistics of AS events in 29 DlC3Hs was showed in Table 2. A3'S events (26.17%) were more frequent than A5'S events(18.30%). IR events were the most frequent with 45.17% (Table 2). This result was the same with previous studies which considered IR events were the most frequent events of AS in plants. Furthermore, the number of genes that with A3'S, A5'S and IR events was basically the same ( Table 2). In addition, as the gure 4A shown, AS events might play a key role in longan somatic embryo morphological. For example, in EC stage, IR event sharp decrease and A3'S/A5'S marked increase. The ES event slight rise in ICpEC and GE stages.Meantime, we counted the number of AS events in longan NEC, EC, ICpEC and GE. The results shown that the AS events occur most frequently in the NEC stage and least frequently in the EC stage.( gure 4B). This result suggested that the AS events in DlC3Hs was related with longan somatic embryogenesis.

Stress and hormone related cis-elements in DlC3Hs promoter
To further explore the potential regulatory mechanism of DlC3Hs during external stress, the promoters regions, which were up-stream 2Kb sequences of DlC3H genes translation starts site, were submitted into PlantCARE database to search cis-elements. A total of 559 cis-elements related to hormone and stress were detected in DlC3H genes ( Figure 7). Among them, except DlC3H07, DlC3H40 and DlC3H49, the rest genes contained at least 1 anaerobic induction element. Meanwhile, drought and low-temperature related ciselememts possessed in 25 and 16 DlC3Hs, respectively. This result showed that DlC3H family might response these abiotic stress. In addition, 36 DlC3H genes contained 164 MeJA responsive cis-elements and 31 DlC3Hs possessed 88 abscisic acid responsive element indicating that MeJA and ABA play a key role in DlC3Hs regulatory. Furthermore, 34 auxin-responsive elements existed in 20 DlC3Hs and 38 gibberellin-responsive elements were found in 23 DlC3Hs. 29 salicylic acid responsive element was located in 20 DlC3Hs. On the whole, the cis-element analysis suggested that DlC3Hs family could involved into abiotic stress and hormone responsive.
Expression patterns of DlC3H genes after ABA, MeJA and theirs endogenous inhibitor treatments According to the potential cis-elements analysis above, 26 DlC3H members, which possessed MeJA and ABA responsive cis-element, were selected from 49 DlC3H genes. The qPCR was performed to analyze their expression patterns after the identical concentration of MeJA, ABA and their endogenous inhibitor treatments. In ABA treatment, among the 26 DlC3Hs, 10 were up-regulated, 8 were down regulated and 8 DlC3Hs were no changed ( Figure 6). STD was the inhibitor of endogenous ABA. In STD treatment, among the 26 DlC3Hs, 4 were up-regulated, 13 were down regulated and 9 were no changed ( Figure 6). Some of DlC3Hs showed the opposite trends in ABA and STD treatment, such as DlC3H10/24/28/37/45/46 ( Figure  6). Most of DlC3Hs signal signi cantly up-regulated responded MeJA. However, in SHAM treatment, the expression of DlC3Hs was almost invariant compared the control. In addition, several DlC3Hs (DlC3H09/24/26/28/30/33/37/46) were up-regulated in MeJA treatment, and down-regulated in SHAM treatment ( Figure 7). This results implying that DlC3Hs were involved into ABA and MeJA signaling pathway.
Expression pro ling of DlC3Hs with RNA-seq and qPCR in longan non-embryonic and embryonic cultures The expression patterns of longan CCCH family in the longan NEC, EC, ICpEC and GE transcriptomes were investigated in this study (Transcriptome datas of DlC3H02, DlC3H08, DlC3H28, DlC3H29, DlC3H30 and DlC3H32 were absent.). As the gure 8 showed that the expression of 43 DlC3Hs was divided into 2 group. In the group , they were at a low expression levels in NEC stage, and high expression between EC stage and GE stage indicating that these genes were related to embryonic of longan somatic embryo. Moreover, 3 DlC3H genes were speci c in GE stage and 2 DlC3H genes in ICpEC stage. 12 DlC3Hs highly expressed in NEC and EC stage, which were clustered at group . This results implied that these genes which highly expressed in speci c stage might involved into their morphogenesis .
To further con rm whether the speci c expression of DlC3Hs could regulate longan somatic embryo morphogenesis of speci c stage, 17 DlC3Hs which highly expressed in a special stage were selected to study. Then, the qPCR was carried out to verify the expression patterns of these DlC3Hs in longan early SE. The results showed that only DlC3H01/07/14/16/38 was consistent with the data in the transcriptome. DlC3H05, DlC3H31, DlC3H39, DlC3H43 and DlC3H47 were down regulated during longan SE, and DlC3H38 and DlC3H41 showed the reverse trend,suggesting that members of the DlC3Hs gene family may have different functions in embryonic development. Whilst, 6 DlC3Hs (DlC3H07/11/14/16/36/49) were highly expressed in EC, and there were lower expression level of most DlC3Hs in ICpEC and GE than NEC and EC ( Figure 9). Small RNA involved into DlC3Hs transcription Small RNAs played an important role in plant growth and development .These regulatory small RNAs (mainly include miRNAs and ta-siRNAs, sic passim) negatively regulate gene expression at posttranscriptional level by directing the cleavage of target transcript (mRNA) [30].Li Yiqun reperted that the MulZF1 which s a zinc nger protein containing CCCH domain is the target gene of mul-miRn26 in Morus alba L [31].To understand whether the DlC3Hs were regulated by sRNA in longan, the modi ed RLM-RACE was carried out to veri ed the cleavage site of 17 DlC3Hs which highly expressed in a special stage.As the gure 10 shown, among the 17 DlC3Hs, the fragments of 6 DlC3Hs (DlC3H01/03/05/11/19/39 ) were detected. The 6 DlC3Hs had 1 to 5 cleavage sites. Meantime, the longan small RNA (sRNA) database was used to predict the potential sRNA that could cleaved the 6 DlC3Hs. As the results shown, the 14 cleavage sites of 6 DlC3Hs were identi ed as the putative cleavage site for 131 sRNAs ( Figure 10, Additional le 2 to 7 ). This implied that sRNAs could widely involve into DlC3Hs pathway. For example, each of three cleavage sites of DlC3H01 could be combined with 4, 5 and 17 sRNA, respectively. Among these sRNA, 21 sRNA had been registered in miRBase database. It is suggested that miRNA could regulate DlC3Hs in longan somatic embryogenesis. Meantime, a larger number animal origin miRNAs were found in these sRNAs, indicating that the C3H family might conserved between plant and animal in terms of the formation principle of miRNA. Furthermore, the rest 5 sRNA had no similar in miRBase database and their had a reliable E value (one with 1.5, one with 2.5, three with 3.0). Thus, we speculated that they might be siRNA or piRNA.
We further study the gene structure and motif composition of longan CCCH Znf family. DlC3Hs within the same groups share similar intron/exon conposition, intron phase and conserved domain. This suggest that longan CCCH Znf members are functionally conservative during the evolution. In addition, there are a larger number of conserved domain except CCCH Znf domain, which is a functional region of the protein. 15 conserved domains are identi ed in Populus trichocarpa [11], 13 in Arabidopsis [9], 8 in Medicago truncatula [10], 6 in Vitis vinifera [13], 10 in maize [16], 16 in Clementine mandarin [12]. There  DlC3H genes may response to plant stress and hormone response Promoter, as a non-coding region upstream of coding gene, plays a key role in regulating gene expression.
In this study, identi cation of a large number of cis-acting elements associated with biotic and abiotic stresses, including MeJA (29.34%), ABA (15.20%), SA (5.19%), Auxin (6.10%), GA (6.80%), drought (6.40%), anaerobic induction (27.01%) and low-temperature responsive elements. It is suggested that the longan CCCH Znf family may involve in these signaling pathways. In MeJA and ABA treatment, longan CCCH Znf members can increase or decrease the expression in response to hormone signals. This results are similar with previous research in Arabidopsis [9][43], Populus trichocarpa [11], maize[16] and Medicago truncatula [10]. Although many studies have shown that CCCH Znf can respond to exogenous hormone signals, the effects of endogenous hormones on its expression remain unknown. In endogenous inhibitor of MeJA and ABA treatment, the effect of endogenous hormone inhibitors on some DlC3Hs are more obvious than exogenous hormones. Besides, some DlC3Hs have opposite expression trends to hormone and their endogenous inhibitors, such as DlC3H10/24/28/37/45/46 to ABA and STD DlC3H09/24/26/28/30/33/37/46 to MeJA and SHAM. These results demonstrate the importance of the CCCH Znf transcription factor family in plant stress and hormone response.

Potential roles of DlC3H genes during plants somatic embryo
Comprehensive analysis of the CCCH Znf family has not been reported in plants somatic embryo. Combined with longan transcriptome data and RT-qPCR analysis, only DlC3H01/07/14/16/38 was consistent with the data in the transcriptome. DlC3H07/11/14/16/19/36//38/49 were highly expressed in single stage suggesting that this members can participate in speci c stage of longan somatic embryo morphogenesis. Most of them are highly expressed in EC, indicating that these DlC3Hs play an important role in the formation of longan embryonic cells. Some DlC3Hs up-regulated (DlC3H38/41) or downregulated (DlC3H05/31/39/43/47) during NEC to GE indicate that up-or down-regulation of these genes can promote the formation and differentiation of longan somatic cells. These result are similar to the function of AtPEI1[21] and CsSEF1[22] in somatic embryogenesis. Moreover, sRNA is an endogenous noncoding small molecule regulator, and many studies have shown that sRNA is of importance role to longan somatic embryogenesis [44][45][46]. While DlC3Hs is involved in plant growth and stress response, it is also regulated by sRNA. 14 cleavage sites of 6 DlC3Hs may be regulated by 131 presumed sRNA. Moreover, a large number of animal-derived sRNA also re ects the conservatism of CCCH Znf family in the animals and plants.

Conclusions
In conclusion,49 DlC3H genes were identi ed in the longan genome,which divided into 3 clades. The results of a comprehensive analysis demonstrate the importance of CCCH zinc fnger genes in the regulation of plant somatic differentiation and in response to hormone and stresses. A systematic and comprehensive analysis of longan CCCH Znf family is conducive to further screening DlC3Hs for functional identi cation, as well as to improving longan fruit quality and enhancing genetic improvement against stress.

Plant materials and treatments
The 'HHZ' longan friable-embryogenic callus (EC) that preserved by Institute of Horticultural Biotechnology, fujian agriculture and forestry university was used in this experiment. For hormone treatments, the EC was

Gene sequence characteristics analyze
The ExPasy website ( https://web.expasy.org/protparam/ ) was used to identify the length of sequence, molecular weight and isoelectric points of longan CCCH protein members. The local BLAST was performed to prediction homologous gene of longan CCCH genes in Arabidopsis genome database. In addition, subcellular location of longan CCCH protein members was predicted by LocTrees3 ( https://rostlab.org/services/loctree3/ ). The number of exon and CCCH Znf domain were obtain from longan genome database and NCBI conserved domain search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi ). The data of Alternative splicing AS events of 49 DlC3Hs were extracted from longan non-embryonic callus, embryonic callus, incomplete compact proembryogenic cultures and globular embryo transcriptome SRA050205 .

Phylogenetic analysis and multiple alignment of CCCH domain
The CCCH protein sequences of Arabidopsis were downloaded from PlantTF database ( http://planttfdb.cbi.pku.edu.cn/ ). All acquired protein sequences were aligned and constructed ML ( maximum likelihood ) phylogenetic tree by BioEdit software with default parameters and 1000 bootstrap.
The conserved domain amino sequence of longan CCCH members and selected Arabidopsis member were aligned by GeneDoc.

The cis-elements analysis of DlC3Hs' promoters
The up-stream sequences ( 2K ) of DlC3Hs CDS (coding sequences) were obtained by TBtools Gtf/Gff3 sequence extractor. Then we deleted the base N found in the promoter of DlC3H01, DlC3H12, DlC3H28 and DlC3H30. Next, the sequences were submitted to PlantCARE database to predict cis-elements.

RNA extraction and expression levels analyses of DlC3Hs
Total RNA was extracted by Trzol Reagent kit according to the protocol. The cDNA for quantitative PCR was synthesized by using PrimerScript RT Reagen Kit (Takara). Quantitative PCR was preformed with Roche LightCyclers 480 instrument using SYBR Prumix EX Taq TM (Takara). The 20 µL qPCR reaction was carried out containing 10 µL SYBR Prumix EX Taq TM , ddH 2 O 6.4 µL, 1 µL each primer ( 10 µM ), cDNA 2 µL. To acquire reliable results, three biological repeats and three technical repeats were preformed. The reference genes FSD, EF-1α and EIF-4α[27, 28] were used as the internal control. We obtained the relative expression of DlC3Hs according to the 2 -∆Ct method, and results were shown as mean and standard deviation ( SD ). All the primers used in this study were listed in additional le 1.

Small RNA cleaved veri cation of a part of DlC3Hs
Small RNAs can regulate gene expression by directing the cleavage of target transcript.To understand whether the DlC3Hs were regulated by sRNA in longan,17 DlC3Hs which highly expressed in a special stage were choosed to veri ed the cleavage site. The mixture of longan EC, ICpEC and GE total RNA was used to synthesized the cDNA for modi ed RLM-RACE followed the GeneRacer TM Kit instruction. Using DNAMAN, two gene special primers were designed for modi ed RLM-RACE. Then, the potential cleavage sites of a part of DlC3Hs were predicted by psRNAtarget software against longan sRNA database[29] with default parameters and a maximum expectation value of 3.5 ( except DlC3H36 ). All the primers used in this study were listed in additional le 1.

Consent for publication
Not applicable.

Availability of data and materials
All data presented in this study are provided either in the manuscript or additional les.

Competing interests
The authors declare that they have no competing interests. Authors' contributions ZXL and YLL designed and coordinated the research, and helped to draft the manuscript. YLS participated in its design, carried out the experimental work and wrote the manuscript. MQJ helped to draft the manuscript. SQH, XDX and XL prepared the materials. YLL revised the paper. All authors read and approved the nal version of the manuscript. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.   Figure 1 The phylogenetic tree of C3H genes from longan and Arabidopsis.
Page 19/27 The multiple alignment of DlC3Hs and selected AtC3Hs ZF-CCCH domain amino acid sequences.