Basic Analysis of Calcium-dependent Protein Kinase Gene and Its Closely Related Gene Families in Solanum Pennellii Genome

Calcium-dependent protein kinases (CDPK) are the main Ca2 + sensor involved in the regulation of plant growth and development and various stress responses. In this study, we identied 32 CDPK (SpCDPK) genes and 7 CDPK-related protein kinase (SpCRK) genes in the whole genome of Solanum Pennellii, which were unevenly distributed on 12 chromosomes. The SpCDPK and SpCRK proteins own ATP-binding region and Ser/Thr protein kinase region. However, the SpCDPK proteins had EF-hand calcium-binding region, but the SpCRK proteins lacked it. Phylogenetic analysis showed that the SpCDPK and SpCRK gene families in Solanum Pennellii could be divided into four subgroups, and the evolutionary relationship between Solanum Pennellii and Arabidopsis thaliana was closer. Further analysis revealed that the exon-intron structure and conserved motif of each subgroup were basically the same, but there were differences in cis-acting elements. In this study, we conducted a preliminary analysis of SpCDPK and SpCRK gene families in Solanum Pennellii to provide basic data for further exploration of its molecular mechanism. to analyze the physicochemical properties, subcellular localization and palmitoylation site of SpCDPK SpCRK gene Prosite tool of ExPASy retrieve EF-hand calcium-binding domain protein kinase MapGene2Chrom map position of genes chromosomes


Identi cation and biochemical characteristics of CDPK and CRK genes in Solanum Pennellii
In this study, 39 non-redundant sequences were collected in the whole genome of Solanum pennellii, including 32 SpCDPK and 7 SpCRK. All of the SpCRK and SpCDPK proteins were designated separately as SpCRK1-SpCRK7 and SpCDPK1-SpCDPK32 based on their accession locus. By analyzing the physicochemical properties of SpCDPK and SpCRK proteins, it was found that the length of open reading frame of the SpCDPK proteins ranged from 511-607 aa, the molecular weight ranged from 55.9-68.0 kDa and the isoelectric point ranged from 5.0-9.0. The length of open reading frame, molecular weight, isoelectric point of SpCRK protein ranged from 501-607 aa, 56.4-68.2 kDa, 6.0-9.3, respectively ( Table   1).
The feature domain analysis showed that the SpCRK carried ATP-binding region and Ser/Thr protein kinase region but lacked the EF-hand calcium-binding domain. However, except for SpCDPK3, SpCDPK5, SpCDPK27 and SpCDPK31, the ATP-binding region, Ser/Thr protein kinase region and four EF-hand calcium-binding domains existed in the remaining SpCDPK. Subcellular localization analysis showed that these genes acted in different organelles (Table 1).

Protein sequence comparison of SpCRKs and SpCDPKs
In order to investigate the sequence characteristics of SpCRK and SpCDPK proteins, we performed multiple sequence alignments on the protein sequences of SpCRK and SpCDPK ( Figure 2). As shown in Figure 2a, SpCRK and SpCDPK proteins contained protein kinase domains. Two highly conserved domains, the protein kinases ATP-binding region (LGxGxFGxTxCGxACKxIxK) and the Ser/Thr protein kinases region (VxHDRLKPENFLx), existed in the protein kinase domain.
The amino acid sequence analysis of the ATP-binding region of SpCDPK and SpCRK proteins was found that, compared with the SpCDPK proteins, the rst amino acid in the ATP-binding region of SpCRK genes were mutated from L (leucine) to I (isoleucine)/V (Valine). And the F (Phenylalanine) mutation occurred in SpCDPK15, T (Threonine) mutation in SpCRK3 and SpCDPK15, two C (cysteine) mutations in SpCRK1-4, SpCRK6, SpCRK7, SpCDPK14, SpCDPK16, and SpCDPK17, only one C mutation in SpCDPK3 and SpCDPK30. Obviously, only the fourth G (glycines) in the ATP-binding region was mutated, and this phenomenon occurred in 10 genes. The last amino acid in the ATP-binding region, K (lysine), was mutated in SpCDPK22 and SpCDPK30. There were also a few proteins with amino acid mutations in the Ser/Thr protein kinases region. Two amino acid mutations existed in the Ser/Thr protein kinases region of SpCRK5, SpCDPK14, SpCDPK16 SpCDPK17, and SpCDPK32, while the Ser/Thr protein kinases region of the other genes were intact. Therefore, in the SpCRK and SpCDPK families, the Ser/Thr protein kinases region was more conserved than the ATP-binding region.
As shown in Figure 2b, all SpCDPKs contained the EF-hand calcium-binding domain (DxD/NxGxE), while SpCRKs lacked it. Signi cantly, the second D (Asparticacid) was mutated to G in the rst EF-hand calcium-binding region of SpCDPK3, SpCDPK5, and SpCDPK27, which resulted in the deletion of an EF-hand calcium-binding region. It was also found that the second EF-hand calcium-binding region of SpDPK31 was incorrect, although it was consistent with the characteristic structure of the EF-hand region.

Phylogentic analysis of SpCRKs and SpCDPKs
The phylogenetic relationship about CDPK and CRK family members of Solanum Pennellii (32 SpCDPK and 7 SpCRK), Arabidopsis (34 CPK and 8 CRK) and rice (29 CPK and 5 CRK) were constructed via using the neighbor-joining method of MEGA6.0 ( Figure 3). As Figure 3 showed, CDPK and CRK gene families were divided into four groups (I, II, III, and IV). The size of four groups were similar. The 10 CDPK in Solanum Pennellii, 13 CPK in Arabidopsis and 8 CPK in rice were put into group I. The group II included 13 SpCDPK, 10 AtCPK and 11 OsCPK. The group III comprised 6 Solanum Pennellii CDPK, 8 Arabidopsis CPK, and 8 rice CPK. The 20 CRK proteins from the three species were put into group IV. In addition to CRK, 3 CDPK (SpCDPK14, SpCDPK16 and SpCDPK17) of Solanum Pennellii, 2 CPK (AtCPK16, AtCPK18 and AtCPK28) of Arabidopsis and 3 CPK (OsCPK4, OsCPK5 and OsCPK18) of rice existed in group IV. This phenomenon was not discovered in the other group. The dendrogram showed that the 39 proteins of Solanum Pennellii were generally closer to the proteins of Arabidopsis than rice, which indicated that they were evolutionarily more closely related.

Genetic structure analysis of SpCRK and SpCDPK genes
The genetic structure analysis of SpCRK and SpCDPK were carried out, and the results were shown in Figure 4. There were 4 groups in SpCRK and SpCDPK gene families, which were consistent with the respective corresponding phylogenetic relationships in Figure 3. As Figure 4 showed, genes within the same groups exhibited similar exon-intron organizations. The SpCDPK genes of the group I possessed 8-9 exons and 7-8 introns; group II genes included 7-8 exons and 6-7 introns; group III genes contained 7-9 exons and 6-8 introns. In particular, all SpCRK were part of the group IV. Except for SpCRK5 (involved 4 exons and 3 introns), all SpCRK introns and exons were 10 and 11, respectively. But SpCDPK of group IV contained 11 introns and 12 exons (Figure 4). These results showed that group IV exhibited more introns and exons than the other three groups, indicating that the gene structure of Group IV was more complex.

Conserved motif analysis of SpCRK and SpCDPK proteins
The results of motif analysis showed that the SpCRK and SpCDPK family proteins contained obvious structural characteristics. Motifs 9 and 10 were labeled as protein kinases ATP-binding region, motif 3 as Ser/Thr protein kinases region, and motifs 5, 6, 7 and 8 as EF-hand calciumbinding domain ( Figure 6). As Figure 5 showed, interestingly, through analysis MEME online website, except for SpCRK5, the other 6 SpCRK included motifs 5, 6, 7 and 8; SpCDPK3, SpCDPK5 and SpCDPK27 included motif 5; and SpCDPK31 included motif 6. But these EF-hand calciumbinding region were not retrieved in Prosite (Table 1). It may be that these motifs were found only as recurring sequences, but they could not constitute EF-hand calcium-binding region structure. The SpCDPK of group I and II contained motifs 1-15, except for SpCDPK25 (Group II). The SpCDPK of Group III showed motifs 1-12 and 15, besides SpCDPK31. All proteins of group IV contained motifs 1-4, 10-11 and 15. These results indicated that all the identi ed proteins had typical ATP-binding region, Ser/Thr protein kinases region and EF-hand calcium-binding region, and each subgroup had similar motifs, which further supported the phylogenetic classi cation of SpCRK and SpCDPK families.

Promoter region analysis of SpCDPK and SpCRK genes
The cis elements in the upstream promoter region of SpCRK and SpCDPK genes were analyzed by PlantCARE. The results showed that the promoter region of SpCRK and SpCDPK genes contained 627 cis-acting elements, which can be divided into ve types: hormone-related elements, growth-related elements, stress-related elements, secondary metabolite-related element and plant protein metabolism-related element ( Figure 7). The hormone-related elements included auxin responsive element, gibberellin responsive element, salicylic acid responsive element, abscisic acid responsive element and MeJA responsive element. The growth-related elements included circadian control responsive element, palisade mesophyll cell differentiation responsive element, endosperm expression response element, seed-speci c regulation responsive element, light responsive element and meristem expresses related elements. The stress-related elements included low temperature responsive element, drought responsive element, anoxic induction element, anaerobic induction element. The secondary metabolite-related element and plant protein metabolism-related element was avonoid biosynthetic genes regulatory elements and zein metabolism regulatory elements, respectively. Among all the elements, the number of light responsive elements was the largest, with 356. Obviously, the number and distribution of cis-acting elements of SpCDPK10 and SpCDPK11 genes were basically the same, and SpCDPK16 and SpCDPK17 also showed similar phenomenon. All elements of SpCDPK20 were located between 900bp and 2000bp upstream sequence. The cis-acting elements of SpCDPK27 were mainly distributed at both ends of the promoter region (Figure 7). These results suggest that SpCRK and SpCDPK genes play an important role not only in the growth and development of Solanum Pennellii, but also in the response to biotic and abiotic stresses.

Discussion
In this study, a total of 7 CRK and 32 CDPK genes were retrieved from the whole genomic data of Solanum Pennellii, which was similar to the number of CRK and CDPK genes in Arabidopsis (5 CRK and 29 CDPK) and rice (8 CRK and 34 CDPK) [30][31][32][33] . The Solanum lycopersicum genome, however, contained only the 29 CDPK genes and lacked the CRK genes 34 (Table 1). These results showed that the SpCDPK and SpCRK proteins were longer and had higher molecular weights than those of Solanum lycopersicum. Interestingly, 29 of the 39 genes ( 32 SpCDPK and 7 SpCRK) were acidic, the opposite of what was found in tomatoes (Table 1) 34 . These results suggested that tomatoes evolved to eliminate redundant structures and creating a simpler genetic structure for survival. The myristoylation or palmitylation sites played a role in the binding of proteins to membranes 35 . The SpCDPKs and SpCRKs contained myristoylation or palmitylation sites at the N terminal, indicating that they might be located on organelle membranes ( Table 1).
The myristoylation site caused the irreversible loose structure between protein and membrane, while the palmitylation site caused the reversible stable structure between protein and membrane 36 . The subcellular localization of SpCDPKs and SpCRKs were diverse, including nuclear, endoplasmic reticulum, mitochondria, cytosol, chloroplast and peroxisomes localization, suggesting they own varied function (Table 1) 37,38 . It was also found that the localization of CDPKs and CRKs proteins in cells was also changed after the mutation of the palmitoylation site or the myrimyoylation site 39,40 .
The SpCDPK and SpCRK family members were unevenly distributed on 12 chromosomes of wild tomato, and most of the family members were located at the front or end of chromosomes (Figure 1). This was also found in the studies of the CDPK gene family of Solanum lycopersicum 34 .
By protein sequence alignment, SpCDPK gene existed three domains: ATP-binding region, Ser/Thr protein kinases region and EF-hand calciumbinding domain. The C-terminal EF-hand calcium-binding domain of SpCRK had been degenerated, and other structures were similar to SpCDPK ( Figure 2). This result was also found in conserved motif analysis ( Figure 5 and Figure 6). Due to the degenerated EF-hand structure at the C-  Comparison of conserved domains of SpCRK and SpCDPK gene families. Colour bars were used to mark no less than 70% of conserved sites.

Figure 3
Phylogenetic relationship among CRK and CDPK family members from Solanum Pennellii, rice and Arabidopsis. To identify the plant species origin of each CRK and CDPK, a species acronym was included before the protein name: eg. SpCRK indicated CRK from Solanum Pennellii, AtCRK indicated CRK from Arabidopsis and OsCRK indicated CRK from rice. The red triangle, blue dots and pink border square before the protein names indicated CRK and CDPK from Solanum Pennellii, Arabidopsis and rice, respectively.

Figure 4
Phylogenetic tree and exon-intron structure of SpCRK and SpCDPK genes. The phylogenetic tree was constructed using the full-length protein sequences of 7 SpCRK and 32 SpCDPK. Introns and exons of the SpCRK and SpCDPK family genes were grouped according to the phylogenetic classi cation. Upstream/downstream, exons and introns were represented by blue boxes, yellow boxes, and the black lines respectively. The Exons and introns of SpCRK and SpCDPK genes was analyzed with Online website GSDS.

Figure 5
Phylogenetic tree and conserved motifs of SpCRK and SpCDPK proteins. The phylogenetic tree was constructed using the full-length protein sequences of 7 SpCRK and 32 SpCDPK. The conserved motifs of SpCRK and SpCDPK family proteins were grouped according to the phylogenetic classi cation. All motifs of SpCRK and SpCDPK proteins with complete amino acid sequences were identi ed by MEME database.

Figure 6
Motif LOGO. Motif 3 were annotated as Ser/Thr protein kinase region, and motifs 5, 6, 7 and 8 as EF-hand calcium-binding domain, and motifs 9 and 10 as protein kinases ATP-binding region.

Figure 7
The cis-element in the promoter region of SpCRK and SpCDPK genes. The phylogenetic tree was constructed using the full-length protein sequences of 7 SpCRK and 32 SpCDPK. The cis-element in the promoter region of SpCRK and SpCDPK family genes were grouped according to the phylogenetic classi cation. All cis-elements in the promoter region of the upstream 2000bp sequence of SpCRK and SpCDPK genes were identi ed online by PlantCARE.