Genome-wide identification of the LecRLKs in cucumber
We identified a total of 46 LecRLK genes which named CsLecRLKs (Table1) in the cucumber genome by the Pfam and SMART search. The total number of LecRLKs in cucumber is less than that in Arabidopsis (75 LecRLK genes) or rice (173 LecRLK genes) (Vaid et al., 2012). The 46 CsLecRLKs were classified into 23 G-type, 22 L-type, and one C-type based on their extracellular lectin domain. The molecular weight (MW) of the proteins ranged from 62.5 kDa (Csa1G056960) to 94.5 kDa (Csa3G733860), and the isoelectric point (Ip) ranged from 4.98 (Csa4G289630) to 9.51 (Csa1G056960), the range of CDS length was 1,803–2,502bp. With the predicted protein structures, it could be considered that most of the CsLecRLKs were localized on the plasma membrane, only Csa7G045520 was located on the extracellular. More information of CsLecRLKs, including the length of the gene, the length of CDS, the length of the protein sequence, the protein MW and pI were listed in Table1.
By analyzing the molecular weight of all 46 CsLecRLKs, we found that the weight of G-type CsLecRLKs (83.2 kDa) are generally larger than L-type (62.5 kDa) and C-type (74.6 kDa). This may be mainly due to the fact that in addition to the lectin domain, G-type CsLecRLKs often contain the EGF and PAN domains (Fig. 1). Signal peptides and transmembrane domain (TM) are critical for protein localization. The software prediction indicated that not each CsLecRLK had signal peptide and unique TM domain. The loss of signal peptide or TM domain would directly affect the localization of proteins in cells (Table 1). The plasma membrane localization of most of the CsLecRLKs indicated that they are signal receptors which can sense extracellular signals and then transmit the signals to the interior of the cells.
Phylogenetic analysis of the CsLecRLKs
We constructed an unrooted phylogenetic tree by the MEGA 7.0.21 (Fig. 2). As expected, the phylogenetic tree showed that the CsLecRLK family could be classified into three subgroups of L-type, G-type, and C-type. This result is consistent with the domain-based classification of CsLecRLK family. The phylogenetic tree indicated that the L-type and C-type had a closer relationship. This result was different from previous reports in Arabidopsis and rice, which revealed a closer genetic relationship between G-type and L-type (Vail et al., 2012). As shown in Fig. 3, the phylogram of G-type and L-type CsLecRLKs could be divided into four and three sub-groups respectively. The division of individual clades was supported by high bootstrap values.
Exon–Intron Structural Analysis of CsLecRLKs
The genomic sequence and corresponding cDNA sequence of the CsLecRLKs were submitted to GSDS (Gene Structure Display Server) together for analyzing their gene structure (Fig. 3). The genome sequence lengths of CsLecRLKs ranged from 1803bp to 6481bp, the lengths of CDS ranged from 1674 bp to 2502 bp. The number of exon of these genes varied from one to nine, 80% CsLecRLKs had less than three exons, excepted that Cs4G296230 contains three exons. All L-type CsLecRLKs contained only one or two exons, and the C-type CsLecRLK (Csa1G056960) contains four exons. The G-type CsLecRLKs contain one to nine exons. Among them, the Csa7G446780 contains nine exons, which has the most exons.
Protein domain and Motif analysis of CsLecRLKs
Through the SMART program prediction, we investigated conserved domains that present in CsLecRLKs. C-type and L-type CsLecRLKs were both only contain three based categories domain, Lectin domain, Transmembrane domain and Kinase domain. But some G-type CsLecRLKs also contained other two categories domains, PAN domain and EGF domain. Among G-type CsLecRLKs, ten proteins contain PAN and EGF domains at the same time, five proteins only contain PAN domain, eight proteins only contain EGF domain, and only one contains neither PAN domain nor EGF domain. Our result indicated that signal peptide would be not necessary to CsLecRLKs. There are 25 CsLecRLKs without signal peptide and 8 CsLecRLKs with more than two transmembrane domains.
Ten conserved motifs were identified in CsLecRLKs using the MEME program. These motifs were labelled Motif 1 to Motif 10 from the N- to the C-terminus. The details of the conserved motifs were shown in Figure 3. The lengths of these motifs ranged from 15 to 60 residues. Generally, the CsLecRLKs contains 4 to 10 motifs. None of the motifs appeared in all gene family members. Excepted that Motif 8 and Motif 9 were only present in the G-type CsLecRLKs, other motifs were present in three type CsLecRLKs. With the CDD program (https://www.ncbi.nlm.nih.gov/cdd/), we found that the six of these motifs represent different kinase domains (Supplementary Fig. 1), indicating that there may be multiple phosphorylation catalytic sites in each of CsLecRLKs.
Chromosomal Location and Gene Duplication of CsLecRLKs
We extracted the location data of CsLecRLKs and the length data of each chromosome from the cucumber genome annotation files by a series of Perl scripts, and constructed gene location map using MapChart software. As shown in Figure 4, all CsLecRLKs were unevenly distributed across 7 cucumber chromosomes, and genes from the same subfamily on the same chromosome had a tendency to cluster. The number of CsLecRLKs on each chromosome varied from 1 to 7, chromosome 3 contains the largest number of 12 CsLecRLKs and chromosome 2 had only two CsLecRLKs.
During the biological evolution, the generation of gene family could be caused by tandem duplication and segmental duplication (Kent et al., 2003; Mehan et al., 2004). In order to explore whether CsLecRLK gene family also have an expansion caused by the two kinds of duplication, we analyzed the duplication events of CsLecRLK genes. The result indicated that although many genes were clustered on the chromosomes, only Csa1G071170 and Csa1G071160 were a pair of tandem duplicated genes, their divergence time was about 38.606 million of years ago (MYA). The other two pairs of duplicated events, Csa1G073890 and Csa7G048050, and Csa3G734030 and Csa4G296230, may be caused by duplication or ectopia of chromosome fragments during the evolution. These duplicated genes are not in the same chromosome. Their divergence times were 30.96 and 32.35 MYA, respectively. Based on the above results, it could be inferred that tandem duplication contributed to the expansion of CsLecRLK gene family.
Cis-acting Elements Analysis on CsLecRLKs promoter
Different genes have their own specific or consensus cis-acting elements on their promoters. Trans-acting factors bind to the cis-acting elements to regulate the gene expression. Different cis-acting elements may correspond to different biotic or abiotic stress signals which could induce or inhibit the genes expression. Therefore, the cis-acting elements analysis on CsLecRLKs promoter will help us to further understand these genes’ function. We used Plantcare website to analyze the promoters of 1500bp upstream sequence from translation initiation site of CsLecRLKs, and found that there were 54 typical and functional cis-acting elements (supplementary Fig. 2), which could be divided into four types: light response, stress resistance, plant hormone and others. Among them, 24 cis-acting elements were related to light response, 11 were related to hormone included salicylic acid (SA), jasmonic acid (JA), ethylene, gibberellin and auxin, and 9 were abiotic stress elements. These results suggested that the CsLecRLK gene family may be mainly involved in the biological pathway of stress resistance in cucumber. There were six developmentally related cis-acting elements, five of which were related to seed development, suggesting that this gene family may play a role in seed development. More details were shown in the supplementary table 1.
Expression Pattern Analysis of CsLecRLK genes
Little is known about the functions of LecRLKs in cucumber. As a first attempt to provide insights into their potential functions, we used RNA-seq data from 10 tissues of cucumber to investigate the expression of each CsLecRLK gene. Most of CsLecRLKs were expressed at a low level, some (Csa6G338050, Csa1G071160, and Csa3G115090) were barely expressed in any tissue.
Most of CsLecRLKs were barely expressed in male flower and unfertilized Ovary (Fig. 5A), just 21 genes had a constitutive expression pattern (FPKM >= 1 in all tissues, Tao et al., 2018) in each of them. But in other tissues, there were at least 25 genes had a constitutive expression pattern in each tissue.
The expression pattern of all CsLecRLKs could be divided into 3 groups based on their expression level in each tissue (Fig. 5A). From group 1 to 3, the range and level of gene expression decreased successively. Group 1 contained 7 genes, which had a high expression level in each tissue with average FPKM of 18.01. There were 12 genes belonged to Group 2, they had an intermediate expression level in each tissue with average FPKM of 6.48. Group 3 included 27 genes expressing at low level in each tissue with average FPKM of 1.29. Excepted that C-type CsLecRLK (Csa1G056960) belonged to group 1, and that L-type CsLecRLKs had higher expression level than G-type.
Thirty-six CsLecRLKs were expressed in all tissues (FPKM > 0 in all tissues) (Tao et al., 2018) and 17 genes were constitutively expressed (FPKM >= 1 in all tissues). Then we focused on those genes with relatively high expression (FPKM >=2 in all tissues) (Tao et al., 2018) and selected 5 tissues of cucumbers for cluster analysis (Fig. 5B), including root, hypocotyl, cotyledon, true leaf and tendril. We found a total of 16 genes were expressed in all these tissues. Specially, two genes were expressed only in the roots (Csa2G439210 and Csa3G115060), one gene (Csa3G099580) was expressed in tendril, and two genes just expressed in cotyledon (Csa7G446780 and Csa7G067410).
Expression analysis of CsLecRLK genes in response to different treatments
Gene expression is not only spatiotemporal specific but also can be induced or repressed by hormones and stress. Because most of LecRLKs are receptor proteins on the membrane, they usually can sense those stimuli at the first time and send signals to intracellular receptors. To uncover all the divergence information of CsLecRLKs under different environment for a short time, the expression patterns under different hormone treatments, including IAA, GA, ABA, and NAA, and cold stress treatments, were analyzed by qRT-PCR. The result showed that most of CsLecRLKs (31/46) responded to at least one treatment (Fold change > 1 than the control group Significance p=0.05). Overall, there were 20 upregulated events and 38 downregulated events totally (Significance p=0.05). In order to show the experimental results more conveniently and intuitively, the change fold under different treatment was displayed in heatmap (Fig. 6) based on the data of qRT-PCR. Firstly, some CsLecRLKs (7/46) could be induced or repressed by multiple treatments (treatment number > 3), for instance, Csa1G071170 could be induced by GA, IAA, NAA and ABA treatments, Csa4G005510 was repressed by all treatments except ABA. Secondly, the expression of different CsLecRLKs could be induced by different treatments. The cold stress induced or repressed the minimal CsLecRLKs gene expression, there were 4 genes expression had changed, they were downregulated. On the contrary, NAA induced or repressed the most CsLecRLKs gene expression, there were 20 genes that responded to NAA treatment, 6 genes were upregulated and 14 genes were downregulated. The 16 CsLecRLKs changed their expression level under ABA treatment, there were 8 genes upregulated and 8 genes downregulated. IAA and GA caused expression level change in 9 and 8 genes respectively. The IAA treatment caused 1 genes expression upregulated, and 8 genes downregulated. The GA treatment caused 5 gene expression upregulated and 3 downregulated. Thirdly, 14 CsLecRLKs had different expression pattern under various treatments, for example, Csa3G734030 could be induced by NAA, and repressed by ABA, Csa1G071150 was upregulated under ABA treatment, and downregulated under NAA and IAA treatments. The results indicated that the members of CsLecRLKs had their own response characteristics to hormones and stresses and may play an important role in sensing external stimulus signals. For example, although Csa1G071160 and Csa3G115090 were not expressed in the root, our experiment showed that they can be induced by NAA and ABA, respectively. There were 15 genes did not have significance expression change under different treatment, they were Csa1G056960, Csa7G029930, Csa5G550210, Csa4G296250, Csa7G048050, Csa1G073890, Csa4G289620, Csa3G115060, Csa3G099580, Csa1G071270, Csa6G516770, Csa2G439150, Csa1G605730, Csa6G338050 and Csa5G648630.