Genome-wide identification and Classification of the LRR-RLK genes in Liriodendron chinense.
We searched the annotated genes in the Liriodendron chinense genome resource for putative PKs and identified 1488 typical PKs (Supplementary File 1). After removing redundant, overlapping, and sequences lacking the LRR-RLK conserved domains, a total of 232 LchiLRR-RLK protein sequences remained. The obtained LchiLRR-RLK protein sequences carried an extracellular domain (ECD), a transmembrane domain, and an intracellular kinase domain (KD) (Fig. 1a). In addition, the LchiLRR-RLK ECD was branded by varying numbers of LRR. The obtained PKs were then classified and renamed into families and subfamilies based on the previous classifications in Arabidopsis and rice model plants (Supplementary File 2) [4]. LchiLRR-RLK protein classification showed 13 families, which were named in Roman numerals (I-XV) following previous publications. Interestingly, families XIV, IX, VII and other subfamilies present in A. thaliana were absent in L. chinense. Of the present families VIII and I were the largest subfamilies with 65 and 61 members, respectively. The other families consisted of not more than 25 members, of which Xa had the least members, with only 2 sequences. Furthermore, the protein sequence lengths varied, ranging between 100 and 1454 and the isoelectric point of the obtained LchiLRR-RLK protein sequences ranged from 4.77 to 10.41 (Supplementary File 2). Suggesting that the obtained LchiLRR-RLKs ranged from weakly acid to strong basic. All the proteins showed a cellular localization in the plasma membrane (Supplementary File 2).
Gene chromosomal location, duplications, and collinearity
To obtain further insights on the gene locations of the identified LchiLRR-RLK protein sequences, the TBtools software was used to map each gene location on the chromosome and contig (Fig. 1b; Supplementary File 2). Results showed that the LchiLRR-RLKs were unevenly distributed on 17 chromosomes and 24 contigs, each chromosome carried at least 12 and at most 27 LchiLRR-RLK genes. Additional information on the LchiLRR-RLKs is listed in the Additional file. Additional analysis of the obtained genes' collinearity exhibited a total of 67 paralogous gene pairs, constituting almost half of the total number (Fig. 1b; Supplementary File 3). This finding further suggested gene duplication events and gene family expansion within the LchiLRR-RLK gene family. Therefore, to understand the mode of gene expansion, we compared two main gene duplication events: tandem and segmental duplication events [33, 34] (Fig. 1b; Supplementary File 3). Results showed that of the total 67 duplicated gene pairs, 24 pairs were tandem arrays, contributing 35.82% (48/134) of the duplicated genes. In addition, most of the tandem duplications were obtained in the VIII and XII families contributing 67% (16/24) of the total tandem arrays. On the other hand, 43 gene pairs were by segmental duplications and contributed 64.18% (86/136) to the expansion of the LRR-RLK gene family expansion in L. chinense (Supplementary File 3). Most of the segmental duplications were obtained in the I and XI families contributing 23% (20/86) and 12% (10/86) of the total number, similar results were obtained in Rosaceae plant genomes [21].
The synonymous and nonsynonymous values and their ratios are used to estimate the selection pressure of a given protein or DNA experience. To have an insight into the source of duplicate genes, we calculated the synonymous and nonsynonymous values and their ratios (Supplementary File 3). A total of 138 linked genes were obtained, of which all the genes investigated for substitution mutation had a Ka/Ks ratio less than 1 (ka/ks < 1), signifying a purifying or stabilizing selection of the LchiLRR-RLK genes during the evolutionary process.
Motif, Gene Structure, and Domain Conservation analyses reveal conserved evolution.
To gain insight into the 232 LchiLRR-RLK protein functions, we computed the conserved motif numbers and arrangement (Fig. 2; Supplementary Fig. 1; Supplementary Fig. 2). We observed that the investigated proteins clustered based on their similar motif arrangements and possible phylogenetic relationships (Fig. 2a). In detail, most groups had at most 9 motifs present in each protein except for groups III and XII which had 10 to 15 motifs (Fig. 2b). In addition, we noticed that motifs 9 and 10 among others were to a greater extent present in the groups III, XI, and XII only; while motifs 5 and 2 were abundant in all groups, suggesting their full conservation.
Generally, the basic structure of the LRR-RLK gene comprises a PK domain and an LRR domain. We investigated the conserved domain in identified gene candidates. In this study, we showed that different subfamilies have different compositions of protein domains (Fig. 2c). However, the majority of the subfamilies carried a Pkinase domain and an LRRNT_2 (leucine-rich repeat N-terminal) (Fig. 2b; Supplementary File 1; Supplementary Fig. 1). We also observed that some subfamilies like the XI, II, and XIII had a mixture of both the Pkinase and the Pkinase_Tyr. The subfamily XI had the most conserved domains, carrying at most 6 conserved domains. Gene structure prediction is vital in comprehending the gene evolution of a gene family [26]. In this study, we analyzed the gene structures of 232 LchiLRR-RLK genes (Fig. 2d; Supplementary Fig. 1). Results showed that exon and intron numbers varied with gene sequences. Families Xb-2, Xa-1, VII-2, XII, XI-1, III, IX, and Xb-2 had exon ranges between 1–3, accompanied by 2 or 1 introns flanking the N- and C- terminals. While the rest of the gene families had numbers more than 3. In addition, gene family XIIIb had the greatest number of exons up to 30, without introns. Interestingly, the exon structures differed among the gene families, gene families with fewer exon numbers carried elongated exons, and intron structures were almost of similar sizes among the different gene families, except for the gene family Xb-3 which was flanked by an elongated intron in the C-terminal
Phylogenetic analysis of the LRR-RLK gene family
Systematic classification of a gene family based on the protein phylogeny facilitates the building of functional and genomic studies. In this study, 1032 LRR-RLK full protein sequences from five plant species, L. chinense, A. thaliana, O. sativa, S. moellendorfii, and P. patens, were used to construct a phylogenetic tree using the neighbor-joining tree (NJT) method in MEGA X (Fig. 3a). Results displayed a clustering of protein sequences into various families and subfamilies consistent with previous publications. We observed a total of 19 cluster groups that had diverged from 3 main branches, forming 14 families and 5 subfamilies Nonetheless, all the cluster groups were observed to have diverged from a common ancestral protein. A deeper analysis showed that one of the main branches carried most of the LRR-RLK groups, 11 in total, while the remaining had 1 and 6 groups. This fact suggests that the LRR-RLKs evolved mainly from a single ancestral protein that diversified possibly through speciation adaptation and other evolutionary measures. Comparisons of the protein quantities in various cluster groups showed that groups XIII and VIII had the most protein numbers, 148 and 187 protein sequences, respectively (Fig. 3b). In contrast, groups XV and II had the least number of protein sequences, 9 and 14, respectively. Additionally, some groups lacked representation of other plant species, for instance, the L. chinense LRR-RLKs were absent in groups, XI, XIIb, Xa, VI, and XIV. On the other hand, some groups were fully represented, suggesting similarity in functionality of LRR-RLKs from different plant species. In total, this finding suggests different degrees of gene expansions within LRR-RLK groups and that proteins clustered together may exhibit similar structures and functions.
Plant Synteny
Gene collinearity within different plant species genes may also reflect phylogenetic relations and possibly similar gene functions. In this research, we used the TBtools software to compute LRR-RLK gene collinearity between four plant species: A. thaliana, L. chinense, O. sativa, and P. trichocarpa (Fig. 3c). Results showed a dense linkage of several genes. Specifically, L. chinense had 55, 48, and 89 orthologous gene pairs with A. thaliana, O. sativa, and P. trichocarpa, respectively. Suggesting a closer evolutionary relationship between L. chinense and P. trichocarpa than any other plant, although recent research has suggested that O. sativa is evolutionarily closer to L. chinense. In addition, this result may suggest more similar biological functions between P. trichocarpa and L. chinense.
Cis-regulatory Elements
Evaluation of the cis-regulatory elements present in the promoter region regions is critical in the understanding of transcriptional regulation and gene function [35]. In this study, a 1.5kb region of the identified LchiLRR-RLK genes was considered a potential promoter region. A total of 5056 putative elements were identified and categorized into three response factors, growth and development, plant phytohormone, and biotic and abiotic responses using the Plant Care Online Database (Supplementary Fig. 1, Additional File 4). However, 24 representatives are shown in the manuscript for presentation purposes (Fig. 4). The cis-element abundancies were not consistent in all the 232 LchiLRR-RLKs. Nonetheless, we noted that distributions of the cis-elements followed a somewhat similar pattern within identical families and subfamilies.
Comparisons in the above-mentioned response factors showed an overrepresentation of the cis-elements in the biotic and abiotic responses constituting 55.22% of the total identified cis-elements. Suggesting that the LchiLRR-RLK genes are more invested in biotic and abiotic stress functional roles. Furthermore, the phytohormonal and growth and development responses constituted 16.8% and 28%, respectively. In-depth analysis showed that the LchiLRR-RLK families VIII and I had the most cis-element in all the response factors analyzed probably due to the fact they have many members present as compared to other families. Particularly, this research focused more on the abiotic stresses; therefore our analysis also exhibited several cis-regulatory elements involved in the abiotic stress responses including the DRE-core, LTR, STRE, MYB., etc. Specifically, the MYB cis-elements were present in all the LchiLRR-RLK genes, while the STRE and LTR elements were also present in almost all the LchiLRR-RLKs. Suggesting that LRR-RLKs in L. chinense respond to abiotic stresses including temperature and drought stresses.
Figure 4. Cis-regulatory element analysis. The total numbers of the identified putative cis-elements in the promoter of L. chinense LRR-RLK genes are shown in the boxes. Different colors show the total ranges of cis-elements present.
Protein Interaction and Protein Analyses
Protein-to-protein interaction (PPI) analysis is crucial in elucidating protein function and the impact of protein absence or presence. In this study, we investigated the protein interaction between various LRR-RLK proteins using the Online String database (Fig. 5a). We observed that the protein families were densely interconnected. Individual LRR-RLK gene groups interacted with various families probably for efficient biological functions. Suggesting that the LRR-RLK gene families in L. chinense interact for full protein function. In detail, most proteins were linked with the Lchi_IV-1 of group IV showing a possibility that
Lchi_IV-1 acts as a control hub group mediating several protein functions. Previously, Chen et al. [11] have shown that plants with numerous continuous LRRs and few insertion segments in the ectodomain tend to stack into super helical shapes for sensing various ligands in signal activations [11].
To gain insight into the protein structures of LchiLRR-RLK proteins, we searched for the homology models using the SWISS-model online tool (Fig. 5b). Previous research has established that LRR assembly structures are predictable due to the high conservation of the LRR repeats, with the “LxxLxLxxN” forming the inner side of the superhelix, while the “xLs/tG” form the plant-specific second β-sheet on the lateral side, and the remainder forming the backside [12, 20]. In this study, ten representative LchiLRR-RLK proteins showed different protein structures however, those from groups XI, XII, XIII, and XV portrayed a somewhat similar structure. Generally, the LchiLRR-RLKs had numerous LRRs that formed the superhelices and buried their hydrophobic patches inside (Fig. 5). Additionally, the conserved residues of the LRR backbone were more hydrophobic than the variable residues, nonetheless, the variable residues had lower hydrophilicity than we predicted to aid in the protein proper folding.
qPCR Expression Analysis
To understand the possible responses of the LchiLRR-RLK genes to three abiotic stresses, twenty LchiLRR-RLK genes were selected for qPCR analysis; and their expression patterns in response to cold, heat, and salt stresses were analyzed over three time points, 3hr, 24h, and 3 days; and compared against the control (0h) (Fig. 6). Generally, most of the analyzed LchiLRR-RLK genes showed significant gene expression trends as compared to the control (0h). Particularly, in cold stress (Fig. 6a), LchiLRR-RLKs clustered into five notable expression groups that exhibited a high to low-expression trend; of which most of the genes had a high upregulation. Noteworthy, three genes, Lchi_I-24/Xa-3/II-1 showed the highest upregulation from stress onset to termination as compared to the 0h. Interestingly, Lchi_VIII-61 clustered within the same group although it exhibited an alternating expression trend, which was marked by a downregulation at 3h, upregulation at 24h, and finally a downregulation at 3d. Nonetheless, this expression trend does not have any significant differences when compared to the 0h, signifying that these four genes respond positively to cold stress. The second and third gene cluster groups had seven LchiLRR-RLK genes that had extremely low expression trends marked by downregulations from stress onset to termination, suggesting that these genes respond negatively to cold stress. The remaining groups had a fairly upregulated expression, although some genes also exhibited an alternating expression trend. Concluding that these genes also respond positively to cold stress to a certain extent.
Similarly, in heat stress, the LchiLRR-RLK genes grouped into clusters according to their expression trends (Fig. 6b). Like cold stress, the first group genes including, Lchi_Xa-3, and others were characterized by the highest positive expression patterns. These exhibited upregulations from stress onset to termination at 3d as compared to the 0h. Suggesting that these four genes are highly responsive to cold stress. Likewise, the second and third groups had fairly upregulated expression patterns, also characterized by alternating gene regulation trends that had upregulations at 3h and downregulations at 24h and an upregulation at stress termination. Surprisingly, LchiVIII-61 and Lchi_XI(2)-8 had extremely low expression values at 3d. Lastly, the fourth and fifth groups had extremely low expression patterns from stress onset up to termination, however, Lchi_I-28 in the fifth group was highly expressed at 3d. Probably due to the fact it responds to heat stress after a long time of exposure.
In salt stress, LchiLRR-RLK genes clustered according to their similarities in response to stress (Fig. 6c). Generally, most of the analyzed genes had an extremely down-regulated expression at 3d, with a few in the second cluster group exhibiting an upregulation at 3d. These included, Lch_Xa-3/VIII-63/I-32/I-28, and others. This result can be concluded as thus, most LchiLRR-RLK genes are downregulated during the salt stress, and a few respond positively to the cold stress at long periods of exposure to stress. In summary, the abiotic stress analysis demonstrated that the LchiLRR-RLK genes respond to cold, heat, and salt stresses at varying extents, suggesting that these genes may regulate the abiotic stresses at different time points.