Genome-wide identification and evolutionary analysis of RLKs involved in response to Aluminum stress in peanut


 Background: As an important cash crop, the yield of peanut is influenced by soil acidification and pathogen infection. Receptor-like protein kinase plays important roles in plant growth, development and stress responses. However, little is known about the number, location, structure, molecular phylogenetic, and expression of the RLKs in peanut, and no comprehensive analysis of RLKs in Al stress response in peanut have been reported. Results: A total of 1311 AhRLKs were identified from the peanut genome. The AhLRR-RLKs and AhLec-RLKs were further divided into 24 subfamilies and 35 subfamilies, respectively. The AhRLKs are randomly distributed across all 20 chromosomes in peanut. Among them, 67.8% and 0.6% of the AhRLKs originated from tandem duplications and segmental duplications, respectively. The ka/ks ratios of 94.9% (1290/1360) of AhRLKs were less than 1. Moreover, totally 90 Al-responsive AhRLKs were identified by mining transcriptome data, and they were divided into 7 groups. Most of Al responsive AhRLKs clustered together had similar motifs and evolutionarily conserved structures. The gene expression patterns of these genes in different tissues were further analyzed, and tissue specific expression genes, including 14 root-specific Al responsive AhRLKs were found. Besides, all of the 90 Al responsive AhRLKs which distributed unevenly in the subfamilies of AhRLKs have different expression pattern between two peanut varieties (Al-sensitive and Al-tolerant) under Al stress.Conclusions: In this study, we analyzed the RLK gene family by the peanut genome. Tandem replication events were the main driving force for AhRLKs evolution, and most AhRLKs were selected for purification. A total of 90 genes were identified as Al responsive AhRLKs, and the classification, conservative motif, structure, tissue expression pattern and predicted function of Al responsive AhRLKs were further analyzed and discussed, revealing their putative roles. This study provides a better understanding of the structures and functions of AhRLKs as well as Al responsive AhRLKs.


Background
Al is one of the most harmful factors for plant growth in acidic soils, and it can cause a 25-80% yield loss depending on the crops [1,2]. Al signal induces a series of physiological events in plant cells. The most obvious phenomena of Al toxicity are inhibition of cell elongation in the apical elongation region and induction of PCD [3][4][5]. PCD is an active, orderly, and genetically controlled form of cell death and it occurs in plant throughout development and in response to environmental stresses [6]. Early studies had found that Al-treatment can enhance Fe 2+ -induced lipid peroxidation and PCD in tobacco cells [7]. For decades, Al-induced PCD had been proved in many plant species including: soybean (Glycine max) [8], maize (Zea mays) [9], barley (Hordeum vulgare) [10], tomato (Lycopersicon esculentum) [11]and peanut (Arachis hypogaea) [12]. It is known that Al-induced PCD is mediated through two cell signal transduction pathways: a mitochondrial-dependent pathway and a nuclear-dominated mitochondrial-independent pathway [5]. However, Al signal information and its transmembrane transduction are unknown. Both plants use plasma membrane and/or cell wall localized receptors to sense environment stimulus and e ciently transduce signal between cells, which perceive and transduce signals to modulate gene expression and/or enzyme activity as well as motility [13]. RLKs play important roles in the process of cell signal transduction, involving in a variety of plant physiological processes including: self-incompatibility [14], environmental signal processing [15], organ shape and meristem activity [16], hormone signal transduction [17], Programed cell death (PCD) [18], and tolerance to oxidative stress [19]. RLKs sense and transduce signals through protein interaction and phosphorylation [20]. Based on the structure of extracellular domain, RLKs have been classi ed into several families such as S-RLK, LRR-RLK, EGF-RLK, LecRLK, TNFR-RLK and PR5K-RLK and so on [21].While many RLKs involved in environmental stress response have been found, few RLKs have been reported to be involved in Al stress response. WAK1, which mediates the interaction between cell wall and cytoplasm, and may participate in cell elongation and morphogenesis [22], was the rst RLK that was found to involve in Al stress response. It was reported that overexpression of WAK1 enhanced Al tolerance in Arabidopsis [23]. The results showed that RLKs play important role in Al-induced PCD, but the mechanism of RLKs in the regulation of Al-induced PCD is unknown.
Peanut is an important oil crop over the world. Al-dependent inhibition of growth causes reduction in peanut yield in acid soil. There is no comprehensive analysis of the RLK gene family in peanut. In the present study, the recently released peanut whole genome sequence data (http://peanutgr.fafu.edu.cn/index.php) was utilized to analyze the RLK gene family in peanut. A total of 1311 AhRLKs had been identi ed. The LRR-RLKs and LecRLKs were further divided into 24 subfamilies and 35 subfamilies based on a phylogenetic analysis, respectively. The evolution and collinearity of AhRLKs were investigated. The evolutionary patterns of the RLK gene family was tested by investigating gene duplication events in peanut. In addition, 90 AhRLKs in response to Al stress were identi ed by transcriptomic analysis and comprehensively determined the expression pro les of AhRLKs at different Al treatment time-point. These results will provide a basis for further research on the evolution and physiological functions of AhRLKs in response to Al stress in peanut.

Identi cation of RLKs in peanut
In order to identify the members of RLKs in peanut, we downloaded the publicly available peanut genome sequence data and used the Arabidopsis RLK sequence as a query to perform a genome-wide similarity search. After ltration of the sequence, a total of 1311 RLKs that contain at least one kinase domain were

Chromosomal location and gene duplication of AhRLKs
Physical positions of AhRLKs obtained from the "Peanut Genome resource" (http://peanutgr.fafu.edu.cn/) [26] were used to map them onto peanut chromosomes. Chromosome location information demonstrated that all the AhRLKs were unevenly distributed among the 20 chromosomes of peanut ( Gene replication events play an important role in the evolution of new functions of proteins and the expansion of genomes. It is known that segmental duplication and tandem duplication are the main causes for the expansion of gene family in plant [27]. The position of two or more AhRLKs on the chromosome within 100 kb was considered as a tandem duplication cluster. The results showed that about 67.8% (889 out of 1311) genes were located in tandem duplications regions and constituted 397 clusters (Additional le 5). The largest tandem duplication cluster contained ten genes, while the smallest one contained only two. Up to 67.2% (368/548) AhLRR-RLKs and 70.1% (192/274) AhLecRLKs were located in regions with tandem duplications. Segmental duplications produced a total of 4 putatively related gene pairs (0.6% of the total genes) (Fig. 4). To investigate the selection forces acting upon individual AhRLKs, the ratio of the non-synonymous substitution rate to the synonymous substitution rate (Ka/Ks) was calculated. The AhRLKs in tandem duplication regions showed variable Ks ranging from 0 to 1 and most of them were between 0-0.06. The ka/ks ratios of 94.9% (1290/1360) of AhRLKs were less than 1, 5% (68/1360) of AhRLKs were more than 1, 6 pairs genes whose Ka/Ks ratios were greater than 1 (ka/ks > > 1) and two pairs of genes that cannot calculated Ka/Ks values (Fig. 5). In addition, we calculated the divergence time with the formula T = Ks/2r, in which r is the rate of divergence for nuclear genes from plants. The r of dicotyledonous plants was taken to be 1.5*10^8 synonymous substitutions per site per year according to the methods of Koch [28], the results show that, the tandem duplication events appeared to have occurred during relatively recent key periods, 0-2 Mya (Additional le 6), illustrating that these AhRLKs were generated by recent gene duplication events in Arachis hypogaea L.

Phylogenetic analysis of Al-responsive AhRLKs
In previous study, we have performed a transcriptome analysis to identify differentially expression genes (DEGs) and pathways between two peanut cultivars under Al Stress (data unpublished). In this study, we scrutinized the transcriptome data to detect the AhRLKs involved in Al response. A total of 90 AhRLKs were found as Al-responsive genes, including 44 LRR-RLKs, 19 LecRLKs, 8 Cysteine-rich, 1 EGF, 2 Prolinerich ,4 s-domain, 1 TMK, 1 RLCK, 1 LysM domain, and 9 no obvious domain (Additional le 2). To reveal the evolutionary relationships of these proteins, a phylogenetic tree was constructed using the ML method ( Fig. 6). Phylogenetic analysis of all the 90 AhRLKs revealed that the Al-responsive AhRLKs were further classi ed into 7 groups,including 48.9% LRR-RLKs, 21.1% LecRLKs and 8.9% CRKs, and so on.
The phylogenetic tree showed that most of these genes belonged to LRR-RLKs and LecRLKs, covering the main subfamilies of LRR-RLKs and LecRLKs. Interestingly, these Al-responsive AhRLKs were evenly distributed across the LecRLKs family, but unevenly distributed across the LRR-RLKs families, focused on Characterization of the amino acid sequences and gene structure of Al stress related AhRLKs As shown in Fig. 7, 90 Al stress related AhRLKs were divided into 7 groups. It was reported that the diversi cation of exons/introns was an important reason for the evolution of certain gene family [29]. The distribution of exon/introns of AhRLKs was further analyzed. The results showed that 7.8% of Al stress related AhRLKs (7/90) had no intron. One, two and three introns were found in 30% (27/90), 15.6% (14/90) and 1.1% (1/90) Al stress related AhRLKs, respectively. Meanwhile, 45.6% (41/90) genes had more than three introns. All genes in subgroup I, II and VII contain more than there introns. The majority of genes in subgroup III, IV and VI, contain one or two introns respectively. Moreover, in order to analyze the diversity of the Al stress related AhRLKs, the MEME tool was used to predict putative motifs of these proteins. A total of 5 different motifs were detected in Al stress related AhRLKs and named as motif 1 to motif 5(Additional le 7). It was shown that 82.4% (14/17) of genes in subgroup I, 70% (7/10) of genes in subgroup II,50% of genes in subgroup III, 42.9% (6/14) of genes in subgroup IV, 88.9% (8/9) of genes in subgroup V, 75.8% (25/33) of genes in subgroups VI, and 33.3% (1/3) of genes in subgroup VII contain the same motif composition as motif 3-motif 4-motif 1-motif 2-motif 5.

Expression Pro les of Al-responsive AhRLKs in Different Tissues
To further understand the role of Al-responsive AhRLKs in peanut growth and development, the expression pro les of Al-responsive AhRLKs from different organs, including leaves, stems, orescence, roots and root tips, were tested in a cultivated variety (A. hypogaea L.) using transcriptomic data (Fig. 8).

Expression patterns of Al stress related RLKs under Al stress
To further investigate the putative functions of Al stress related RLKs, the RNA-Seq dataset that were generated from different Al treatment time-point were utilized to reveal the expression pro les of these genes under Al stress. The expression pro les of Al stress related RLKs were shown with histograms ( Fig. 9). As shown in Fig. 9 It was shown that 548 LRR-RLKs were classi ed into 24 subfamilies (I to XXIV) based on the phylogenetic relationship with Arabidopsis, 2 times the number of Arabidopsis LRR-RLK genes (Fig. 1). In general, the number of LRR-RLK receptors for most of the subfamilis among the peanut was two times of Arabidopsis, except LRR-XII, LRR-XIV, LRR-XV and LRR-XVI, which had more than three times the members of Arabidopsis. Only one subfamily, LRR-V, had fewer members than Arabidopsis. The number of LecRLKs was over 3 times the number of AtLecRLKs (Fig. 2). The subfamilies in peanut like L-LecRK-VII, L-LecRKs-IX and G-LecRKs-VIa were much larger than those of Arabidopsis, while some subfamilies, including G-LecRKs-VIb, G-LecRKs-VIII, G-LecRKs-VII, G-LecRKs-X, G-LecRKs-III, L-LecRKs-VI, L-LecRKs-I, L-LecRKs-II, L-LecRKs-III and L-LecRKs-V, were not found in peanut (Table 1&Table 2).   As gene duplication was the main mechanism for evolutionary events [32]. About 67.8% AhRLKs were located in regions with tandem duplications, revealing the presence of high tandem and low segmental duplications in AhRLKs (Additional le 5). Study in LRR-RLKs had shown that tandem replication has a greater contribution to the birth of new genes [33], which suggested that the expansion of the LRR subfamilies may be caused by tandem duplication. It was found that about 67.2% (368/548) LRR-RLKs and 70.1% (192/274) LecRLKs were located on the regions with tandem duplications. Segmental replication was also an important driven force for the ampli cation of gene family. However, our results revealed that only 0.6% (8 genes) of the AhRLKs originated from segmental duplication, which suggested that tandem replication events were the main driving force for AhRLKs evolution. Besides, the ka/ks ratios of 94.9% (1290/1360) of AhRLKs were less than 1, which suggested that most AhRLKs were selected for puri cation (Fig. 5). There were 6 pairs genes whose Ka/Ks ratios were greater than 1 (ka/ks > > 1), which indicated that these genes were in a state of positive selection in peanuts, evolving rapidly, and might be very important for the evolution of peanut.

Conservation of the AhRLKs in response to Al stress
In this study, a total of 90 AhRLKs were identi ed as Al responsive genes, which were divided into 7 groups (Fig. 7) [34,35]. Most of the subgroup shows certain regularity of exon-intron structure. For instance, all genes in subgroup I, II and VII contain more than there introns. Members belonging to the same subgroup had similar exon/intron organizations. Furthermore, 5 conserved motifs were identi ed in these AhRLKs and the motif compositions among subgroups were consistent with the phylogenetic classi cation. These results indicated that the members in the subgroups were more conservative in the evolution.

Diversity roles of Al-responsive AhRLKs in different subgroups
To further understand the Al-responsive RLKs in peanut, we investigated the potential functions of each subgroup (Table 3). In subgroup I, PERK1 has been reported to regulate ABA signaling pathways and modulate the expression of genes related to cell elongation and ABA signaling during root growth [36], implying that the genes in Subgroup I was essential to plant signaling and growth. It is known that the inhibition of Al on root elongation is the primary symptom of Al toxicity, and the members of subgroup I maybe take part in Al response by in uencing cell elongation. The function-known genes in subgroup II were reported to play a role in plant signaling transduction, plants growth and biotic stress response, for instance, PXC1, CRCK1 played a role in signal transduction [37,38], PRK1 was essential for post meiotic development of pollen [39], FLS2 involved in preinvasive immunity against bacterial infection [40], RCH1 was critical to the resistance of hemibiotrophic fungal pathogen Colletotrichum higginsinaum [41]. In Subgroup III, ANXUR1/ANXUR2 were involved in controlling pollen tube rupture during the fertilization process and regulating signal transduction [42], FERONIA was required for cell elongation during vegetative growth [43], suggested the genes in subgroup III might play an important role in plant morphology. In subgroup IV, TMK1 was an essential enzyme for DNA synthesis in bacteria [44], it indicated that the genes of subgroup IV might play a critical role in cell expansion and proliferation regulation. Subgroup V gene RLK1 were reported to increase the tolerance to salinity, heavy-metal stresses, and Botrytis cinerea infection [45], it is suggested the genes of subgroup V are implicated with biotic and abiotic stress response. In subgroup VI, CRK5 were reported to response to drought and salt stresses [46], and CRK45 was a potentially positive regulator of ABA signaling in early seedling growth [47], stomatal movement [48], it is indicated that the genes of subgroup VI are critical to abiotic stress response and related to plant morphology. The reported genes in subgroup VII, such as GsSRK was an positive regulator of plant tolerance to salt stress [49], SD1-29 improved plant resistance to bacteria [50], it shown that the genes of subgroup VII have critical role in response to biotic and abiotic responses. In general, Al-responsive AhRLKs in different subgroups take part in Al response by different pathways. Subgroup I and II are related to signal transduction, subgroup II is implicated with biotic stress response, subgroup III and VI play an essential role in plant morphology, subgroup IV play an critical role in cell expansion and proliferation regulation, subgroup V and VII are critical to biotic stress and abiotic stress response (Table 3). GsSRK a positive regulator of plant tolerance to salt stress [49] Note: only the Al responsive AhRLKs with characterized homologs were listed in the table.
The AtRLK gene family plays a role in plant growth and development processes [58]. As shown in the histograms in Fig. 8, the expression pattern of the Al-responsive AhRLKs exhibited tissue speci city, about 2.2% (2/90, AH07G04000.1 and AH16G09430.1) of Al-responsive AhRLKs were expressed in all four tested organs with high expression levels (value > 5) in peanut, implying that these genes might play essential roles in plant growth and development. About 2.2% (2/90, AH16G41130.1 and AH07G24540.1) of Al-responsive AhRLKs were expres sed speci cally and at a high level in aerial organs. About 8.8% (8/90, AH14G07810.1, AH03G21680.1 AH19G41030.1 AH13G57290.1, AH10G29990.1, AH08G20520.1, AH08G06390.1, and AH01G04120.1) of Al-responsive AhRLKs were expressed speci cally and at a high level in root or root tips. The tissue speci city of these Al-responsive AhRLKs indicates their key roles in tissue development or tissue functions. Additionally, 6 tissue non-speci c genes (AH07G04000.1, AH03G13700.1, AH10G03910.1, AH08G04680.1, AH08G04640.1, and AH16G09430.1) that expressed at a high level speci cally in root were also worth concern. As shown in the histograms in Fig

Conclusions
The soil affected by Al is widely distributed throughout the world, which poses a great threat to agricultural production, meanwhile there are few studies on RLK under Al stress, and therefore, the research on peanut RLK is of great signi cance. In this study, a total of 1311 RLKs were identi ed in the peanut genome, 2 times the number of Arabidopsis RLKs, including 548 LRR-RLKs and 274 LecRLKs. LRR-RLK represented the largest RLK gene family identi ed in plant. These AhRLKs were unevenly distributed among 20 chromosomes, Chloroplasts and mitochondria of peanut. Compared with segmental duplication, tandem duplication might play a more critical role in some AhRLKs. The tandem duplication events appeared to have occurred during relatively recent key periods, 0-2 Mya, illustrating that these AhRLKs were generated by recent gene duplication events in Arachis hypogaea L. Besides, Estimation of Ka/Ks ratios for 1360 AhRLKs revealed that most AhRLKs were selected for puri cation. Furthermore, we identi ed a total of 90 Al responsive AhRLKs by mining transcriptome database. These genes were divided into 7 groups. The exon/intron compositions and motif arrangements were considerably conserved among members in the same groups or subgroups. Analysis of transcriptome data revealed tissue expression patterns of the 90 Al responsive AhRLKs, and tissue speci c expression genes were found. Among them, genes that were identi ed as root-speci c genes might play a key role in Al sensing and response in peanut. The close phylogenetic relationship of Al responsive AhRLKs and characterized RLKs in the same subgroup provided insight into their putative functions. Overall, this systematic analysis provided valuable information to understanding the biological functions of the AhRLKs genes under Al stress in peanut.

The resources of peanut AhRLKs
The genome sequence, protein sequences and genome annotation of peanut were according to PEANUT

Multiple sequence alignments and phylogenetic tree construction of AhRLKs
The sequences of LRR-AhRLKs and 90 AhRLKs response to Al stress were aligned using ClustalX in MEGA 7 with default parameters [59]. The phylogenenic tree based on the multiple sequence alignments of peanut LRR-RLKs (Fig. 1), LecRLKs (Fig. 2) and 90 AhRLKs response to Al stress (Fig. 6) were performed by MEGA 7 using the ML method with bootstrap test replicated 1000 times, the Poisson model, uniform rates and Partial deletion. Based on the multiple sequence alignment and the previously reported classi cation of Arabidopsis thaliana, the peanut RLKs was assigned to different subfamilies and subgroups [60][61][62].

Chromosomal locations and duplication analysis for peanut RLKs
The physical location of AhRLKs on the chromosomes was obtained from the database of PEANUT GENOME RESOURSE (http://peanutgr.fafu.edu.cn/). All members of AhRLKs were mapped onto peanut chromosomes based on the physical positions of them, and the image of chromosomal location was produced with the online software Map Gene 2 Chromosome v2 (MG2C:http://mg2c.iask.in/mg2c_v2.0/). RLKS clustered together within 100 kb were regarded as tandem duplicated genes based on the criteria of other plants in previous reports. The duplication events and syntenic analysis of AhRLKs were determined using Circos software [63]. The non-synonymous (Ka) and synonymous (Ks) substitution ratios were calculated by KaKs_Calculator 2.0 software [64]. The divergence time were calculated with formula T = Ks/2r, the r of dicotyledonous plants was 1.5*10^8 synonymous substitutions per site per year [65]. 4. The gene structure, motif analysis and heatmap of AhRLKs response to Al stress The exon-intron structures of 90 AhRLKs response to Al stress were determined based on their coding sequence alignments and their respective genomics sequences, while diagrams were obtained from the online program Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/) [66]. MEME (Multiple Em for Motif Elicitation) tool was used to predict putative motifs of these proteins (http://meme-suite.org/). The Heatmap generation and the combination of phylogenetic tree, gene and protein structures was generated using TBtools, The presence of signal peptides, kinase domains and transmembrane domains were predicted with SMART (http://smart.embl-heidelberg.de) [67]. The Amino acid residue base,