In-Silico Identication, Expressional Prole and Regulatory Network Analysis of Mitogen Activated Protein Kinase Kinase Kinase (MAPKKK) Gene Family in Tea (C. Sinensis)

Background: Mitogen activated protein kinase kinase kinase (MAPKKK) form the upstream component of MAPK cascade. It is well characterized in several plants such as Arabidopsis and rice however the knowledge about MAPKKKs in tea plant is largely unknown. Result: In the present study, MAPKKK genes of tea were obtained through a genome wide search using Arabidopsis thaliana as the reference genome. Among 59 candidate MAPKKK genes in tea, 17 genes were MEKK-like, 31 genes were Raf-like and 11 genes were ZIK- like. Additionally, phylogenetic relationships were established along with structural analysis which includes gene structure, its location as well as conserved motifs and functional domain signatures that were systematically examined and further, predictions were validated by the results. Also, on the basis of orthologous genes in Arabidopsis, functional interaction was carried out in C. sinensis. The expressional proles indicated major involvement of MAPKKKK genes from tea in response to various abiotic stress factors. Conclusion: The present study provides the targets for additional inclusive identication, functional study, and also might provide comprehensive knowledge for a better understanding of the MAPKKK cascade regulatory network in C. sinensis. classication of the identied MAPKKK genes into their subfamilies was backed by the results obtained from the above analyses made. The 59 genes were mapped onto their respective genomic scaffolds and a network of functionally interacting genes was constructed. Further, expression prole analyses were conducted to reveal the involvement of the tea MAPKKK genes in various tissues during development and also check the expression of these genes under various abiotic stress stimuli and plant hormonal treatment. These data will provide detailed information about the tea MAPKKK genes for further characterization of the MAPK signalling cascade and lay a concrete foothold for further exploration and research on C. sinensis.


Results
Identi cation of MAPKKK gene family in C. sinensis In order to identify the MAPKKK gene family in tea (C. sinensis), 415 known MAPKKK peptide sequences from Arabidopsis thaliana (80), Oryza sativa (75), Solanum lycopersicum (71), Solanum tuberosum (81), Capsicum annum (60) and Coffee canephora (48) were retrieved from their respective databases. MAPKKK gene family is divided into three other sub-families, which include MEKK-like, Raf-like and ZIK-like genes. To classify and categorize the MAPKKK genes in tea, BLASTp searches were conducted against the tea protein database, using the retrieved peptide sequences from Arabidopsis and rice as query sequences. For all BLASTp searches, e value and identity percentage were set to 1e-5 and 50% as threshold, respectively (Additional File 1: Supplementary  Table S1, Supplementary Table S2 and Supplementary Table S3). The identi ed tea peptides were again screened with a Hidden Markov Model (HMM) search to con rm the presence of serine/threonine-protein kinase-like domain (PF00069). The screened peptides were again subjected to self-BLAST to remove any chances of redundant data. The nal screened tea MAPKKK genes yielded 59 total potential genes, which included 17 MEKK-like, 31 Raf-like and 11 ZIK-like genes and were incorporated into the nal dataset.
The physicochemical properties of the identi ed tea MAPKKK protein sequences were evaluated using ExPASy ProtParam tool ( Table 1, Table 2 and Table 3).
The length and molecular weight of the 17 MEKK proteins ranged from 311 to 1191 amino acid residues and 34828.88 to 130956.46 kDa respectively (Table  1). For the Raf proteins, it ranged from 305 to 1436 amino acid residues and 35012.57 to 159263.21 kDa (Table 2), and for the ZIK proteins, it ranged from 300 to 831 amino acid residues and 34181.96 to 94422.51 kDa ( Table 3). The theoretical pI values ranged from 4.58 to 9.50 for MEKK, 4.88 to 9.61 for Raf and 5.14 to 6.33 for ZIK proteins, indicating that most of the MEKK and Raf proteins have a basic nature while the ZIK proteins being acidic. The grand average of hydropathy (GRAVY index) in all the extracted MEKK, Raf and ZIK were negative values, ranging from -0.605 to -0.060, -0.661 to -0. 182 and -0.582 to -0.350 respectively. This indicates that all the identi ed 59 tea MAPKKKs are hydrophilic in nature. 52 of the 59 putative tea MAPKKKs had instability index values above 40, while 6 Raf genes (TEA000933.1, TEA022171.1, TEA011280.1, TEA031223.1, TEA007232. 1 and TEA013875.1) and 1 ZIK gene (TEA020112.1) had instability index values less than 40 (Table 1, Table 2 and Table 3). This signi es the unstable nature of most of the identi ed tea MAPKKKs [14]. Subcellular localization of the peptides was predicted using the BaCelLo online server with 49 genes being localized in the nucleus, 9 genes in chloroplast and 2 genes in cytoplasm (Table 1, Table 2, and Table 3). TMHMM server v2.0 was employed to predict the presence of trans-membrane helices in the putative peptide sequences and one of the ZIK genes (TEA027328.1) had one trans-membrane helix (Additional File 3: Supplementary Fig. S1 1B). Unlike the MEKK tree, the Raf tree was divided into 11 different clades, with an uniform clustering of genes in all the clades. However, the Raf tree did not feature any orthologous gene pair. The NJ tree for ZIK was generated using 11 sequences from tea, 11 sequences from Arabidopsis, 10 sequences from rice, 10 sequences from tomato, 16 sequences from potato, 6 sequences from capsicum and 8 sequences from coffee (Fig. 1C). The ZIK tree was divided into 7 clades and had a uniform clustering of genes in all the clades with only clade E consisting of 2 genes each of Arabidopsis and rice. Similar to the Raf tree, the ZIK tree also did not feature any orthologous gene pair.

Domain analysis of tea MAPKKKs
The MAPKKK domain architecture as reported in known literature reveals that proteins belonging to Raf family in Arabidopsis and other plant species, possess a C-terminal kinase domain and a long N-terminal regulatory domain. The proteins belonging to ZIK family on the contrary possess a N-terminal kinase domain. The proteins belonging to the MEKK family however has less conserved domain architecture and is either located either at the N-or C-terminal or at a much central region of the protein [6].
Out of the 3 subgroups of plant MAPKKKs, the MEKK subfamily is fairly well known and characterized. Most MEKKs are known to be a part of the recognized MAP Kinase cascades, which activates the downstream MKKs. MEKK1 and MEKK2 from Arabidopsis, have been proven to play a signi cant role in plant innate immunity [15,16,17]. Similar to other plant MAPKKKs, 16 out of 17 members of MEKK subfamily in tea structured a characteristic conserved signature G(T/S)Px(W/Y/F)MAPEV, except TEA014429.1 ( Fig. 2A). Two of the most widely studied Arabidopsis Raf subfamily MAPKKKs, namely CTR1 and EDR1 are known to actively participate in ethylene mediated signalling and defense response mechanisms. All 31 members of the Raf subfamily in tea featured a conserved GTxx(W/Y) MAPE signature in its kinase domain with no exceptions (Fig. 2B). The ZIK-like MAPKKKs are also known by the name WNK or with no lysine (K). They are not proven to be involved with the phosphorylation of the MKKs in plants however, have speci c functions. Arabidopsis ZIK1 is known to phosphorylate APRR3 in-vitro, which is a putative component of the circadian clock in plants and is believed to be involved in signal transduction pathway, regulating its biological activity [18]. Another ZIK cascade, involving ZIK2, ZIK5 and ZIK8 in Arabidopsis is known to regulate the owering time by modulating the photoperiod [19]. The ZIK subfamily featured a characteristic GTPEFMAPE(L/V/M)(Y/F/L) conserved signature across all its 11 members in tea ( Fig. 2C) [5,6]. The presence of these distinctive conserved signatures across the tea MAPKKKs further con rms identity and the subfamily they belong. The largest subfamily was found to be the Raf subfamily with 31 members, while the smallest was found to be the ZIK subfamily with only 11 members. This result showed consistency when compared with known literature on other plant MAPKKKs.

Motif composition of tea MAPKKKs
To understand the evolution and comprehend sequential characteristics of the MAPKKK proteins in tea, a conserved motif search was carried out using the MEME suite (Fig. 3). Ten conserved motifs were identi ed in each of the three subfamilies. Almost all the tea MAPKKK proteins featured the protein kinase domain of motif 1, motif 2 and motif 3. Motif 4 was conserved across all the proteins with only one exception of TEA031230.1. Motif 5, motif 7 and motif 8 were only obtained for the ZIK subfamily with one exception of a MEKK-like TEA014429.1, which also featured motif 8. Motif 6 and motif 9 were harboured by almost all the protein sequences. However, motif 10 was only speci c to the MEKK and Raf subfamilies. Motif annotation revealed that motif 2 harboured a protein kinase ATP-binding site. Motif 6 contained a tyrosine kinase phosphorylation site. Motif 9 featured a serine/threonine protein kinase activation site (Additional File 3: Supplementary Fig. S4). The results suggested that proteins belonging to a same group harboured similar conserved motifs, further indicating that the classi cation of the tea MAPKKK subfamilies was backed by motif analyses.

Gene structure analysis of tea MAPKKKs
The intron-exon distribution pattern for tea MAPKKKs were analysed and visualised using the Gene Structure Display Server v2.0. Study of gene structure revealed differences in number of introns and exons, which contributes to variation in gene length. Introns or non-coding sequences are found abundantly within a genome and are regarded as an indicator of genome complexity [20,21]. Analysis of the intron patterns could help to comprehend and provide insights into the evolution, function and regulation of the genes [20,22,23,24,25]. The analysis of the intron-exon architecture in tea revealed signi cant variation in the number of introns and exons among the three subfamilies of MAPKKKs (Fig. 4). However, genes belonging to the same clades had similar Genomic distribution map and evolutionary pressure of tea MAPKKKs The tea MAPKKKs was mapped onto the genomic scaffolds to understand their distribution pattern. Due to the lack of chromosome-level assembly data in the TPIA database, the genes were mapped onto their respective scaffolds instead of the chromosomes. All 59 tea MAPKKKs were extensively distributed across 58 different genomic scaffolds. 17 MEKK genes were distributed across 17 different scaffolds (Fig. 5A). Similarly, 31 Raf genes were distributed across 31 genomic scaffolds (Fig. 5B). 11 ZIK genes were mapped onto 10 genomic scaffolds (Fig. 5C). Two ZIK genes namely, TEA013344.1 and TEA013346.1 were mapped on the same genomic scaffold 5883 and thus featured a duplication event. Additionally, both these genes possessed similar intron-exon architecture.
This result is conclusive evidence that duplication events were of signi cant importance and played a crucial role in the expansion of the MAPKKK genes in C. sinensis genome. Further, the ratio of non-synonymous substitution rates (K a ) and synonymous substitution rates (K s ) was evaluated to illuminate the mechanism of gene divergence and evolutionary pressure of the tea MAPKKKs (Additional File 2). The ratio determines the selective pressure acting on the respective proteins. If the K a /K s ratio is <1, it determines negative or purifying selection. If the K a /K s ratio is =1, it indicates neutral selection and if the K a /K s ratio is >1, it signi es positive selection [26].  Table S6). The K a /K s cumulative graphs of tea MAPKKKs were also generated (Additional File 3: Supplementary Fig. S5, Supplementary Fig. S6 and Supplementary Fig. S7). The results suggest strong positive selection pressures would have occurred, enabling different factors to regulate the MAPKKKs in C. sinensis.

Functional interaction network of tea MAPKKKs
For better understanding of the interactions of tea MAPKKKs in C. sinensis, an interaction network was constructed based on the orthologous genes in Arabidopsis, using the STRING server (Fig. 6). The functional interaction network of the genes has been built using that of Arabidopsis because tea database is not included in the STRING online server. TEA005306.1 in tea was found to be orthologous to AT5G55100 in Arabidopsis. This orthologous gene was identi ed using the TPIA database and AT5G55100 was used to build the interaction network. Additionally, tea proteins, homologous to the Arabidopsis proteins participating in the network were incorporated in the gure. These homologous proteins were designated as STRING proteins and were selected on the basis of high bit scores. Similarity searching programs such as BLAST produce accurate statistical estimates that help determine that protein sequences sharing substantial degree of similarities tend to have similar structures [27].  (Fig. 7C). Heat maps for all the 58 genes, representing the tissue speci c expression levels were also being generated (Additional File 3: Supplementary Fig. S8).

Abiotic stress induced differential expression levels of tea MAPKKKs
To check the effect of various abiotic stress tolerance levels of tea MAPKKKs, the expression data was retrieved from the TPIA database (Additional File 5: Supplementary  Table   S11) and expression graphs were generated (Fig. 8, Fig. 9, Fig. 10  genes were downregulated (Fig. 8B). Expression data of the ZIK genes revealed that under CA 1-6h, 7 out of 11 genes were upregulated and 4 genes were downregulated. CA 1-7d condition revealed that 5 genes were upregulated, 5 genes were downregulated and remaining 1 gene displayed no expression. Under CA 2-7d condition, 4 genes were upregulated, 6 genes were downregulated and 1 gene had no expression. Finally, under DA-7d, 8 genes showed upregulation and remaining 3 showed downregulation (Fig. 8C). Heat maps for the retrieved expression data were also generated (Additional File 3: Supplementary Fig. S9).
Further, expression levels of all tea MAPKKKs were checked under the effects of drought stress conditions. Drought stress levels are recorded in the TPIA database with respect to 25% polyethylene glycol (PEG) treatment and it includes 4 different stages: 1. 0h; 2. 24h; 3. 48h; and 4. 72h [30], where 0h was taken as the control. The expression levels of MEKK genes revealed that under PEG-N-24h condition, 12 genes were upregulated, 4 were downregulated and 1 gene did not show any expression. Under PEG-N-48h, 12 genes were upregulated, 4 were downregulated and 1 gene showed no expression. PEG-N-72h revealed 11 genes showing upregulation, 5 genes showing downregulation and 1 gene with no expression (Fig. 9A). Expression of Raf genes showed that under the PEG-N-24h condition, 11 genes were upregulated, 20 genes were downregulated. Under PEG-N-48h, 16 genes showed upregulation while the remaining 15 genes were downregulated. PEG-N-72h revealed that 15 genes were upregulated and 16 genes were downregulated (Fig. 9B). Finally, the expression data of ZIK genes revealed 10 out of 11 genes had different expression levels under the given conditions while 1 gene (TEA013344.1) had no data. Under the PEG-N-24h condition, expression data showed that only 1 gene was upregulated while the rest of the genes were downregulated. PEG-N-48h condition too revealed the same result with only 1 gene being upregulated. However, PEG-N-72h showed that 2 genes were upregulated and the rest of the genes were downregulated (Fig. 9C). Heat maps for the afore-mentioned data were also generated (Additional File 3: Supplementary Fig. S10).
The expression levels of the tea MAPKKKs under salt stress condition were studied. Similar to the drought stress parameters, the salt stress data in TPIA database is recorded based on treatment with 200 mM NaCl under 4 stages: 1. 0h; 2. 24h; 3. 48h; and 4. 72h where 0h was taken as the control. Analysis of the MEKK genes revealed that under NaCl-N-24h, 9 genes were upregulated and 8 genes were downregulated. For NaCl-N-48h condition, 9 genes showed upregulation and remaining 8 genes were downregulated. Expression levels under NaCl-N-72h revealed 5 genes being upregulated and the rest being downregulated (Fig. 10A). For the Raf genes, expression data concluded that under NaCl-N-24h condition, 15 genes were upregulated and 16 genes were downregulated. Under the NaCl-N-48h condition, 16 genes showed upregulation and 15 genes were downregulated. Expression levels under NaCl-N-72h showed that 8 genes were upregulated and remaining 23 were downregulated (Fig. 10B). For ZIK genes, 10 out of 11 genes had expression levels with 1 gene (TEA013344.1) showing no effect under the given conditions. Expression levels determine that under NaCl-N-24h condition, only 2 genes showed upregulation and the rest of the genes were downregulated. For NaCl-N-48h condition, only 1 gene was upregulated while the remaining 9 were downregulated. NaCl-N-72h condition too revealed a similar result with 2 genes being upregulated and remaining 8 being downregulated (Fig. 10C). Heat maps were generated for the above-mentioned data as well (Additional File 3: Supplementary Fig. S11). 48h_MeJA revealed that only 4 genes were upregulated and rest of the genes were downregulated (Fig. 11B). Treatment of the ZIK genes under the 12h_MeJA condition revealed that 4 genes were upregulated and 7 genes were downregulated. 24h_MeJA condition showed 5 genes being upregulated and remaining 6 being downregulated. 48h_MeJA condition concluded that 3 genes were upregulated and remaining 8 being downregulated (Fig. 11C). Heat maps for these data were also generated (Additional File 3: Supplementary Fig. S12).

Discussion
The MAPKKK-MAPKK-MAPK signalling cascade is a key component in response to various environmental stresses and plant developmental stages [5,17,31]. Investigation of the MAPKKK genes, which form a signi cant component of this core regulatory network in the pathway, would certainly aid to a better understanding of the signalling genes. Genome wide studies have previously identi ed MAPKKK genes in various plant species, which include 80 genes in Arabidopsis [16], 75 genes in Oryza sativa [6], 71 genes in Zea mays [22], 89 genes in tomato [9], 59 genes in Cucumis sativus [10], 150 genes in Glycine max [32], 77 genes in banana [33], 155 genes in Triticum aestivum [34], 62 genes in cassava [35] and 73 genes in Medicago truncatula [36]. However, tea plant has been explored the least and MAPKKK signalling genes have not been studied yet. This conducted study provided a comprehensive synopsis of the phylogenetic relationship, intron-exon architecture, motifs, functional domains, genomic distribution and expression patterns of the MAPKKK genes in tea. Herein, a grand total of 59 MAPKKK proteins were screened and identi ed from tea plant genome. The identi ed genes were then classi ed into 3 subfamilies based on their phylogenetic relationships (Fig. 1). Previous studies on MAPKKK gene families in Arabidopsis, cucumber and rice also yielded consistent results with respect to the classi cation of the identi ed genes into 3 subfamilies [6,10,16]. The classi cation of the MAPKKK genes was further backed by domain analyses, motifs and gene structure studies. All the identi ed tea MAPKKKs had their respective subfamily speci c domains. The MEKK subfamily genes featured a less conserved domain signature G(T/S)Px(W/Y/F) MAPE, with one exception (Fig. 2A). The Raf subfamily featured a conserved GTxx(W/Y)MAPE signature (Fig. 2B) and ZIK subfamily genes had a characteristic GTPEFMAPE(L/V/M)(Y/F/L) conserved signature (Fig. 2C). Motif analyses revealed that all MAPKKK proteins had protein kinase domains and proteins belonging to the same subfamily shared similar motifs (Fig. 3). This result is consistent to previous studies conducted on other plants like cucumber [10], Arabidopsis [16] and banana [33]. The intron-exon architecture of the tea MAPKKK genes revealed a signi cant variation in the number of introns and exons (Fig. 4). The analysis also proposed that genes belonging to the same subgroup featured similar intron-exon organisation. MEKK genes displayed exons ranging from 6 to 16 on an average. Highest number of exons found among the MEKK genes was 18. Raf genes had an average of 6 to 18 exons with the highest being a staggering 28 and ZIK genes had 7 to 10 exons on an average.
Raf subfamily thereby featured more number of introns than MEKK and ZIK subfamilies. Reports suggest that the rate at which introns are lost is faster compared to the rate at which introns are gained after segmental duplication [37]. This is a conclusive evidence to infer that Raf subfamily might contain the original set of genes, from which genes of other subfamilies have been derived. The MAPKKK genes also displayed a signi cant variation with respect to the number of UTR segments present. Most genes possessed both 5' and 3' UTRs while few had only the 5' UTR or 3' UTR segment. These variations of the gene structures in large scale suggest that the tea genome has been variable during its evolutionary history. Similar occurrence was also observed in plants like cassava [35], grapevine [38] and maize [22]. All the identi ed genes were mapped onto their respective scaffolds (Fig. 5). Duplication events were observed among the ZIK genes and further the ratio of non-synonymous substitution rates (K a ) and synonymous substitution rates (K s ) was evaluated which indicated strong positive selection pressures to have occurred, enabling different factors to regulate the MAPKKKs in C. sinensis genome. Functional interaction network was also constructed based on the tea orthologs in Arabidopsis (Fig. 6). The genes participating in the interaction network revealed genes responsible for various biological processes during developmental stages. Additionally, tea proteins homologous to the Arabidopsis proteins participating in the interaction network were also added. These homologous proteins were considered STRING proteins because of the fact that proteins possessing high sequence and structural similarity tend to have similar functions.
Tea plant is a major plantation crop grown all over the world due to its high commercial value. Leaves obtained from tea plant are rich sources of nutrients and tea is the second most important beverage in the world besides water. However, being a thermophilic crop, tea plant is largely affected by various environmental stress factors [29,33]. Plants undergo many biochemical, molecular and physiological changes to defend themselves against environmental damage by evolving numerous signalling pathways to react to the external stimuli into intracellular reactions [39,40]. MAPKKKs function at the highest level of the MAPK signalling cascade, helping with developmental and stress tolerance in plants. Previous studies have established the positive role of plant MAPKKKs in tolerance to various abiotic stress factors like cold, salt and drought [41,42,43,44]. Through this cascade, these external stress stimuli are being converted into cellular responses [44]. Reactive oxygen species (ROS) are oxygen derivatives, which are highly reactive by-products of the aerobic metabolism [50]. Plants consist of a complicated network of ROS scavenging antioxidant enzymes that help regulate the ROS levels under normal physiological conditions [49,50]. Although a shift from normal physiological conditions to adverse environmental conditions shifts the equilibrium, resulting in increased ROS production. ROS are highly toxic to the cellular machinery and increased ROS levels can lead to serious oxidative damage and cell death [49]. Studies have suggested that the MAPK signalling cascade comprising of the MAPKKK-MAPKK-MAPK module is stimulated when excess ROS levels are detected under multiple stress conditions such as salt stress, cold stress and drought stress [49,50]. In tomato, most of the MAPKKK genes were upregulated by drought, salt, cold, and heat stress conditions out of which SlMAPKKK51, SlMAPKKK53, and SlMAPKKK55 were upregulated more [9]. MAPKKK members of some plants including GhRaf19, AtMAPKKK18, AtRaf43, and DSM1 were found to be involved in plants resistance to various abiotic stress stimuli [40,41,42]. Gene expression analysis for the tea MAPKKK genes against cold stress demonstrated that TEA016319.1 among the MEKK genes, TEA008343.1 among the Raf genes and TEA010125.1 among the ZIK genes were expressed the most during the CA2-7d condition (Fig. 8). Expression data studied for drought stress tolerance revealed that TEA005122.1 among the MEKK genes, TEA000933.1 among the Raf genes and TEA022762.1 among the ZIK genes had the highest levels of expression at PEG-N-72 h condition ( Fig. 9). Salt stress tolerance data suggested that TEA005122.1 among the MEKK genes and TEA000933.1 among Raf genes displayed the highest levels of expression at NaCl-N-72 h condition. TEA020112.1 among the ZIK genes was upregulated the most under NaCl-N-72 h condition (Fig. 10). Expression data for treatment under Methyl jasmonate (MeJA) was also analysed. Data revealed that TEA028214.1 among the MEKK genes, TEA000933.1 among the Raf genes and TEA002087.1 among the ZIK genes were expressed the most under the 72_MeJA condition (Fig. 11). Collectively, these ndings suggest the involvement of a number of MAPKKK genes, being upregulated and expressed under the stress conditions. In general, this study provides a detailed and comprehensive analysis of the MAPKKK genes in tea. Further extensive studies needs to be conducted on MAPK genes in tea that could underwrite a better understanding of various functions of these set of genes in developmental processes and expression under various abiotic stress stimuli.

Conclusion
Mitogen activated protein kinases (MAPK) signalling cascade plays signi cant roles in different biological processes. The signalling components are linked to the upstream and downstream regulators by phosphorylation. There has been substantial development in identifying the different MAPKKK genes and understand their physiological roles in various plants. However, these genes had yet not been explored and studied in tea plant. In-silico genome wide analysis had identi ed 59 MAPKKK genes from C. sinensis genome. The classi cation of the identi ed MAPKKK genes in 3 subfamilies were conducted based on their phylogenetic relationships. The genes were further investigated under domains signatures, conserved protein motifs and intron-exon architecture. The classi cation of the identi ed MAPKKK genes into their subfamilies was backed by the results obtained from the above analyses made. The 59 genes were mapped onto their respective genomic scaffolds and a network of functionally interacting genes was constructed. Further, expression pro le analyses were conducted to reveal the involvement of the tea MAPKKK genes in various tissues during development and also check the expression of these genes under various abiotic stress stimuli and plant hormonal treatment. These data will provide detailed information about the tea MAPKKK genes for further characterization of the MAPK signalling cascade and lay a concrete foothold for further exploration and research on C. sinensis.

Methods
Identi cation of MAPKKK gene family in Tea  [54]. The retrieved Arabidopsis and rice MAPKKK sequences were used as query sequences to search against the tea plant genome database deploying the BLASTp algorithm with an e value set to 1e-5 and identity percentage of 50% as threshold. The obtained genes were then aligned and uploaded to SMART (http://smart.emblheidelberg.de/) [55] and Pfam web tool (https://pfam.xfam.org/) to con rm the existence of kinase domains. Self-BLAST of the identi ed sequences were done to remove any chances of redundancy and were considered potential C. sinensis MAPKKK genes. The physicochemical properties of the identi ed tea MAPKKK genes were predicted using ProtParam tool incorporated in ExPASy database (https://expasy.org/) [56]. Subcellular localization of the peptides was predicted using the BaCelLo (Balanced subcellular localization predictor) online server (http://gpcr.biocomp.unibo.it/bacello/index.htm) [57].

Intron exon structures and conserved motifs
The intron exon distribution pattern for tea MEKK, Raf and ZIK peptide sequences were analysed and visualised using the Gene Structure Display Server v2.0 (http://gsds.cbi.pku.edu.cn/) [63]. The full-length peptide sequences were uploaded to MEME suite (http://meme-suite.org/) [64] in-order to identify the conserved motifs.
Mapping of tea MAPKKK genes onto scaffolds and gene duplication TPIA database has incomplete genome assembly information. As a result, the tea MAPKKK genes were mapped onto their respective scaffolds using MapGene2chromosome web v2 (MG2C) software tool (http://mg2c.iask.in/mg2c_v2.0/) [65]. The genes were mapped according to their scaffold positional information available in TPIA database, which includes scaffold IDs for each gene, scaffold dimensions and the starting and ending position of the each gene on the scaffolds.

Expression pro les of tea MAPKKK genes
The tissue speci c expression pro les, which include expression levels in apical bud, ower, fruit, young leaf, mature leaf, old leaf, root, and stem were retrieved from TPIA database. Furthermore, the abiotic stress tolerance data (cold, drought, salt) under different parameters, were retrieved from TPIA database along with the expression data for treatment with methyl jasmonates (MeJA). This data was fed into GraphPad Prism 8    Functional interaction network of tea MAPKKK proteins. The interaction network was build according to the ortholog in Arabidopsis. TEA005306.1 in tea is orthologous to AT5G55100 in Arabidopsis. The orthologous protein (red) and homologous proteins (black) are shown within brackets.