Genome Wide Identi cation and Characterization of Mitogen Activated Protein Kinase (MAPK) Genes Reveals Their Potential in Enhancing Drought and Salt Stress Tolerance in Gossypium Hirsutum

Sadau Bello Salisu (  Sbsadau.ste@buk.edu.ng ) Cotton Research Institute https://orcid.org/0000-0001-6298-7873 Teame Gereziher Mehari Cotton Research Institute Adeel Ahmad Cotton Research Institute Sani Muhammad Tajo Cotton Research Institute Sani Ibrahim Oil Crops Research Institute Chinese Academy of Agricultural Sciences Muhammad Shahid Iqbal Cotton Research Institute Mohammed Elasad Agricultural Cooperative College: Agricultural Cooperative University Jingjing Zhang Cotton Research Institute Hengling Wei Cotton Research Institute Shuxun Yu Cotton Research Institute


Abstract
Background: Cotton crop is universally considered as protein and edible oil source besides the major contributor of natural ber and is grown all around the globe. Unpredicted environmental stresses are becoming a signi cant threat to sustainable cotton production, ultimately leading to a substantial irreversible economic loss. Mitogen-activated protein kinase (MAPK), generally considered essential for recognizing environmental stresses through phosphorylating downstream signal pathways.
Results: In the current study, we have identi ed 74 MAPK genes across cotton, 41 from G. hirsutum, 19 from G. raimondii, whereas 14 have been identi ed through G. arboreum. The MAPK gene-proteins have been further interrogated to determine their physicochemical characteristics and other essential features.
In this perspective, characterization, phylogenetic relationship, chromosomal mapping, gene motif, cisregulatory element, and subcellular localization were carried out. Based on phylogenetic analysis, the MAPK family in cotton is usually categorized as A, B, C, D, and E clades. Seven GHMAPK genes (GH_A07G1527, GH_D02G1138, GH_D03G0121, GH_D03G1517, GH_D05G1003, GH_D11G0040, and GH_D12G2528) were selected, and speci c tissue expression and pro ling were performed across drought and salt stress.
Conclusions : RNA sequence and qPCR results represented genes as differentially expressed across both vegetative and reproductive plant parts. Similarly, the qPCR analysis showed that six genes had been upregulated substantially through drought treatment while all the genes were upregulated across salt treatment.

Background
Cotton (Gossypium spp.) has become more important for plant research on polyploidization, phylogeny, cytogenetics, and genomics. It has been regarded as one of the most vital natural plants most variability and has the highest commercial importance among crop plants (Kunbo et al., 2018). Cotton is mainly cultivated as a potential source of ber, food, and feed. Gossypium hirsutum L, the tetraploid, is the largest cotton species with over 50 genomic species (C. Chen et al., 2018a). Gossypium hirsutum is a natural allotetraploid believed to originate from genetic mutation amongst an A-genome species that may be sourced from Gossypium herbaceum (A1) African origin or might be from Asian cotton, also known as Gossypium arboreum (A2) with a D-genome species might be from American origin Gossypium raimondii. This tetraploid cotton accounts for around 90% share of worldwide cotton production annually (Page et al., 2013).
Several biotic and abiotic factors signi cantly impact cotton productivity (drought, heat, waterlogging, and salinity), causing signi cant losses in the focused agricultural sector productivity. Although breeding programs have made a positive attempt, old crop breeding techniques have limitations such as crossing barriers, long-time effects, and genetic disease transformation. The cotton plant self-stresses due to its indeterminate growth habit; that is, it grows and expands before internal or external stresses impede growth and expansion (X. Zhang et al., 2014a).
The high amount of greenhouse gas emissions in the atmosphere and associated air pollution are signi cant causes of heatwaves, oods, and drought stress. Drought stress can signi cantly impact crop production, and the magnitude and length of the stress are also important factors. Availability of water is a critical factor in achieving long-term sustainability in crop production (Khan et al., 2018). Drought stress is a signi cant problem in cotton productivity because 50% global cotton supply comes from drought challenges. Cotton crops require improved yields and yield balance in both standard and moisturestressed environments (Tuteja, 2007). Drought stress in uences cotton plants' growth and productivity by inducing several morpho-physiological and biochemical changes. Physiological and metabolic features such as photosynthesis, stomatal conductance, respiration, energy output, carbohydrate metabolism, and ultimately yield are clogged even though cotton has various mechanisms to relieve and withstand waterde cit stress (Tian et al., 2019).
High salinity is among the most signi cant environmental stress that plant experience. Roots are the rst and most direct organs to detect a signal. From germination to boll formation, salt stress harms cotton physiology, and the tolerance mechanism is well described (Munns & Tester, 2008;Zhu, 2020). However, early salinity tolerance responses of plant growth may not be a substantial measure of tolerance for salts across various plant species. Screening for salt tolerance among different plants may use physiological parameters as stress tolerance indicators, while enzyme concentration could be used as salt tolerance assessment in cotton. Few novel approaches, such as transcriptome pro le, methylation-sensitive ampli ed polymorphism analysis at genes and cell levels, and genetic diversity assessed by various molecular markers, revealed salt stress-induced epigenetic changes in cotton cultivars and their salt tolerance mechanism (Dk et al., 2020).
Plants' adaptive responses to environmental changes triggered by external and internal in uences primarily depend on their interpretation of external signals. Multiple signal transduction pathways are used to amplify these perceived signals. MAPK can be considered as the standard signal regulation mechanism, transforms external stimuli into cells. hirsutum, followed by 19 genes in G. raimondii, whereas 14 genes present in G. arboretum have been identi ed through the current research. The MAPK genes proteins were studied further to determine their physicochemical characteristics, phylogenetic relationship, gene ontology, chromosomal mapping, and conserved motif. Further experimental analysis of 7 selected GHMPK genes was carried out for tissuespeci c tissue expression, drought, and salt stresses to con rm their functions in cotton. The ndings would give substantial evidence, laying the groundwork for further research into the molecular and biological functions of MAPK in cotton.

Materials And Methods
Plant materials, growth condition, and stress treatments TM-1, G. hirsutum L., as upland cotton, was used to evaluate tissue/organ expression. Plant for vegetative tissues was planted under 25 °C, with a light cycle of 16 hrs and 8 hrs dark cycle in a controlled chamber, while plants for reproductive tissues were harvested in the Cotton Research Institute Chinese Academy of Agricultural Sciences eld, Anyang, Henan, China. Tissues like young leaves were collected at an early stage of planting; stems, true leaves, roots, and bers were collected after two-week of planting from the growth chamber. During 10 days post-anthesis, the ower was harvested from the eld. To determine the GHMPK gene function in cotton under abiotic stresses, the CCRI10 variety was used for salt stress, while the H177 variety was used for drought stress. The cotton seedlings CCRI10 and H177 were planted and harvested under a laboratory-controlled growth chamber with a 25°C temperature and a 16/8 hrs light/dark cycle. A 15% polyethylene glycol (PEG-6000) induced drought stress treatment was applied to seedlings, and for salt stress, a 200 mM sodium chloride (NaCl) treatment was used (Y. Li et al., 2013; Ma et al., 2020). Sample collection was carried out at 0, 2, 4, 6, 12, 48, and 72 hours after treatment. Sampling for each treatment was carried out three times, and samples were immediately collected in liquid nitrogen and preserved under a temperature of -80 °C.

Physiochemical properties analysis of MAPK genes in Cotton species
Proteins encoded by MAPK in Gossypium hirsutum (tetraploid cotton genome), Gossypium arboreum, and Gossypium raimondii have been obtained online cotton database CottonFGD.
Sequence alignments, phylogenetic tree construction and collinearity Three MPK. Cotton protein sequences were downloaded for G. hirsutum, and G. arboreum, and G. raimondii, through the online cotton database, cotton FGD. (http://www.cottonfgd.org/) and Phytozome (https://phytozome.jgi.doe.gov/) was followed for A. thaliana, sequences retrieval, and the neighborjoining (NJ) approach to investigate the evolutionary relationship. Computer software package MEGA 7.0 (www.megasoftware.net) was used to construct a phylogenetic tree, considering Jones-Taylor-Thornton to be the substitution model through selecting 1000 replications. For gene collinearity, G. hirsutum protein sequence was considered for blast search across G. raimondii and G. arboreum protein database considering E-value as <0.01, and signi cant were considered ≥ 90 signi cant. The Gene I.D.S., GFF3 les, and linked les were used to construct the collinearity using the TBtools software.
Motif identi cation, gene structural analysis, chromosomal mapping and promoter analysis of the cotton MAPK genes The MEME, an online tool, was used to determine the cotton MAPK gene-related conserved motifs. TB tools software was then used for the motif visualization. The coding sequences (CDSs) were compared with the MAPK gene's genome sequences through an online gene structure tool (http://gsds.cbi.pku.edu.cn/). Information about the chromosome was done by extracting cotton GFF3 from cotton GDP (http://www.cottonfgd.org/) and then mapped with the gene ID using the TBtools software (Tamura et al., 2011). To examine the role of the GHMAPK gene's regulatory region in cotton, an upstream sequence within a 2000bp distance from the start codon has been considered and searched for CARE program. (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ (Lescot et al., 2002).

Subcellular localization
For localization of GHPMK protein, the protein sequence was downloaded from cottonFGD, and the prediction was carried out through Wolf PSORT (https://wolfsort.hgc.jp) online server.
RNA isolation, cDNA synthesis, and qRT-PCR The RNA isolation was carried out from samples following the kit protocol using an RNA extraction kit (Polysaccharides & Polyphenolics-rich) (Tiangen, China). Using a PrimeScriptTMRT reagent package with a gDNA Eraser, RNA has been further transcribed to cDNA through using rtqPCR as outlined. For calculation of relative expression, 2 −ΔΔCT method has been followed (Schmittgen & Livak, 2008). Each experiment has been repeated 3 times, along with three technical replicates.
Expression Patterns of GHMPK gene in Different Tissues, Under Drought, Salt, and Validation RNA Sequencing Data GHMPK exhibits different expressions across various tissues and using stress treatments. RNA sequence data for TM-1 was obtained from our lab. We analyzed the RNA sequence data under tissues expression, drought, and salt stress. Samples for tissue expression were taken from cotyledon, leaf, stem, root, and ber at 5dpa, while the sample for drought and salt were taken at 0, 2 , 6, 6, and 12 hours as experimental conditions. PEG-6000 has been used for drought induction, whereas sodium chloride (NaCl) solution a salt treatment. Log transformation was carried out for reading/kilobase/million mapped values, and heatmap was constructed using software package TB tools.

Results
Physiochemical traits of MAPK gene family A total of 74 MAPK genes were detected in cotton; out of them, 41 in Gossypium hirsutum, 19 in Gossypium raimondii, whereas 14 were Gossypium arboreum. These MAPK genes' proteins were studied further to evaluate their physicochemical characteristics as well as other features. These MAPK genes were reported to express proteins ranging in length between 604 to 87 amino acids, with molecular mass between 71.503 to 9.988 KDa and Isoelectric Points (pl) between 10.23 to 5.672. All identi ed MAPK genes have grand average hydropathy > 0, implying that all MAPK genes in cotton were hydrophilic ( C, D, and E clades based on results from the phylogenetic tree (Fig 2.1). TDY phosphorylation site may comprise all the genes in clades A, TEY motif across clade B members, Clade C members may contain the TDY motif, D members contain more TDY motif, and few TEY motifs lastly, Clade E members contain TEY motif. There are 26 GHMAPKs, 9 GrMAPKS, 9 GaMAPKs, and 3 AtMAPK that contain the TDY motif, while 15 GhMAPK, 8 GrMAPK, 6 GaMAPK, and 5 AtMAPK contains the TEY motifs. This shows that the cotton MAPK gene has more TDY motif and minor TEY motif while the AtMPK has more TEY motif and minor TDY motif. The cotton MAKS classi cation was consistent with previous ndings (X. Zhang et al., 2014b). This indicates that the TEY MAPKs motif may signi cantly function in dicot plants than the TDY MAPKs motif.
To differentiate the collinear gene pairs, collinearity was done for the MAPK in three cotton species through circle gene viewer using the software package TBtools (C. Chen et al., 2018b). Collinearity analysis was performed between the physical maps of the subgenomes about Gossypium hirsutum, Gossypium raimondii, and Gossypium arboreum for associations among A vs. D, A vs. At, and D Vs. Dt subgenome. Generally, most of the MAPK genes from tetraploid cotton represented high similarities to the D genome (G. raimondii) and the A genome (G. arboreum).
Gene structure was examined by using an online tool http://gsds.cbi.pku.edu.cn/ for cotton MAPKKKs. In G. hirsutum, out of 41 genes, 38 were found to possess intron, and 3 are intronless. The most extended intron interruption was observed in GH_A11G0035 and GH_A12G2888 (Fig. 4). In G. arboreum, all the 19 genes possess intron, and the highest intron interruption was observed in Ga02G0944 (Fig. 4B), also in G. raimondii, both genes possess intron, and the highest intron interruption is found in Gorai.007G004400 and Gorai.008G249800 with 10 introns respectively (Fig 4C).

Chromosomal mapping of MAPK Genes in Gossypium species
The chromosome distribution was investigated in the 3 species of cotton and found out that among the 41 G. hirsutum MAPK genes, 20 were located in the At-sub genome, and 21 were located at the Dt subgenome. With four genes each, chromosome A12 and its homologous D12 had the highest gene locus, followed by chromosome A05. Three genes were found in A01, A03, D03, and D05, D07, and D02 have two genes each, the remaining A02, A05, A07, A08, A09, A10 D04, D08, D09, and D10 have one gene, respectively (Fig. 5A). In G. arboreum, the GrMPK genes were mapped in A01, A02, A03,05, A11, and A12. The highest gene locus was observed in chromosomes A03, A11, and A12, with three genes (Fig. 5B).

Determination of cis-regulatory elements
Cis-regulatory elements are assumed to perform various functions based on their arrangement and location across promoters (Biłas 2016). We have analyzed Cotton MAPKs promoter regions for the determination of their cis-elements. The 2kb sequences from the start of transcription from each GHMKPS on the upstream side were considered and used. The cis-elements identi ed in the GHMPK promoter region were categorized into ve functions: hormone responsiveness, stress responsiveness, light responsiveness, cellular responsiveness, and binding site (Table S 2.2). The majority of GHMPK genes represented ABRE for the responsiveness of hormone i.e. Cis-element, are involved for elements related to the ABA responsiveness, TGA element (auxin-responsive element). Ethylene responsive element (ERE), CA element (salicylic acid-responsive gene element), and GATA-Motif (cis-acting regulatory element involved in the MeJA-responsiveness). Few among the GHMAPK promoters have the GARE motif and P-BOX (gibberellin-responsive). Elements associated with stress include TC-rich repeats (Defense and stress-responsive element), LTR (low temperature-responsive element), WUN-motif (wound responsive element), and TC-rich repeats (defense and stress responsiveness element). For light-responsiveness, elements such as ATCT-motif, Box 4, GT1-motif, LAMP-element, TCT-motif, TCCC-motif, CHS-CMA1a, TCTmotif, and TGACG-motif are found. For cellular development and binding site, few cis-elements are involved. Several cis-regulatory elements related to enhancing tolerance related to abiotic and biotic stresses in plants have been found in the promoter sequences in the coding sequence related to the GHMAPK gene, indicating that this gene can be investigated as an abiotic tolerance gene in cotton.

RNA Sequence Analysis and RT-qPCR con rmation of GHMPK under drought and salt treatments
The RNA pro le data of TM-1 was used, and the raw data and their transformed log 10 values of the genes were analyzed, and a heat map was constructed. Seven GHMAPK genes were found to have different expression patterns across different tissues, including young leaf, true leaf, the cotyledon, stem, ber, and roots. Gene-speci c qRT-PCR primers were designed (Table S23). Based on RNA-sequence data as well as qRT-PCR results, we determined that GH_A07G1527 and GH_D02G1138 are upregulated in all tissues GH_D03G0121 shows upregulation in cotyledon, stem, and root and GH_D03G1517 and GH_D05G1003 are upregulated in all the tissues. Lastly, GH_D11G0040 shows no expression (Fig. 7A). The results demonstrated that RNA Sequence analysis and RT-qPCR expression results represented a strong correlation, with R 2 = 0.91 in drought, R 2 = 0.75 in salt and R 2 = 0.66 in tissues.
To determine the roles of GHMAPK genes across salt and drought stresses, the seven genes have been analyzed for both the RNA-sequence data and qRT-PCR results to detect their expression pattern after treatment. For drought treatment, GH_A07G1527 was upregulated at 6 hr and 12hr, GH_D02G1138 was upregulated at 2 hr,6 hr, and 12 hr, GH_D03G0121 was upregulated at 12 hr. At the same time, GH_ D03G1517and GH_D02G1138 were upregulated at 2 hr,6 hr, and 12 hr. While GH_D11G0040 showed no expression. Lastly, GH_D12G2528 was upregulated at 12 (Fig. 7B). For salt treatment, GH_A07G1527 and GH_ D02G1138 were upregulated at 2 hr,6 hr, and 12 hr, GH_D03G0121 show no expression in RNA seq data and upregulated expression at 12hr post-treatment. GH_ D03G1517and GH_D02G1138 were upregulated at 2hr, 6hr, and 12hr of post-treatment, respectively. GH_D11G0040 showed no expression in RNA seq data and upregulated at 6 hr and 12 hr of qRT-PCR. Lastly, GH_D12G2528 at 2hr,6hr, and 12h respectively (Fig. 7C).

Discussion
Plants are constantly vulnerable to different environmental stress conditions, for example, pathogen infections, salt, cold, drought, and even oxidative stress. Such environmental stresses have adverse effects on plant development and productiveness, resulting from signi cant loss of crop productivity (Tuteja, 2007) About 41 MAPK genes in G. hirsitum,15 in G. arboreum, whereas 19 in G. raimondii have been found. These three cotton species depict almost identical physicochemical properties about molecular weights with a range of 71.503 to 9.988 KDa and GRAVY values less than zero. Shallow gravity is a strong indication that the protein is hydrophilic and possesses high gravity index. Previous research has con rmed that hydrophilic proteins are involved in tolerating numerous abiotic stresses (Hanin et al., 2011). Cotton has more MAPKS genes in Clade A than Clade B, according to the results of the phylogenic relationship, which is following the previous research reports in G.raimondii, Arabidopsis, rice, and poplar (Hamel et al., 2006;Ichimura et al., 2000) To better understand the potential functions of GHMAPK in cotton under various environmental stresses, we examined the cis-element distribution in promoter regions. Based on their position, form, and orientation on the promoter, the cis-regulatory elements observed serve various functions. In this study, the cis-element identi ed were classi ed into ve groups (hormone responsiveness, light responsiveness, stress responsiveness, cellular development, and binding site). Prevalence of such elements across the promoter region of these genes signi es their role in the growth and development of plants ( Both RNA sequence and RT-qPCR validation Analyses indicated that GHMPK genes upregulated across drought and salt stresses. Hence, results expressed that all the six genes were upregulated under drought treatment, whereas all the seven genes have been upregulated under salt treatment, and two were downregulated. This is consistent with previous ndings that various abiotic stresses upregulate GhMPK2 and GbMPK3 in cotton and possibly enhanced drought and oxidative stress tolerance. The expression of most GhMAPKKKs has been found to increase in cotton when exposed to a water de cit, implying that these genes may be linked to cotton drought tolerance and response (Sadau et

Conclusion
Environmental challenges remain critical in crop production, even though signi cant measures were made to control genetic mechanisms underlying abiotic stress tolerance. Thus water availability, temperature maintenance, and disease control are of paramount importance. Crops are generally exposed to multiple stresses, and hence the area that requires much more attention is the plant response to these stresses. These require an integrated approach to water-de cient lands, mining critical genes to enhance stress tolerance through conventional breeding methods. This was executed to investigate MAPK genes from Gossypium hirsutum to enhance tolerance against drought, and salt stresses based on family analysis, Gossypium hirsutum, Gossypium arboreum, and Gossypium raimondii, respectively. GHMAPK elements related cis-regulatory elements analysis suggests that these genes signi cantly affect abiotic stress tolerance. Analysis of RNA-sequence and RT-qPCR data have revealed the upregulation of several genes across both vegetative and reproductive tissue. They have been declared candidate genes for tolerance to drought and salt stresses in cotton due to their upregulation across post-treatment examinations under the reported study. This research work lays the groundwork for more research into these genes to build a more robust cotton genotype that performs better under different environmental stress features, like drought, cold and salt stress.  Analysis of Gene structure using Gene structure display server (A) Gossypium hirsutum (B) Gossypium raimondii Analysis of Gene structure using Gene structure display server (A) Gossypium hirsutum (B) Gossypium raimondii (C) Gossypium arboreum.