Genome-wide identification and characterization of glutathione S-transferase gene family in Musa acuminata L. AAA group and gaining an insight to their role in banana fruit development

Glutathione S-transferases are a multifunctional protein superfamily that is involved in diverse plant functions such as defense mechanisms, signaling, stress response, secondary metabolism, and plant growth and development. Although the banana whole-genome sequence is available, the distribution of GST genes on banana chromosomes, their subcellular localization, gene structure, their evolutionary relation with each other, conserved motifs, and their roles in banana are still unknown. A total of 62 full-length GST genes with the canonical thioredoxin fold have been identified belonging to nine GST classes, namely tau, phi, theta, zeta, lambda, DHAR, EF1G, GHR, and TCHQD. The 62 GST genes were distributed into 11 banana chromosomes. All the MaGSTs were majorly localized in the cytoplasm. Gene architecture showed the conservation of exon numbers in individual GST classes. Multiple Em for Motif Elicitation analyses revealed few class-specific motifs and many motifs were found in all the GST classes. Multiple sequence alignment of banana GST amino acid sequences with rice, Arabidopsis, and soybean sequences revealed the Ser and Cys as conserved catalytic residues. Gene duplication analyses showed the tandem duplication as a driving force for GST gene family expansion in banana. Cis-regulatory element analysis showed the dominance of light-responsive element followed by stress- and hormone-responsive elements. Expression profiling analyses were also done by RNA-seq data. It was observed that MaGSTs are involved in various stages of fruit development. MaGSTU1 was highly upregulated. The comprehensive and organized studies of MaGST gene family provide groundwork for further functional analysis of MaGST genes in banana at molecular level and further for plant breeding approaches.


Introduction
Musa acuminata (L.) popularly known as "banana," belongs to the Musaceae family distributed in the hot, tropical and subtropical regions worldwide. The presence of several vitamins (vitamin A; vitamins B 1 , B 2 , B 3 , and B 6 ; and vitamin C) and minerals (especially potassium) and a good quantity of starch and fiber enhance the nutritional value of a banana. Due to its antioxidant properties, its consumption it could be a fascinating challenge for plant breeders to enhance its fruit quality and to develop more stress-resistant crop against diverse biotic and abiotic stresses through transgenic approaches. The availability of the sequenced genome of M. acuminata (DH Pahang, AAA group) (D'Hont et al. 2012) provides an opportunity for researchers to search and characterize diverse gene families which are functionally important. In the recent studies, many gene families like DCL, AGO, and RDR gene families (Ahmed et al. 2021); calcineurin B-like (CBL) genes (Xiong et al.2021); cellulose synthase-like (Csl) gene family (Yuan et al. 2021); TCP gene family ; and aquaporin gene family (Hu et al. 2015a, b) have been identified and well characterized in M. acuminata.
Glutathione S-transferases (GSTs) are an inbuilt antioxidant enzymatic defense system of plants that works downstream of Cyt P450. The GST enzyme superfamily is primarily occupied in scavenging diverse chemical compounds present in the soil or in the environment by conjugating glutathione (GSH), a natural non-enzymatic antioxidant defense system in plants (Song et al. 2021), to a hydrophobic substrate to make a chemical compound more hydrophilic to be expelled from the cell through a vacuole (Basantani and Srivastava 2007). GSTs are the key enzymes in plant growth and development, secondary metabolism, anythocyanin accumulation (Shao et al. 2021), signal transduction pathways (Nianiou-Obeidat et al. 2017), tetrapyrrole metabolism and retrograde signaling (Sylvestre-Gonon et al 2020), and detoxification of reactive carbonyl species (RCS) (Mano et al. 2019) and are against various biotic and abiotic stresses. GSTs are characterized by the presence of canonical thioredoxin fold in the highly conserved N-terminal domain which dominantly consists of α-helices and β-strands with a β1α1β2α2β3β4α3 topology. It possesses a G-site for glutathione binding. The variable C-terminal domains is consists of all α-helices and possesses an H-site for secondary hydrophobic substrate binding (Vaish et al. 2020).
Genome-wide analyses have identified 39 GSTs in Cucumis melo var. saccharinus (Song et al. 2021), 92 GSTs in Medicago truncatula (Hasan et al. 2021), 32 GSTs in Cucurbita maxima , 330 GSTs in Triticum aestivum (Wang et al. 2019a, b), 82 GSTs in Raphanus sativus (Gao et al. 2020), 51 GSTs in chickpea (Ghangal et al. 2020), and 31 GSTs in Vigna radiata (Vaish et al. 2018). GSTs are functionally pivotal in plant growth and development and are responsive against diverse biotic and abiotic stresses, and there is no information related to genome-wide identification and characterization of the GST gene family in M. acuminata. The publically available whole-genome sequence of the banana enables us to perform a genome-wide analysis of the GST gene family in banana using the integrated bioinformatics tools that are a costeffective, time, and labor-saving approach. The present study proposed the details of the banana GSTs' physicochemical characteristics, subcellular localization, chromosomal localization, gene duplication events, gene structure, protein secondary structure prediction, phylogenetic relationship with other members of other taxa, abundance of cis-regulatory elements, 3D structure modeling, and their expression level during fruit development. The information about banana GSTs could be of potential importance for banana breeding programs in the future.

Mining of banana GSTs from Banana Genome Hub
The well-characterized GST protein sequences of Arabidopsis thaliana, Glycine max, and Oryza sativa were retrieved from The Arabidopsis Information Resource (TAIR) (http:// www. arabi dopsis. org/) and the Rice Genome Annotation Project (RGAP) by their accession numbers, respectively. With the pBLAST search in the genome of M. acuminata, DH Pahang v4 (updated in September 2021) was employed on the Banana Genome Hub (https:// banana-genome-hub. south green. fr/ downl oad) database (Droc et al. 2013), with an e value of 0.001. The identified sequences were subjected to NCBI Batch Conserved Domain (CD) search (https:// www. ncbi. nlm. nih. gov/ Struc ture/ cdd/ wrpsb. cgi) (Marchler Bauer et al. 2017), SMART (Simple Modular Architecture Research Tool) database (http:// smart. embl-heide lberg. de/) (Letunic and Bork 2018), and Pfam search (http:// pfam. xfam. org/ search) online tool to find the key feature of GSTs, i.e., the presence of conserved N-terminal domains with the thioredoxin fold and a C-terminal domain. The full-length sequences containing both the conserved domains were selected for further analysis and characterization.

Chromosomal localization, evolutionary, and gene duplication analyses of MaGST genes
The genomic locations of the identified MaGST genes were retrieved from genomic data. The locations of these genes were diagrammatically depicted on their respective chromosomes using the TBtools software v0.667 (https:// github. com/ CJ-Chen/ TBtoo ls). The phylogenetic analysis was carried out using the amino acid sequences of M. acuminata, A. thaliana, O. sativa, G. max, Physcomitrella patens (a bryophyte), and Larix kaempferi (a gymnosperm) GST protein. The amino acid sequences of P. patens and L. kaempferi were downloaded from the NCBI database using their accession numbers. The sequences were aligned using Clustal Omega, and the tree was constructed following the neighbor-joining (NJ) method using MEGA X software (Kumar et al. 2018). For the accuracy of a constructed tree, the bootstrap value was set at 1000 replicates. Gene duplication events were analyzed by pBLAST search of MaGSTs against each other on NCBI pBLAST (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi? PAGE= Prote ins). The MaGSTs exhibiting sequence similarity > 80% were assumed as duplicated genes (Kong et al. 2013). The pair of homologous genes within 100-kb regions on the same chromosome was considered as tandem duplicated (TD), while those located beyond the 100-kb region or different chromosomal localizations were designated as segmental duplicated (SD) genes (Holub 2001). The estimation of synonymous rate (dS), non-synonymous rate (dN), and evolutionary constraint (dN/dS) between the duplicated MaGST gene pairs were analyzed using the PAL2NAL online tool (https:// bio. tools/ pal2n al) (Suyama et al. 2006) using their protein sequence alignment performed on Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/) and their respective mRNA sequences. The mode of selection between duplicated genes was identified through the dN/dS ratio. The values > 1, = 1, and < 1 were considered as positive, neutral, and purifying selections, respectively. The divergence time (T; million years ago (Mya)) of each duplicated gene pair was calculated using the following formula: T = dS/2λ, where T is divergence time, dS is the number of synonymous substitutions per site, and λ is the fixed rate of 6.5 × 10 −9 synonymous substitutions per site per year for monocotyledonous plants (Koch et al. 2000).

Conserved motif and gene structure analyses
The amino acid sequences of Arabidopsis, rice, and Musa acuminata GSTs were used for conserved motif analysis. The conserved motifs of MaGSTs were identified using the Multiple Em for Motif Elicitation (MEME) program (http:// meme-suite. org/) (Bailey et al. 2009). The parameters used for the analysis were 15 as the motif number and 6-50 as the motif width. The results were visualized with TBtools. The exon/intron organization of MaGSTs was analyzed by the online available tool Gene Structure Display Server (GSDS) 2.0 (https:// gsds. cbi. pku. edu. cn/) (Hu et al. 2015a, b) using their corresponding CDS sequences and genomic sequences, retrieved from the Banana Genome Hub (https:// banana-genome-hub. south green. fr/ downl oad).

Protein sequence alignments of MaGST protein and prediction of catalytic residue position
The MaGST protein sequences were aligned with the protein sequences of A. thaliana, O. sativa, and G. max using Clustal Omega (Sievers et al. 2011). The protein alignments were then visualized through ESPript 3.0 (http:// espri pt. ibcp. fr/ ESPri pt/ cgi-bin/ ESPri pt. cgi) (Robert and Gouet 2014). The signature sequences and the conserved catalytic residues of MaGSTs were highlighted on the alignments.

Promoter analysis for cis-acting regulatory elements in MaGST genes
Two-kilobase pair promoter regions upstream of the transcription start site (ATG) on MaGST genomic DNA sequences were extracted from JBrowse of M. acuminata DH Pahang v.4 using their locus ID to analyze cis-acting regulatory elements. The extracted promoter sequences were analyzed online through PlantCARE software (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/) (Lescot et al. 2002) to identify diverse hormone-responsive, stressresponsive, and cellular development-responsive elements.

Expression profiling of MaGSTs using RNA-seq data
To explore the basal gene expression patterns of the MaGST genes under fruit development condition, RNA-seq data of 45 genes of M. acuminata DH Pahang (AAA group) was retrieved from Expression Atlas (https:// www. ebi. ac. uk/ gxa/ exper iments). The GST gene expression data from endocarp tissue at 0 day, 20 days, and 80 days after flowering was analyzed to elucidate the importance of MaGST genes in fruit development. The heatmap was drawn by the TBtools software based on the transcripts per kilobase million (TPM) values of 45 MaGST genes (Chen et al. 2018).

M. acuminata genome has 62 GST genes that belong to nine canonical GST classes
With the pBLAST search for the M. acuminata DH Pahang v.4 genome at the Banana Genome Hub, a total of 62 GST genes were obtained. The identified GST protein sequences were validated for the presence of conserved N-and C-terminal domains through NCBI CD search, Pfam search, and SMART database search. Out of 62 GSTs, 9 MaGST proteins possessed only the N-terminal domain and one contained only the C-terminal domain. Fifty-one full-length banana GST genes containing both the domains were found. The identified MaGSTs belong to nine established classes, namely tau, phi, theta, zeta, lambda, DHAR, EF1G, GHR, and TCHQD. Elongation factor 1-gamma possesses an additional EF1G domain with the canonical N-and C-terminal domains (Fig. 1). Plant-specific tau and phi GSTs were highest in numbers 29 and 12, respectively, in the banana genome (Table 1). The nomenclature of banana GST protein was done as MaGSTs by taking the prefix Ma from M. acuminata, as proposed by Dixon and Edwards (2010) for A. thaliana. All the classes of MaGSTs were named MaGSTU, MaGSTF, MaGSTT, MaGSTZ, MaGSTL, MaD-HAR, MaGHR, MaEF1G, and MaTCHQD. The numbering for each member of the class was done based on their chromosomal localization in the ascending order. All the 62 MaGST genes were further subjected to bioinformatics characterization.

Sixty-two MaGST genes are clustered on 11 banana chromosomes and evolutionarily conserved, and tandem duplication was the driving force for MaGST gene family expansion
On the basis of the M. acuminata DH Pahang v.4 annotation, 62 MaGST genes were assigned to the eleven chromosomes ranging from 2 (Chr7) to 13 (Chr9). Chromosome 11 contains the highest thirteen MaGST genes, followed by Chr1 and Chr5 having nine MaGST genes each. The pattern of chromosomal allocation of GSTs was also noticeable, i.e., mainly on the proximal or distal end of the banana chromosome as depicted in Fig. 2. A phylogenetic tree was constructed using M. acuminata, Arabidopsis, rice, soybean (angiosperm), P. patens (a bryophyte), and L. kaempferi (a gymnosperm) GST protein sequences. The different GST classes branched out into their individual clades, with the members of each class clustering together. Two major clades are of plant-specific tau and phi GSTs, which made two superclades under which small clades were noticed (Fig. 3). The individual class of GST from different plants that belongs to separate subgroups of the plant kingdom was clustered together and was indicative of their divergent evolution from a common ancestor. Phylogenetic analysis of MaGSTs with angiosperm (Arabidopsis and rice), gymnosperm (L. kaempferi), and bryophyte (P. patens) was carried out. The outcome revealed that the evolution of plant GSTs might be earlier than their division into individual groups such as bryophyte, pteridophyte, gymnosperm, and angiosperm and also that each GST class had diverged prior to the division of monocot and dicot. Additionally, the numbers of each class of GSTs expanded in a speciesspecific manner independently and irrespective of their genome size. Additionally, the gene pairs under tandem and segmental duplications were close together in a phylogenetic tree showing close relatedness with each other.  To elucidate the gene family expansion in MaGSTs, the duplication mechanism was analyzed. A total of 26 duplication events were noticed in banana GST gene family expansion and evolution. Tandem duplication was found to play a major role as 20 gene pairs were involved in the tandem duplication event, creating 20 gene clusters on Chr1, Chr5, Chr8, Chr9, and Chr10. Among them, 18 duplicated gene pairs of tau class genes were highest in number, followed by MaGSTZ2/ MaGSTZ3 and MaDHAR3/MaDHAR4 gene pairs that were also tandem duplicated gene pairs. Five gene pairs composed of MaGSTU9/MaGSTU15, MaGSTU9/MaGSTU16, MaG-STU12/MaGSTU14, MaGSTF1/MaGSTF10, MaGSTL2/ MaGSTL4, and MaTCHQD1/MaTCHQD2 genes were segmental duplicated. The duplication event majorly occurred on banana Chr1, Chr5, Chr8, Chr9, and Chr10. Moreover, the dN/dS values of duplicated genes were calculated and found to be less than 1, which is an indicative of purifying selection. Exceptionally, the value of dN/dS for the gene pair MaGSTU22/MaGSTU20 was 1, showing a positive selection. Finally, the divergence time was also calculated for these duplicated genes. The estimated divergence time of these gene pairs was approximately 1.87 ~ 145.41 Mya (Table 2).

MaGST proteins are highly stable, hydrophilic, and found majorly in the cytoplasm
The protein length of MaGSTs ranged from 110 (MaG-STU22) to 619 (MaEF1G3) with their corresponding molecular weights of 12.54 kDa to 69.69 kDa, respectively. The isoelectric point (pI) ranged from 4.84 (MaGSTU19) to 9.32 (MaGSTF3). Among 62 MaGSTs, 14 MaGSTs were basic and 48 MaGSTs were acidic in nature. The grand average of hydropathicity values of most of the MaGST proteins of all the classes were negative, indicating that all the MaGST proteins were hydrophilic, having good interaction with water. The aliphatic index of MaGSTs ranged from 58.71 (MaGSTF11) to 108.54 (MaGSTU15). Most of the MaGSTs have an AI value below 100, and hence, they are hydrophilic in nature. The subcellular localization prediction results showed that MaGSTs were centrally localized in the cytoplasm followed by chloroplast, nucleus, mitochondria, plasma membrane, and extracellular (Table 1, Fig. 4).

MaGST proteins are characterized by the presence of many class-specific motifs, and gene architecture among tau, phi, zeta, lambda, and DHAR classes is highly conserved
The conserved motif analysis identified many class-specific motifs, and few motifs were found to be distributed among all the GST classes. The tau class MaGSTs possessed the highest number of eight motifs, i.e., motifs 1, 2, 3, 4, 5, 7, 8, and 13, whereas MaDHAR possessed the least number of motifs, i.e., 4 and 6. Motif 4 was found to be distributed across all the MaGSTs except MaEF1G members. Similarly, motif 1 was also present in all the MaGSTs, omitting MaDHAR and MaGHR members. Motif 11 was found only among MaGSTF members. Motifs 14 and 15 were found in MaGSTLs. Motif 12 was found only in the MaGSTZ and MaEF1G classes. Motif 10 was found in MaGSTF, MaG-STZ, and MaEF1G. Motif 6 was found in all the MaGST classes except tau and lambda. Motif 3 was found in MaG-STU, MaGSTT, and MaGSTZ members (Fig. 5).
MaGST genes possess two to ten exons. All the MaGSTU possess two exons/one intron except MaGSTU6/MaGSTU8/ MaGSTU15/MaGSTU20/MaGSTU24/MaGSTU25. Except MaGSTF1/MaGSTF4/MaGSTF5, all the phi members contain three exons/two introns. All the genes in the zeta members possessed nine exons, and DHAR members possessed six exons except MaDHAR2, which has five exons. MaGSTT1 and MaGSTT2 have nine and seven exons in its gene architecture, respectively. In the lambda class, MaGSTL1/MaGSTL2/ MaGSTL3 possessed nine exons whereas MaGSTL4 possessed eight exons. Two MaGHR genes possessed two and seven exons, respectively, whereas three MaEF1G genes contained five, six, and nine exons correspondingly. The numbers of exons were highly variable in the MaEF1G and MaGHR classes. Both the MaTCHQD genes possessed two exons (Fig. 6). The conservation in the number of exons can be correlated with the expansion of the MaGST gene family.

Ser and Cys catalytic residues are highly conserved in MaGST classes
Multiple sequence alignment was performed by taking the GST protein sequences of M. acuminata, A. thaliana, G. max, and O. sativa to identify the conserved residues and catalytic residues among different GST classes. The position of catalytic residues and their signature sequences have been depicted in Fig. 7. The position of active site residue varied among the classes. Tau and phi GSTs possessed Ser active site residue at positions 17 and 12, respectively, whereas theta and zeta hold active site Ser residue at positions 13 and 41. Lambda and DHAR contained active site Cys residue at position 20. The GHR also possessed Cys active site residue at position 46.

MaGSTs are predominantly composed of α-helices
The secondary structure of plant GSTs is characterized by the dominance of alpha helix followed by coil, betastrand, and beta-turns. The percent of alpha helix was highest in all the MaGSTs especially tau proteins, except MaGSTF4/MaGSTF11/MaGSTZ4/MaGSTL1 and MaD-HAR2, in which the percent of coil was highest. The MaGSTU11 contained the 61.92 alpha helix which was highest among all the MaGSTs. Both the MaGHRs also possessed the highest percent of coil than alpha helix ( Fig. 8; Table S1).

Phosphorylation, as a major post-translational modification in MaGSTs
Post-translational modification plays an important role in protein structural modification and its functioning. In the post-translational alteration prediction in banana GSTs, 18 out of 62 MaGSTs possessed glycosylation sites (Table S2). The phosphorylation prediction study revealed that Thr and Ser residues are highly phosphorylated, accounting for 48.54% and 47.43%, respectively, followed by Tyr, which is 27.86% ( Fig. 9; Table S3).

Five types of functional cis-regulatory elements are present in the promoter region of MaGST genes
Cis-acting regulatory elements (CAREs) are found in the promoter region of target genes. They are basically short motifs of 5-20-bp-length non-coding DNA that binds the transcription factors and regulates the gene transcription. In addition to four core elements (AT ~ TATA box, CAAT box, TATA box, TATA), the current study identified 36 cis-elements categorized on the basis of its importance in plant physiology, i.e., light-responsive element, hormone-responsive element, stress-responsive element, cellular development-related elements, and other elements. The light-responsive elements were highest in number followed by stress-and hormoneresponsive elements (Fig. 10). MaGSTU25 possessed the highest number of 57 cis-elements whereas MaG-STU7 possessed only 6 cis-elements in its promoter region. MYB and MYC cis-elements were highest in numbers 140 and 178, respectively. After that, abscisic acid-responsive elements were the highest in number, accounting for 127. The high number of stress-responsive elements (MYB and MYC) in the promoter region can be associated with their role in combating against diverse stress in banana plants by upregulating the MaGST transcripts and total enzyme activity.

RNA expression profiling of MaGSTs showed most of the MaGST genes get upregulated during fruit development stage
To predict the role of MaGSTs in fruit development, the expression level of 54 MaGST genes expressed in endocarp tissue was analyzed for 0 day, 20 days, and 80 days after flowering (DAF) based on its RNA-seq data. Among the banana GST tau members, the MaGSTU1 was highly upregulated initially, then downregulated at 20 DAF and then upregulated at 80 DAF and could be involved in fruit ripening, and at the phase of ripening, the activity of the ethylene signal transduction pathway is also significantly increased (Dhar et al. 2019). Likewise, the expression of MaG-STU12, MaGSTU14, and MaGSTU26 was also upregulated at 80 DAF and they might be involved in fruit ripening. MaG-STU15, MaGSTU24, MaGSTU25, and MaGSTU29 were found   % of MaGSTs subcellular localizaƟon fruit development. MaGSTF2 and MaGSTF12 were found to be downregulated from 0 to 80 DAF whereas MaGSTF3, MaGSTF4, MaGSTF6, MaGSTF7, and MaGSTF8 genes were upregulated at 20 DAF. MaGSTF5 was upregulated at 80 DAF during fruit development. Gene expression levels of MaGSTZ were found to be variable; i.e., MaGSTZ2 was upregulated from 0 to 20 DAF and then downregulated at 80 DAF whereas MaGSTZ3 showed a low level of expression at 0 to 80 DAF. The MaGSTZ4 showed increased expression at 0 DAF and then downregulated from 20 to 80 DAF. MaGSTT2 showed a lower level of expression in all the fruit developmental stages, whereas MaGSTT1 was found to be upregulated at the 0 DAF and 20 DAF and expressed highly at 80 DAF. MaGSTL1 showed a low expression level whereas MaG-STL3 and MaGSTL4 were highly expressed. All the MaGSTL, MaGHR, and MaEF1G3 genes showed high levels of expression at 20 DAF. The expression level of MaDHAR2 and MaDHAR3 was high at 0 DAF and then low at 80 DAF whereas MaDHAR4 was highly expressed at 20 DAF (Figs. 11 and 12).

Genome-wide MaGST gene identification analysis and their physicochemical characterization
The GST gene family is well characterized and is involved in diverse plant activities. It is functionally a versatile protein family involved in plant growth and development, signal transduction pathways, retrograde signaling, biotic and abiotic stress management, tetrapyrrole signaling, hormone signaling, Fig. 6 Gene structure analyses of banana MaGSTs. An exon-intron structure analysis was done using the GSDS tool etc. To date, genome-wide identification and characterization of the GST gene family has been performed on a variety of plants like Hami melon (Song et al. 2021), melon , radish (Gao et al. 2020), and Medicago (Hasan et al. 2021). The comprehensive genome-wide identification of a gene family is precise and more significant using the wholegenome information. In the current study, genome-wide search of the GST gene family in banana led to the identification of 62  (Lan et al. 2009), Capsicum (Islam et al. 2019), and Paeonia suffruticosa (Han et al. 2022). Like rice, barley, sweet potato, tomato, mung bean, soybean, and various other crops, the numbers of tau and phi GST genes were the highest, accounting for 29 and 12, respectively. The high number of tau and phi GST genes reflects their functional importance in plant growth and development.
Tau and phi GSTs are also coupled with the plant response to various abiotic and biotic stresses. The pI values of GSTs were in the range from 4.84 to 9.32. Among 62 MaGST proteins, 48 MaGSTs were acidic, having a pI value less than 7, and 14 MaGSTs were basic in nature, having a pI value above 7. In a report by Mohanta et al. (2019), plant proteomes from 145 species revealed a pI range of 1.99 (epsin) to 13.96 (hypothetical protein) and the molecular mass of the plant proteins varied from 0.54 to 2236.8 kDa. The molecular weight and isoelectric point of a plant protein play a significant role in protein biochemical functioning, and hence, it is important to know these physicochemical features of a protein in detail. The GRAVY of MaGSTs showed them highly stable proteins as the values were highly negative, indicating that all the MaGST proteins were hydrophilic (González-Faune et al. 2021). The higher aliphatic index is also related to the thermal stability of proteins due to the occurrence of aliphatic amino acids. These results indicate the higher thermostability of the abovementioned MaGSTs (Hasan et al. 2021). The subcellular localization prediction of 62 MaGSTs showed the dominance of these proteins in the cytoplasm, which indicates that these are soluble proteins followed by the chloroplast, the nucleus, and mitochondria. The subcellular localization of a protein is directly related to its involvement in the biological process in a cell. Hence, it is important to spot a protein to better understand its role at the cellular level (Glory and Murphy 2007). In a report, with the combination of C-terminal GFP fusion technique and confocal microscopy for visualizing the fusions in Nicotiana benthamiana, the subcellular localization of 16 P. patens GSTs out of 21 PpGST proteins was confirmed to be cytosolic and nuclear (Liu et al. 2013). Likewise, GSTs were also reported to be found in other cellular compartments such as mitochondria, chloroplast, and endoplasmic reticulum and also in the plasma membrane (Lallement et al. 2014a). Under oxidative stress or for ROS and xenobiotic detoxification, GST uses GSH to convert it into glutathione disulfide (GSSG) and this GSSG is renewed into GSH by glutathione reductase (Hasanuzzaman et al. 2017). In this way, GSTs maintain GSH:GSSG ratios in mitochondria because a high concentration of glutathione has been observed in this cellular compartment (Zechmann et al.

Evolutionary analysis of MaGST genes
Gene family expansion majorly depends upon gene duplication events, transposition, or splicing. Segmental and tandem duplications are two major duplication types. In the current study, tandem duplication played a driving force for MaGST gene family expansion and, among them, MaGST tau genes were majorly duplicated and this class contributed more to MaGST gene family expansion. This might be due to the high number of tau genes and their major role in detoxification of reactive oxygen species, providing stress tolerance to plants against different stress stimuli . It can also be inferred that the tau GSTs of the banana were the earliest than other MaGST gene members. Twenty-eight gene pairs in melon, 9 gene pairs in Hami melon, and 11 gene pairs in apple were also found to have a tandem duplication pattern. In a report by Flagel et al. (2009), polyploidy can be a leading provider of duplicate genes by means of diverse duplication mechanisms and a source of evolutionary uniqueness in plant genomes. The ratio of non-synonymous Fig. 10 Cis-acting regulatory elements predicted in the upstream promoter region of predicted MaGSTs. The scale represents the number of particular elements in the corresponding genes. Gray color is indicative of the absence of CAREs and synonymous substitutions (Ka/Ks) for duplicated MaGST genes was found to be less than 1, signifying that the duplicated genes were under purifying selection pressure, which indicates the removal of deleterious duplications increases the fixation possibility of novel duplicated genes (Tanaka et al. 2009). Interestingly, the entire duplicated MaGST genes were primarily under strong purifying selection, like Gossypium species (Dong et al. 2016) and tomato GST genes (Islam et al. 2017). Phylogenetic analysis showed the numbers of each class of GSTs expanded in a species-specific manner independently and irrespective of their genome size (Hasan et al. 2021).

Banana GST gene structure analysis
The gene structure of the GST gene family is highly conserved among different plant species. There was conservation in exon numbers among common phylogenetic classes, and different exon numbers can be correlated with different evolutionary patterns (Wang et al. 2019a, b). There are 2 exons in tau and TCHQD class, 3 in phi class, 9 in zeta class, 7 to 9 in theta, 6 in DHAR, and 8-9 in lambda class. All the MaGST genes of tau, phi, theta, zeta, lambda, and DHAR classes contained a prominent number of exons exhibiting their evolutionary conservation. The same numbers were also reported in wheat, radish, mung bean, apple, and melon GSTs. In a report, it is mentioned that the lower number of introns responds faster to stress stimuli; hence, tau and phi containing a less number of introns respond quickly to any stress (Jeffares et al. 2008).

Conserved motif analysis
MEME analyses identified class-specific motifs and motifs that were found in many MaGST classes. The diversity in the occurrence of motifs in individual classes was probably due to the GSTs' common functions like plant growth and development, stress management, as well as a few different roles in tyrosine metabolism and ascorbate metabolism (Vaish et al. 2020). Well-conserved signature motifs were identified in MaGSTs; i.e., W(A/V)S(P/M) in tau, (E/Q)SR(A/K/G)I in phi, SQPS/C in theta, SSCS/A in zeta, CPF/YA in lambda, CPFC/S in DHAR, and CPWA in GHR (Vaish et al. 2020) were also reported in tomato (Islam et al. 2017), Capsicum (Islam et al. 2019), Chinese cabbage (Du et al. 2018), and wheat (Wang et al. 2019a, b). The presence of these signature motifs in the individual classes of MaGSTs clearly validates them as GST proteins and their involvement in diverse plant functions.

Cis-acting regulatory element analysis in the promoter region of banana GSTs
The promoter analyses revealed different CAREs that were related to hormonal, cellular, stress, and light response functions (Kaur et al. 2017). In addition to these elements, core promoters (AT ~ TATA box, CAAT box, TATA box, TATA) (Rahman et al. 2021) were also found. The light-responsive elements were majorly found in the promoter region of all the 62 MaGST genes. Photosynthetic reactions take place in the leaves in response to light. Hence, this can also be correlated with the association of GST protein in photosynthesis. In a report by Gallé et al. (2019), the function and expression of GSTs and the level of its substrate, i.e., GSH, also depend on the quality and intensity of light. Due to the significant role of light in GST's activity and its expression, probably the numbers of light-responsive elements were higher. Plant growth regulators play an important role in plant growth and development, seed germination, fruit development, etc. Many regulatory elements, such as ERE motif, GARE motif, CGTCA element, TGACG element, ABRE, TGA element, Aux RR core, O2 site, and P box, were found in the promoter region of most of the MaGSTs that can be responsive to diverse hormones like ethylene, gibberellins, salicylic acid, methyl jasmonate, abscisic acid, and auxin. All these plant hormones play an important role in a range of plant metabolisms. The presence of different defense and stress-responsive elements like STRE, DRE, LTR, MBS, W box, WRE3, WUN motif, as-1, and TC-rich repeats confirmed the role of MaGSTs providing resistance against many biotic and abiotic stresses (Kaur et al. 2017). Additionally, for circadian control, circadian elements were also found in many MaGST genes. Although little information is available regarding the role of this element in plant metabolism, the study by Alderete et al. (2018) on tobacco seedlings, the NtGST gene (phi), was analyzed for putative circadian regulation. The results of the study exhibited diurnal regulation with increased expression at the end of the light phase, with transcript levels decreasing in the dark period. Collectively, cis-acting regulatory elements are essential for driving the functioning of a protein efficiently.

Post-translational modification in MaGSTs
Post-translational modification (PTM) is a common event that plays an important role in protein functioning, and phosphorylation is common among them. Likewise, MaGSTs were also predicted for glycosylation and phosphorylation. Phosphorylation was the major post-translational modification in MaGSTs. Serine is the most common residue for phosphorylation, followed by tyrosine and threonine. Phosphorylation can modulate enzyme activity, and it is also reported that phosphorylation is an essential controller of Fig. 11 Expression profiling of MaGST genes based on RNA-seq data at fruit development stage. The heat map was generated by TBtools software. The scale represents a signal intensity of TPM values (converted in log 10 ) ◂ controllers. Casein kinase II, protein kinase C, and tyrosine kinases are involved in the regulation of tau, phi, and zeta GSTs. In a report by Puglisi et al. (2013), GSTs undergo post-transcriptional regulation mechanisms (phosphorylation) that are correlated with GST gene expression and covalent, post-translational regulation of the enzymatic activity.

Expression profiling of banana GST under fruit development
The expression profiling of GST genes has been done in many plant species under various developmental stages in different plant parts like leaves, roots, pericarp, endocarp,  (Dixon and Edwards 2010;Jain et al. 2010;Islam et al. 2017). In the current study, the MaGSTs were found to be involved in fruit development during all the stages. Hence, MaGSTU1 was found to be highly upregulated and can be a good candidate gene for molecular characterization.

Conclusion
Conclusively, in silico identification and characterization of the GST gene family in banana led to the identification of 62 GST genes. The identified MaGST genes can potentially be used for molecular and functional characterization in this agriculturally important crop. The individual MaGST genes can be cloned and characterized, and their expression can be studied in different tissues under normal developmental and diverse stress conditions. The results of the current study can be utilized for banana plant breeding programs for developing high-yielding and/or stresstolerant varieties.