Phylogenetic analysis of mate genes in wheat
Studies have revealed that the phylogenetic analyses of membrane transporters were generally inaccurate to condemn specific substrates (Santos et al.,2017). However, phylogeny of the MATE family has been represented relatively useful to predict the affinities with potential molecule groups, such as organic acids (citrate), alkaloids (nicotine), and flavonoids (anthocyanin, proanthocyanidin etc.) (Santos et al.,2017). Multiple sequence alignments of MATE protein sequences were generally carried out by using ClustalX 2.1 software with its default settings (Peng et al.,2012). From that we employed Clustal-Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and phylogeny.fr:TreeDyn (http://www.phylogeny.fr/one_task.cgi?task_type=treedyn) tools to construct a multiple sequence alignment of the identified 44 full-length protein sequences. The complete multiple alignment profiles of protein and nucleotide sequences were used to establish a phylogenetic tree. The Neighbour-joining (N-J) tree topography has been employed to study the MATE gene family (Fig. 2.). Here, we have estimated seven major groups or families of MATE genes by increasing the scale of 0.1 to 0.12 and categorised the results in Table 1.
From the earlier study by Liu et al., in 2016, it was confirmed through the genome wide analysis of MATE genes in soybean that 117 MATE genes could be classified into four primary clades or families by using MEGA 6.0 tool and a maximum likelihood (ML) tree was constructed. In maize, a total of 49 recognized ZmMATE genes were found and grouped into seven clusters in the form of Neighbour-joining tree by using Mega5.0 software (Zhu et al.,2016). Similarly, to study the comparative evolutionary history as well as relationship of MATE gene family among two species of cotton and Arabidopsis, total 196 putative MATE genes were analysed, out of which 68 genes were of G. arboreum (GaMATE), 70 genes of G. raimondii (GrMATE) and 58 genes of Arabidopsis. Then, based on phylogenetic classification, these MATE genes were categorised into three subfamilies by using NJ method of MEGA6.0 software, where M1 subfamily was the largest one having 124 genes, followed by M2 having 48 genes M3 having 24 genes (Lu et al.,2018). From this, it may be concluded that the classification implemented in this study could be accepted like prior findings.
Further, the multiple sequence alignment of 44 genes were represented where variation at the nucleotide level has been exhibited (additional file 3). The nucleotide sequence alignment was performed by using Multalin database (http://multalin.toulouse.inra.fr/multalin/) where 44 MATE genes have been displayed in three colours which were red, blue and black signifying high consensus, low consensus and neutral colours respectively. Here, we noticed deletion regions at several places in the nucleotide sequence alignments which were highlighted in the Fig. 3. These evolutinary footprints are of immense importance for understanding the basic structure of wheat genes. Similarly, according to Ma et al.,2011, in rice and Arabidopsis, the multiple peptide sequence alignment of the plastocyanin-like domains (PCLDs) determined the conserved amino acids involved in copper binding. In rice, all the MAPKKKs (Mitogen activated protein kinase kinase kinase) genes, exclusively involved in signal transduction pathways, were classified under Raf, ZIK and MEKK subfamilies and thus the analysis of rice MAPKKKs along with that of Arabidopsis for all the Raf, ZIK and MEKK subfamilies was carried out by creating multiple sequence alignments of kinase domains through Multalin program in order to detect specific conserved signature sequences. It was revealed that in subfamily Raf, 43 MAPKKKs in rice and 48 in Arabidopsis, in subfamily ZIK, about 10 MAPKKKs in rice were identified whereas in MEKK, 22 MAPKKKs from rice and 21 from Arabidopsis were found to be conserved (Rao et al.,2010). Also, in blueberry (Vaccinium corymbosum), out of total 33 MATE genes, the multiple sequence alignments of 08 novel VcMATE protein sequences along with the selected MATE transporter orthologs were analysed using ClustalX software. Here, it was found that VcMATE 2 shared highest level of identity with the known flavonoid transporters while VcMATE 1 and VcMATE 4 exhibited lowest similarities to the MATE-type flavonoid transporters (Chen et al.,2015).
It has been documented that, the cells communicate with each other through protein-protein interactions and perform all the physiological processes of life through interactions of various proteins (Szklarczyk et al.,2017). We have constructed the association network of protein interactions among 43 genes belonging to Triticum aestivum out of total 44 protein sequences (remaining 1 protein sequence i.e, A0A3B6AXC7 comes under Triticum utaru and didn’t participate in the interaction) by using STRING online program. The primary interaction unit in STRING is mainly the ‘functional association’, and a link between two proteins that contribute mutually to a specific biological function (Szklarczyk et al.,2017). Here, the network (Fig. 4) has been stretched by an additional 20 proteins (through MORE button in STRING Interface) so as to get an extra clear image of the interaction, and the confidence cut-off for screening interaction links has been set to “medium confidence” at 0.4.
This (Fig. 5) determines the results retrieved after entering the set of 43 protein sequences projected to be involved in efflux of toxic multidrug compounds from the cells and tissues of the wheat plant. The network statistics of the set of proteins identified in functional subsystems revealed that number of nodes were 53 with average node degree being 1.55 and PPI enrichment p-value at 3.9e-12. Further, we have also noticed that the protein sequences from the same phylogenetic group were interacted closely and clustered together in the association network.
Similar studies using this approach was seen in Brassica rapa, where proteins involved in biosynthesis of camalexin (involved in resistance against Botrytis cinerea and Alternaria brassicae) were found to be interacted through functional proteins association networks and analysed using STRING database version 10.5. Here, a phytoalexin deficient 3 (PAD3) gene was identified as a key functional node along with CYP71A12 gene as a potential functional partner with which all other multi proteins were found to be associated (Gaur et al.,2018). In case of in silico analysis of functional linkage among arsenic induced MATE genes in rice by Seth et al.,2019, it was found that 37 MATE genes were found to be interacting at the protein. Also, in rice (Oryza sativa ssp. japonica), around 30 genes were identified from AbS (Abiotic Stress) responsive gene family, involved in stress responsive signalling during various abiotic conditions like drought, submergence, cold, salinity, metal toxicity etc., and out of these 30 seed –proteins, 22 genes were found to be extensively involved in protein protein interaction network along with the extra 34 derived neighbours by using String software, showing closely related functional modules and complexity of AbS (Muthuramalingam et al.,2017).
STRUCTURAL ANALYSIS OF MATE GENES IN WHEAT
The phases and structures of exons / introns of the MATE genes was examined by using EnsemblPlant software (https://plants.ensembl.org/Triticum_aestivum/Transcript/Summary). This analysis provides more insight into evolution of gene structures in wheat which provides detailed information regarding the transcript and translation length, number of coding exons, amino acids and the base pairs along with the transcript diagram. There were 44 exon intron transcripts that have been identified (Additional file 4).The maximum number of exons present in the following gene sequences were 14 which exist in genes TraesCS6A02G418800.1 and TraesCS6D02G407900.1 while the least number of exons that is only 1 was available in the genes sequences viz., TraesCS1A02G188100.1.cds1,TraesCS5B02G562500.1.cds1, TraesCS6A02G256400.1.cds1 and TraesCS6D02G384300.1.cds1.
Fan et al.,2014, suggested that on the basis of phylogenetic tree analysis, MATE genes were mainly grouped in to three subfamilies and the intron-exon structures were subfamily-specific, indicating the cotton MATE genes were considerably conserved and functionally diversified.
Generally protein structures have conserved elements called motifs, which have a sufficient influence on the function of proteins (Conklin.,1995). The function of proteins usually imposes tight constraints on the evolution of specific regions of protein structure residues directly or indirectly in a function and often clustered in a short sequence motif (signature, pattern, framework or fingerprint) that is conserved across the various proteins sharing that function (Manning et al.,1998).The online software Multiple EM for Motif Elicitation (MEME) was employed to analyse the motifs in MATE proteins (Bailey et al., 2015). These conservative protein motifs in wheat MATE protein gene sequences were identified and predicted using MEME tool where the maximum number of motifs was set at 22. These were then arranged according to their four families. The motifs of wheat MATE protein were shown as coloured boxes, each motif represented as a number in the coloured box. They were listed according to the families 1 to 7 from the phylogenetic tree (Additional file 5). Zhu et al.,2016, observed that, usually, it is perceived that most of the closely related members within the same family were having common motif compositions, indicating their functional similarities. In this case, a total maximum of 22 conserved motifs were identified and represented as the different coloured boxes as symbols for different motif consensus. The types and sequences of the protein motifs among the families 1, 2 and 3 were significantly different from the rest. Further, we have noticed that the family 3 and 5 were having 19 and 14 number of protein motifs respectively while the rest have 22. Here, it may be concluded that the interacting MATE genes or protein sequences within a family are also having similar protein motifs.
An early study had defined that motifs were the short DNA or protein sequence which contribute towards the biological functions of the sequences in which they resides where they become one of the basic functional units of molecular evolution (Grant et al.,2011). Similar work has been done by Liu et al.,2016, in genome-wide analysis of MATE transporters in soybean where they have analysed maximum 12 conserved motifs in which identical type of motif sequences were present in the first three families and significantly different were in the fourth family with very less number of motifs by using MEME software.
IN SILICO EXPRESSION ANALYSIS AND THEIR HOMOEOLOGOUS CANDIDATES
Wheat is having MATE transporter proteins which are responsible for controlling different expressions and functions during vegetative growth, reproductive development, senescence as well as resistance to biotic and abiotic stresses similar to the other plants. The possible roles of 44 homologous MATE genes in plant growth and development in wheat has been constructed by heatmaps using wheat expression browser (http://www.wheat-expression.com/). The MATE genes showed their highest level of transcript accumulation in leaves, roots, shoots, flowers, grain, spikes etc. as shown in the Fig. 5 and thus indicated that they were involved in the development of all tissues or organs under normal conditions. The heatmaps with constitutive gene expressions of 44 MATE genes as in Fig. 6.were generated under different biotic and abiotic stress conditions taking into account such as-
Phosphate starvation in roots and shoots
Heat and drought stress time course in seedlings
Spikes with water stress
Fusarium head blight infected spikelets
Stripe rust infected seedlings
Septoria tritici infected seedlings
Powdery mildew time course of infection in seedlings
PAMP inoculation of seedlings
Time course of spikelets inoculated with Fusarium head blight/ABA/GA
Coleoptile infected with Fusarium pseudograminearum (crown rot)
Zymoseptoria tritici infected seedlings
Seedlings with peg (polyethylene glycol) to manage stress
Shoots from NILs segregating for crown rot resistance
The degree of expressions of genes in the tissues at a particular stage was depicted by the extent of their intensities which was expressed through expression units – tpm log2 (transcripts per million).
The heat maps of gene expressions of these 7 families were obtained where all the genes were displayed according to their phylogenetic associations (Additional file 6). Family 1 contained two genes with constitutive, high expressions which were TraesCS5B02G326600.1 and TraesCS4A02G245300.1 with tpm values 6.99 and 5.4 out of 7 Log₂ (tpm) which were highly expressed in fifth leaf blade, spikes, rachis, anthers for disease infestation such as Fusarium head blight and stripe rust respectively. While the genes TraesCS3B02G563400.1 and TraesCS5B02G245500.1 had lowest expression in the group as they were showing least or even no expression in high as well as intermediate level of biotic and abiotic stress conditions. Similarly, the family 2 contains five genes showing higher expressions in mainly reproductive stages at spikes, rachis and during Fusarium infestation. These were TraesCS2B02G296000.1, TraesCS2D02G277400.1, TraesCS5D02G378200.1, TraesCS5D02G378300.1 and TraesCS2B02G296100.1 having tpm values 6.26, 6.35, 5.35, 6.45 and 5.13 respectively. The negligible expressions were displayed by the gene TraesCS5D02G413800.1. In family 4, two members out of three were constitutively expressed, although at mid to low levels in most of the tissues such as grain, spike, leaves and other shoot parts mainly at reproductive stage and during disease infestation except the gene TraesCS1A02G188100.1, showing no expression at all. In family 5, only one gene (TraesCS5B02G562500.1) is present which shows maximum degree of expressions at 24 days after sowing during 10th day of phosphorus starvation and other abiotic stress particularly at roots. Here, in family 6, we can notice that out of the two, only TraesCS4B02G244400.1 is highly expressive during reproductive stage at anthers and spikelets with maximum tmp value of 4.52. Unlike other families, all the MATE gene members in family 3 and 7 were constitutively expressed with varying transcriptional intensities with their tmp values ranging between 3 to 4 at both vegetative and reproductive stages in various tissues of the plant during stress conditions.
As all these MATE genes were located in the integral part of the plasma membrane, they were exclusively involved in molecular functions like solute - solute antiporter activity, xenobiotic transmembrane transporter activity and transmembrane transporter activity (Table 5). The results obtained suggested that different MATE genes were showing expressions during various biotic and abiotic stress conditions but majority of the genes were exhibiting expressions during biotic stress conditions due to disease infestations (Table 4), where the overall highest level of expressions has been shown by gene TraesCS5B02G326600.1 belonging to family 1, expressing during disease infestation of Fusarium head blight by Fusarium graminearum after 4 days of inoculation.
Lu et al., in 2018, had analysed the expression profiles of GaMATEs and GrMATEs genes belonging to two species of cotton viz., G. arboreum and G. raimondii. The study was carried out in root tissues in order to examine the expressions levels of genes in roots tissues under abiotic stress conditions of drought, salinity and Cd stress. Out of the total MATE genes, GrMATE54, GrMATE53 and GaMATE21 were found to be highly expressed during these three abiotic stress conditions and were also involved in vacuolar sequestration and toxin effluxers while GrMATE34 and GaMATE54 found significantly expressed during stress conditions were involved in ABA transporting. Similarly, in Soybean, out of 117 MATE genes, expression profiles of 113 MATE genes were constructed through heatmap by using MeV 4.9 software, which were differentially expressed in nine tissues viz., leaf, stem, flower, pod, seed, root, root hair, nodule and shoot apical meristem. These GmMATE genes exhibited tissue specific expressions such as GmMATE107 and GmMATE27 showed highest expression level in roots, root hairs and nodules while least expressions in above ground tissues. Similarly, GmMATE44, GmMATE81 and GmMATE36 showed high level of expressions in pods and developing seed while GmMATE62 and GmMATE7 were expressed in leaf tissues (Liu et al.,2016). In Medicago truncatula, out of total all the MATE genes, UGT78G1, MaT4, MaT5, and MATE2 were found to be expressed in various parts of the plant such as leaves, roots and flower but MATE2 gene had shown highest level expressions in flowers, followed by roots, vegetative buds, leaves, and seeds and was associated with the transport of glycosylated flavonoids. It has been reported that the gene was exclusively involved in the pigmentation of anthocyanin compound and thus lack of this pigment resulted in discolouration in leaves and flowers (Zhao et al.,2011). Tiwari et al.,2014, had also revealed that in genome-wide expression analysis of rice MATE genes, two arsenic responsive genes OsMATE1 and OsMATE2 were taken for functional study in transgenic lines of Arabidopsis, where majority of the expressions were shown in leaf, seed and flower morphology, pattern of rosette arrangement and flowering time. These OsMATEs were found to regulate plant growth and development in transgenic lines but their expressions were showing more susceptibility to the biotic and abiotic stresses as compared to the wild types.
We know that wheat is an allopolyploid having two (tetraploid wheat with two homoeoalleles ) or three (hexaploid wheat with three homoeoalleles ) homologous sub genomes. The homoeoalleles of a gene in polyploid wheat having higher affinity in DNA sequence and function, makes the gene cloning and functional analysis a challenging task (Zhang et al.,2018). The polyploidy that arises from whole genome interspecific hybridisation or duplication is present ubiquitously across the plant and fungal kingdom and thus the existence of extremely related genes in polyploids known as homoeologs has promoted the domestication and adaptation of many major polyploidy crops like hexaploid bread wheat (Triticum aestivum; AABBDD sub genome), cotton, coffee etc., (González et al.,2018).
The factors showing probable reasons for the variation in gene density in wheat chromosomes within homologous and sets could be due to the variation in genetic density or number of loci is the evolutionary history of the specific chromosome genomes, variation in individual size of chromosome and the structural aberrations comprising unequal exchanges of genetic material within and among chromosomes (Qi et al.,2004). By understanding in what way these homoeologous genes interactions effect the gene expressions, will ultimately help to build strategies so as to improve the crops by targeting and manipulating individual or multiple homoeologs to quantitatively modulate trait responses (Borrill et al.,2015).
All the possible homologous genes for the 44 MATE genes were found with help of EnsemblPlant database and displayed in the form of ternary plots through wheat expression browser software and listed in table 3. Here, we have found that the ternary plot shows two homologous genes from different species like Azhumaya wheat (TraesCS1A02G188100.1) and Chinese spring wheat (TraesCS1B02G195900.1) of Triticum aestivum (TraesCS1D02G188200.1) as shown in Fig. 7.and their level of expressions indicates their transporting roles of alkaloids in tissues such as leaves, roots, rachis, spikes, coleoptiles etc. in different biotic and abiotic stress conditions like stripe rust, powdery mildew, heat and cold stress etc.
Similarly, from Table 4 and Additional file 7, we can analyze that the homoeologous genes of the identified wheat MATE genes TraesCS1A02G305200.1, TraesCS2B02G247700, TraesCS2D02G277400.1, TraesCS3B02G298700.1, TraesCS4B02G244400.1, TraesCS5B02G326600.1 and TraesCS2B02G296000.1 were exhibiting low to medium level of expressions of Abiotic stress in case of High level stress-disease, ranging from 20–50% where the maximum expression was displayed by the homoeologous gene TraesCS3A02G265100.1 i.e., 63.62% of the MATE gene TraesCS3B02G298700.1, while the in case of biotic stress like disease under High level stress-disease, the maximum expressions were shown by the gene TraesCS5D02G355500.1 among other homologous gene with 69.79 %.
Further, we have noticed the maximum level of expressions of genes i.e., 100% at stress disease condition. For example, the homoeologous genes such as TraesCS1B02G315900.1, TraesCS7D02G488000.1, TraesCS2A02G222300, TraesCS2D02G277400.1, TraesCS5B02G371200.1, TraesCS7D02G488000.1and TraesCS2D02G277400 belonging to MATE genes TraesCS1A02G305200.1, TraesCS7A02G500700.1, TraesCS2B02G247700, TraesCS2D02G277400.1, TraesCS5D02G378300.1, TraesCS7D02G488000.1 and TraesCS2B02G296000.1 respectively were highly expressed during stripe rust mixture 6/14 days. In addition to this, some other homologous genes such as TraesCS5B02G371200.1, TraesCS1A02G305200.1, TraesCS5D02G378200.1 and TraesCS3A02G499200 of MATE genes TraesCS5D02G378300.1, TraesCS1A02G305200.1, TraesCS5D02G378200.1and TraesCS3B02G562400.1 were also expressed during Zymoseptoria tritici inoculation 4 days, Septoria tritici 10 days, Stripe rust pathogen 87/66 9days and Flg22 500nM respectively.
Moreover, in case of intermediate stress, the homologous genes have revealed higher level of expressions such as the homologous genes TraesCS2B02G247700 of MATE gene TraesCS2B02G247700 exhibits 94.10 % expression level during stripe rust. Similarly, other homologous genes TraesCS5B02G371200.1, TraesCS5D02G378200.1 and TraesCS5D02G378200.1 of MATE genes TraesCS5D02G378300.1, TraesCS5D02G378300.1 and TraesCS6B02G383300 were expressed during intermediate stress at PAMP Chitin, Fusarium pseudograminearum study 2 and cold 2 weeks by 99.24%, 96.39% and 97.98 % respectively. Whereas the maximum level of expression at PAMP flg22 intermediate stress was shown by the homologous gene TraesCS3A02G499200 of MATE gene TraesCS3B02G562400.1 with 100% value.
In this study, out of the total 44 MATE genes, 39 ternary plots consisting respective homologous genes were traced out showing relatively different expression levels during biotic and abiotic stress for 39 genes where 56% triads (A, B and D homoeologs) show balanced expressions and 44 % of triads exhibit nonbalanced expression, being more tissue specific as well as of greater expressivity towards stress conditions. Here, each circular dot signifies a gene triad with an A, B, and D coordinates comprising relative contribution of each homoeolog to the overall triad expression.
Similar type of work had been done by Medina et al.,2014, in Aspergillus flavus, where the effect of climate change was studied on Aspergillus flavus and its aflatoxin B1 production by obtaining ternary plot diagrams. Besides this, according to the study of transcriptional landscape of polyploid wheat, ternary plots were showing relative expression abundance of 16,746 syntenic triads for 50,238 genes in hexaploid wheat during combined analysis of 15 tissues from Chinese spring. Further, it has been noticed that 70% of triads possessing A, B and D homoeologs showed balanced expressions among other homoeologs and were universally expressed and 30 % exhibited non balanced expressions and were more tissue specific (González et al.,2018).