Gene annotation
The contigs from the SAGE libraries were subjected to KAAS analysis that assigned the sequences with K numbers after the initial screening using GHOSTX (http://www.bi.cs.titech.ac.jp/ghostx/). The results displayed 200 sequences from SAGE 1, 124 sequences from SAGE 2, 105 sequences from SAGE 3 and 34 sequences from SAGE 4 (Supplementary Table S2). These filtered contigs, assigned with K numbers, were subjected to gene ontological annotation. Further, WEGO annotation was carried out with these sequences to depict their molecular functions, cellular component, and biological processes along with the annotation scores (Supplementary Fig. S1 a-d). Under biological process, the annotated contigs divulged to be involved in immune system process signaling, biological adhesion, carbon utilization, cell proliferation and different organ developmental processes. Under the molecular function, contigs were involved in antioxidant activity, transcription regulator activity, catalytic activity and regulation of biological process. Under cellular component, most contigs were cell part, cell, organelle, nucleoid, supramolecular complex, membrane part, membrane, protein-containing complex, organelle parts.
The K numbers assigned to the contigs of SAGE libraries were further subjected to KEGG database and the secondary metabolite pathway was recorded from the list of KEGG pathways, particularly, flavonoid biosynthetic pathway (Fig. 4). Among those, 25 sequences of SAGE 1, 14 of SAGE 2, 13 of SAGE 3 and five of SAGE 4 respectively were involved in the biosynthesis of secondary metabolites. Flavonoid biosynthetic pathway revealed involvement of several enzymes out of which three putative enzymes with EC numbers were selected for further studies. These EC numbers from the flavonoid biosynthesis pathway were assigned in the ExPASy ENZYME server to derive the names of the genes, reaction catalyzed and their descriptions (Supplementary Table S3). A total of 18 sequences corresponding to the flavonoid biosynthesis genes were identified that belonged to Chalcone synthase (2.3.1.74), Flavanone 3-dioxygenase (1.14.11.9) and Anthocyanidin synthase (1.14.20.4). Chalcone synthase (CHS) is the entry point of the flavonoid pathway, which catalyze 4-Coumaroy-CoA and Malonyl-CoA to chalcone, leading the phenylpropanoid pathway towards flavonoids biosynthesis (Ma et al., 2014). Besides CHS, enzymes such as Chalcone isomerase (CHI), Flavanone 3- hydroxylase (F3H), Anthocyanidin synthase (ANS), Dihydroflavonol 4-reductase (DFR), Flavonol synthase (FLS) and Flavone synthase (FNS) were also determined that are involved in the pathway leading to biosynthesis of major flavonoid classes which aided for the selection of enzymes (Lu et al., 2017) (Supplementary Fig. S2). Therefore, these three genes CHS (initial gene of the pathway), F3H (gene in the middle of the pathway) and ANS (at late stage of the pathway) were selected to characterize and discern their roles during leaf rust pathogenesis.
In silico characterization of the identified flavonoid biosynthesis genes
In silico characterization studies of the 18 identified sequences were carried-out for unraveling their functional attributes. The physico-chemical parameters of the identified proteins demonstrated considerable variations in length, molecular weight, pI and grand average of hydropathicity (GRAVY) values (Supplementary Table S4). All 18 identified amino acid sequences were predicted to have nuclear localization signal (NLS) sequence (Supplementary Table S5). The prediction of NLS depicts stretches of amino acid tags which helps in trafficking of the proteins from cytoplasm into nucleus.
The psRNA target tool predicted 26 flavonoid biosynthetic gene sequences that were targeted by various wheat (tae)-miRNAs (Supplementary Table S6). TaCHS1 and TaCHS2 was targeted by four different miRNAs: tae-miR1121, tae-miR1135, tae-miR9659-3p and tae-miR9776. TaCHS5 was targeted by two miRNAs: tae-miR1130a and tae-miR1130b-3p. TaF3H1 was targeted by tae-miR9781, TaF3H6 was targeted by tae-miR5086 whereas TaF3H3 was targeted by four different miRNAs: tae-miR9670-3p, tae-miR9677a, tae-miR9775 and tae-miR9779. TaANS3 was targeted by tae-miR9670-3p. TaANS1 and TaANS5 was targeted by two different miRNAs i.e., tae-miR531, tae-miR9773 and tae-miR9773, tae-miR1128 respectively whereas TaANS2 was targeted by four different miRNAs: tae-miR1121, tae-miR1847-5p, tae-miR397-5p and tae-miR9781. Though multiple miRNAs targeted the same gene, the target sites on the ORF were different.
Analysis of the regulatory sequences present in the 2000 bp upstream of each gene illustrated various cis-regulatory elements that included light responsive, developmental related, promoter related, flavonoid biosynthesis related, hormone responsive, defense related and environmental stress related elements (Supplementary Table S7). Besides, elements for basal transcription like TATA-box and CAAT-box were detected. The key physiological processes related to the regulatory elements were photoperiod, hormonal response, developmental regulation, and environmental response. Phytohormone responsive abiotic stress elements included P-box, ABRE, TGA, TCA, CGTCA-motif, TGA-element, TGACG-motif and GARE-motif. Among these, CGTCA, P-box and TCA motifs were abundant with average copy numbers of 2.66, 0.22 and 0.27 respectively. These elements are mainly related to methyl jasmonic acid (MEJA), gibberellin and salicylic acid stress. The cis-regulatory elements responsive to other environmental stresses like anaerobic induction, drought inducibility and low-temperature responsiveness were ARE, MBS and LTR with average copy numbers of 1.00, 0.77 and 0.5, respectively. The cis-regulatory element responsive to flavonoid biosynthesis detected was MBSI with an average copy number of 0.77. Defense related elements included TC-rich repeats and WUN-motif with an average copy number of 1.22 and 0.55 respectively. The developmental related elements included CAT-box, O2-site, AT-rich element, GC-motif, GCN4_motif, RY-element and circadian element that are involved in zein metabolism regulation, meristem expression, endosperm expression, and cell cycle regulation respectively. The elements such as CAT-box and GCNA motif was copious. The highest number of cis-acting elements observed were under the developmental related elements suggesting that these identified wheat genes might play critical roles in morphogenesis, differentiation and development.
In order to understand the evolutionary constraints of the selected flavonoid biosynthesis genes of wheat, Ka (non-synonymous substitutions), Ks (synonymous substitutions) and Ka/Ks ratio was determined. Ka/Ks values = 1 denote the neutral drifting of the genes, Ka/Ks > 1 denote the accelerated evolution with positive selection, and Ka/Ks < 1 denote the functional constraint with a negative or purifying selection of genes. The Ka/Ks ratio of the 18 orthologous genes was less than 1, suggesting these genes are undergoing strong purifying selection and they are slowly evolving at protein level in wheat (Fig. 5 and Supplementary Table S8).
The deduced protein sequences comprised of brief patterns of motifs that were originally restored by nature. These motifs refer to an enzyme's active site and structural units that are required for proper protein folding. Hence, the motif sequences encompass basic functional components of molecular evolution (Bailey et. al, 2009). Motifs of CHS, F3H and ANS proteins were identified using MEME which revealed three conserved motifs (Fig. 6). Motif − 1: EGEEPILEEPITFAEMYRRKMERDLDLAKRKKQAKDQLMQQQLQLQQQQQ denotes Leucine Rich Repeats (LRR); Motif-2: ATVLAIGTATPANCVYQADYPDYYFK ITKSDHMADLKEKFKRMCKFNEFH denotes polyketide synthase (type III PKS17) and Motif-3: NYCALCMQFMSNGRFKNADHQAVVNGESSRLSIATFQNPAPDARVWPLAV denotes Naringin (NRG) motifs. The Non-Expresser of Pathogenesis Related (NPR) motifs, found in all sequences, bind with bZIP transcription factors, forming a complex. This complex-formation activates some genes that include Acyl-CoA oxidase (ACO) and Acyl-CoA dehydrogenase that contributes to the synthesis of flavonoids (Garcia-Ponce and Rocha-Sosa, 2000).
Based on the query sequences of Triticum aestivum the evolutionary relationship was analyzed. Neighbor-Joining trees were constructed with the query protein sequences of CHS, F3H and ANS, that were aligned along with Oryza sativa, Zea mays and Hordeum vulgare. The bootstrap values on the branch points of the phylogenetic trees provide the assessments of “confidence” for each clade in the tree and based on homology, the sequences were thereby clustered into three main classes namely Class I, Class II and Class III represented in three different colors (Fig. 7A-C). Figure 7(a) depicts CHS proteins that are clustered in following groups: Class I included eight proteins: TaCHS1, TaCHS2, TaCHS3, TaCHS4, TaCHS5, TaCHS6, HvCHS1 and HvCHS2; Class II included three proteins: ZmCHS3, ZmCHS4 and ZmCHS5 while Class III included eight proteins: ZmCHS1, ZmCHS2, OsCHS1, OsCHS2, OsCHS3, OsCHS4, OsCHS5 and OsCHS6. Based on the phylogenetic tree, five sister gene pairs were formed which included TaCHS3-TaCHS2 showing paralogous relationship and TaCHS6-HvCHS2 showing orthologous sister pair within Class I; ZmCHS3-ZmCHS4 of Class II showed paralogous relationship; OsCHS4-OsCHS5 and ZmCHS1-ZmCHS2 of Class III showed paralogous relationship. Figure 7(b) depicts F3H proteins that are clustered in following groups: Class I included six proteins: TaF3H3, TaF3H4, TaF3H5, TaF3H6, TaF3H7 and HvF3H1; Class II included three proteins: TaF3H1, TaF3H2 and OzF3H2 while Class III included six proteins: OzF3H1, OzF3H3, OzF3H4, OzF3H5, OzF3H6 and ZmF3H1. Based on this phylogenetic tree, four sister gene pairs were formed which included TaF3H3-TaF3H4 and TaF3H5-TaF3H6 of Class I showing paralogous relationships: OzF3H2-TaF3H2 of Class II showing orthologous relationship and OzF3H5-OzF3H6 of Class III showing paralogous relationship. Figure 7(c) depicts ANS proteins that are clustered in following groups: Class I included one sister gene pair: OzANS3-OzANS5 which also showed paralogous relationship; Class II included four proteins: TaANS1, TaANS3, TaANS4, and TaANS5, where TaANS1-TaANS5 showed paralogous sister pairs while Class III included four proteins: OzANS1, OzANS2, OzANS4 and TaANS2.
Further, to better divulge the functional role of flavonoid biosynthesis genes, protein-protein interaction was performed using STRING v. 11 database using plant specific parameters. The search tool predicted other Triticum aestivum proteins which exhibited direct and indirect interaction with the query proteins in the STRING functional association databases (Fig. 8). The interaction revealed 16 nodes with 60 edges. A clustering co-efficient of 0.783 indicated significant interactions among the query proteins. The network analysis predicted five functional partners such as Chi-B1 and Traes_5BL_4F4E0B862.1 which are chalcone-flavonone isomerase that belongs to the chalcone isomerase family whereas Traes_2AS_2E21BFFFB.1 is a putative uncharacterized protein that belongs to chalcone/stilbene synthases family. The other two functional partners i.e., Traes_3AL_197871859.2 and Traes_3B_96D744B6B.1 are putative uncharacterized proteins. It also revealed the pragmatic co-expression protein interactions between Traes_2DS_1BA7B8A2C.1, CHS, Traes_2BL_A3CF87FBF.1, Traes_5BL_4F4E0B862.1, Chi-B1, Traes_2AS_2E21BFFFB.1, Traes_2DL_BC7F606B9.2 and Traes_2DS_B9D014DE1.1 with C- and N-terminal domains that belonged to chalcone synthase, stilbene synthases as well as Chalcone-flavanone isomerase containing 3-Oxoacyl-[acyl-carrier-protein (ACP)] synthase III and FAE1/Type III polyketide synthase-like protein domains. The functional partners identified in the interaction network revealed the CHS proteins to have their functional role in flavonoid biosynthesis mediated signaling pathway.
The in-silico expression study of these genes exhibited high expression in leaves, followed by roots, spikes and grains under normal conditions of growth and development (Fig. 9). In leaves, the expressions of these genes were low during abiotic stress condition whereas high during diseased condition The TaCHS2, TaCHS3, TaF3H2, TaF3H3, TaF3H4 and TaF3H5 genes highly expressed in leaves under high levels of stress-disease while they moderately expressed in roots and spikes. Whereas other genes exhibited diverse expression patterns in tissues and organs during different developmental stages and under biotic stresses. Thus, the uniform and comparatively high expression pattern of TaCHS2 and TaCHS3 genes in leaves demonstrated preclusion under diseased or other biotic stress condition.
Synteny analysis for the comparative genomics study of the genetic loci of query sequences of the three flavonoid biosynthesis genes of wheat on chromosomes of other monocot plants such as Oryza sativa and Hordeum vulgare was studied (Fig. 10). The synteny analysis reveals several clues for the presence of the functional genes. The mapping of identified genes shared more than 80% homology with Hordeum vulgare and Oryza sativa indicating presence of similar NRG and LRR domains inferring the high syntenic relation among these plants. It can be presumed that these genes might have similar functions among all three selected monocot plants.