Classification and characterization of identified CYP450s in wheat and maize
The wheat and maize genomes respectively contained 1285, 263 full-length CYP450 genes and 2, 7 designated pseudogenes (Additional files 1 and 2: Table S1 and S2). According to standard nomenclature, there are 45 families in wheat, and due to lack of CYP723 and CYP729, only 43 families existed in maize (Additional file 3: Figure S1). CYP71 was the largest A-type family in wheat (404 genes) and maize (56 genes), while wheat CYP96 and maize CYP72 emerged as the largest non-A type family with 82 and 16 members respectively. Families CYP703, CYP715, CYP85, CYP722, CYP724 and CYP733 consisted of a single gene in maize implying a unique highly conserved function for each, but they were found with gene duplication in wheat. The TaCYP450s ranged in molecular weights from 10.92-102.06 kDa with coding sequences of 100-897 amino acids. In maize, the proteins were 135 to 1122 amino acids long with molecular weights of 14.92-126.44 kDa. And most proteins fell in the range of 400 to 600 amino acids in both species. In isoelectric point (pI) values, CYP450s had large variations ranging from 4.55 to 11.52 for wheat and from 4.99 to 10.59 for maize, implying the diversity of the biochemical properties of CYP450s. A total of 465 (36.0%), 709 (54.9%), 110 (8.5%), 8 (0.6%) TaCYP450s; and 94 (34.8%), 150 (55.6%), 25 (9.3%) and 1 (0.3%) ZmCYP450s possessed zero, one, two and three transmembrane domains, respectively. About subcellular localization, the vast majority of CYP450s were likely positioned on the endoplasmic reticulum. Of particular interest is that all members of CYP74 and CYP701 families were targeted to chloroplasts. Also, CYP727A2_2B, ZmCYP75B87 and ZmCYP88A56P were detected as chloroplast-localized proteins. Only three (ZmCYP81A17, ZmCYP81A2 and ZmCYP81A36) were located in mitochondria, two (ZmCYP71T29-fusion and ZmCYP72A351P) in cytoplasm and one (TaCYP710A8_3A) in nucleus. TaCYP71BU11_3D was only considered secreted, whose unusual localization might reflect a unique biological function.
Further, we searched for the conserved motifs of A-type and non-A-type CYP450s in wheat and maize. As for A-type, the relatively conserved C-terminal region usually consisted of motifs 1, 14, 23, 3, 30, 12, 5, 4, 2, 7, 13 and 27. The motifs 1, 3, 4, 2 were associated with the functionally defined domains, which corresponded to characteristic p450 consensus sequence Ala/Gly-Gly-X-Asp/Glu-Thr/Ser (AGxD/ET), Glu-X-X-Arg (ExxR), Pro-X-Arg-X (PxRx) and Phe-X-X-Gly-X-Arg-X-Cys-X-Gly (CxG) respectively (Additional file 4 and 5: Figure S2 and S3). As shown in Additional files 6 and 7: Figure S4 and S5, each clan in non-A-type had the specific arrangement of motifs, and that may be the main cause of the protein functional diversification among different clans. The specific motif 16-27-15-4-5-22-8-24-10-17-2-9-3-12-20-6-1-19-13 layout was universally conserved in CYP72 clan, motif 21-15-8-17-9-3-12-6-1-13 layout in CYP711 clan, and motif 21-15-18-14-10-17-2-3-12-6-1-13 layout in CYP51 clan. One thing to be noted is that motif 14 was a characteristic feature of CYP51 clan in both species. In CYP85 clan, the motif 21-15-4-5 arrangement within N-terminal region appeared to be specific to CYP707 family. CYP86 clan generated a unique motif 3-7-6-23-1-13 layout in its C-terminal region. Motif 26 occurred only in subfamilies CYP94E and CYP94C. Obviously, the members of clans CYP727, CYP710 and CYP74 seemed to have lost many motifs. The most highly conserved motifs 2, 3, 6 and 1 respectively represented AGxD/ET, ExxR, PxRx and CxG in non-A-type.
Phylogenetic and gene structure analysis
To dissect the evolutionary relationships of plant CYP450s, an unrooted maximum likelihood (ML) tree including seven species (C.reinhardtii, P.patens, Arabidopsis, maize, wheat, rice and poplar) was reconstructed. In the tree, there exist 91 families as defined in the standardized nomenclature, of which 19 are green alga-specific, 15 are moss-specific and the remaining 57 encompass the CYP450 diversity existing in the five angiosperms. As illustrated in Additional file 8: Figure S6, families within the same clan are usually closely clustered together. With only a few exceptions, green algae-specific families CYP737, CYP738, CYP739 and CYP740 were clearly separated from the others in CYP85 clan; CYP743 and CYP744 families specific to green algae in CYP711 clan did not cluster with CYP711 family. It seemed that some ancient families or clans exhibited a closer evolutionary relationship, for instance, CYP710 clan clustered with CYP51 clan; CYP97 family was adjacent to some green algae-specific families in clans CYP711 and CYP85 (i.e. CYP743, -744, -737, -738, -739 and -740). Regarding the clustering of families within the same clan, some interesting phenomena were also observed. For example, CYP93, CYP705 and CYP712 were closely clustered together just as CYP735 was phylogenetically closer to CYP714 and CYP715. Similarly, CYP77, CYP89 and CYP723 showed a close evolutionary relationship in the global tree. In addition, the phylogenetic tree placed CYP83 and CYP99 inside the CYP71 family, and CYP723 inside the CYP89 family. Notably, monocot specific CYP99 family was obviously closest to poplar CYP71D subfamily.
We further analyzed exon-intron structure and splicing site mapping for discovering wheat and maize CYP450 gene structure diversity and evolutionary divergence. As displayed in Additional files 9-10: Figure S7-8, all members of CYP74 and CYP710 clans were intronless with the exception of TaCYP74A97_4D. In CYP86 clan, families CYP86, CYP94 and CYP96 were generally characterized by being intronless, whereas CYP704 family contained 4-6 exons. Within CYP71 clan (A-type), most genes had only a single phase 0 intron, while CYP701 family had more sophisticated structure due to harboring a greater number of exons, and CYP89 family was generally intronless. In contrast, families in CYP97, CYP727, CYP72 and CYP85 clans had very complex gene structures, particularly CYP85, CYP87, CYP88, CYP90, CYP707 and CYP724. Wheat CYP97A59_6A, CYP97A59_6B, CYP97A60_6A and TaCYP97A60_6B contained the maximum number of exons (16) and the most structurally complex maize CYP450s were ZmCYP97A16 and ZmCYP71T29-fusion with 15 exons.
Chromosomal distribution and gene duplication
All ZmCYP450s were unevenly scattered over the ten chromosomes, and chromosome 1 contained the most genes with the number of 47 whereas the chromosome 7 had the least with the number of 13 (Additional file 11: Table S3). The relatively high densities of ZmCYP450s were observed in chromosomes 1, 2 and 4 with approximately 0.15 gene per Mb, while chromosome 7 had the lowest density (0.07 gene per Mb). As depicted in Additional file 12: Figure S9, we defined 36 gene clusters (designated as cluster a to aj), of which 31 included tandemly repeated genes (Additional file 13: Table S4). ZmCYP72As formed two of the largest clusters (cluster q and ad) on chromosomes 3 and 8, and CYP72A also showed large clusters in Arabidopsis and rice. Among 79 pairs of genes that underwent tandem duplication, 76 were within the same subfamily except for ZmCYP709E8/ZmCYP709H1, ZmCYP709E9/ZmCYP709H1 and ZmCYP71K29/ZmCYP71Y10. A total of 29 segmental duplication events were detected, and only three pairs (ZmCYP71Y10/ZmCYP71AF7, ZmCYP709E9/ZmCYP709C14, and ZmCYP709E4/ZmCYP709C23) were from different subfamily. Taken together, seven ZmCYP450s (ZmCYP96B23, -709E9, -74A38, -92A1, -72A5, -71Y10 and -709E4) that underwent both tandem duplication and segmental duplication were identified. Duplication events in families CYP71, CYP72 and CYP709 outnumbered the others, implying their major role in expansion of maize CYP450s. The Ka/Ks of 108 duplicated gene pairs is 0.07 to 1.08, and their corresponding duplication events might occur in 0.38 to 255.05 Mya (millions of years ago). ZmCYP74A18/ZmCYP74A38 not only suffered positive selection with Ka/Ks ratio greater than 1, but also was the youngest pair that might undergo a more recent duplication estimated at 0.38 Mya. In wheat, genome wide distribution indicated the sharing of 387, 431 and 423 TaCYP450s from A, B and D sub-genomes respectively, and the remaining 46 were located on scaffolds; they were distributed on each chromosome with different frequencies (Additional file 11: Table S3). A maximum of 128 genes were positioned on chromosome 2B, while a minimum 136 on chromosome 4B. On account of the incomplete genome sequence of wheat, only 172 adjacent gene pairs were detected, including 140 pairs of tandemly duplicated TaCYP450s. The duplicated pairs having a Ka/Ks ratio between 0.08 and 0.89 could occur within last 180.05 to 3.45 Mya.
GoGe SynMap program was used to detect synteny regions among Arabidopsis, maize, rice and wheat (Additional file 13: Table S4). None of the collinear CYP450 gene pairs were detect between Arabidopsis and maize. And the absence of synteny between maize and wheat genomes involving the CYP450 gene regions could be due to unavailability of wheat complete chromosome sequences. Using the rice genome as a reference to investigate the syntenic regions, 134 ZmCYP450s (49.4%) had their syntenic counterparts in rice genome. And these pairs were mainly distributed in families CYP71 and CYP78. The Ka/Ks ratio for all the pairs varies from 0.08 to 5.53, and the estimated divergence time was approximately between 23.38 and 240.40 Mya. ZmCYP86B22/OsCYP86B3 and ZmCYP94C20/OsCYP71Z5 were under positive selection with Ka/Ks ratio greater than 1.
Functional divergence analyses of CYP51, CYP74 and CYP97 clans among wheat, maize and rice
Functionally divergent sites may contribute to explaining specific functional classes of protein family and the substrate specificity of each protein. Here, we focused on calculated Type-I and Type-II functional divergence for clans CYP51, CYP74 and CYP97, because these clans are conserved across evolution and classified in distinct subfamilies acting on different substrates. As illustrated in Fig. 1 and Additional files 14-16: Figure S10-12, each family was divided into three gene clusters named as CYP51H/CYP51G-1/CYP51G-3, CYP74A/CYP74E/CYP74F and CYP97A/CYP97B/CYP97C. In CYP51 clans, we found evidences for Type-I functional divergence between pairs CYP51H/51G-3 (θI = 0.329 ± 0.083), CYP51H/51G-1 (θI = 0.475 ± 0.052) and CYP51G-1/51G-3 (θI = 0.354 ± 0.120), indicative of site-specific altered selective constraints; a total of 6, 48 and 1 CAASs (critical amino acid sites) were detected between the three group pairs, respectively. And the coefficients of Type-II functional divergence (θII) between them were less than 0 or insignificant. The similar cases were found between CYP74 clusters as well, that is, the Type-I functional divergence tests of CYP74A/E (θI = 0.625 ± 0.113), CYP74A/F (θI = 0.220 ± 0.093) and CYP74E/F (θI = 0.391 ± 0.138) were statistically significant and identified 66, 2 and 4 CAASs respectively, while all the pairs showed no Type-II functional divergence. These findings support the hypothesis that shifts of evolutionary rate and changes of amino acid property should not uniformly act on CYP51 and CYP74 clans during long-term evolution. The degree of both types of functional divergence between CYP97A/C (θI = 0.407 ± 0.094; θII = 0.224 ± 0.042), CYP97A/B (θI = 0.503 ± 0.093; θII = 0.252 ± 0.044) and CYP97C/B (θI = 0.460 ± 0.098; θII = 0.326 ± 0.043) were remarkably significant. There are 13, 40 and 21 Type-I functional divergence-related residues for CYP97A/C, CYP97A/B and CYP97C/B, respectively, and the CAASs that may have radically changed their amino acid properties were only found in CYP97A/C. In comparison with the number of CAASs for Type-I, 54 Type-II CAASs were identified for CYP97A/C, indicating that functional divergence between CYP97A and CYP97C was mainly attributed to the changes of amino acid property.
Assigning protein secondary structure elements and homology modeling
With a better understanding of the structure/function relationship of CYP450s from wheat and maize, we deciphered the 3D structure of CYP51, CYP74 and CYP97 clans of interest by homology modeling using Phyre2 server. The results revealed that 490, 479, 462, 465, 437 and 435 residues of TaCYP51G3_2D, ZmCYP51G35, TaCYP74A98_4A, ZmCYP74A39, TaCYP97A59_6B and ZmCYP97A16 (99%, 98%, 96%, 91%, 79% and 68% of amino acid sequence) have been modelled with 100.0% confidence by the single highest scoring template. As shown in Additional file 17: Figure S13, the Ramachandran plot analysis for nine predicted models (TaCYP51G3_2D, ZmCYP51G35, TaCYP74A99_5B, ZmCYP74A39, TaCYP97A59_6B and ZmCYP97A16) showed that the vast majority of residues fell in favoured regions. And the qualities of the models were further supported by high ERRAT scores, which signify acceptable protein environment. In spite of relatively low amino acid sequence identity among different organisms, they appear to take on a similar secondary and tertiary structural fold throughout evolution (Additional files 17-20: Figure S13C, S14-16 and Fig. 2). Sixteen α-helices (A-L, B’, J’, K’, K’’) and 3 β-sheets including a five-stranded sheet (β-sheet 1), a three-stranded sheet (β-sheet 3) and a two-stranded sheet (β-sheet 2) are in common, while β-sheet 4 also containing two strands is not present in CYP74. The four-helix bundle (helices D, E, I, and L) and helices J and K form CYP450 conserved structural core. With exception of CYP74 family, the central part of the I-helix contains consensus sequence (A/GGxD/ETT/S) involved in oxygen binding. At the beginning of L helix occurs a heme binding region FxxGxRxCxG with the absolutely conserved cysteine that serves as fifth ligand to the heme iron. Notably, the nine-residue insertion in heme-binding loop is specific to CYP74 family. Helix K contains the conserved ExxR motif that is on the proximal side of the heme. Furthermore, Glu and Arg of ExxR motif and the Arg in the “PxRx” consensus sequence comprise E-R-R triade that is probably involved in locking the heme pockets into position and stabilizing the core structure. We also identified six putative SRS (substrate recognition sites) regions which are involved in recognition and binding of substrates according to Gotoh’s predicted models. The highly variable loop region between helices B and C lines the SRS1; helices F and G and their loop house SRS2 and SRS3; SRS4 lies in I helix and SRS5 in β1-4; and the β-turn between β4-1 and β4-2 or between β3-1 and β3-2 protrudes into the SRS6.
Expression profiles of CYP450 genes in various organs and developmental stages
To gain insights into the spatial and temporal expression patterns of CYP450s, we investigated the expression profiles of TaCYP450s in root, stem, leaf, inflorescence and grain at three developmental stages; ZmCYP450s in root, ear, tassel, pollen, embryo, and endosperm. A total of 403 TaCYP450s and 101 ZmCYP450s were chosen for expression analysis, because their expression values were ≥10 TPM (Transcripts Per Kilobase Million) in at least one tissue (Additional files 21-23: Table S5, Table S6 and Figure S17). A comparison of gene expression profiles among different organs revealed that wheat CYP450 transcripts were detected in all tissues, but the highest in root, which was followed by stem, leaf, inflorescence and grain (Fig. 3a). Thereinto, TaCYP73A17_3A, -73A17_3B, -73A17_3D, -74E7_6A and -76H2_7D displayed peak transcript levels in root. TaCYP89J13_2A, -73A205_4B and -71F39_2D were specifically and abundantly expressed throughout root development. Also, in maize, more CYP450s were expressed in high quantity in root as compared to other organs such as ZmCYP73A7 (318 TPM), ZmCYP707A5 (275 TPM), -71C2v2 (173 TPM), -98A29 (162 TPM), -71C5 (139 TPM) (Fig. 3b). While in endosperm and pollen, a large majority of ZmCYP450s are transcribed at low levels. Only 4 genes (ZmCYP724B3, -72A353, -707A116 and -71C5) in endosperm and 3 (ZmCYP84A34, -71T29-fusion and -94C69) in pollen displayed medium level expression (10 to 20 TPM). Additionally, in both species, a considerable proportion of CYP450s (356 TaCYP450s, 76 ZmCYP450s) showed high alterations in expression levels among different tissues (CV (coefficient of variation) >100%) suggesting that they may have more specific functions. For example, TaCYP84A97_1B, -75B125_7D, -93G18_2B and -93G18_2D expressed predominantly in stem, particularly in SFL.02 (Stem at 1/2 of flowers open stage). Expression of TaCYP71C162_5A, -71Y18_1B, -71Y18_1D and -71Y18_1A had a high degree of specificity to FR (Fruit at whole plant fruit ripening stage) with TPM value >300. ZmCYP89M2, -81N5 and -78A131 showed organ-specific expression in embryo; ZmCYP96B18, -86A35 and -92A47 in tassel; and ZmCYP707A5, -72A5, -71C62, -92A1, -72A123, -75B89, -94B41, -728C14, -89B19 in root. The transcripts of ZmCYP94C69 and -84A34 were specifically detected in pollen and root. Only a tiny fraction of genes had a relatively stable expression, of which TaCYP727A2_2A and TaCYP96B64/72_3B exhibited the least variation with a CV value of 42%. The average expression level of TaCYP96B64/72_3B is approximately three times higher than that of TaCYP727A2_2A.
And then we compared the expression profiles of the homologous genes in wheat and maize. The results showed that members of CYP73A in both species showed preferential expression patterns in the root. Transcripts of TaCYP51Hs (8) were mainly detected in root, especially TaCYP51H39_5B, -51H43_5B, -51H49_4B and -51H49_4D, while ZmCYP51H12 showed significant expression divergence, which were predominantly expressed in embryo. TaCYP99As without TaCYP99A42_5D were highly expressed in the specific tissues, root, whereas ZmCYP99As had no expression in any tissues but slight expression (1<TPM<10) in root. Also, the expressional diversity of members from CYP78A in different organs and species was found. Specifically, TaCYP78A263_4B, TaCYP78A263_4D, TaCYP78A263_5A showed preferential transcript accumulation in root; TaCYP78A265_2D, TaCYP78A265_7B in stem; TaCYP78A264_5A, TaCYP78A264_5B, TaCYP78A264_5D in inflorescence. In contrast, ZmCYP78A130 expression was confined to embryo and root with much higher expression level in embryo (195 TPM). ZmCYP78A1 not only displayed high expression level in embryo with 99 TPM but also exhibited extremely high transcript abundance in ear with 880 TPM. We also detected the expression of ZmCYP78A53 mostly in ear and tassel, with the strongest expression in ear (131 TPM).
Expression profiles of CYP450 genes under drought stress
Drought as one of the prevailing abiotic stresses affects various physiological and biochemical processes of plants. To analyzed expression of CYP450s in response to drought treatment, 119 TaCYP450 and 86 ZmCYP450 genes whose expression values were ≥10 TPM in one or more conditions were selected (Additional files 24-25: Table S7 and S8). For wheat, twenty-six TaCYP450s (17 up-regulated and 9 down-regulated) and 77 (27 up-regulated and 50 down-regulated) showed significant changes in their expression levels after drought treatment at 1 h (DS1h) and 6 h (DS6h), respectively (Additional file 26: Figure S18A). Five genes with significant up-regulation at DS1h, i.e. TaCYP71T45_3B, -73A17_3B, -73A201_3A, -74A1_4A and -89E25_7B were considered as drought sensitive genes. While TaCYP701A66_U, -704A152_3A, -704A160_3A, -704A164_3D, -706C1_7B, -71AK14_5A, -71C10_3A, -71C10_3D, -71Y16_1D, -72A578_1B, -72A598_7A, -89E20_1A, -89E24_7A, -90B46_4B, -96B38_2A, -96B38_2D, -96B58_5A and -96B64/72_3B were significantly induced at DS6h. The transcript levels of TaCYP71E12_4A, -74A1_4D, -78A264_5B and -84A103_2A were initially elevated markedly at DS1h and then reduced at DS6h. As the drought continues, the expression of TaCYP707A5_6A, -707A5_6B, -707A5_6D, -71R11_1A, -71T45_U, -71Y18_1D, -73A17_3A, -73A201_3B and -97B4_6D were elevated continuously, whereas TaCYP714C17_5D, -71C142_2D, -71C166_6D, -71C170_6B, -71W30_5B, -76M28_2D, -88A94_7D, -90A28_5A and -90A28_5B exhibited continuously decreased expression. Of note, TaCYP71Y18_1D was the most significantly up-regulated gene in both DS1h and DS6h (9.32- and 118.60-fold), and TaCYP76M28_2D showed the largest fold change (4.06- and 39.12-fold) among all down-regulated genes. For maize, 39 ZmCYP450s were DEGs (19 up-regulated and 20 down-regulated) under dehydration stress (Additional file 26: Figure S18B). Some genes including ZmCYP81N4 (12 fold up), -81A1 (10 fold up), -71C62 (9 fold up) and -78A53 (4 fold up) were expressed only under drought treatment. And the transcripts of ZmCYP707A65 (8 fold up, 452.52 TPM) and ZmCYP71F13 (5 fold up, 208.58 TPM) massively accumulated under water deficiency treatment (Fig. 4b). In contrast, ZmCYP86A35, -93G5, -78A55, -86B21, -71C114, -86A97 and -77B2 expression dropped to undetectable level after exposure to drought.
In a comparison with CYP450s expression patterns between the two species, we found wheat differentially expressed genes (DEGs) were only detected in clans CYP51, 74, 97, 71, 72, 85 and 86, while maize DEGs in clans CYP51,727, 97, 71, 72, 85 and 86. There are more than 40 families in wheat and maize, but the DEGs only distributed in less than 26 families, and the overlapping families with DEGs between both species were CYP51, CYP71, CYP76, CYP78, CYP81, CYP89, CYP93, CYP72, CYP707, CYP96 and CYP97. Some TaCYP450s had similar expression patterns with their homologs in maize, which were principally occurred in subfamilies CYP71C, CYP93G, CYP72A, CYP704A. For instance, TaCYP71C142_2D, TaCYP71C162_5B, TaCYP71C166_6D, TaCYP71C170_6B, ZmCYP71C114, ZmCYP71C1v1, ZmCYP71C2v2, ZmCYP71C3v2 and ZmCYP71C4; TaCYP93G20_7A, TaCYP93G20_7B, TaCYP93G20_7D, ZmCYP93G5 and ZmCYP93G7; TaCYP72A578_1B, TaCYP72A598_7A, ZmCYP72A353, ZmCYP72A354, ZmCYP72A5 and ZmCYP72A16 were expressed in the same trend that the expression level was significantly decreased under drought stress. TaCYP704A152_3A, TaCYP704A160_3A, TaCYP704A164_3D and ZmCYP704A105 transcripts significantly increased in response to dehydration. And TaCYP72A598_7D showed a high level of transcript accumulation across all three conditions (Control, DS1h and DS6h) with no significant expression changes, whose average expression values were up to 111.24 TPM (Fig. 4a). And ZmCYP72A124, the homologous gene most closely related to TaCYP72A598_7D, had similar expression trend, although the expression abundance was differentiated. Meanwhile, many genes presented quite different expression profiles to their homologs in maize. For example, TaCYP96B38_2A, TaCYP96B38_2D, TaCYP96B58_5A, TaCYP96B64/72_3B and ZmCYP96B23 showed a totally contrary expression trend. TaCYP707A5_6A, TaCYP707A5_6B, TaCYP707A5_6D exhibited similar expression patterns to ZmCYP707A5 but opposite to ZmCYP707A114 and ZmCYP707A65. Finally, 14 genes were selected for qRT-PCR analysis to confirm the expression patterns (Fig. 4c). Primers are listed in Additional file 27: Table S9.
Co-expression modules of TaCYP450s
Co-expression analysis can further provide valuable clues for functional annotation of genes that have significant and as yet unappreciated roles. In total, 22 co-expression modules were identified and covered 477 TaCYP450s belonging to 39 families (Additional file 28: Table S10). These genes showed distinct expression patterns mirroring each of the modules to which they belong. To identify the major molecular and biochemical pathways and functional categories for each module, the gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed (Additional file 29: Table S11). Here, the discussion focused on 10 modules because they showed the most clear and distinct characteristics in terms of gene expression profiles (Fig. 5a). Purple module contained 62 TaCYP450s with the strongest expression in root. KEGG enrichment analysis revealed significant overrepresentations of phenylpropanoid biosynthesis (ko00940), phenylalanine metabolism (ko00360), metabolism of xenobiotics by cytochrome P450 metabolic pathways (ko00980) and so on (Fig. 5b and 5c). TaCYP73A17_3B, -73A205_4B and -73A205_4D were annotated into degradation of aromatic compounds pathway (ko01220) with Q-values < 0.007. Lightgreen module where 39 TaCYP450s were predominantly expressed in root was highly positively correlated with purple module. Many enriched GO terms and pathways were common such as lignin metabolic process (GO:0009808), phenylpropanoid metabolic process (GO:0009698) and metabolism of xenobiotics by cytochrome P450 metabolic pathway (ko00980). Besides, diterpenoid biosynthesis (ko00904) was unique metabolic pathway in the module (Fig. 5d and 5e), and more TaCYP450s (TaCYP73A17_3A, -73A17_3D, -73A202_7A, -73A202_7D, -73A203_2B, -73A205_5A, -73A206_7B, -75A82_6B, -75A84_4A, -701A63_2B) were annotated into the significantly enriched pathways, i.e. phenylalanine metabolism, ubiquinone and other terpenoid-quinone biosynthesis, diarylheptanoid and gingerol biosynthesis, flavonoid biosynthesis, diterpenoid biosynthesis, degradation of aromatic compounds and alpha-Linolenic acid metabolism. In orangered4 module, eight TaCYP450s had their peak expression in SSE.00 (stem at stem elongation begins stage) and SSE.02 (stem at two nodes or internodes visible stage).
Fifty-nine TaCYP450s with a maximal transcript abundance in leaf were grouped in black module, and carbon fixation in photosynthetic organisms (ko00710), photosynthesis (ko00195), carotenoid biosynthesis (ko00906) and porphyrin and chlorophyll metabolism (ko00860) were the most significantly enriched in the module. Orange module had 20 TaCYP450s exhibiting the highest transcript abundance in LCE (Leaf at cotyledon emergence stage), and aminoacyl-tRNA biosynthesis (ko00970), porphyrin and chlorophyll metabolism (ko00860), ribosome (ko03010), protein export (ko03060), RNA degradation (ko03018), ubiquinone and other terpenoid-quinone biosynthesis (ko00130), monobactam biosynthesis (ko00261), selenocompound metabolism (ko00450) and biotin metabolism (ko00780) were significantly overrepresented pathways. Four highly enriched metabolic pathways monoterpenoid biosynthesis (ko00902), glucosinolate biosynthesis (ko00966), diterpenoid biosynthesis (ko00904) and 2-Oxocarboxylic acid metabolism (ko01210) were identified in floralwhite module that harbored three L3N (Leaf at main shoot and axillary shoots visible at three nodes stage)-specific expressed TaCYP450s, namely TaCYP71V29_3B, -79A141_2A and -79A141_2B, of which the latter two were respectively annotated into glucosinolate biosynthesis and 2-Oxocarboxylic acid metabolism pathways. Terpenoid backbone biosynthesis (ko00900), plant-pathogen interaction (ko04626) and regulation of autophagy (ko04140) showed significant enrichment in tan module including 20 TaCYP450s that displayed peak transcript levels in LF1 (leaf at whole plant fruit formation stage 30 to 50%).
Cyan module contained 27 TaCYP450s whose expression peak appeared at maximum stem length reached stage of inflorescence (ISE.99) when the anthers release their pollen, and Go terms associated with pollen development (GO:0009555, GO:0010208, GO:0010584, GO:0080110) were significantly enriched, indicating the biological roles of the tissue. Fatty acid metabolism-related pathways (ko01212, ko01040, ko00062, ko00061, ko00071, ko00073) were significantly over-represented in the module. Thereinto, TaCYP86A154_2A was annotated into cutin, suberin and wax biosynthesis metabolite pathways (ko00073). Another sexual reproduction related module, magenta, possessed 16 TaCYP450s with a high degree of specificity to IFL.02. Forty-six TaCYP450s in turquoise module had been described as FR-specific in their expression. The pathways of folate and arginine biosynthesis (ko00790, ko00220), alanine, aspartate, glutamate, thiamine, ether lipid and inositol phosphate metabolism (ko00250, ko00730, ko00565, ko00562) were specifically enriched in the module.
Prediction of miRNA targets and cis-regulatory elements
To obtain in-depth understanding of the regulation mechanisms of gene expression, miRNA target and cis-regulatory elements predictions were performed (Additional file 30: Table S12). MiRNAs control gene expression through affecting both translation and stability of mRNAs. A total of 46 TaCYP450s were predicted to be regulated by 28 miRNAs in 55 miRNA-target interactions in wheat. Seven genes seemed to be targeted by multiple miRNAs, of which TaCYP51H39_5D, -71C9_4A, -71S9_2D and -75A90_6B had two regulatory miRNAs, TaCYP96E3_3D had three, and TaCYP97A59_6A had four. According to the gene expression profiles, a total of 15 predicted target genes seemed to be involved in diverse developmental processes, such as TaCYP76H2_7D, -73A201_3A, -74E7_6D, -96B53_7A, -93G18_2D, -92A150_2B and -72A597_6B. And drought-responsive TaCYP73A201_3A was targeted by tae-miR5049-3p. In maize, 11 miRNA-target interactions representing nine miRNAs and eight CYP450s were found. The transcriptional regulation of ZmCYP72A28v2 was mediated by four miRNAs. ZmCYP72A353 possibly as a target of zma-miR399e-5p was significantly up-regulated under drought stress; Root-specific ZmCYP72A28v2 was targeted by zma-miR172a, zma-miR172b-3p, zma-miR172c-3p and zma-miR172d-3p; zma-miR156j-5p acted as potential regulator of ZmCYP78A53 that had its peak expression in ear.
Cis-regulatory elements serve as binding sites for transcription factor to regulate gene expression. A promoter scan revealed 75 and 97 kinds of trans-acting binding sites in drought-responsive CYP450s in wheat (82) and maize (39), respectively. Among them, 15 elements have high frequency of occurrence (> 50) such as the drought-related ABRE (ABA responsive element), TGACG- and CGTCA-motif (MeJA-responsiveness), and MBS (MYB binding site involved in drought-inducibility). ABA has a central role in the complex cascade of drought response, and there is a hormone cross-talk in plant adaptation to drought stress. Of hormone-responsive elements, ABRE was the most abundant element of the drought-responsive genes, e.g. TaCYP707A5_6A, TaCYP707A5_6B, TaCYP707A5_6D, ZmCYP707A65, TaCYP97B4_6D, TaCYP97C2_1A, TaCYP97C2_1D and ZmCYP81N4. The second most abundant hormone-responsive element is MeJA-responsiveness element possessed by 66 TaCYP450s and 26 ZmCYP450s, among which the drought-inducible gene TaCYP74A1_4D, a homolog of JA biosynthesis-related AOS, had two MeJA response elements CGTCA- and TGACG-motif. Others hormone response elements like GA-responsive P-box, TATC-box, GARE-motif and SA-responsive TCA-element were also found to be apparently abundant. P-box resided in TaCYP88A94_7D that represents an orthologue of GA biosynthetic gene KAO1 but had significantly decreased expression. MBS was distributed in 38 TaCYP450s and 26 ZmCYP450s, such as TaCYP71R11_1A, TaCYP71C10_3D, TaCYP71Y18_1D, ZmCYP81A1, ZmCYP71C62 and ZmCYP71F13, and TaCYP71Y18_1D was the most highly induced gene under dehydration stress. Only ZmCYP51H12, TaCYP97C2_1A and TaCYP714C17_5D harbored MBSI involved in flavonoid biosynthetic gene regulation, of which ZmCYP51H12 was upregulated after exposure to drought.