Gene replication is one of the most essential evolutionary processes to produce genetic diversity and new functions and plays a vital role in adaptation and speciation. Results were divided into segmental, tandem, proximal, singleton, and dispersed genes. Among them, the frequency of tandem replication was 38.6% (76), fragment replication was 19.3% (38), proximal replication was 9.1% (18), decentralized replication was 33% (65), and no single-copy replication occurred (Figure 4). The results demonstrated that tandem reproduction was the main driving force for CsCYP450 gene family amplification. Tandem replication is required to maintain a large gene family, which can dilate and shrink rapidly in response to environmental alterations, resulting in increased genetic complexity under good conditions [30]. Additionally, the chromosomal distribution pattern of CsCYP450 strongly suggested that tandem replication promoted the amplification of CsCYP450 in cucumber. To further explore the possible evolutionary mechanism of the CsCYP450 gene family, Circos software was used to map collinear gene pairs in the cucumber genome (Figure 6).
Evolutionary analysis of related species
A phylogenetic tree was constructed according to the multiple sequence alignment results to investigate the genetic relationships among the members of the CYP450 gene family in cucumber. Based on phylogenetic analysis, 165 P450 genes were classified into A-type and non-A-type P450 gene family groups (Figure 7). The A-type P450 gene family was clustered into the CYP71 clan. The non-A-type P450 gene family contained seven clans, designated as CYP85, CYP86, CYP72, CYP97, CYP711, CYP51, and CYP710.
To investigate the evolutionary relationship between cucumber CYP450 genes, we constructed a phylogenetic tree using CYP450 proteins from cucumber and Arabidopsis (Figure 7). CYP450 proteins were classified into A-type (52.7%, 87/165) and non-A type (47.3%, 78/165). According to the criteria of phylogeny and homology, the CYP450 gene family was further divided into 40 families and eight clusters. Among these, the most extensive A-type gene family of cucumbers is CYP71, with 87 members. Studies have shown that the most prominent CYP450 gene family in most plants is CYP71, which contains more than half of the CYP450 genes and has rich and diverse functions, which is closely related to the metabolism of aromatic and aliphatic amino acid derivatives, some triterpenoid derivatives, fatty acids, alkaloids, and hormone precursors; the largest non-A type family in cucumber is CYP85, which is composed of nine gene families: CYP86, CYP707, CYP722, CYP716, CYP718, CYP87, CYP724, CYP85, and CYP90. It is worth mentioning that the CYP93, CYP701, CYP703, CYP51, CYP724, CYP718, and CYP710 families are all composed of single cucumber genes, suggesting that each gene has a unique, highly conserved function. In particular, the CYP51 family is an ancient and conservative family, with only one member in the study of gene families of all species to date. CYP51G and CYP710A encode 14α-demethylase and sterol 22-desaturase, respectively, which participate in sterol biosynthesis. In a survey of CYP703 family members, the Arabidopsis mutant cyp703a2 showed the phenotype of pollen development retardation and partial male sterility [31]. In addition, compared to the A-type P450 family, the non-A-type P450 gene family has a wide range of species and complex changes. The non-A-type P450 gene family may be evolutionarily older than the A-type P450 family, and the time for gene replication and rearrangement may be longer. This leads to a more diverse composition than that of the A-type P450.
We compared the P450 genes of cucumber with P450 gene families in Arabidopsis, tomato, soybean, maize, rice, poplar, grape, and moss (Tables S3 and S4). CYP736 has evolved in the cucumber genome compared with Arabidopsis. Some studies have shown that CYP736, CYP83, and CYP81 are very similar, and they strongly induce Phytophthora sojae to infect cucumber hypocotyls [14]. However, CYP82 has more P450 members in the cucumber, soybean, and grape genomes, whereas CYP82 is absent in the rice and moss genomes. Studies have shown that the CYP82 family makes cotton strongly resistant to disease and stress, and it is speculated that it has a similar role in cucumbers [11]. The CYP93 family is involved in the biosynthesis of soybean flavonoids, with 13 members in soybean and seven members in maize. Similar to Arabidopsis and rice, cucumber has fewer members of the CYP93 family, further confirming the specificity of the P450 gene family.
In addition, non-A-type CYP709, CYP702, CYP708, and CYP705 families exist in the Arabidopsis genome but not in cucumber, soybean, tomato, and other species. P450 genes from these families are specific to Arabidopsis and its closest relatives, such as Brassica napus, making them the only known cruciferous heterosexual CYP protein. It has been suggested that CYP705 family and other P450 genes congregate in different chromosome regions. The genes of the CYP702A, CYP705A, and CYP708A families may be involved in the modification of triterpenes. CYP705 is derived from the CYP712 family. The number of CYP705 family members in Arabidopsis is as high as 26, while they do not occur in other species [31]. The absence of these P450 families in cucumber may be related to their evolution.
Analysis of Ka/Ks
In genetics, Ka/Ks or dN/dS represents the ratio between the non-synonymous replacement rate (Ka) and synonymous replacement rate (Ks). This ratio can be used to determine whether there is a selective burden on the protein-coding genes. It is broadly trusted that non-synonymous modifications are influenced through natural selection, whereas synonymous mutations are not. In evolutionary analysis, it is important to understand the rate at which synonymous and non-synonymous mutations occur. The values of Ka, Ks, and Ka/Ks are based on the coding sequence alignment and are calculated using the Nei and Gojobori model of the KaKs_calculator software [32]. The typically applied indicators are as follows: synonymous mutation frequency (Ks), non-synonymous mutation frequency (Ka), and the ratio of the non-synonymous mutation rate to the synonymous mutation rate (Ka/Ks). Ka/Ks > 1 indicates positive selection, suggesting that natural selection has a changing consequence on protein, resulting in rapid fixation of mutant sites in the population and accelerated gene evolution; Ka/Ks=1 indicates neutral evolution, showing that natural selection has no effect on mutation; Ka/Ks < 1 indicates purification selection, which means that natural selection can effectively eliminate harmful mutations and maintain protein characteristics. These results showed that some P450 gene members in cucumbers experienced significant positive selection pressure.
Collinearity analysis
Collinearity was originally used to describe the locations of genes on the same chromosome. It now refers to the conservation of gene types and relative order in different species derived from the same ancestral type. Collinearity analysis can identify linear homologous genes among species, annotate protein-coding genes, and discover evolutionary events. The collinearity of species was constructed using McScanX and plotted using the Circos software (Figure 9). The CYP450 genes of cucumber, melon, and Arabidopsis genomes were jointly analyzed to study the collinear genetic relationships among them.
Collinearity analysis showed that 71 pairs of collinear genes were identified between the genomes of Arabidopsis and cucumber, indicating that the gene family was significantly amplified before the differentiation of the two species. There were 138 collinear gene pairs between cucumber and melon. These results showed that the CYP450 genes of cucumber and melon were highly evolutionarily conserved.
According to the collinear analysis of pairs between cucumber and Arabidopsis, in addition to one-to-one matching (e.g., Csa3G119520/AT1G01280 and Csa1G004040/AT3G48290), there is also many-to-one matching (e.g., Csa5G153010, Csa6G004550, Csa6G514850/AT1G01600); these results showed that CsCYP450 is relatively conserved and may have originated from the same ancestor and have differentiated functions during evolution. There are also many-to-one matches between cucumbers and melons, such as Csa3G127130, Csa3G903510, and Csa6G079220/MELO3C006237. These results suggest that the functional differentiation of these genes may have occurred in cucumbers and melons during evolution.
Analysis of cis-acting elements of CsCYP450 promoter
To better understand the potential regulatory mechanism of CsCYP450 during cucumber growth and development, the cis-regulatory element in the promoter regions of CsCYP450 was identified in this study. The upstream sequence (2.0kb) of all CsCYP450 translation initiation sites was scanned and the potential role of CsCYP450 expression elements was predicted using the Plant Care tool (Figure S1).
In addition to the TATA-box, CAT-box, and other specific elements, the CsCYP450 promoter contains various cis-regulatory elements that are related to light signal response, tissue and organ development, hormone response, defense, and stress. Many regulatory elements related to the light response were detected in the CsCYP450 gene, such as ACE, ATCT-motif, and CAG-motif. Most genes contain at least two box4 elements, and there are 11 Box4 elements in the promoter region of the Csa2G108690 gene, suggesting that some DNA modules may be involved in the light response.
Among the elements related to plant growth and development, nine specific regulatory elements were identified, including those related to cell differentiation (CAT-box), endosperm expression (GCN4-motif), and zein metabolism regulation (O2-site). Csa5G157290, Csa5G157320, Csa3G852560, and other genes were identified as cis-acting regulatory elements involved in regulating circadian rhythm, which means that their function may be affected by day length.
Among the elements related to plant hormones, 11 hormone response regulatory factors were identified, which were related to abscisic acid [33, 34], GA (GARE-motif, P-box, TATC-box), IAA (AuxRR-core, TGA-box, TGA-element), SA (TCA-element, SARE), and MeJA (CGTCA-motif, TGACG-motif). Among these, the ABA-related response elements are widely distributed. The promoter regions represented by the Csa6G088160 and Csa2G432220 genes were predicted to contain nine ABRE elements, which might be involved in the ABA signaling pathway, and SA and GA elements were enriched in most CsCYP promoters, indicating that they may be induced by SA and GA.
Among the biotic and abiotic stress-related elements, ARE elements were found in most of the gene promoter regions. Seven ARE elements were found in Csa3G818260, and six ARE elements were found in Csa1G595860. ARE is necessary for cis-acting element-based anaerobic induction and may directly affect the antioxidant capacity of the gene. ERE elements have been found in the promoter regions of some genes, such as Csa5G615280 and Csa4G641760. This could be related to the ethylene response. MBS cis-elements were identified in Csa6G366560, Csa1G039830, and other genes, and it was predicted that these genes were involved in MYB binding sites involved in drought induction. In addition, a small number of genes were rich in WUN-motif, LTR, and DRE response elements, suggesting that they may be interested in wound response, low temperature, and osmotic stress response mechanisms.
These results show that CsCYP450 may play a significant role in the response to stress, hormones, and light exposure.
Expression analysis of CsCYP450 gene
Salicylic acid is a significant signaling molecule and plant hormone that plays a significant role in the coordination of development and growth under environmental conditions. It has been demonstrated that SA played a significant role in the activation of defense responses of plants, such as inducing the expression of related genes and producing specific secondary metabolites for defense in plants [35]. At present, it has been confirmed in Arabidopsis that biological stress first induces the accumulation of SA, and then SA combines with SA-binding protein (SABP) to form the SA-SABP complex and initiate intracellular conduction. After NPR1 is activated, the activation and interaction of transcription factors, such as TGA and WRKY, can induce the expression of pathogen-related (PR) protein genes and generate stress resistance [36].
The CYP450 family plays a significant role in defense reactions. Zhang et.al have shown that CYP450 genes, antioxidant enzyme genes and ATP-binding cassette transporters genes, and other genes related to defense signals in Salvia miltiorrhiza are significantly overexpressed under SA induction, which can be used as a genetic tool to study disease resistance [37]. In cucumber, the relationship between CYP450 and SA signals is unclear. Therefore, we analyzed the expression patterns of some CYP450 family genes induced by SA, based on transcriptome analysis data to explore the relationship between CYP450 and SA signaling pathways. Among the genes annotated as P450 in the RNA-SEQ data, 35 were significantly upregulated or downregulated by SA (Table S5). According to the functional annotation of GO, the DEGs were mainly concentrated, especially the integral components of the membrane (GO:0016021) and ribosome (GO:0005840) in cellular components. The main enrichment items in molecular function were oxidoreductase activity (GO:0016705), monooxygenase activity (GO:0004497), iron ion binding (GO:0005506), and heme-binding (GO:0020037). In biological processes, brassinosteroid biosynthetic process (GO:0016132), sterol metabolic process (GO:0016125), RNA polymerase II promoter (GO:0006367), defense response to another organism (GO:0098542), biosynthetic process (GO:0016114), lignin biosynthetic process (GO:0009809), and fatty acid oxidation (GO:0019395) were the most differentially expressed enrichment items. According to the KEGG database comparison, the pathways of significant enrichment of differential genes included the phenylpropane biosynthesis pathway, phenylalanine metabolic pathway, flavonoid biosynthesis pathway, biosynthesis pathway of secondary metabolites, O-antigen nucleoside biosynthesis, and cutin, cork, and waxy biosynthesis pathways. Several studies have demonstrated the involvement of these pathways in plant disease tolerance.
Quantitative reverse transcription polymerase chain reaction (qRT-PCR) was used to verify the differentially expressed genes in different tissues. In addition, some genes such as Csa1G002090 and Csa1G004040 were upregulated after 12 h of SA treatment, and the expression trend of qRT-PCR results was consistent with that of transcriptome sequencing, which indicated the reliability of transcriptome sequencing results. The slight difference between the two experimental methods may be related to differences in the practical methods and principles.
The results of the expression analysis showed that most CsCYP450 showed different expression patterns, but a few showed similar expression patterns. CsCYP450 was highly expressed in roots. Several genes (e.g., Csa3G852560 and Csa3G852630) were highly expressed in the three tissues, and their expression levels increased with SA treatment time. However, other genes such as Csa6G514850 and Csa6G088710 were not expressed in any tissue. Some CsCYP450 genes show extremely high expression in certain tissues, such as Csa6G366560 in leaves, Csa3G651820 in stems, and Csa7G198310 in roots. In this study, CsCYP450 genes belonging to the CYP82 family were found to be highly expressed in roots, stems, and leaf tissues. In addition, most genes have different time- and tissue-specific expression patterns, which are speculated to be related to the differences in the biological functions of CsCYP450 (Figure 10).
Combined with the analysis of the potential function prediction results of the promoter cis-elements, no significant correlation was found with expression. For example, Csa7G198310 contained two TCA-element elements that were significantly upregulated under SA treatment; however, some genes with TCA-element or SARE elements, such as Csa3G852610 and Csa2G435520, showed a decreasing trend or remained unchanged under SA treatment.