Identification of genes encoding trihelix transcription factors in the domestic apple genome
A search for genes encoding the transcription factors of the trihelix family was carried out using the HMMER3 software and the HMM profile downloaded from the PFAM database. Confirmation of candidate genes belonging to the trihelix family was performed using SMART. In total, 37 genes belonging to the trihelix family (marked as MdTH) with conserved domains characteristic of the transcription factors of this family were found in the apple genome of the Golden Delicious variety.
The length of the amino acid sequence encoded by them varies from 278 to 917 amino acid residues (514.5 on average). The molecular weight of MdTH proteins ranges from 31.9 to 101.3 kDa (57.7 kDa on average). Estimated isoelectric points range from 5.23 to 9.78 (7.4 on average). The results of the cellular localization assessment in silico demonstrate that 28 hypothetical transcription factors of the MdTH are located in the nucleus, 2 ― in chloroplasts (Table 1).
Evaluation of the chromosomal distribution of MdTH showed that 13 out of 17 chromosomes (except for 1, 8, 9, and 15) carry at least one of such genes. The largest number of genes (six) is located on 5th and 6th chromosomes, while chromosomes 3, 7, 13, 16 and 17 carry only one gene each. There is no clear relationship between the chromosome length and the number of genes located on it (Fig. 1). Genes located on different chromosomes may have more similarities than those on the same chromosome evidencing that polyploidization processes played a role in the formation of the apple genome. The domestic apple genome was formed as a result of duplication and subsequent reorganization of nine chromosomes of a precursor species (Velasco et al. 2010). A model for the origin of the M. × domestica genome designed on the basis of sequencing results suggests that chromosomes 5 and 10 originated from chromosome I of an ancestral species. This explains a high degree of similarity in the structural organization of MdTH8 and MdTH21, MdTH9 and MdTH22, MdTH10 and MdTH23 genes (Fig. 2). Chromosomes 2 and 7, 4 and 12, 6 and 14, 12 and 14 or their individual regions are also combined by common origin (Velasco et al. 2010). As a result, we can observe that MdTH genes located on chromosomes 2 and 7, 4 and 12, 6 and 14, 12 and 14, as well as on chromosomes 5 and 10, are characterized by a higher degree of homology relative to each other compared with other genes of this family (Fig. 2).
Table 1 Trihelix family members identified in the apple tree
Gene name
|
Gene ID
|
Protein length (a.a.)
|
MW (kDa)
|
PI
|
Localization of cells
|
Chromosome of the gene localization
|
MdTH1
|
MD02G1219900
|
366
|
40481.75
|
9.16
|
Nucleus
|
2
|
MdTH2
|
MD02G1244500
|
491
|
53176.73
|
8.57
|
|
2
|
MdTH3
|
MD02G1247700
|
569
|
65668,97
|
6.06
|
Nucleus
|
2
|
MdTH4
|
MD02G1318400
|
742
|
81180.12
|
9.75
|
Nucleus
|
2
|
MdTH5
|
MD03G1089900
|
674
|
74397.21
|
9.06
|
Nucleus
|
3
|
MdTH6
|
MD04G1154600
|
287
|
34785.25
|
7.82
|
|
4
|
MdTH7
|
MD04G1221900
|
658
|
74968.15
|
5.84
|
Nucleus
|
4
|
MdTH8
|
MD05G1024300
|
448
|
51480.28
|
6.12
|
Nucleus
|
5
|
MdTH9
|
MD05G1024400
|
665
|
74379.05
|
5.66
|
Nucleus
|
5
|
MdTH10
|
MD05G1024500
|
572
|
64620.71
|
8.07
|
Nucleus
|
5
|
MdTH11
|
MD05G1174500
|
291
|
32988.47
|
9.33
|
Nucleus
|
5
|
MdTH12
|
MD05G1322900
|
474
|
53022.84
|
9.78
|
Nucleus
|
5
|
MdTH13
|
MD05G1361500
|
412
|
46153.60
|
5.61
|
Nucleus
|
5
|
MdTH14
|
MD06G1021000
|
288
|
33181.41
|
8.88
|
|
6
|
MdTH15
|
MD06G1046900
|
353
|
39091.11
|
9.32
|
Nucleus
|
6
|
MdTH16
|
MD06G1127800
|
764
|
83305.38
|
5.75
|
Nucleus
|
6
|
MdTH17
|
MD06G1143600
|
917
|
100966.22
|
7.62
|
Chloroplast
|
6
|
MdTH18
|
MD06G1172900
|
485
|
55103.10
|
6.50
|
|
6
|
MdTH19
|
MD06G1196500
|
338
|
37845.55
|
7.74
|
Nucleus
|
6
|
MdTH20
|
MD07G1068800
|
566
|
64781.83
|
5,69
|
Nucleus
|
7
|
MdTH21
|
MD10G1024800
|
447
|
51549,42
|
6.11
|
Nucleus
|
10
|
MdTH22
|
MD10G1025100
|
664
|
74452.43
|
6.47
|
Nucleus
|
10
|
MdTH23
|
MD10G1025200
|
563
|
63544.23
|
8.56
|
Nucleus
|
10
|
MdTH24
|
MD10G1163600
|
278
|
31889.18
|
7.72
|
Nucleus
|
10
|
MdTH25
|
MD10G1338800
|
130
|
15328.55
|
9.79
|
Nucleus
|
10
|
MdTH26
|
MD11G1017600
|
593
|
65658.60
|
9.21
|
Nucleus
|
11
|
MdTH27
|
MD11G1079000
|
497
|
54971.24
|
6.35
|
Nucleus
|
11
|
MdTH28
|
MD12G1018900
|
538
|
60775.83
|
6.36
|
Nucleus
|
12
|
MdTH29
|
MD12G1168300
|
296
|
35422.79
|
8.35
|
|
12
|
MdTH30
|
MD12G1238000
|
656
|
73982.06
|
5.70
|
Nucleus
|
12
|
MdTH31
|
MD13G1109800
|
365
|
42097.23
|
6.21
|
Nucleus
|
13
|
MdTH32
|
MD14G1016900
|
532
|
60326.45
|
6.48
|
Nucleus
|
14
|
MdTH33
|
MD14G1058900
|
674
|
74986.94
|
9.10
|
|
14
|
MdTH34
|
MD14G1143600
|
727
|
79127.51
|
5.23
|
Nucleus
|
14
|
MdTH35
|
MD14G1158900
|
917
|
101294.49
|
8.36
|
Chloroplast
|
14
|
MdTH36
|
MD16G1109700
|
382
|
43571.81
|
6.05
|
|
16
|
MdTH37
|
MD17G1017400
|
417
|
46616.93
|
5.47
|
Nucleus
|
17
|
Phylogenetic analysis of trihelix proteins
In order to assess phylogenetic relationships between the apple trihelix transcription factors and the model plant Arabidopsis thaliana, an unrooted phylogenetic tree, containing 33 hypothetical proteins of trihelix Arabidopsis thaliana and 37 of the apple tree, was constructed (Fig. 3). Apple trihelix transcription factors belong to one of six subfamilies (GT-1, GT-2, SH4, GTγ, GTδ, and SIP1) depending on the conserved amino acids of the GT domain, the number of DNA-binding motifs, and the classification of homologues.
The largest is the cluster that combines GT-2 subfamily genes. It includes 14 apple tree genes and 6 Arabidopsis genes. The SH4 cluster is represented by 13 genes and eight out of them are of the apple tree. Fifteen genes are included in the SIP1 cluster; five out of them are apple tree genes. The GT-1 cluster combines 10 genes and six out of them are of the apple tree. The GTγ cluster is represented by five genes, including two of the apple tree. The GTδ cluster is represented by two apple tree genes. The genes of this subfamily were not originally isolated from Arabidopsis (Gao et al. 2009). They are described in tomato, sunflower, and rice (Song et al. 2021; Yu et al. 2015; Li et al. 2019).
Analysis of the gene structure and motif of the MdTH
MdTHs contain the consensus sequences of conserved amino acid motifs (Fig. 4A). Sequences, containing similar motifs, are grouped into clusters the location of which correlates with phylogenetic analysis results (Fig. 3, Fig. 4A). Analysis of conservative motifs within the subfamilies showed that motifs 2, 3, 6, and 8 are found only in GT-2 subfamily members. Representatives of the GT-2 subfamily contain the largest number of motifs. Four of them have all ten motifs, two have nine, and another two contain eight. Motifs 5 and 10 contain GT-2 and GT-1 subfamily sequences.
Motif 1 turned out to be the most widespread and may be found in the composition of all sequences. Motif 7 was discovered in 36 out of 37 sequences, and it could be found no more than once in each. Motif 4 is the next most frequently occurring and was found in 32 proteins. Motif 6 was represented only in 6 sequences belonging to the TG2 subfamily. Motif 10 may be found only in the composition of six sequences belonging to two subclasses ― GT-2 and GT-1. The SH4 subfamily was characterized by the smallest number of conserved motifs from 1 to 3.
The coding sequences of most MdTH genes are separated by introns and their number varies from 1 to 16 (Fig. 4c). The number of introns in GT-1, GT-2, and SH4 subfamily members ranges from 1 to 8. The intron length and location in the gene differ both in the representatives of different subfamilies and within the subfamilies themselves. The greatest differences from other subfamilies in the exon-intron organization were found in MdTH17 and MdTH35 genes belonging to the GTδ subfamily. Both representatives of the GTδ subfamily carry 16 introns in their composition.
Seven genes have no introns. These are MdTH8 and MdTH21 genes belonging to the GTγ subfamily. Introns are absent in four out of five representatives of the SIP1 subfamily, while the remaining representative of MdTH2 carries six introns in the gene sequence. The coding MdTH25 GT-2 subfamily sequences are not separated by introns.
Identification of hypothetical cis-elements in the promoter regions of MdTH
Analysis of the nucleotide structure of regions located before the MdTH genes suggests that they are involved in a complex network of interactions between regulatory proteins controlling the vital activity of plant cells.
One of the key factors influencing the expression of MdTH genes is light. The elements involved in light-dependent signaling pathways were discovered in the promoter regions of all family members with no exceptions found (Figure 5). Among them, G-box (detected in the promoter regions of 33 out of 37 genes, which is 89.2%), Box 4 (78.4%), GT-1 (64.9%), TCT-motif (56.8%), and AE-box (51.3%) are the most frequently occurring. The 3-AF1 binding site, AAAC-motif, A-box, ACA-motif, ACE, AT1-motif, ATC-motif, ATCT-motif, Box II, CAAT-box, CAG-motif, chs-CMA1a/ 2a, chs-Unit 1 m1, GA-motif, Gap-box, GATA-motif, G-Box, GT1-motif, GTGGC-motif, I-box, LAMP-element, L-box, MRE, Sp1, TCC- motif, and TCCC-motif elements were also found.
The expression of MdTH genes under stress conditions is evidenced by the detected STRE elements (91.9% of the promoter regions contain them) and TC-reach repeats (29.8%). ARE elements involved in the response to anaerobic stress were found in the regulatory regions of 81.1% genes; CG involved in the response to anoxia in 21.6%; MBS, DRE1 and DRE regulating the expression of genes under the conditions of drought in 51.4%, 5.4% and 13.5% respectively; LTR involved in the response to low temperatures in 48.6%. The possible expression of MdTH genes in response to wound stress and pathogen attack may be judged by the presence of WUN, Box S, WRE3 and WRE motifs found in the regulatory regions of 35.1%, 18.9%, 43.2% and 5.4% genes respectively. At least one of these elements contains the regulatory regions of 67.6% MdTH genes (Figure 6).
In the promoter regions of each of the MdTH genes, the binding sites of regulatory proteins with the MYB domain were found. They are involved in the regulation of a response to biotic and abiotic stress, including epigenetic control, hormonal signaling pathways, regulation of cell differentiation and shape, and phenylpropanoid biosynthesis. As found, 97.3% of the MdTH genes contain characteristic DNA sequences for binding transcription factors from the MYC family. They are part of the hormonal regulation system for jasmonic and salicylic acids participating in coordination of plant growth and development in response to various types of stress and other biological processes. With regard to analyzed regulatory regions, 54.1% contains W-box ― the DNA sequence to which WRKY transcription factors bind. They perform many functions, such as the formation of resistance to diseases, stress, ontogeny, and other, including hormonal regulation (Figure 6).
Induction of the expression of trixelix transcription factors may be indirectly triggered by plant hormones: abscisic acid as evidenced by the presence of at least one of the following elements: ABRE, ABRE2, ABRE3a, ABRE4, and AT-ABRE in the regulatory regions of 91.9% of genes; gibberellin (at least one of p-box, TACT-box, and GARE elements was found in 75.7%), salicylic acid (TCA and/or as-1 elements have regulatory regions of 91.9% of genes), ethylene (ERE elements were found in 48.6% of cases). CGTCA and TGACG sequences involved in the methyl jasmonate-induced signaling pathway were found in the promoter regions of 75.7% and 83.8% of genes respectively. Certain elements associated with signaling cascades regulated by hormones were found in the promoter regions of each gene (Figure 6).
The expression of MdTH genes may be tissue-specific and depend on the plant development stage. This is evidenced by AC1, AC2, telo-box, and F-box elements (the latter is also involved in the response to biotic and abiotic stress); dOCT, which is involved in the regulation of motif I (expression in the root) growth and development. The expression of MdTH genes may be indicated by the following elements: AP-1, which regulates flowering; O2 ― prolamin metabolism; MBS-I ― flavonoid biosynthesis; GCN-4 associated with the expression in the endosperm; CAT ― in meristems; and HD-ZIP1 responsible for palisade mesophyll differentiation. Apparently, a list of functions of trixelix transcription factors also includes the regulation of proliferation and a cell cycle, as evidenced by the detected regulatory elements: re2f-1, NON, MSA-like, and e2fb.
Thus, analysis of the nucleotide structure of regions located directly before the genes encoding the transcription factors of the trixelix family suggests that they are involved in a complex system of interaction of regulatory processes essential for plant life. Analysis of detected regulatory elements allows assuming that the expression of MdTH genes occurs continuously; however, the conditions of plant existence may lead to a change in its level. It may be influenced by ontogeny stages, the specifics of tissues and organs, various environmental factors, including changes in the level of light, as well as various types of biotic and abiotic stress effects. The results obtained are consistent with the data derived during the study of the expression of trihelix proteins on other objects demonstrating that most of them are expressed under normal conditions, but their expression level may change significantly when exposed to stress.
Searching for phosphorylation sites specific for mitogen-activated protein kinase (MAPK) and predicting miRNA targets
The paper (Li et al., 2015) demonstrates that in Arabidopsis MAP KINASE4 triggered the rapid phosphorylation of the ASR3 gene, belonging to the trihelix family, after treatment with MAMP (microbe-associated molecular patterns) (Li et al. 2015). This suggests that the members of this family may be post-translationally regulated by phosphorylation. Analysis of hypothetical MdTH proteins showed that each of them has at least one putative phosphorylation site. At that, serine residues predominate among them. Thus, among GT-2 family members, the number of supposed phosphorylated serine residues is 1-19 (more than 10 in most representatives) and of threonine is from 0 to 5; in SH4 family members, from 2 to 49 of serine residues and from 0 to 7 of threonine residues; in SIP1 representatives, the number of putative phosphorylation sites by serine ranges from 7 to 22 and threonine 0-4. In GT-1 family members, the number of putative phosphorylated serine residues ranges from 6 to 11 and threonine – from 1 to 3. No threonine phosphorylation sites were found in GTγ family members with 13-15 putative serine phosphorylation sites. Representatives of GTδ have three putative phosphorylated threonine and 31-40 serine residues each.
miRNAs are a class of non-coding regulatory RNAs that are involved in the regulation of gene expression by inhibiting translation or cleavage of the target mRNA (Unver et al. 2009; Eldem et al. 2013; Zhang 2015). Analysis of mRNAs encoding MdTH showed that 21 out of them contain at least one of the specific binding sites for one or more of 75 apple miRNAs able to cleave mRNA or inhibit its translation. The largest number of binding sites for various miRNAs (32) was found in the mRNA sequence MdTH34. MdTH12 and MdTH17 contain 19 binding sites each; MdTH36 and MdTH8 ― six each; MdTH5 and 15 ― five each; MdTH35 has four; MdTH7, MdTH29 and MdTH30 have three each. MdTH4, MdTH6 and MdTH31 contain two binding sites each. MdTH1, MdTH9, MdTH14, MdTH20, MdTH28, MdTH32 and MdTH37 have one. At that, MdTH34 contains two copies of binding sites mdm-miR535b and mdm-miR535c, while all other genes contain only one copy of each binding site.
MdTH gene expression patterns under conditions where MM-106 apple rootstocks are exposed to stress
Various adverse environmental factors, such as drought, high and low temperatures, and soil salinity, may affect the growth, development and productivity of plants. In order to adapt to the conditions of stress, plants change the expression of genes associated with stress. In the work presented here, we tested a response of 14 MdTH genes to the effect of certain unfavorable abiotic factors, since their representatives are known to be involved in adaptation processes in other plants (Fang et al. 2010; Li et al. 2019; Murata et al. 2002; Xie et al. 2009).
Evaluation of the expression of 14 MdTH genes in response to drought (Fig. 7) showed that in 10 out of them a relative expression level had increased and most strongly in MdTH4 at 24th hour of exposure. The expression level of three genes (MdTH21, MdTH23, and MdTH30) had, on the contrary, a tendency to decrease when exposed to drought.
A decrease in temperature to 4°C caused a significant decrease in the MdTH4 expression at the 2nd and later hours of exposure (Figure 7). The greatest increase in the expression level was observed in the MdTH30 gene at the 4th hour. Most genes (MdTH8, MdTH9, MdTH20, MdTH21, MdTH31, MdTH35, and MdTH36) demonstrated an increase in their expression level at the 2nd and/or 4th hour of exposure. A day later, the relative level of their expression turned out, as a rule, to be lower than the starting point of sampling, but not always (in the MdTH35 gene, for example, the relative level of expression at the 24th hour of exposure was higher than at the zero point of sampling).
In order to study the effect of elevated temperatures, the plants were placed in the climate chamber with a temperature of 40°C. Evaluation of the expression levels of MdTH genes in MM-106 rootstocks (Figure 7) showed that in most of the genes under study the expression changes slightly. In some (e.g. MdTH4, MdTH11, MdTH22 etc.), it decreases. In three genes, it increases significantly, e.g. MdTH8 at the 24th hour of exposure; MdTH20 and MdTH36 at the 4th hour with the latter having the strongest changes in response to elevated temperatures.
Among all of the stress factors investigated, exposure to a saline solution caused the greater homogenous reaction of the genes under study: we observed an increase in the expression level of most of them (MdTH4, MdTH8, MdTH9, MdTH11, MdTH20, MdTH21, MdTH22, MdTH24, MdTH30, MdTH31, and MdTH36) at the 2nd hour of expose. At that, the magnitude of a response of different genes was not the same. The greatest increase in the expression level was observed in MdTH4 and MdTH24 genes. At later sampling points, MdTH gene expression levels decreased (Figure 7).