2.1 Identification and analysis of FAD genes in the walnut genome
We identified the FAD family using Arabidopsis FAD family protein sequence information to construct the hidden Markov model. The walnut protein data were searched, and 33 FAD family genes were screened in walnut. Using Pfam domain analysis, 25 FAD family genes were finally obtained, encoding 30 protein sequences, which were numbered according to their annotation information in walnut (Table 1).
Because of the limited depth of analysis of walnut genome data, we do not know the specific distribution of these genes in the 16 pairs of chromosomes in walnut. The shortest length among these FAD genes is 1262 bp, and the longest length is 14280 bp. The CDS lengths of these genes ranges from 453 bp-1368 bp, the lengths of FAD protein translation range from 150 to 455 amino acids, and their MW (molecular weight) range from 32.48 kDa (JrSLD-2) to 52.06 kDa (JrFAD7).
The length of JrFAD3-1 was 2402 bp, its coding region was 1143 bp, and it encoded 380 amino acids, including 8 exons and 7 introns.
Table 1 Basic information of the FAD family in walnut
|
Gene
|
Gene ID
|
Gene
|
Amino acid
|
Gene length (bp)
|
Exon number
|
Intron
number
|
CDS length (bp)
|
Protein length (AA)
|
PI
|
MW
(KDA)
|
JrADS3
|
108998330
|
2497
|
5
|
4
|
1164
|
387
|
9.95
|
44.56
|
JrALD
|
109021249
|
2055
|
1
|
0
|
1344
|
447
|
8.56
|
51.34
|
JrDAL
|
109001694
|
4240
|
2
|
1
|
1149
|
382
|
8.78
|
44.11
|
JrDES1-1
|
109011517
|
2103
|
2
|
1
|
981
|
326
|
7.27
|
37.91
|
JrDES1-2
|
108990634
|
2623
|
2
|
1
|
978
|
325
|
7.83
|
37.74
|
JrFAD2-1
|
109021930
|
3411
|
3
|
2
|
1167
|
388
|
8.72
|
44.38
|
JrFAD2-2
|
109021930
|
3411
|
3
|
2
|
1167
|
388
|
8.72
|
44.38
|
JrFAD2-3
|
109011954
|
6737
|
3
|
2
|
1152
|
383
|
8.83
|
43.90
|
JrFAD2-4
|
109011954
|
6737
|
3
|
2
|
1152
|
383
|
8.83
|
43.90
|
JrFAD3-1
|
109002248
|
2402
|
8
|
7
|
1143
|
380
|
8.26
|
43.88
|
JrFAD3-2
|
108989905
|
2166
|
8
|
7
|
1053
|
350
|
7.79
|
40.51
|
JrFAD3-3
|
108989905
|
2166
|
8
|
7
|
1143
|
380
|
7.37
|
43.61
|
JrFAD3-4
|
108989903
|
2129
|
8
|
7
|
1077
|
358
|
7.8
|
41.43
|
JrFAD3-5
|
108989903
|
2129
|
8
|
7
|
1167
|
388
|
7.39
|
44.5
|
JrFAD4
|
108983937
|
1262
|
1
|
0
|
924
|
307
|
9.11
|
34.68
|
JrFAD6
|
108993197
|
4917
|
10
|
9
|
1329
|
442
|
9.18
|
51.38
|
JrFAD7
|
109007160
|
3038
|
8
|
7
|
1368
|
455
|
8.94
|
52.06
|
JrFAD8-1
|
108994930
|
2963
|
8
|
7
|
1368
|
455
|
8.98
|
51.77
|
JrFAD8-2
|
109015144
|
2965
|
8
|
7
|
1368
|
455
|
8.98
|
51.77
|
JrSAD-1
|
109005061
|
5401
|
3
|
2
|
1191
|
396
|
6.44
|
45.26
|
JrSAD-2
|
108984606
|
5286
|
4
|
3
|
1191
|
396
|
6.39
|
5.16
|
JrSAD-3
|
108984606
|
5286
|
4
|
3
|
1074
|
357
|
5.72
|
40.99
|
JrSAD-4
|
108984606
|
5286
|
4
|
3
|
999
|
332
|
5.4
|
38.05
|
JrSAD-5
|
108988959
|
2413
|
4
|
3
|
1176
|
391
|
7.72
|
44.43
|
JrSAD6
|
109012153
|
1549
|
2
|
1
|
1185
|
394
|
5.98
|
44.63
|
JrSAD-6
|
108989429
|
2075
|
3
|
2
|
1206
|
401
|
8.71
|
45.49
|
JrSAD-7
|
108988967
|
1981
|
3
|
2
|
1170
|
389
|
8.26
|
44.32
|
JrSAD-8
|
109018931
|
14280
|
4
|
3
|
1002
|
333
|
6.16
|
38.08
|
JrSLD-1
|
109021451
|
1905
|
1
|
0
|
1344
|
447
|
6.64
|
32.64
|
JrSLD-2
|
108988023
|
1846
|
1
|
0
|
1344
|
447
|
6.42
|
32.48
|
2.2 Analyses of the evolution, exon-intron structure and motif distribution of JrFAD family members
Using MEGA 5.0 software with the maximum likelihood method, the walnut FAD protein sequence was constructed together with the Arabidopsis FAD protein sequence to construct a phylogenetic tree (Fig. 1), indicating that the FAD gene families of Juglans regia and Arabidopsis thaliana are similar. There are four main subfamilies: the SAD desaturase subfamily, Δ7/Δ9 desaturase subfamily, Δ12/ω3 desaturase subfamily and "front-end" desaturase subfamily.
So far, SAD is the only subfamily of soluble enzymes found in the FAD family, and the remaining types of fatty acid desaturases are membrane integrins [25-26]. There are 7 copies of the SAD gene in Arabidopsis, and 9 in walnut, and these genes are well clustered in the unified phylogenetic tree. In the Δ12/ω3 desaturase subfamily, six branches of the walnut ω-6 desaturase and two from Arabidopsis were grouped together as Δ12 fatty acid desaturases; three Arabidopsis FAD3 genes and eight annotated ω-3 desaturase genes in walnuts together formed the Δ12/ω3 desaturase subfamily; in the five FAD3 genes of walnut, FAD3-1 was far from the other four evolutionary distances. By comparing the gene expression levels during the development of walnut kernels, it was found that the expression of FAD3-1 only was detected in the developing kernels, and the other four genes were not expressed or were expressed at very low levels, indicating that FAD3-1 is a key enzyme gene that catalyses the synthesis of linolenic acid in walnut kernels.
To further investigate the structural evolution of walnut FADs, we first analysed the exon intron structure (Fig. 2C). Analysis showed that the FAD genes did not contain introns, except for JrALD, JrSLD-1, JrSLD-2 and JrFAD4, which had from 1-9 introns. Genes in the same subfamily have similar intron and exon structures. FAD6 differs from other Δ12 desaturase genes in that it has the largest number of introns and exons. Both DESI genes had two exons, while the same group of ALD and SLD genes had only one exon. The number of exons in the SAD subfamily ranged between 2 and 4. JrSAD-8 contained 4 exons, the highest amount, while SAD6, SAD-3 and SAD-4 contained two exons, and the remaining genes contained three exons.
Subsequently, we used MEME software to analyse the conservation of these protein sequences and structures (Fig. 2B). Most of the 20 conserved motifs (Fig. 2D) found belonged to the typical transmembrane helix region and unknown functional complex region in fatty acid desaturases. No common conserved motifs were found among the 30 members of the walnut FAD family, but the distribution of conserved motifs was very similar within different subfamilies. In the SAD subfamily, all genes except JrSAD-8 contain conserved motifs 1, 3, 4, 7, and 10; in the Δ12/ω3 subfamily, except for JrFAD6, the genes contain conserved motifs 2, 4, 5, 6, 8, and 9.
2.3 Expression of genes related to unsaturated fatty acid synthesis
The expression of genes related to unsaturated fatty acid biosynthesis and arachidonic acid metabolism peaked at 110 d after anthesis based on transcriptome sequencing. The genes related to alpha-linolenic acid metabolism were first decreased and then increased and reached their maximum value at 110 days after anthesis (Supplementary Fig. 1).
Seventeen genes enriched in the metabolic pathway of unsaturated fatty acid biosynthesis were selected (Fig. 3). KCS2 (encoding 3-ketoacyl-CoA thiolase), fabG (encoding 3-oxoacyl-ACP reductase) are involved in carbon chain elongation; fadE (encoding acyl-ACP desaturase), SSI2 (encoding acyl-ACP desaturase), FAD2 (encoding fatty acid desaturase 2), FAD3-1(encoding fatty acid desaturase 3), Acox1(encoding peroxisomal acyl-CoA oxidase) and SAD (encoding stearoyl-ACP-desaturase )are involved in the desaturation process. The expression of 17 transcripts of these 8 genes can be found to be roughly divided into two categories. The expression level was lower at 70 d after flowering and higher at 110 d after flowering. There were 7 transcripts of fabG, KCS2, fadE, FAD2 and FAD3-1, which were mainly involved in the biosynthesis of linoleic acid, linolenic acid and isounsaturated fatty acids. The other general trend is that the expression level was relatively high at 70 d after flowering and then gradually decreased. 10 transcripts of fabG, KCS2, SAD, SSI2 and FAD2 were mainly involved in the biosynthesis of oleic acid and linoleic acid. The expression of FAD3-1 increased rapidly during the period from 70 d to 110 d after flowering. The dehydrogenation of linoleic acid to α- linolenic acid began 70 d after flowering while the α-linolenic acid content was almost zero, but the genes encodeing enzyme that can catalyze the dehydrogenase of linoleic acid to linolenic acid were highly expressed at this time. With the rapid increase in the expression of FAD3-1, the content of α-linolenic acid in the kernel began to increase gradually.
2.4 Tissue-specific expression of FAD family genes
By semi-quantitative detection (Fig. 4), SLD-1 was found to be expressed in all 8 tissues, among which the expression levels were higher in catkins, old branches, mature leaves and kernels; DAL was most highly expressed in young and mature leaves. JrFAD3-1 was expressed in catkins, young leaves and kernels, and the highest expression level was observed in mature embryos; ADS3, SAD-2 and SAD6 were only expressed in mature embryos. It can be preliminarily concluded that the FAD family is characterized by the desaturase subfamily and that the Δ7/Δ9 desaturase subfamily is specifically expressed in the embryo, while the "front-end" desaturase subfamily is expressed in all tissues. The Δ12/ω3 desaturase subfamily is highly expressed in mature embryos. However, the determination of more specific expression patterns still requires further research and verification.
2.5 Expression of JrFAD3-1 and accumulation of α-linolenic acid in kernels at different developmental stages
The relative expression of JrFAD3-1 increased slowly at 70 days after flowering in the ‘QingXiang’ kernel, and the relative expression increased rapidly after 90 days after flowering, peaking at 100 days after flowering (Fig. 5). Then, the expression quickly decreased, and it gradually stabilized at a lower level after 120 days. We also detected the content of α-linolenic acid in identical samples and found that the content of α-linolenic acid was maintained at a low level from 70 days to 95 days after flowering and gradually accumulated from 95 days to 120 days after flowering. It increased rapidly from 120 days to 130 days after flowering and peaked at 40.92 mg/g (Fig. 5), after which it decreased slightly. Combining the two results showed a 30-day difference between peak JrFAD3-1 gene expression and peak of α-linolenic acid levels. The expression of JrFAD3-1 at 70-100 days and 100-130 days was analysed. There was a significant positive correlation between the expression of JrFAD3-1 and the content of α-linolenic acid at 5 periods (from 90-100 days after anthesis) and the content of α-linolenic acid at 5 periods (from 100-130 days after anthesis), with a correlation coefficient of 0.991. Therefore, it is possible that the decrease in JrFAD3-1 expression and the increase in water content in the later stage are factors leading to the decline in the α-linolenic acid content in the later stage.
2.6 Multiple sequence alignment of JrFAD3-1 proteins
Multiple sequence alignment of the FAD3 proteins in 32 different species was performed using MEGA 5.0. The results show that FAD3 protein sequence was conserved between monocotyledonous and dicotyledonous plants and yeast (Fig. 7). Conservative domain prediction was performed by a database search of the Pfam protein domain family, and all FAD3 proteins contained the fatty acid desaturase motif among 32 different species (Fig. 6). Through comparative analysis, 62 completely conserved regions in the 32 protein sequences were found. It can be further confirmed that the primary structure of the JrFAD3-1 protein is closely related to the content of α-linolenic acid.
2.7 Phylogenetic evolution of FAD3 proteins
The retrieved homologous proteins were multiplexed with the JrFAD3-1 protein with MEGA 5.0 (Fig. 8), and the reported Glycine max (Genbank: ABP01320.1; ABV00679.1; ACD56667.1), Phaseolus lunatus (Genbank: ADP07952.1), Vigna unguiculata (Genbank: ABY60736.1), Betula pendula (Genbank: AAN17504.1), Corylus heterophylla (Genbank: AEF80000.1), Crepis alpina (Genbank: ABA 55806.1), Solanum lycopersicum (Genbank: ABX24525. 1), Eucommia ulmoides (Genbank: ARQ20744.1), Linum usitatissimum (Genbank: AQT18955.1), Jatropha curcas (Genbank: ABX82798.1), Vernicia fordii (Genbank: AAC98967.2), Picea abies (Genbank: CAC18722.1), Paeonia suffruticosa (Genbank: AVZ47050.1), Salvia hispanica (Genbank: ARE67897.1), Perilla citriodora (Genbank: AQZ42316.1), Perilla frutescens (Genbank: AAD15744.1), Physaria fendleri (Genbank: AWX65625.1; AWX65624.1), Orychophragmus violaceus (Genbank: AWY93817.1), Chorispora bungeana (Geneb Ank: AKN35208.1), Descurainia sophia (Genbank: 104815269), Brassica juncea (Genbank: ADJ58020.1), Brassica oleracea (Genbank: AGH20189.1), Brassica napus (Genbank: AAT09135.1) Sinapis alba (Genbank: AGH20189.1), Oryza sativa indica group (Genbank: BAA28358.1), Triticum aestivum (Genbank: BAA28358.1), Candida tropicalis (Genbank: ADN42964.1), and Arabidopsis thaliana (Genbank: NP_180559.1) all have a homology FAD proteins. The domains all have conserved amino acid residue sequences. Thirty-one plant-derived FAD3 proteins are grouped into one group, and the outer group of Candida tropicalis is a single group; JrFAD3-1 is closely related to Betula pendula and Corylus heterophylla at 100% confidence. This further indicates that the conserved structure of FADs is consistent with classical plant architecture.