Phenotypic identification of oleic acid content in soybean seeds
From 2018 to 2019, the oleic acid content of each soybean lines was analyzed by SPSS 22.0 software. The Oleic acid content of seeds approached the normal distribution (Fig. 1a, b). The SD of oleic acid content of soybean seeds was 6.7 in 2018 and was 7.5 in 2019 (Fig. 1c). The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. The results also indicated that the oleic acid content in different soybean lines is significantly different, the distribution of oleic acid content in soybean seed is continuous, which accorded with the genetic law of quantitative traits. The oleic acid content in soybean line q070 is 35.32% in 2019 and 34.84% in 2018, which was the largest one of all soybean lines. The oleic acid content in soybean line q024 is 5.49% in 2018 and 6.33% in 2019, which was the lowest content of all soybean lines.
Soybean fatty acid correlation analysis and Heritability Calculation
The multiple linear regression model was be used to analysis the relationship between different fatty acids. According to the regression coefficients of standardized multiple linear regression model, the content of soybean oleic acid is significantly negatively correlated with the content of linoleic acid and linolenic acid (Fig. 2). The correlation coefficient between oleic acid and linoleic acid is -0.660 and -0.532 (Table 1), there was a significant positive correlation between soybean oleic acid content and palmitic acid content, with a correlation coefficient 0.421. The heritability of the five fatty acids in 260 soybean lines is different, the heritability of oleic acid was 0.652 (Additional file 1: Table S1).
Table 1. Correlation between relative expression of oleic acid content in soybean
|
Oleic acid
|
Linoleic acid
|
Linolenic acid
|
Palmitic acid
|
Stearic acid
|
Oleic acid
|
Pearson Correlation
|
1
|
-0.660**
|
-0.532**
|
0.421**
|
0.454**
|
Linoleic acid
|
Pearson Correlation
|
-0.660**
|
1
|
0.571**
|
0.051**
|
-0.893**
|
Linolenic acid
|
Pearson Correlation
|
-0.532**
|
0.571**
|
1
|
0.447**
|
-0.473**
|
Palmitic acid
|
Pearson Correlation
|
0.421**
|
0.051**
|
0.447**
|
1
|
-0.043**
|
Stearic acid
|
Pearson Correlation
|
0.454**
|
-0.893**
|
-0.473**
|
-0.043**
|
1
|
Note: * indicates significant at the 0.05, ** indicates significant at the 0.01.
SNP genotyping and SNP Annotation
In our experiment, SLAF-seq technology was used to sequence soybean genomic DNA. A total of 1,102,987 SLAF tags and 2,311,337 SNP markers were obtained (Fig. 3a). According to the number of SNP markers on different chromosomes. The results of SNP distribution on chromosomes are shown in Fig. 3b. According to the location information of SNP loci in reference genome (CDS regions, gene regions or intergenic regions), the mutate loci (non-synonymous mutations) were predicted. More than 50% of the SNP were located in the intergenic region, intergenic region is a stretch of DNA sequences located between genes, Intergenic regions are different from intragenic regions (or introns). Intragenic regions are a subset of noncoding DNA. 10% of the SNP markers were located in the upstream region of genes, and 10% of the SNPs were located in the downstream region of genes. 4.99% of the SNP loci were located in the protein coding regions. The 9.18% SNP markers were located in introns (Fig. 3c).
Phylogenetic analysis, Genetic structure analysis, Principal components analysis (PCA)
From the phylogenetic tree, it can be concluded that there are two large branches, this result suggested that 260 soybean lines are from the same ancestor. However, in the process of evolution, they evolved in two directions (Fig. 4a).
The group structure, refers to the sub-populations with different gene frequencies in the studied populations. The population structure analysis can quantify the number of ancestors of the studied population, and infer the source of each sample. It is a cluster analysis method that is currently more applied and is helpful to understand the evolution process of materials. Based on the SNPs. This experiment used to analyze the soybean population structure. For the study population, the number of subpopulations pre-set in this trial was 15 (Fig. 4b). We analyzed it with EIGENSTRAT in their study of 260 soybean lines. It concluded that sample we collected can be represented as an admixture of two ancestral populations (Fig. 4c).
Based on the difference in SNPs, this experiment performed principal component analysis by EIGENSOFT software, and carried out principal component analysis on 260 soybean materials to obtain clustering of 260 soybean materials. Based on the difference in SNP markers of 260 soybean lines, the PCA result showed that the 260 soybean lines can be divided into two subgroups (Fig. 4d). PC1, PC2 and PC3 accounted for 36.43%, 33.82% and 33.12%, respectively.
Genome-wide association analysis (GWAS) for seed oleic acid
Based on the oleic acid content of 260 soybean lines, the TASSEL software (Glm model, mlm model, cmlm model), fastlmmc software, and Emmax software were used for GWAS. The SNP markers significantly correlated with the oleic acid content of soybean seeds were detected, and the LD distance was set to 8.9 kb. The Manhattan and QQ (Quantile-Quantile) diagrams for oleic acid content in the 2018 and 2019 were showed in Fig. 5 and Fig. 6. The SNP markers which are significantly correlated with the oleic acid content of soybean seeds were searched (Additional file 2: Table S2). Using 9.7 kb as the linkage disequilibrium attenuation distance, candidate genes related to soybean oleic acid traits were screened in the LD distance. In 2018, 21 candidate genes related to soybean oleic acid content were screened using genome-wide association analysis (Table. 2). In 2019, 8 candidate genes were screened by genome-wide association analysis (Table. 2). Based on GO terms, the function of genes is: 1. Major CHO metabolism, 2. cell wall, 3. lipid metabolism, 4. metal handing, 5. N-metabolism, 6. amino acid metabolism, 7. secondary metabolism, 8. hormone metabolism, 9.tetrapyrrole synthesis, 10. Stress, 11. Redox, 12.misc, 13. Protein, 14. Cell, 15. Signaling, 16. Development, 17.transport. The functional distribution of candidate genes is shown in Fig. 7. The candidate gene Glyma.11G229600.1 located on chromosome 11 was detected by GWAS during 2018 and 2019. The function of Glyma.11G229600.1 was not reported in the soybean database, according to Swissport annotation, Glyma.11G229600.1 belongs to the plant BAG protein family. The candidate gene Glyma.04G102900.1 located on chromosome 4 was also detected by GWAS in two years. There is also no functional report about the candidate gene in soybean base, according to Swissport annotation. A similar gene in Arabidopsis belongs to the plant GRAS protein family.
Table. 2.Correlation between relative expression of oleic acid content in soybean
Year
|
chromosome
|
Gene
|
Predicted Function
|
|
Length
|
Contribution rate
|
2018
|
Chr03
|
Glyma.03G054100.1
|
3PREDICTED: Glycine max TMV resistance protein N-like (LOC100805036), transcript variant X3, mRNA
|
|
687
|
0.10
|
|
|
Glyma.03G168200.3
|
3PREDICTED: Glycine max pleiotropic drug resistance protein 1-like (LOC100791601), mRNA
|
|
4662
|
0.07
|
|
Chr04
|
Glyma.04G191100.1
|
3PREDICTED: Glycine max probable pectate lyase 18-like (LOC100814679), mRNA
|
|
1657
|
0.32
|
|
|
Glyma.04G102900.1
|
|
|
2522
|
0.43
|
|
|
Glyma.04G203200.1
|
3PREDICTED: Glycine max respiratory burst oxidase homolog protein C-like (LOC100800248), mRNA
|
|
2440
|
0.08
|
|
Chr05
|
Glyma.05G155300.1
|
3PREDICTED: Glycine max ATP carrier protein 2, chloroplastic-like (LOC100797684), mRNA
|
|
1655
|
0.11
|
|
Chr07
|
Glyma.07G033100.1
|
3PREDICTED: Glycine max ADP,ATP carrier protein 1, chloroplastic-like (LOC100793284), mRNA
|
|
2317
|
0.12
|
|
|
Glyma.07G089000.1
|
3PREDICTED: Glycine max VIN3-like protein 1-like (LOC100780157), transcript variant X2, mRNA
|
|
2756
|
0.10
|
|
Chr08
|
Glyma.08G019700.1
|
3PREDICTED: Glycine max calcium-dependent protein kinase 3-like (LOC100777096), transcript variant 1, mRNA
|
|
1877
|
0.16
|
|
|
Glyma.08G185000.2
|
3PREDICTED: Glycine max probable plastid-lipid-associated protein 4, chloroplastic-like (LOC100803367), transcript variant 1, mRNA
|
|
979
|
0.15
|
|
Chr11
|
Glyma.11G229600.1
|
3PREDICTED: Glycine max DNA replication complex BAG protein,transcript variant 2, mRNA,
|
|
1257
|
0.47
|
|
Chr13
|
Glyma.13G163400.1
|
3PREDICTED: Glycine max protein S-acyltransferase 24-like (LOC100777470), misc_RNA
|
|
2490
|
0.32
|
|
Chr14
|
Glyma.14G045100.1
|
3PREDICTED: Glycine max abscisic-aldehyde oxidase-like (LOC100812604), mRNA
|
|
4517
|
0.22
|
|
Chr15
|
Glyma.15G117700.1
|
3PREDICTED: Glycine max uncharacterized LOC102666654 (LOC102666654), mRNA
|
|
693
|
0.17
|
|
|
Glyma.15G120100.1
|
3PREDICTED: Glycine max tRNA methyltransferase 10 homolog A-like (LOC100779099), mRNA
|
|
1337
|
0.10
|
|
|
Glyma.15G120200.2
|
3PREDICTED: Glycine max uncharacterized LOC102665381 (LOC102665381), mRNA
|
|
1227
|
0.08
|
|
|
Glyma.15G127500.1
|
3PREDICTED: Glycine max polygalacturonase-like (LOC100785701), mRNA
|
|
1551
|
0.10
|
|
|
Glyma.15G201700.1
|
3PREDICTED: Glycine max uncharacterized LOC100814752 (LOC100814752), mRNA
|
|
1945
|
0.11
|
|
|
Glyma.15G210100.3
|
3PREDICTED: Glycine max alpha,alpha-trehalose-phosphate synthase [UDP-forming] 1-like (LOC100797320), transcript variant X6, mRNA
|
|
3542
|
0.12
|
|
|
Glyma.15G244000.1
|
3PREDICTED: Glycine max uncharacterized LOC100814749 (LOC100814749), mRNA
|
|
1213
|
0.10
|
|
|
Glyma.15G261100.1
|
3PREDICTED: Glycine max uncharacterized LOC100801946 (LOC100801946), transcript variant X1, mRNA
|
|
3888
|
0.09
|
2019
|
Chr19
|
Glyma.19G110600.1
|
3PREDICTED: Glycine max uncharacterized LOC102659858 (LOC102659858), mRNA
|
|
1709
|
0.10
|
|
Chr02
|
Glyma.02G220300.1
|
2PREDICTED: Glycine max ataxin-2-like (LOC100788042), mRNA
|
|
1135
|
0.18
|
|
Chr04
|
Glyma.04G102900.1
|
|
|
2522
|
0.10
|
|
Chr08
|
Glyma.08G071600.1
|
2PREDICTED: Glycine max metacaspase-3-like (LOC100796113), transcript variant X2, mRNA
|
|
1839
|
0.12
|
|
Chr11
|
Glyma.11G229600.1
|
3PREDICTED: Glycine max DNA replication complex BAG protein,transcript variant 2, mRNA,
|
|
1257
|
0.47
|
|
Chr12
|
Glyma.12G224000.1
|
2PREDICTED: Glycine max uncharacterized LOC102660202 (LOC102660202), mRNA
|
|
2799
|
0.17
|
|
|
Glyma.12G227300.1
|
2PREDICTED: Glycine max DNA ligase 1-like (LOC100818049), mRNA
|
|
2728
|
0.17
|
|
Chr20
|
Glyma.20G026100.1
|
2PREDICTED: Glycine max 26S proteasome non-ATPase regulatory subunit 7 homolog A-like (LOC100816479), mRNA
|
|
896
|
0.08
|
The expression of two candidate genes in diverse rapeseed accessions and tissues
In order to validate the candidate genes significantly associated oleic acids content, we
selected two genes involved and measured their gene expression in different tissues (root, stem, leaf and seed) by using qRT-PCR. The lectin gene (GenBank: A5547-127) was used as the reference gene. The results showed that the candidate gene Glyma.11G229600.1 in soybean seedlings was expressed in different tissues, but the relative expression of genes was significantly different .The Glyma.11G229600.1 was expressed in soybean leaves, and the relative expression level ranged from 1.23-4.31, the relative expression in stems ranged from 10.21-39.56, and the relative expression in soybean roots ranged from 16.21-43.14 (Table.3). The candidate gene Glyma.11G229600.1 had the lowest relative expression (1.23) in the leaves of the soybean line q001, that has the lowest content of oleic acid. The relative expression of the Glyma.11G229600.1 in the soybean line q001 seeds was also the lowest (25.26). The candidate gene Glyma.11G229600.1 relative expression in leaves of the soybean line q353 was 4.3 times higher than in leaves of the soybean line q001 and in seeds was 10 times higher than in leaves (Additional file 3: Fig. S1). As general conclusion we can say that the relative expression of the candidate gene Glyma.11G229600.1 in different tissues of soybean seedling is significantly different. The correlation coefficient between Glyma.11G229600.1 and oleic acid content is 0.980-0.994 (P<0.01) (Additional file 4: Table. S3). The correlation coefficient strongly indicate that the candidate gene Glyma.11G229600.1 plays a positive role in regulating the oleic acid content in the seeds.
The relative expression of Glyma.04G102900.1 was also analyzed in soybean stems, roots and seeds. The Glyma.04G102900.1 was expressed in soybean leaves, and the relative expression level ranged from 9.62-44.41, the relative expression in stems ranged from 3.18-28.11 (Table.4). Specifically, in the soybean line q001, that has the lowest oleic acid content in seeds, the candidate gene Glyma.04G102900.1 showed the highest expression, which was 49.01 ((Additional file 5: Fig. S2)). In general, the relative expression of Glyma.04G102900.1 in different tissues is significantly different (P<0.05). The correlation coefficient between Glyma.04G102900.1 and oleic acid content is -0.964~-0.998(Additional file 6: Table. S4). The result suggested that Glyma.04G102900.1 is closely related to the oleic acid content, specifically showing a negative effect on the oleic acid content in seeds.
Table 3. Variance analysis of Glyma.11G229600.1 expression in different tissues of soybean
Group
|
Oleic acid content
|
Name
|
relative expression in leaves
|
relative expression in stem
|
|
relative expression in roots
|
relative expression in seed
|
|
mean
|
Sig.
|
Mean
|
Sig,
|
mean
|
Sig
|
mean
|
Sig.
|
mean
|
Sig.
|
Soybean lines with high oleic acid
|
30.01
|
A
|
q318
|
3.41
|
c
|
35.11
|
b
|
32.24
|
b
|
56.27
|
b
|
q20
|
4.08
|
c
|
36.13
|
b
|
35.79
|
b
|
56.80
|
b
|
q073
|
3.67
|
c
|
35.43
|
b
|
35.98
|
b
|
56.69
|
b
|
q176
|
4.13
|
c
|
37.12
|
a
|
40.64
|
a
|
62.06
|
a
|
q070
|
3.85
|
c
|
35.71
|
b
|
36.42
|
b
|
57.89
|
b
|
|
|
|
q68
|
4.32
|
b
|
38.65
|
a
|
40.23
|
a
|
62.43
|
a
|
|
|
|
q353
|
4.61
|
a
|
38.19
|
a
|
41.01
|
a
|
62.73
|
a
|
Soybean lines with low oleic acid
|
14.14
|
B
|
q001
|
1.23
|
d
|
10.11
|
c
|
16.60
|
c
|
25.26
|
c
|
q024
|
1.86
|
d
|
11.11
|
c
|
17.45
|
c
|
25.75
|
c
|
q020
|
1.88
|
d
|
10.43
|
c
|
18.32
|
c
|
26.65
|
c
|
q008
|
1.79
|
d
|
10.15
|
c
|
18.72
|
c
|
26.71
|
c
|
q035
|
1.85
|
d
|
11.01
|
c
|
17.99
|
c
|
26.68
|
c
|
|
|
|
T7287
|
1.34
|
d
|
10.16
|
c
|
18.11
|
c
|
26.36
|
c
|
|
|
|
q033
|
1.90
|
d
|
10.54
|
c
|
18.23
|
c
|
26.81
|
c
|
Note: The different uppercase letters indicate significant differences at P < 0.01, the different lower letters indicate significant differences at P < 0.05, as determined by Duncan’s multiple-range test
Table 4 Variance analysis of Glyma.04G102900.1 expression in different tissues of soybean
Group
|
Oleic acid content
|
Name
|
relative expression in leaves
|
relative expression in stems
|
|
relative expression in roots
|
|
relative expression in seeds
|
mean
|
Sig.
|
mean
|
Sig.
|
mean
|
Sig.
|
mean
|
Sig.
|
mean
|
Sig.
|
|
Soybean lines with low
|
10.31
|
B
|
q318
|
17.32
|
b
|
5.13
|
c
|
22.23
|
c
|
23.02
|
c
|
|
q20
|
18.21
|
b
|
6.12
|
c
|
24.15
|
c
|
24.01
|
c
|
|
q073
|
9.62
|
b
|
5.12
|
c
|
16.12
|
c
|
22.07
|
c
|
|
q176
|
10.29
|
b
|
5.61
|
c
|
18.75
|
c
|
24.08
|
c
|
|
q070
|
18.85
|
b
|
6.16
|
c
|
17.22
|
c
|
25.12
|
c
|
|
|
|
|
q68
|
17.32
|
b
|
3.18
|
c
|
17.15
|
c
|
23.25
|
c
|
|
|
|
|
q353
|
18.11
|
b
|
4.52
|
c
|
18.75
|
c
|
24.28
|
c
|
|
Soybean lines with low oleic acid
|
30.23
|
A
|
q001
|
44.41
|
a
|
28.11
|
b
|
38.34
|
a
|
49.01
|
a
|
|
q024
|
43.21
|
a
|
23.13
|
b
|
38.46
|
b
|
46.52
|
b
|
|
q020
|
41.38
|
a
|
22.41
|
b
|
38.43
|
a
|
48.32
|
a
|
|
q008
|
42.15
|
a
|
23.62
|
b
|
38.89
|
b
|
45.22
|
b
|
|
q035
|
39.25
|
a
|
24.14
|
b
|
44.56
|
b
|
45.01
|
b
|
|
|
|
|
T7287
|
41.31
|
a
|
25.21
|
b
|
40.19
|
b
|
45.05
|
b
|
|
|
|
|
q033
|
39.25
|
a
|
21.02
|
a
|
45.21
|
b
|
44.79
|
b
|
|
Note: The different uppercase letters indicate significant differences at P < 0.01, the different lower letters indicate significant differences at P < 0.05, as determined by Duncan’s multiple-range test