DOI: https://doi.org/10.21203/rs.2.13122/v1
The oil palm is an important crop with a higher quality oil and yield potential compared to other vegetable oil crops [1]. Colombia is the fourth-largest producer worldwide with 3.8 tons of palm oil per hectare, which positions Colombia above the world’s average yield [2]. Within the Arecaceae family, the oil palm species (Elaeis guineensis), native to West Africa, is the main source of most vegetable oil [3]. However, another oil palm species native to the tropical Central and South America, known as Elaeis oleifera, is recognized for its high yield production [3]. The palm is a perennial monocot with a lifespan of approximately 25 years [4], which results in a slow breeding progress. The Colombian Corporation of Agricultural Research (Agrosavia) breeding program has focused on developing OxG interspecific crosses (E. oleifera x E. guineensis). OxG hybrids are characterized by a slow trunk growth [5], as well as tolerance to bud rot [6–9] and red ring diseases [10] in comparison to their parents. Additionally, OxG hybrids inherit the parthenocarpic fruit development of E. oleifera species, which allows the production of seedless fruits [11].
Over the last 20 years, a genetic map of the oil palm have been constructed [12]. Saturated genetic linkage maps are important for the identification of genomic regions associated with major genes and quantitative trait loci (QTLs) controlling agronomic traits. The first marker-based genetic maps for oil palm were generated using restriction fragment length polymorphisms (RFLPs) and amplified fragment length polymorphisms (AFLPs) [13, 14]. Dense genetic maps were subsequently constructed using simple sequence repeats (SSRs) and single nucleotide polymorphism (SNP) markers, which have also been used for QTL identification. Thus, Jeennor and Volkaert [15] identified a QTL associated with bunch weight using a mapping population of 69 samples and a genetic map constructed with 89 SSRs and 101 SNPs. Billotte et al. [16], used a multi-parent linkage map generated with 251 SSRs and reported QTLs associated to bunch traits. Similar approaches have enabled the identification of 164 QTLs associated with 21 oil yield component using SSR, AFLP, and RFLP markers [17].
In recent years, advances in next-generation sequencing technology have lowered the cost of DNA sequencing to the point that we can obtain millions of SNPs compared with other technologies [18, 19]. The technique genotyping by sequencing (GBS) is a rapid, low-cost, and robust approach to screen breeding populations using hundreds or thousands of SNPs [20]. Pootakham et al. [21] constructed an oil palm map using an F2 population and 1,085 SNPs derived from GBS and were able to identify QTLs for height and fruit bunch weight. Similarly, a genome-wide association analysis (GWAS), using a larger number of SNPs (4,031) derived from GBS across a diverse panel of E. guineensis, allowed to identify novel QTLs associated with the increasing in the trunk height [22].
GWAS have been proposed as a robust approach over QTL linkage mapping [23]. The use of a wide range of genetic background in GWAS analyses increases the possibility to detect QTL regions associated with traits of interest, compared to the limited genetic variation of a bi-parental mapping population [24]. However, GWAS limitations such as the effect of population structure can lead to spurious associations between a candidate marker and a phenotype [25]. To solve this, the mixed linear model incorporates the structure data (Q) and the relative kinship effects (K), resulting in the reduction of false positive associations [26].
Given the food, industrial, and medical purposes, palm oil has experienced rapid growth in economic importance and nowadays is considered the second most traded vegetable oil crop in the world after soybean [27, 28]. The increasing demand of this crop is caused by a shift away from trans-fats to healthier alternatives [29], and because its residues can be processed to produce biofuel [28]. For these reasons, the identification of specific genes involved in morphological traits such as height, foliar area, and its relationship with productivity, is becoming more critical for this crop.
Although previous studies have identified QTLs controlling morphological and yield-related traits in oil palm, these QTLs were detected using F1 intraspecific populations. Our study is the first report in which molecular markers were mapped through association analysis in an interspecific OxG hybrid populations. Our study aims to: (i) genotype an OxG oil palm mapping population; (ii) analyze genetic diversity, population structure and linkage disequilibrium; (iii) perform GWAS to identify loci or candidate genes involved in yield and other morphological traits for future use in breeding programs.
Plant material
The plant materials used for current GWAS study came from the National Germplasm Collection of Colombia, maintained at the Colombian Corporation of Agricultural Research (Agrosavia). All accessions were collected following national regulations according to the genetic resources agreement for scientific research without commercial interest No. 74, signed between Agrosavia and the Ministry of Environment and Sustainable Development from Colombia.
A population of 471 oil palm samples consisted of (62) E. oleifera, (31) E. guineensis and (378) OxG F1 interspecific hybrid was used in this study. The OxG samples were generated through eight different crossings, however the parents of these crossing are already deceased. For the purpose of this study, other E. oleifera genotypes native to the northern of Colombia and E. guineensis genotypes that have a Yamgambi and Deli origin were used to estimate the genetic relationships. The OxG and E. oleifera samples were collected from the Research Center El Mira and the E. guineensis samples were collected from the Research Center La Libertad of Agrosavia [54]. Details of the plant materials can be found in Table S1.
Genotyping and SNP calling
Genomic DNA was extracted from young leaf tissue using the DNeasy Plant Mini Kit (QIAGEN, Germany). All 471 samples were genotyped using the GBS protocol following the procedure cited by Elshire et al. [20]. GBS library preparation and sequencing were performed at the Institute of Genomic Diversity (Cornell University, Ithaca, NY, United States). Briefly, samples were digested with the methylation-sensitive restriction enzyme PstI, which has a six base pair recognition site (CTGCAG). Sequencing was performed with 100-bp single-end reads using the Illumina HiSeq 2000 platform (Illumina Inc., United States).
The raw data was demultiplexed using the standard pipeline from the Tassel v4.5.9 software [55]. Then, reads were mapped to the oil palm reference genome of E. guineensis [56] using Bowtie2 [57] with the very-sensitive option. SNP calling was performed using the following parameters: minor allele frequency (MAF) < 5%, minimum locus coverage (mnLCov) of 0.9, minimum site coverage (mnScov) of 0.7 and minimum taxon coverage (mnTCov) of 0.5. Finally, SNPs were filtered to retain 5% of missing data and biallelic SNPs with VCFtools v0.1.13 software [58]. SNP data was integrated with phenotypic data to perform the GWAS analysis. The physical positions of SNP markers used in the association analysis were obtained from the Genomsawit Website of the International Malaysian Oil Palm Genome Programme (http://gbrowse.mpob.gov.my/fgb2/gbrowse/Eg5_1/).
Genetic diversity and population structure
High quality SNP markers were used to assess oil palm genetic diversity in 471 samples with breeding coefficient (F) values and germplasm relatedness through a Neighbor-Joining tree according to the Nei’s genetic distance matrix. Genetic diversity (π), Tajima’s D, and population pairwise F-statistics (FST) were calculated using VCFtools v0.1.13 [58]. Similarly, the population structure of the 471 samples was estimated using the Admixture v1.3.0 software [59] in both unsupervised and supervised modes.
Phenotypic analysis
The OxG hybrid population (378 samples) has been planted in field and randomly distributed using a randomized complete block design with four blocks. The experimental field is located in the Research Center El Mira. The field was planted in a quincunx or triangular system with 10 meters between the plants; this planting system allowed a density of 115 oil palm trees per hectare. All plants were grown under the same standard agronomic practices.
Phenotypic data of seven morphological measurements and three yield-related traits were used in this study (Table 1). Each trait was measured according to the methodology proposed by Corley et al. [60] and Breure [61]. To assess the relationship among the studied traits, a principal component analysis (PCA) and a Pearson’s correlation were calculated. A hierarchical cluster analysis using Ward’s method was carried out to analyze the genetic relationship among samples by the use of all phenotypic variables. All analyses were performed using the R statistical package [62].
Marker-trait association analysis
Marker-trait association analysis was performed on morphological and yield-related traits using the OxG hybrid population. A unified mixed linear model with a kinship matrix and PCA results were used in the R package GAPIT (Genome Association and Prediction Integrated Tool) [63]. To remove any possible bias caused by population structure, we included the first five principal components calculated using the R package SNPrelate [64], and a relatedness (kinship) matrix in the mixed linear model. Association mapping model evaluations were based on visual observations of the Manhattan plots. Q-Q plots were plotted of the observed −log10 P-values and the expected −log10 P-values to study the appropriateness of the GWAS model. A false discovery rate (FDR) [65] was used to correct for spurious associations.
the quantile-quantile
(QQ) plot supported the appropriateness of the GWAS
model (Fig. 3a).
the quantile-quantile
(QQ) plot supported the appropriateness of the GWAS
model (Fig. 3a).
The heatmap of LD was investigated with a custom script by plotting pairwise R2 values against the physical distance (base pairs) between markers on the same chromosome.
Candidate genes identification
Genes annotations under the candidate gene regions were determined using published genome information of E. guineensis [56]. To assign putative biological functions of significant SNP markers associated with the traits, the flanking sequences of SNPs were queried against databases, such as: HMMER (https://www.ebi.ac.uk/Tools/hmmer/), NCBI (http://www.ncbi.nlm.nih.gov/), European Molecular Biology Laboratory (http://www.ebi.ac.uk/) and European Nucleotide Archive (http://www.ebi.ac.uk/ena).
Improving oil quality and increasing yield per hectare in oil palm is a major concern in the oil processing industry. Agrosavia’s palm breeding program has focused on interspecific crosses of OxG. According to Bastidas [31] the OxG hybrids of Agrosavia present heterosis in traits such as resistance to diseases, fruit number and weight, leaf length and trunk diameter. To our knowledge, this study represents the first geno-phenotypic and GWAS analysis of an OxG hybrid population. Our 378 OxG hybrid population was screened for a set of morphological and yield-related traits and genotyped with 3,776 high-quality SNPs uncovered by a GBS approach. Correlation analysis results among yield-related traits indicated that BN could be a potential and better selection criterion for production than BW in the hybrid population. It is known that the leaf emission rate determines the bunch number in a palm, being BN negative correlated with number of leaves [32]. In our study, no significant correlations were found between yield and leaf-related traits (FA, LA, LDW, LXL, RL), however, a previous study in E. oleifera and OxG found that BN can be greater that number of leaves just when oil palms carry in their genotype a prolific trait [33]. Increases in BN and BW are also expected to correlate with increased mesocarp and kernel oil yields as shown in other oil palm germplasm studies [34]. Future studies specifically related to oil yields for the OxG hybrid population should be conducted considering their importance in oil palm breeding.
GBS and its combination with GWAS has allowed the genetic dissection of variation in complex traits in many plant species [35–37]. Specifically, in oil palm the GBS technique has been used for identifying candidate genes in intraspecific populations related to oil bunch [38], average bunch weight [21, 38] and stem height [22] with the enzyme ApeKI, meanwhile the enzymes PstI-MspI have been used for oil quality traits studies [39]. We used GBS with the enzyme PstI in morphological and yield-related traits in an interspecific OxG hybrid population. The use of this enzyme allowed the discovery of new genetic variants, which according to Chung et al. [40] is one of the most important advantages of GBS.
In the present study, association mapping resulted in the identification of 12 SNPs related to 10 morphological and yield-related traits (Table 2). For morphological traits, a significant association was found for LDW on chromosome 3 explaining 10% of the phenotypic variation. This SNP was located in a mechanosensitive (MS) ion channel protein 10-like (MSL10) gene. In plants, MS ion channels have been proposed to play a wide array of roles, from the perception of touch and gravity to the osmotic homeostasis of intracellular organelles [41]. Besides, mechanoperception genes are essential for normal cell and tissue growth and development as well as for the proper response to an array of biotic and abiotic stresses [42]. A second gene associated with TD was identified on chromosome 15. This gene is involved in nucleic acid binding and has a C2H2-type Zinc finger domain. The C2H2-ZF gene family has been proposed to be involved in the formation of wood and shoot and cambium development in species such as Poplar, as well as playing a role in stress and phytohormone response [43].
For HT trait, different studies have reported associated QTLs in chromosomes 2, 6, 7 and 9 [22, 34]. In our study, we reported three candidate genes on chromosome 15, which is similar to the results reported by Pootakham et al. [21]. However, our candidate genes were positioned in the vicinity of the ones reported by Pootakham et al. [21], which highlights the importance of this region (from 19.3 to 23.6 Mbp) in the phenotypic variance of HT (Table 2). The closest gene to the SNPs S15_22553489 and S15_22553493 SNPs, corresponds to a STYK gene, which is involved in the control of stomatal movement in response to CO2 [44]. Recent studies also showed the role of STYK gene in stem diameter by increasing the number of xylary fibers in species such as Bambusa balcooa [45].
For RL and LXL traits QTLs have been reported on chromosomes 4, 2, 10 and 16 [34]. In our study, four SNPs were associated with four different candidate genes for RL on chromosome 13. The SNP S13_20856724 is the closest to the AGC3 gene and encodes different G proteins. G proteins had been reported to be involved in a wide range of developmental and physiological processes, having a high potential for yield improvement in crops such as rice [46]. The other significant association was found with the SNP S13_23674227, which is located in an extracellular ribonuclease gene (RNase gene). RNase genes have been studied for years in plants, playing an important role in plant defense mechanism [47] or plant development due to their ability to modify RNA levels, and thereby influence protein synthesis [48]. Other candidate genes were also found for RL and LXL, but further studies are necessary to determine their role in regulating these traits.
For yield-related traits, previously studies reported associated SNPs in chromosomes 1, 3, 4 and 6 [21, 38, 49]. In our study, other major associations peaks were observed on chromosomes 5 and 10, explaining 11% of the phenotypic variance. A significant SNP related to Yield and BN was located in the gene p5.00_sc00003_p0367, coding for a cation/H(+) antiporter gene. Antiporter proteins function as regulators of monovalent ions, pH homeostasis, and developmental processes in plants [50]. On chromosome 10, the gene p5.00_sc00004_p0097, associated with BW, encodes the zinc finger protein 8. The zinc finger proteins (ZFP) are a large protein family, involved in plant development, regulation of plant height, root development, flower development, seed germination, secondary wall thickening, anther development, and fruit ripening [51]. Studies conducted by Wu et al. [52] demonstrated that silencing a gene related to ZFP hampered fruit development in Nicotiana benthamiana [52]. This ZFP gene might play an important role on yield-related traits in oil palm, as shown in other plants, where overexpression of zinc finger proteins are related with higher yields in crops [53] although, further analysis are needed to determine its role in bunch weight and yield in oil palm.
In oil palm, harvesting of fruit bunches after certain age is a very difficult labor due to their tallness. For this reason, genotypes with less HT and TD are preferred among oil palm farmers. Likewise, larger foliar (RL and LDW) is related to a greater photosynthetic production which could be involve in higher productivity. But most importantly, increasing the number and weight of fruits means higher productivity per palm and therefore higher incomes for farmers. For this reason, leveraging QTLs or genes related to these traits could contribute to develop plant breeding strategies such as marker-assisted selection (MAS), that helps to select promising accessions in in earlier stages (greenhouse conditions), and therefore reduce the breeding cycle. Further work needs to be focus on the biological functions of the set of candidate genes found in our research, though the correlations identified in association studies cannot be dubbed as causations.
The present study is the first one to report significant SNPs associated with morphological and yield-related traits based on GWAS on an interspecific OxG population. Nine of these SNPs are located within chromosomes reported in previous mapping studies, however, the set of genes presented in our analysis could be of value to locate with more precision the intervals of the reported QTLs on chromosomes 13 and 15. Also, candidate genes discovered on chromosomes 3, 5 and 10 have not been reported for the studied traits. The findings from the present study will provide the groundwork for the development of marker assisted breeding in oil palm and will serve as a strong base for future functional studies to dissect high yield production in oil palm.
FDR, False-Discovery-Rate; GBS, Genotyping By Sequencing; GWAS, Genome-Wide Association Studies; LD, Linkage disequilibrium; MAS Marker-Assisted Selection; PCA, Principal Component Analysis; QTL, Quantitative Trait Loci; SNP, Single Nucleotide Polymorphism; RFLPs, Restriction Fragment Length Polymorphisms; AFLPs, Amplified Fragment Length Polymorphisms; SSRs, Simple Sequence Repeats; TD, Trunk Diameter; HT, Trunk Height; RL, Rachis Length; LDW, Leaf Dry Weight; FA, Foliar Area; LA, Leaf Area; LXL, Leaflet Per Leaf; BW, Bunch Weight; BN, bunch number.
Ethics approval and consent to participate
Not applicable.
Consent to publish
Not applicable.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Competing interests
The authors declare they have no competing interests.
Funding
Publication of this article has been funded by TV-17 supported by the Ministry of Agriculture and Rural Development of the Republic of Colombia. The funding entities had no role in study design, data collection and analysis, interpretation, decision to publish, or preparation of the manuscript.
Authors’ contributions
LSB, FEER and SBP conceptualized and conceived the project and its components. SBP, LPD, GAGM and LPM collected the samples and the trait data. JAOG and GAGM carried out the genotypic and phenotypic analysis with the supervision of OEC. JAOG and GAGM wrote the manuscript and LSB, OEC and FEER corrected and edited it. All authors reviewed and contributed to draft the manuscript as well as read and approved the final manuscript.
Acknowledgements
The authors would like to acknowledge to William Tolosa for his support on sample collection, to Jhon Berdugo for his support in the data analysis and Marco Antonio Lopez to provide the script for the LD heatmap analysis. The authors thank Joanna Kelley for assistance in revising the final version of the manuscript.
Table 1. Mean values, standard deviation (SD) and minimum and maximum values from the phenotypic traits used in this study
Category |
Trait |
Abbreviation |
Unit |
Mean |
SD |
Minimum value |
Maximum value |
|
Trunk Diameter |
TD |
cm |
88.5 |
6.0 |
62.4 |
102 |
|
Trunk Height |
HT |
cm |
250.3 |
29.5 |
133.3 |
327 |
|
Rachis length |
RL |
cm |
421.5 |
35.3 |
275.5 |
530 |
Morphological |
Leaf dry weight |
LDW |
kg |
2.2 |
0.3 |
1.3 |
3.7 |
|
Foliar area |
FA |
m2 |
385 |
78.2 |
141.3 |
617.1 |
|
Leaf area |
LA |
m2 |
8.6 |
1.3 |
4.7 |
12.7 |
|
Leaflet per leaf |
LXL |
unit |
234.8 |
14.8 |
184 |
294 |
|
Bunch weight |
BW |
kg |
6.1 |
1.8 |
1 |
19.5 |
Yield |
Bunch number |
BN |
unit |
8.8 |
5.0 |
1.0 |
27 |
|
Yield per palm |
Yield |
kg |
56.5 |
39.1 |
1.8 |
233 |
Table 2. Significant marker–trait associations for morphological and yield related traits using a mixed linear model approach.
Category |
Trait |
SNP |
Chromosome |
Position (kb) |
P-value |
MAF |
R2 |
FDR Adjusted P-values |
Candidate Gene |
SNP position relative to the candidate gene |
Candidate gene annotation |
Morphological |
LDW |
S3_30467222 |
3 |
30467222 |
1.E-04 |
0.106 |
0.107 |
0.050 |
p5.00_sc00100_p0017 |
0 |
Mechanosensitive ion channel protein 10-like (MSL10) |
TD |
S15_21239833 |
15 |
21239833 |
1.98E-05 |
0.493 |
0.097 |
0.010 |
p5.00_sc00036_p0097 |
+21.8 kb |
Nucleic acid binding |
|
HT |
S15_22347191 |
15 |
22347191 |
7.94E-05 |
0.496 |
0.098 |
0.084 |
p5.00_sc00036_p0145 |
+0.3 kb |
Paired amphipathic helix protein (PAH) |
|
S15_22553489 |
15 |
22553489 |
7.94E-05 |
0.496 |
0.098 |
0.084 |
p5.00_sc00036_p0152 |
-0.6 kb |
Serine threonine-protein kinase (STYK) |
||
S15_22553493 |
15 |
22553493 |
8.90E-05 |
0.495 |
0.098 |
0.084 |
|||||
S15_23645020 |
15 |
23645020 |
7.94E-05 |
0.496 |
0.098 |
0.084 |
p5.00_sc00036_p0217 |
0 |
Class E vacuolar protein-sorting machinery protein (VPS) |
||
RL |
S13_20856724 |
13 |
20856724 |
3.22E-05 |
0.499 |
0.074 |
0.041 |
p5.00_sc00035_p0180 |
-12.0 kb |
Guanine nucleotide-binding protein subunit gamma (AGC3) |
|
S13_23674227 |
13 |
23674227 |
3.22E-05 |
0.499 |
0.074 |
0.041 |
p5.00_sc00035_p0078 |
0 |
Extracellular ribonuclease (RNAse) |
||
S13_25522088 |
13 |
25522088 |
3.22E-05 |
0.499 |
0.074 |
0.041 |
p5.00_sc00128_p0001 |
0 |
lmbr1 domain-containing protein (IMBR1) |
||
S13_24474516 |
13 |
24474516 |
7.14E-05 |
0.497 |
0.070 |
0.067 |
p5.00_sc00035_p0043 |
0 |
Probable ran guanine nucleotide release factor-like (RANGRF) |
||
Production |
Yield |
S5_41396842 |
5 |
41396842 |
1.E-04 |
0.132 |
0.059 |
0.410 |
p5.00_sc00003_p0367 |
+29 kb |
Cation h(+) antiporter |
BW |
S10_21597426 |
10 |
21597426 |
3.E-04 |
0.499 |
0.054 |
0.316 |
p5.00_sc00036_p0097 |
-8.0 kb |
Zinc finger protein 8-like (ZFP) |
* SNP position relative to the candidate gene: SNPs upstream and downstream of candidate genes are specified with “–” and “+”, respectively. 0 indicates that SNPs are located within candidate gene.