Improving oil quality and increasing yield per hectare in oil palm are major concerns in the oil processing industry. The Corporación Colombiana de Investigación Agropecuaria (Agrosavia), a non-profit government research institution, is committed to delivering solutions to farmers, incorporating cultivars developed from breeding programs that include the oil palm. Its strategy has focused on developing interspecific OxG that present heterosis in traits such as resistance to diseases, fruit number, fruit weight, leaf length, and trunk diameter [30]. To our knowledge, this study is the first GWAS analysis of an OxG population.
Phenotypic data
Correlation analysis results for yield-related traits indicated that BN could have the potential to be a better selection criterion for production compared to BW in the OxG population. In our study, no significant correlations between yield and leaf-related traits (FA, LA, LDW, LXL, RL) were found; however, a previous study in E. oleifera and with OxG hybrids found that BN can be higher than the number of leaves, but only at the time when oil palms are producing multiple inflorescences [31]. Increases in BN and BW are also expected to correlate with increased mesocarp and kernel oil yields, as shown in other oil palm germplasm studies [32]. Future studies directed to improve the oil yields should be conducted considering the importance of this aspect of oil palm breeding.
Association analysis
In the current study, we generated sequencing data using GBS, a technology developed for crop plants [19]. GBS relies on restriction enzymes to generate a reduced representation of locations spread throughout the genome to decrease its complexity and rapidly genotype samples using interspaced SNP markers [33], that could be linked to candidate genes responsible for important traits. For this reason, GBS has gained popularity in crop research and plant breeding due to its high throughput and low-cost genotyping, being suitable for population studies, germplasm characterization, genetic improvement, and trait mapping in a variety of diverse organisms [34].
With the association mapping, 12 genomic regions (SNPs) related to 10 morphological and yield-related traits were identified (Table 2). However, only five regions associated with LDW, TD, RL, and LXL remained significant (p ≤ 0.05) after the FDR correction was performed. Importantly, the SNPs found to have a statistically significant association with the trait are not necessarily the causal DNA variant, that is, a variant that has a direct effect. The association only signifies that the SNP locus harbors a causal variant in LD with the SNP identified by the GWAS.
The small LD blocks in the heat map analysis could suggest that the causal regions are located near to the most significant SNPs. Thus, the identified SNP in this study serves as a signpost defining an interval in the genome for which one must do follow-up studies to determine the causal variant(s).
Therefore, we describe the five most significant regions and the genes located within those regions that might be potential candidate genes involved in the expression of the phenotypic traits evaluated in this study. For morphological traits, a significant association was found for LDW on chromosome 3, explaining 10% of the phenotypic variation. The most significant SNP in this region was located in a mechanosensitive (MS) ion channel protein 10-like (MSL10) gene. It has been proposed that the MS ion channels in plants play a wide array of roles, from facilitating the perception of touch and of gravity to regulating the osmotic homeostasis of intracellular organelles [35]. In addition, mechanoperception genes are essential for the growth and development of normal cells and tissue as well as for the proper responses to an array of biotic and abiotic stresses [36]. A second significant region was identified associated with TD on chromosome 15 that contains a gene involved in nucleic acid binding that has a C2H2-type zinc finger domain. It has been proposed that the C2H2-ZF gene family is involved in the formation of wood and in shoot and cambium development in species such as poplar, and that it also plays a role in stress and phytohormone responses [37].
For RL and LXL traits, QTLs have been reported on chromosomes 2, 4, 10, and 16 [32]. In our study, three SNPs were associated with three different candidate genes for RL on chromosome 13. The SNP S13_20856724 is the closest to the AGC3 gene and encodes different G proteins. These have been reported to be involved in a wide range of developmental and physiological processes, and therefore have a potential for facilitating yield improvement in crops such as rice [38]. The second significant association was found with the SNP S13_23674227, which is located in an extracellular ribonuclease gene (RNase gene). The RNase genes in plants have been studied for years and play an essential role in plant defense [39] and development due to their ability to modify RNA levels and thereby influence protein synthesis [40]. Finally, the SNP S13_25522088 was also significantly associated with RL and LXL, but further studies are necessary to determine its role, if any, in regulating these traits.
Seven SNPs were no longer significant after the FDR correction, possibly due to the reduced sample size used. QTL and association studies are limited by the relatively small mapping population sizes, resulting in low statistical power and thus rendering small or even medium-effect QTLs that are statistically non-significant and difficult to detect. Such statistically underpowered populations may also suffer from severe inflation of effect size estimates (the so-called Beavis effect) [41]. Hence, increasing the population size and marker density is required to enable estimations that are unbiased by the Beavis effect and achieve higher statistical power [41–43]; nonetheless, for perennial populations (long generation time) with limited offspring numbers, the size increase would require a considerable investment.
For the oil palm, the harvesting of fruit bunches after the palm has reached a certain age is an arduous task due to the height of the trunk. For this reason, genotypes with reduced HT and TD are preferred among oil palm farmers. Likewise, a larger foliar area (dependent on RL and LDW) is related to greater photosynthetic production, which could be involved in higher productivity. Nevertheless, most importantly, increasing the number and weight of fruits means a higher productivity per palm and therefore a higher income for farmers. For this reason, leveraging QTLs or genes related to these traits (such the ones we identify in this study) could contribute to the development of plant breeding strategies, such as marker-assisted selection that help with the selection of promising accessions in earlier stages (i.e., greenhouse conditions) and therefore reduce the breeding cycle. There is need for further work that focuses on the biological functions of the set of potential candidate genes found in our research since the correlations we have identified in our association study cannot, as yet, be dubbed as causations.