Phenotypic data in the breeding program
Large phenotypic diversity was observed for many of the traits in this study. Average phenotypic values observed here for NW, KW and KR (7.09 g, 2.73 g, and 38.7%, respectively) were all slightly higher compared with the same traits when the trees were young (6.21 g, 2.28 g, and 36.9%) (32). The moderate heritabilities suggest that selection for a number of traits will result in good genetic progress. For example, the high narrow-sense heritability observed for KR (h2 = 0.74) means that the aim to select for higher KR is achievable with truncation selection. This form of selection is where trees with phenotypes or estimated breeding values below a certain threshold are excluded from parent populations, and the mean values of progeny should increase for this trait over generations (34). Results of this study differed to that of O'Connor, Hayes (32) which analysed the same population when the trees were younger (around 8 years of age). Heritability for KR was higher in mature trees than young trees (0.62), whilst KW was lower in mature trees (0.37) than young trees (0.53). In comparison, the difference in heritability for NW between the two studies was low (0.03), but the correlation between these phenotypes was only moderate (0.56).
This study demonstrates that linear mixed models are useful for analysing phenotypic and genetic data in macadamia to identify QTLs for target traits, which is beneficial, as developing new macadamia varieties is time-consuming, laborious and expensive. Additionally, the large tree size and numbers involved in macadamia breeding means that multiple environments are typically needed during evaluation trials. The mixed models employed in this study account for the average effect of the environment, as well as G x E interactions for some traits. Thus, the best model was fitted to the data on a trait-by-trait basis.
The current study used 4,113 SNP markers imputed with high accuracy, though analysis of LD found that LD declined rapidly over short distances (35). The number of markers in the current study is comparable with other studies in fruit trees (13, 15-17); however, the fragmented nature of the macadamia genome scaffolds means the distribution of markers across the whole genome is still unknown. Genetic linkage maps have been used to anchor scaffolds to chromosomes (Langdon et al. in preparation), and the location of scaffolds in the genome will be informative for determining locations of genes detected by SNPs in this study.
Population structure affects LD, and this needs to be accounted for in GWAS to avoid spurious associations and over-prediction of allelic effects. For most traits investigated here, the QQ plots showed that only the highly significant markers deviated from the null expectation (y = x line), and did not show inflation of the observed versus expected p-values at lower significance levels. QQ plots showing this pattern demonstrate that population structure has been effectively accounted for by the GRM (36). One explanation for divergence from the null hypothesis (more associations detected than expected) at high p-values is polygenicity: many loci of small effect contributing to variation in the trait (37). This genetic model may explain the pattern observed for TC, where a large number of associated markers was detected even at low p-values. The previous study (32) did not use imputed markers, and deviations from the null hypothesis line were observed. Imputation of missing data with high accuracy can, therefore, more accurately capture the realised kinship between individuals, and, as such, produce more accurate association results.
MAS, using the findings of GWAS, is effective for traits controlled by few genes, and, as such, has little value for complex traits (38-40). However, Kelner, Costes (41) performed QTL mapping and found two clusters of QTLs related to fruit yield and cumulative yield in apple on two different linkage groups, as well as QTLs for precocity and biennial bearing. Genomic selection may be a more appropriate and accurate method to predict yield in macadamia (19).
This study identified SNP markers significantly associated with NW, WK and TC. Although no significantly associated markers were detected for KW or KR, the marker with the lowest p-value in each case should be investigated in further studies. Neither NPR nor RSN had any significant associations, which may be partly due to the very low heritability of both traits. Additionally, while there was no G x E detected in RSN, there may be a large environmental influence on the capacity of a tree to retain racemes from flowering through to nut set (27, 28).
For TC, 16 of the 44 significant markers were non-redundant, suggesting that there may be 14 QTLs controlling this trait. Multiple regression suggested that all of the the markers significantly associated with NW may have detected the same or linked QTLs, with the most significant SNP (s2204) being the only non-redundant marker. The location of scaffolds in linkage groups (Nock et al. in preparation) may further aid the understanding of whether markers are in linkage disequilibrium or are separate QTLs.
A direct comparison cannot be made between SNPs found to be significantly associated with nut traits in O'Connor, Hayes (32) and the current study, as two different SNP panels were used in the analyses. However, some of the significant markers could be mapped to genome assembly scaffolds. A comparison of the locations of mapped SNPs between the two studies showed that there were no markers occupying the same scaffold (data not shown). Results from GWAS are not always consistent, with variation between populations and environments altering allelic frequencies and phenotypes. For example, differences were found also across years in apple (18), and between QTL mapping and GWAS studies in chestnut (11, 42), and this may be a consequence of limited power in these studies.
Researchers use different thresholds for determining which markers to include in their genomics studies, such as 5% MAF (11, 17), 1% MAF within-populations (43), and ten copies of the minor allele across samples (18). In the present study, markers were initially excluded with MAF <2.5%, though these statistics were calculated for each marker before imputation, and, as such, the study included markers with MAF below this threshold (MAF altered after imputation of missing calls). It was interesting, then, that all of the markers associated with NW had very low MAF. If these markers had been removed by filtering, they would not have been detected through GWAS. Associations with rare alleles should be treated with caution due to low power of detection (33), and this is the case here. Therefore, the significant markers with low MAF in the current study should be validated in independent studies, preferably with more individuals to observe whether the MAF is similar across populations of different sizes (44), as this will support the findings of this study.
Demonstration of marker-assisted selection
The results of this GWAS study can be used to demonstrate the implementation of MAS in the macadamia breeding program. SNPs significantly associated with commercially important traits would be ideal candidates for use in MAS. The estimates of BLUEs in the multiple regression analysis indicate the additive effect of the SNP allele at that marker on that trait. For example, the estimated effect for SNP allele at s2204 was 0.084, meaning that genotypes with one SNP allele will have an added 0.084 g of nut weight controlled by genetic variance than those without. The influence of additive genetic variance of these alleles was quite different to that which was observed in the raw phenotype, as the phenotype will have been influenced by non-additive genetic effects and environment. The three genotypic states for NW at SNP s2204 and for WK at marker s0201 showed clear association with phenotypic averages, though the difference in genotypic states was much lower than the additive allele effect calculated from BLUEs. The sample sizes among the three different genotypic states varied greatly in these examples, and so it is important to recognise that these findings are severely biased upwards and are only for demonstrative purposes for how MAS could be used. Simply, breeders could genotype seedling progeny from their first leaf at these key markers. Determining the allelic states at these markers would allow selection of AG heterozygotes at SNP s2204 for seedlings with predicted intermediate nuts, and AA genotype at SNP s0201 for a high percentage of whole kernels. However, with such low MAF and number of individuals in these genotypic states, these results should be interpreted with caution. Again, the SNP should be validated in an independent population, and the effect of the SNP alleles should be estimated in that population.
This study and our previous work (32) provide a foundation for how the use of genomics can improve breeding in macadamia, and is among the first to analyse the potential for genomics-assisted breeding in nut crops. However, the results presented require validation before being employed in breeding programs. Multi-trait analyses could be performed to increase the power of detection of QTLs, and also detect pleiotropy (45). A separate population should be studied to determine if QTLs detected are the same as those detected here, or are new associations. Further studies should incorporate larger population sizes, to ensure that significant associations are accurate and applicable to a wider breeding population. Additionally, the low MAF observed for some markers in this study may change with sample size, which will influence the proportion of variance explained by those markers.
When a more complete reference genome is assembled, the location of these markers can be determined, and LD between markers more accurately estimated with population structure and cryptic relatedness taken into account. Due to the rapid decay of LD over short distances in macadamia (35), using a larger number of markers may increase the likelihood of SNPs being in LD with causal polymorphisms. Furthermore, the potential issues posed by allelic dropouts, such as lower than expected levels of heterozygosity observed by O'Connor, Kilian (35), could be alleviated with the use of a complete reference genome in sequencing of SNPs in the future. Without genome scaffold annotation, the significant SNPs cannot be linked to known genes or proteins, which has been achieved in other studies of GWAS in fruit trees (e.g. 13, 15, 16, 18). The v2 scaffolds and chromosomes are being (Nock et al. in preparation), and so candidate genes could be identified in future studies.
Although there was a lack of significant associations in some traits in the current study, these should still be investigated in future work. The polygenic nature of TC, as well as the complexity of yield, means that these traits may be more suited for genomic selection, where many markers may have a small effect on the trait, and all markers are modelled simultaneously (46), rather than one-by-one as in GWAS.Other traits that could be analysed include self-fertility, and resistance to diseases that affect nut yield, including husk spot and phytophthora. Genomic selection could be used as an alternative to GWAS for more complex traits such as yield, and perhaps TC due to the polygenic nature of these traits.