Distribution of phenotypic variations
Phenotyping was conducted for a total of 301 apple genotypes over four years (2015-2018) in order to identify the distributions of and correlation among different traits (Table S1, Fig.1). A normal distribution was observed for TA, SSC, skin chroma, skin hue, flesh firmness, fruit length, fruit diameter, and fruit weight (Fig. 1), whereas TA showed a continuous variation with a range from 0.1 to 3.0 % over four years. Generally, most cultivars had a distribution ranging from 0.2 to 1.0 % (Fig. 1). SSC ranged from 10.1 to 23.8°brix and reached 11.0-16.0°brix in apple genotypes of more than 80 %, while the skin chroma ranged from 250–350 and skin hue fell between -1 and 2 for about 80 % of the genotypes. In case of flesh firmness, it fell mainly in the 50-80N range, and fruit length and diameter from most of the fruits were 60-70 mm and 70-80 mm, respectively. In term of weight, except crab apple group belonged under 50 g which primarily produces small fruits, most apple phenotypes have 150-250 g, and the distribution of fruit weight resembled a normal distribution.
To prove significant correlations among the traits, Pearson’s correlation coefficient was applied in this study (Fig. 2). According to the correlation analysis, all phenotypic traits did not show large differences over the four years. However, fruit length, diameter, and weight were positively correlated with each other (r > 0.90**), explaining their mutual interdependence, while a highly significant negative correlation was indicated fruit length, diameter, and weight (r = -0.72**, -0.70**, and -0.61**) regarding flesh firmness. In addition, TA also exhibited a highly significant negative correlation with fruit length, diameter, and weight, but it was significantly positively correlated with flesh firmness (r = 0.48**).
Genotyping by sequencing
The GBS data set for the 301 apple genotypes was generated from an average of 3,840,814 raw Illumina sequencing reads from the GBS library. The mapped reads covered an average of 63,058 regions and had a depth of 15.57 depth, with approximately 2 % coverage of the reference genome (Table 1). A total of 13,146,988 SNP loci were called from the GBS sequencing data, including homozygous (8,260,098), heterozygous (2,385,549), and other SNPs (2,501,483) (Table 2). The SNPs were filtered according to specified criteria (Supplementary Table 2). The SNP data were first filtered with a minimum 5 % MAF, which retained 261,969 SNPs. Then, the data set was further filtered to less than 20 % and 10 % missing data per locus, and 51,434 and 12,608 SNPs were obtained, respectively. The SNPs were distributed evenly on chromosomes 1 to 17.
Population genetic analysis
To verify the genetic distances among the 301 genotypes, phylogenetic trees containing 12,608 SNP genotypes were constructed using neighbor joining (NJ) (Fig. 3d). According to the phylogenetic analysis, the genotypes clustered into four different major groups. In addition, to better comprehend the genetic structure of the genotypes, the SNP data was analyzed with STRUCTURE software using the MCMC model. Assessment of population structure was performed using genome-wide SNPs, whereas K values ranged from 3 to 10. When delta K was between 4 and 6, a change was significantly appeared about number of clusters (Fig. 3a). This classification of the population was correspond with measured delta K values in four runs with different conditions. This observation was also in accordance with results of the phylogenetic analysis (Fig. 3d), thus, the optimum population mean was supposed to be 4. In addition, occurrence of the different genetic variances was also indicated by K value (Fig. 3b). PCA was performed to further evaluate patterns of genetic diversity among apple germplasm (Fig. 3c). The 301 apple germplasm were approximately separated four groups by PC1 axis or the maximum six groups by two axes of PC1 and PC2. PC1 was identified as the standard for sorting between Malus domistica and Malus sp., Malus hybrid, and crabapples (PCA-Ⅰ, crabapples such as Malus floribunda, Malus sikkimensis, and Malus arnoldiana; PCA-Ⅱ, crabapple, Malus hybrid and Malus sp.; PCA-Ⅲ, PCA-Ⅳ, Malus domestica).
Genome-wide association analysis and LD estimation
For association analysis of 10 fruit quality traits among the genotypes, genome-wide association studies were conducted, and the results are presented in Figs. 4 and 5, and Table 3. Four noteworthy association observed from the association analysis are highlighted in Fig. 5. According to the analysis, TA, SSC and skin chroma were identified as being significantly associated with specific SNPs, whereas SNPs with lower significance were identified for the other fruit quality traits (Table 3, Fig. 4). The valid associations of SNPs with all traits were examined on quantile–quantile (Q-Q) plots in advance of confirmation with manhattan plots, and were confirmed to deviate from a uniform diagonal distribution, meaning that the association with significant SNPs was probably valid (Fig. 4 (k-t)). In the Manhattan plot, the red horizontal line indicates a significant line denoting the -log10(P) of the value significance threshold, and the solid blue line indicates a suggestive line (-log10(P) = 5).
Over the four years investigated, we could identify the SNPs that are significantly associated with titratable acidity (TA) on chromosome 16 (position: 1466019) (Table 3, Fig. 5a), whereas only one SNP that was significantly associated with SSC was found on chromosome 15 (Fig. 5b). In addition, a significant SNPs associated with SSC/TA were also identified near the front of chromosome 16 (Fig. 5c). Moreover, SNPs significantly associated with chroma of skin color were also appeared on chromosome 9 (position: 29585162), and those related to hue of skin color identified on chromosomes 12, 2, and 3, respectively (Table 3, Fig. 4e, 5d). Despite they did not surpass the significance threshold, peaks above the suggestive line were observed on chromosomes 17, 8, 3, 9, and 12, associated with flesh firmness, fruit length, fruit diameter, fruit length/diameter ratio, and fruit weight, respectively (Fig. 4(f-j)).
To identify of candidate genes highly related to the fruit traits, a 300 kilobase pair (kb) range centered on each meaningful loci was analyzed. Also, candidate genes and their predicted functions were discovered by comparing adjacent LD blocks linked to the significant SNP loci. LD blocks adjacent to the physical position of SNP loci associated with each traits were confirmed (Fig. 6). From the analysis, meaningful candidate genes associated with TA, SSC, SSC/TA, and skin chroma could be identified. Ma1 (MDP0000252114) and Ma2 (MDP0000244249) genes, which code for malic acid transporters, associated with TA and SSC/TA could be identified to be located within 4 kb – 20 kb from the SNP (Table 3, Fig. 5 a, c). Most of SNPs were in complete LD as indicated by D’ of colored regions with Ma1 and Ma2 (Fig. 6a). In addition, SPS3 (MDP0000414968), SPS4 (MDP0000783676), HK1 (MDP0000309677), SUSY1 (MDP0000250070), and SUT1 (MDP0000275743) were identified to be a candidate gene for SSC in the regions surrounding the SNP (chromosome 15: 11673596, 42605470) (Table 3, Fig. 5b). Moreover, a MYB transcription factor (MDP0000259614) associated with fruit skin chroma was identified to be located approximately 200 kb from the three SNPs on chromosome 9 (chromosome 9: 29585162, 29298755, 29692945) (Table 3, Fig. 5d). These SNPs and MYB transcription factor genes (MYB1, MYB10) were linked by LD blocks (Fig. 6c).
Development of SNP markers and HRM validation
From GWAS analysis and gene annotation, the SNP alleles favorable for fruit quality traits were selected (Table 3), and several SNP markers related to TA and SSC (listed in Table 3) were converted to HRM markers. Based on the initial HRM analysis using all primer pairs, one primer pair for each of the three SNPs with the most definite melt curve patterns was chosen for further analysis. These markers had noticeable differences in HRM curves and were correlated with acidity and sugar content to a certain extent in apple germplasm (Fig. 7).
Related to TA, two-primer set was tested in the core collection and the reference set, including 'Maypole', 'Purple wave', 'Ginger gold', 'Crimson king', 'Virginia gold' and 'Hongro'. The C16S39 (chromosome 16: 1545939) and C16S63 (chromosome 16: 1332163) primers were distinguishable by their 1.3% and 1.0% acidity level, respectively (Fig. 7 a, b). In melting curve of C16S39, the SNP alleles of apple cultivars showing more than 1.4 acidity level tend to indicate GG, and cultivars with less than 1.3 % acidity level were AA or AG (Fig. 7a). Similarly in C16S63, the SNP alleles of apple cultivars with more than 1.1 % acidity level tend to indicate AA or AG and cultivars of less than 1.0 acidity level indicated GG (Fig. 7b). In relation to SSC, C15S96 of SSC-HRM marker (chromosome 15: 11673596) was tested in the core collection and reference set including 'Golden delicious', 'Granny smith', 'Gala', 'Crimson king', 'Spur earliblaze', and 'Hongro'. The C15S96 primer was distinguishable from the 14°brix SSC level (Fig. 7c). In Fig. 7c, blue represents apple cultivars of 14brix or less and red indicates cultivars of more than 14°brix, respectively.