Plant Material
The material used in this study was comprised of pedigree connected germplasm with cultivars, advanced selections, and seedlings from Clemson University peach breeding program (CUPBP), which were previously assembled under the RosBREED project18,22. A breeding population containing a total of 288 seedlings, which included multiple F1 and F2 families obtained from 19 parents (Supplementary Table S1), was chosen for mapping analyses. The parents and seedlings were maintained at the Clemson University Musser Fruit Research Center, in Seneca, South Carolina (Latitude: 34.639038, Longitude: -82.935244, Altitude 210 msl), under warm, humid, temperate climate and standard commercial practices for irrigation, fertilization, and pest and disease control. The trees were grafted on ‘Guardian®’ rootstock, and either planted at 4 × 6 m and trained to open center or at 1.5 × 4 m and trained to perpendicular V.
Phenotypic data
Phenotypic data for QTL mapping and DNA test development were recorded over two seasons (2011–2012). Ripening time (RPT), in Julian days (JD), was determined when 20% of fruits were at commercial harvest by visually inspecting the presence of a few soft fruits in the field for maturity twice per week. A composite sample of one approximately 2 cm wide longitudinal slice from each of five fruits was used to extract juice with a juicer to measure SSC (°Brix) using a digital refractometer. Statistical analysis (descriptive, normality test, Spearman’s rank correlation, and Wilcoxon signed rank test) was performed using R Statistical Software (version 4.1.2).
The narrow-sense heritability (h2) were estimated considering the two seasons (2011–2012) using the R package Sommer33 and the vpredict function:
$$vpredict\left(object, transform\right)$$
where: object represents a model fitted with the mmer function; transform is the formula to calculate the function.
$$mix<-mmer(Trait\sim Year,$$
$$random=\sim vsr(Selection,Gu=K),$$
$$rcov=\sim vsr\left(dsr\left(Year\right),units\right),$$
Where: “K” refers to the additive relationship matrix. The formula included in the function was:
where: “VA” is the additive genetic variance; “VP” is the phenotypic variance.
Genotyping and linkage map
Samples were genotyped with the IPSC peach 9K SNP array v134. The SNP data curation was performed using the workflow for high-resolution genetic marker data described in Vanderzande et al.21. In order to reduce the time needed for analysis, a total of 1487 informative SNPs were retained for pedigree-based QTL mapping (Supplementary Table S2). The genetic positions of the retained SNPs were calculated using the physical position of the peach reference genome v2.035 and a conversion factor where every 1 Mb corresponded to 4 cM, as described by Vanderzande et al.21.
QTL mapping
The QTL mapping was performed using FlexQTL™ software (version 0.1.0.42)36 which implements pedigree-based QTL analysis via Markov Chain Monte Carlo (MCMC) simulation. The analysis was run at least twice for the two seasons (2011 and 2012), as well as for the datasets RPT_Ave and SSC_Ave (average of the two seasons), until reaching the effective chain size (ECS) criterion for convergence (≥ 100). The MCMC length for RPT ranged from 700,000 to 900,000 iterations to store one thousand samples with a thinning between 700 and 900 under mixed genetic model. However, 200,000 iterations with a thinning of 200 under additive genetic model were adequate to achieve ECS ≥ 100 for SSC. Inference on the number of QTLs was based on a pairwise comparison of models (1/0, 2/1, 3/2, and so on) using twice the natural log of the Bayes factor (2lnBF) statistic. The Bayes factor (BF) parameter was interpreted as: non-significant (0–2), positive (2–5), strong (5–10), or decisive (> 10) evidence for the presence of QTLs37.
In this study, we reported only stable QTLs. A QTL was considered “stable” when: the BF was decisive (> 10) and the significant effect was located in overlapping positions between the two seasons. The stable QTLs were named as “q” + trait name abbreviation + data set + scaffold + number of the chronological QTL for this trait reported on this chromosome (e.g. qRPT_SC_4.1).
To narrow down and re-define the QTL intervals, further analysis was carried out in FlexQTL™ using the ‘MQTRegions.new’ file considering the supporting data files ‘Post_genome.csv’ and ‘marker map’. The new generated output files were used to recalculate the phenotypic variance explained (PVE) for the stable QTLs and to update information for QTL intensity, interval, and mode positions.
FlexQTL™ obtained the additive variance (σ2A(trt)) for each trait by subtracting the residual variance (σ2e) from the phenotypic variance (σ2P). The phenotypic variance explained (PVE) for a particular QTL considering an additive model was calculated using the following equation:
\(PVE additive model= \frac{{\sigma }_{A\left(qtl\right)}^{2}}{{\sigma }_{P}^{2}}\times 100\) where:
\({\sigma }_{A\left(qtl\right)}^{2}\) is the additive variance of QTL.
Regarding the mixed model, genetic variance (\({\sigma }_{G}^{2}\)), was calculated by subtracting the residual variance (\({\sigma }_{e}^{2}\)), from the phenotypic variance (\({\sigma }_{P}^{2}\)), and the PVE was calculated as follows:
\(PVE mixed model= \frac{{\sigma }_{A\left(qtl\right)}^{2}+ {\sigma }_{D\left(qtl\right)}^{2}}{{\sigma }_{P}^{2}}\times 100\) where:
\({\sigma }_{A\left(qtl\right)}^{2} \text{i}\text{s} \text{t}\text{h}\text{e} \text{a}\text{d}\text{d}\text{i}\text{t}\text{i}\text{v}\text{e} \text{v}\text{a}\text{r}\text{i}\text{a}\text{n}\text{c}\text{e} \text{o}\text{f} \text{Q}\text{T}\text{L}\) and \({\sigma }_{D\left(qtl\right)}^{2} \text{i}\text{s} \text{t}\text{h}\text{e} \text{d}\text{o}\text{m}\text{i}\text{n}\text{a}\text{n}\text{t} \text{v}\text{a}\text{r}\text{i}\text{a}\text{n}\text{c}\text{e} \text{o}\text{f} \text{Q}\text{T}\text{L}.\)
A pleiotropic effect of RPT and other quality traits, including SSC, has been previously reported11. Therefore, QTL analysis for SSC_Ave (average of the two seasons) was also performed using the RPT information as a covariate, which was integrated as a nuisance variable in the data file used in FlexQTL™. Separate runs using a mixed model, with the same number of seeds and MCMC length (1,200,000), were carried out in FlexQTL™ using the RPT information in four different datasets: 1. Dataset where no covariate was included; 2. Dataset where the corresponding phenotypic average of RPT (in Julian days) was included as a cofactor; 3. Dataset where the haplotype information of the RPT QTL interval was included as a cofactor; 4. Dataset where both the phenotypic average of RPT (in Julian days) and haplotype information of the RPT QTL interval were included as cofactors.
Haplotype analysis
As described by Rawandoozi et al.16, the haplotype analysis was carried out by selecting the SNPs within the significant QTL interval considering the RPT and SSC average datasets. We used the output files ‘MQTRegionsGTP.csv’ and ‘mhaplotypes.csv’ generated by FlexQTL™ to perform the haplotype analysis. Haplotypes were constructed in the dataset using PediHaplotyper R package38. The effects were determined from combinations of diplotypes. The nonparametric multiple comparison Steele–Dwass test (p < 0.05) was applied to assess significant differences between diplotype effects. QTL allele genotypes (Q or q) were assigned to haplotypes based on the direction of their effects (increasing or decreasing RPT and SSC, respectively). The statistical analysis was performed using the JMP Pro Version 13.2 (SAS Institute Inc., Cary, NC, 2016). Illumina codes for nucleotides, where A = A or T, and B = C or G, were used as marker designation in haplotypes34
Phenotypic variation explained (PVE) by each informative SNP within the significant QTL interval was estimated using single linear regression (SLR) analysis in R Statistical Software (version 4.1.2). The phenotypic performance of different genotypes of the predictive marker was further validated in peach breeding populations comprising 128, 290 and 139 individuals from Arkansas (AR), South Carolina (SC) and Texas (TX), respectively.
KASP marker development and validation
Two informative SNPs capable of distinguishing haplotypes for the main RPT and SSC QTL on LG4 (10,981,971 to 11,298,736 bp) were used for developing the Ppe.RPT/SSC-1 KASP assay. The SNP_IGA_412662 was unsuitable for conversion into a KASP assay and was subsequently discarded. Primers were designed for each of the two SNPs (Table 1) and reaction mixtures and PCR conditions for the KASP assays were determined following Fleming et al.22. Eighty-four peach cultivars representing the diversity of fresh-market US germplasm, 51 of which had known genotypes from the 9K SNP array34, were chosen for the development and validation of the assay (Supplementary Table S3). Phenotypic data for RPT were obtained from literature39, while the SSC data were obtained from two databases: Clemson University Variety evaluations database (https://www.clemsonpeach.org/), and the Genome Database for Rosaceae40. The variety evaluation data from Musser fruit research farm, were collected within 2015–2021, and the publicly available GDR peach data (GRIN_PEACH and Peach RosBREED Public), were collected during three and seven seasons, within RosBREED22 project and GRIN, respectively. The average SSC for each cultivar was calculated and used in the validation of the assay. DNA for these cultivars was extracted using the protocol described in Edge-Garza et al.41.
Three replicates of non-template controls and positive controls for homozygous (AA and BB) and heterozygous (AB) genotypes were tested in a 96 well-plate for each assay. Positive controls were selected from the DNA samples with known genotypes from the 9K SNP array (Supplementary Table S4). Amplifications were conducted in a Bio-Rad CFX Connect Real-Time PCR thermocycler under a standard protocol (15 min at 94°C; 10 cycles of 20 s at 94°C and 60 s at 61°C with a 0.6°C decrease in temperature per cycle; 40 cycles at 94°C for 20 s; 60 s at 55°C, and 30 s at 23°C), and cycle 25 of real-time PCR was selected for each KASP assay to maximize separation between genotypes. A template spreadsheet designed by Fleming et al.23 was used to assign the genotypes to each sample based on the relative fluorescence unit values from this cycle. For routine use, end-point PCR was tested on Bio-Rad T100 thermocycler using the same cultivars and positive controls for all assays, with slightly modified protocol. The number of cycles at the SNP specific temperature was decreased to 25 and the last cycle of the standard protocol (30 s at 23°C) and plate reading were omitted. End-point reactions were read on the Bio-Rad CFX Connect Real-Time PCR thermocycler using Bio-Rad CFX Maestro™ software.
The newly developed Ppe.RPT/SSC-1 KASP assays were validated by screening 163 seedlings from the CUPBP and 26 commercial cultivars (Supplementary Table S5) with end-point PCR. The validation set included 15 seedlings from QTL mapping germplasm. Phenotypic data was collected in the CUPB program. The best linear unbiased predictions (BLUPs) for each individual were obtained for the RPT and SSC values recorded across five seasons (2017 to 2021) using the R package 'lme4'42 with year selected as a random effect:
$${Y}_{ij}= \mu +{g}_{i}+ {y}_{j}+ {gy}_{ij} + \epsilon$$
Where: Yij is the trait of interest, µ is the overall mean, gi is the genetic effect of ith genotype, yj is the effect of the jth year, and gyij as the interaction effect of ith genotype with jth year, ɛ is the residual of the model.
DNA samples for these individuals were obtained using a rapid crude DNA extraction protocol described by Noh et al.43. The template spreadsheet designed by Fleming et al.23 was employed to automatically assign genotypes. Statistical differences between classes were evaluated using one-way ANOVA, followed by the Tukey test for multiple comparisons in SPSS.