Genetic analysis and cross associations of strawberry (Fragaria × ananassa) genotypes and morpho-biochemical features: a practical breeding program

The success of plant breeding depends on diversity in plant genetic resources and their responses to changing environments. In this regard, twenty strawberry genotypes with altered genetic background were evaluated for their performance under different environments based on the estimation of their genetic variability and heritability. Furthermore, genetic distances and associations were assessed by using data mining techniques. The combined ANOVA results showed that environmental factors have great inuences on changing the values’ size of the measured-traits. Also, high heritability and genetic variation were detected for yield and its components showed a high potential for improvement among the strawberry genotypes and being considered as raw materials in breeding programs. The biplot and heatmap, in line with clustering, showed to be key methods for nding about the structural association among genotypes, the measured-traits, and their cross relationships and their inuences on one another simultaneously. Cluster analysis showed that the genotypes of the same geographic origin did not necessarily group in the same clusters which could be due to the environmental variability and the different responses of the genotypes coming from altered genetic materials. The overall results regarding multivariate analyses showed that genotypes Kurdistan, Queen Elisa, and No.14 could be considered as a cluster with high similarity for being crossed by other genotype in the clusters such as groups of Pajaro and Chandler, or the group of Silva and Gaviota.


Introduction
Strawberry (Fragaria × ananassa Duch.) is the most economically important berry crop with great nutritional and medicinal potential (Mozafari et al., 2019). The total effects of heredity of concerned genes and the environmental in uence are linked to the variability in the collections for these traits.
Genetic background can play a signi cant role in the yield, and phytochemical composition of strawberries (Vallarino et al., 2018;Mozafari et al., 2019). Genetic variability is vital to effectively screen and improvement the production in any given crop (Moucheshi et al., 2011;Aliakbari et al., 2013;Weber et al., 2020).
The environment and the annual climatic diversity during fruit ripening also are major factors in explaining the variations in the strawberry characteristics (Krüger et al., 2021). Surveys variability and heritability in in screening and breeding programs for strawberry would assist a breeder to know to what extent the trait are affected by genetic and environment (Saed-Moucheshi et al., 2021). The success of breeding programs is dependent on deep knowledge of the breeding value of the parental genotypes used in genetic and genomic studies and cross fertilization programs. In addition, the availability of genetic diversity within compatible genetic resources, including wild species and domestic cultivars will enhance the extent of any improvement (Lot et al., 2014;Saed-Moucheshi et al., 2021). Breeding value, the amount heritability in a feature, can be deduced from the phenotypic and genotypic value for key desirable traits, as well as from the heritability and genetic advances value of the parental lines (Moucheshi et al., 2011;Vosough et al., 2015;Cervantes et al., 2020).
Multivariate statistical analyses can provide an insight into the structure and relationship of data collected from different features. Zhang et al. (2020) used multivariate statistics to characterize diploid Fragaria species with divergent morphologies. These data were then used to select morphologically divergent interfertile parents for the production of mapping populations for QTL analysis of the genetic basis of the heritable phenotypic differences. Ukalska et al. (2006) quanti ed patterns of phenotypic diversity in a strawberry germplasm consisted of 117 accessions (clones) using multivariate statistics to assess variability and the phenotypic, genotypic, and environmental correlations among twenty-eight important traits. (Mozafari et al., 2019) used multivariate statistics such hierarchical clustering to investigate strawberry interrelationship of traits under in vitro condition.
There are numerous studies available on the effect of different treatments on the berry plants, however, there is still lack of comprehensive studies considering the effect of environmental and genetic factors on strawberry simultaneously. Also, there is a few studies on testing different lines and genotypes with altered genetic background in order to nd the best parental lines that might be able to improve both production and quality of their progenies. Regarding the lack of breeding program on strawberries, the current study was conducted to evaluate the eld performance of 20 strawberry genotypes altered in their genetic background in Kurdistan province, as the main genetic resource and highest producer of strawberry in Iran. In this study, different genotypes were evaluated to nd the best parental lines for being used as the source of genetic materials in upcoming breeding programs. Also, different features of the strawberry genotypes were checked for genetic parameters and the in uence of environment and genetic background and the ability of screening via these traits. We also provided a program writing in SAS language in order to estimate genetic parameters in any given plant without requirement of being managed by an admin. Additionally, advanced multivariate statistical methods were used to estimate the phenological traits relationships to provide a theoretical foundation for systematic strawberry breeding programs.

Materials And Methods
The silt-loamy soil of the trials' location had a pH of 6.99 and electrical conductivity of 0.459 dS m -1 . The plants were formerly cultivated in the greenhouse and after rooting they were transplanted to the eld and cultivated on 20 cm raised beds in double row hills system with inter and intra-row spacing of 90 and 40 cm, respectively, in open eld conditions to provide a natural environment. The experiment was carried out based on a randomized complete block design (RCBD) with three replications. As usual in the region, plants were irrigated twice a week and exact same amount of water were provided for each plot. Weed control, where it was required, was done by hand, and no fertilizer and herbicide, or other chemical materials, were used in this study. Flowering and fruiting date: The date of appearance of the rst ower and fruit from a reference date (days from 20th of March) was used for owering and fruiting date determination.
Flowering and fruiting period: The average of days between the appearance of the rst and last ower and fruit from three plants of each experimental unit was calculated for the owering period.
Stolon and crowns per plant: The sum of stolon and crowns from each plant at three harvest times (second, eighth and fourteenth harvest) was calculated.

Yield parameters
In orescence per plant: The average of in orescences number from each experimental unit at the end of the harvest period was calculated to in orescence per plant determination.
Flower per in orescence: The average of ower number from each in orescence belong to each experimental unit was calculated.
Flower per plant: The sum of owers in each plant was calculated during the owering period.
Fruit per plant: The number of fruits harvested in each plant was counted during the fruiting period.
Fruit weight: Fruit weight was calculated based on the average of weight of twenty fruits from each plant at three harvest times (second, eighth, and fourteenth harvest).
Yield (g/plant): The yield was calculated based on the total weight of fruits harvested during the fruiting period.

Biochemical parameters
Total soluble solids (TSS): The level of TSS was measured using a refractometer (ATAGO, Japan) at 20 °C.
Titratable acidity (TA): The value of TA was determined by diluting 5 ml aliquot of the juice in 50 ml of distilled water and then titrated to pH 8.2 using 0.1 N NaOH. TA was determined as citric acid equivalent and expressed in g/100g FW (Zheng et al., 2007).
Anthocyanin content: Content of anthocyanin was measured based on (Woodward, 1972). The absorption of the supernatants was recorded at 517 nm using a spectrophotometer. Anthocyanin content was calculated using the following formula and expressed as mg of cyaniding 3-glycosid per 100g FW by using a molar absorptivity of 26900 and a MW of 449.2. Anthocyanin content (mg/l) = (A×MW×DF×1000)/ ( ×1)

Statistical models and methods
Obtained data from all measured traits were subjected to the combined analysis of variance (ANOVA). Bartlett's test for homogeneity of variances was carried out prior to the combined ANOVA. SAS 9.4 was used for performing the combined ANOVA and estimating the genetic parameters according to the code source that is provided as supplementary material in this study. The code has never been used in other studies before and is originally provided the First author. This code is applicable on all similar studies for any other plant. Also, multivariate statistical techniques containing correlation plot, heatmap, clustering, and principal components analysis (Saed-Moucheshi et al., 2013) were carried out in the newest version of R software (R 4.0.2).
Heritability in a broad sense (h 2 ) was estimated according to the following formula: Where σ 2 g and σ 2 ph are genotypic and phenotypic variances. Genotypic and phenotypic variances were estimated using expected mean squares of genotype (MSg) and error (MSe) according to (Saed-Moucheshi et al., 2021): Phenotypic coe cient of variation (PCV) and genotypic coe cient of variation (GCV) (Saed-Moucheshi et al., 2021) were determined for comparing the data with different units using the following formulas: Genetic advance (GA) was calculated as follow (Moucheshi et al., 2011): Where K=2.06 is selection intensity at 5% and σ ph is the square root of phenotypic variance.

Mean performance of genotypes
According to combined ANOVA (Supplementary Table S1); save for stolon numbers and TSS, the effect of environment changing (year) was signi cant (p<0.01). All measured features in Supplementary Table S1 were signi cantly affected by genotype, as a source of variation in ANOVA table (p<0.01). The interaction effect between year and genotype was also signi cant in relation to fruits number, owers number, stolon numbers, stolon extension date, leaf number, fruiting period, fruiting date, and yield (p<0.05).
Supplementary Table S2 and Supplementary Table S3, shows the mean comparison of genotypes regarding all measured-features in the rst and second years, respectively. Accordingly, a wide range was detected for the number of leaves (22.83 to 64.00 in Pajaro and Kurdistan, respectively) with a grand mean of 43.62. Minimum and maximum values of leaf area were 47.69 (cm 2 ) in Kurdistan and 92.28 (cm 2 ) in Tennessee Beauty with a grand mean of 70.89 (cm 2 ). The mean of stolons per plant was 6.59 with minimum and maximum ranges from 4 in Mrak to 9.17 in genotype No.14. The mean of stolon extension date was 60.73 (days), which ranged from 54.50 to 70.33 in Yalova and Mrak, respectively. The mean of owering and fruiting date were 17.53 and 52.69 (days), with maximum number of days (20.66 and 56.83 in Aliso and Tioga, respectively) and minimum number of date (14.66 and 50.33 in Blakemore and No.14, respectively). A wide range was observed for owering and fruiting periods in Gaviota and Paros (30 to 40.84 and 24.00 to 34.5 days, respectively). The number of crowns per plant was ranged from 3.83 in Gaviota to 11.83 in Kurdistan with a general mean of 19.36. In the case of yield-related traits, the average number of in orescences per plant was 9.25, ranging from 3.33 in Gaviota to 13.17 in Kurdistan. Mean of ower per in orescence was 6.72, ranging from 3.10 to 10.06 in Pajaro and Kurdistan, respectively. Minimum and maximum numbers of ower and fruit per plant were ranged from 17.00 to 102.50 and 11.94 to 94.50 in both Gaviota and Tioga, respectively, with an overall mean value of 6.72 and 62.91. Average fruit weight was recorded 8.91 (g) with minimum and maximum values of 5.39 (g) in Tioga and 13.53 (g) in Camarosa, respectively. Minimum and maximum values of yield were recorded 77.70 and 852.95 (g/plant) in Gaviota and Queen Elisa, respectively with an overall mean of 482.55 (g/plant). An extensive range was found for studied biochemical traits. The mean value for TSS was 7.59 (°Brix), which ranged from 5.73 in Chandler to 9.64 in Gaviota.
Minimum and maximum of TA were recorded 359.50 and 754.67 (mg/100g FW) in Catskill and Blakemore, respectively, with a general mean of 522.30 (mg/100g FW). Mean of anthocyanin was 25.97 (mg/100g FW), ranging from 12.06 in Paros to 35.80 in No.14 (mg/100g FW).

Genetic parameters and genetic associations
Based on the written program in SAS by the authors, genetic parameters were separately estimated in both years (Table 3 and Table 4). Although most of the measured-traits e.g., anthocyanin, total acidity, and total yield had relatively high heritability (h2>0.8), the traits related to time of fruit and ower appearance showed higher heritability than other traits in both years. Leaf area showed the lowest heritability in both years. Fruits and owers per plant presented the highest genotypic (over 36%) and phenotypic coe cient of variation (over 42%), while TA and fruiting date exhibited the lowest GCV and PCV (lower than 5%). Table S4 and Supplementary Table S5) as well as genotypic correlation coe cient (Table 5 and Table   6) indicating the direction and the quantity of association among the measured-traits were calculated for both years. Genotypic and phenotypic correlations were both based on Pearson's method. In this case, the direction of almost all genotypic and phenotypic correlations were similar to each other, e.g. yield and anthocyanin showed both negative correlation with each other in both estimations, however, their intensities were different. Therefore, the association among the traits has put into two correlation plots for rst ( Figure 1A) and second ( Figure 1B) years. Accordingly, yield and its components have shown high and positive correlations with one another, while they showed lower, or negative in some cases, associations with biochemical features.

Classi cation and cross association of genotypes and features
Heatmap plot showing the cross relationship between measured-traits and used genotypes alongside the clustering analysis of genotypes and features based on genetic distances and their similarities was drawn for the data of both years ( Figure 1A and Figure 1B).
Hierarchical cluster analysis in both years put genotypes into overall four major groups. Blackmore, with Queen Elisa and Kurdistan in one hand and Catskill with No.14 on the other hand showed lowest genetic distance in relation to other genotype distances with one another. However, these genotypes showed highest distance with Fresco and Missionary in rst year ( Figure 2A) and with Missionary and Tennessee Beauty in the second year ( Figure 2B).
Although there were some changes in pattern of the clusters among genotypes from the rst year to the second year, most pairs of genotypes stayed together e.g., Pajaro and Chandler.
Similar to what was observed in the clustering pattern of genotypes in both years, there were some overall changes in the clustering regarding the measured-traits e.g., anthocyanin was placed in the same cluster with owering date and stolon extension date in the rst year while it contained in a cluster with leaves and crown number. Meanwhile, anthocyanin stayed close to fruiting date in both years. Similarly, total yield showed close neighboring with fruit and ower numbers.
The clustering of the genotypes and features was tested by biplot extracting from the rst two components in accordance with Figure 3. Most of the association paired associations between genotypes or features were veri ed by biplot. Kurdistan and Queen Elsa in both rst ( Figure 3A) and second ( Figure 3B) years were stayed together. Similar pattern was observed for Pajaro and chandler. For higher clarifying the scattering of genotypes across the rst two components, Supplementary Figure S1A for rst year and Supplementary Figure S1B for second year are presented as supplementary material.
Accordingly, Giova showed to be signi cantly farthest from other genotypes. Also, for clarifying the contribution of each feature in the rst two components and to detect the angle between the features, Supplementary Figure S2A and Supplementary Figure S2B for the rst and second years, respectively, are provided. In both years, the highest contribution (dark blue color in Supplementary Figure S2) was belonged to ower and fruit number per plant, and at the same time, they showed acute angle with each other. After these two features, yield and other yield components showed to be more contributed in the variability among the genotypes. Anthocyanin and TSS showed lower contribution than the average rate in the loading plot (Supplementary Figure S2) and biplot ( Figure 3).
The cross association of genotypes versus measured-traits could be considered by both heatmap and biplot. Pajaro and Chandler showed negative values against yield and some other yield components, and they also showed obtuse angle, indeed very close to 180 , verifying the results of heatmap. In the rst year, genotype No.14 and in the second year cultivar Kurdistan were closest to the strawberry yield (Figure 3), and they showed high and positive values in the heatmap as well ( Figure 2). Sequia, according to both heatmap and biplot, was the genotype having negative associations with owering date.

Discussion
ANOVA results and mean comparison indicated a high range of variability among the genotypes as a result of both altered genotype backgrounds and the high in uences of environmental factors. Th signi cant interaction between year, known as environmental effect, and genotype for most of the measured traits showed that genotypes are most likely responding in different ways to environmental changes. Therefore, the screening among the strawberry genotypes for breeding programs should be made with care and with considering the environmental factors. Considering the weather condition of both seasons of cultivation showed wide differences in reading to precipitation, temperature, and air humidity. In addition, signi cant differences between genotypic and phenotypic correlation coe cients for the measured-traits indicated that the environmental factor is not only able to vary the values of the traits but also to change the relationship between any pair of these traits. However, strawberry yield and the yield components stayed positively correlated relationship under different environmental conditions showed that under normal condition the variability among traits was higher than that under stress condition, which supports our results of changes in genotypic and phenotypic correlation coe cients.
Genetic parameters indicated high potential of the used genotypes for improving measured-traits in this study because the most of the traits showed high heritability on one hand, and the high genetic variabilities on the other hand. According to genetic gain and estimated next generation means, screening among the used genotypes and descending them into following generation would increase these features signi cant. As in another notion, the presented outcomes demonstrate that although most of the cultivated strawberries genotypes were originated from the same species (Fragaria × ananassa, Dutch.) mainly through breeding programs, there are still high amounts of variations between them. In any study, the nal yield of the plant and economic production is the rst aim in the breeding programs. High heritability of yield in these genotypes showed a great possibility of screening among the genotypes and distributing new population for gaining higher yield. Also, since the berry yield is a polygenic trait, and environmental conditions have a great effect on it, breeders might nd it e cient to use some of the yield components as selection criteria to boost the e ciency of the program (Tabarzad et al., 2017;Singh et al., 2018). The results of this study showed different trends compared to Singh et al. (2011), using 21 genotypes of strawberries in India; in which the authors reported less mean (111.9 compared to 482.55 g/plant) and smaller range (26.6-248.2 compared to 77.70-852.95 g/plant), but higher h2 (98.31 compared to 84.18 %) for yield that may be due to a shorter harvest period, different genotypes (only 5 genotypes in common) and different climatic conditions. The results of anthocyanin related genetic parameters in the current study were closely in concordance with those reported by Bacchella and Testoni (2009).
Mean comparison and genetic parameters can only consider the variability and differences between the genotypes for one feature and they cannot support multiple features simultaneously. On the other hand, using multivariate techniques gives us the ability to consider all the available features and data samples all at once. Thus, some advanced multivariate and data mining methods i.e., cluster analysis, heatmap, and biplot were used in this study. The results showed that there are close similarities as well as dissimilarities (or genetic distances) among the used genotypes. These genetic similarity and dissimilarity spring a great opportunity to screen among these genotypes for being used as the source of genetic materials for next generations and providing a population with a high variation. The majority of these cultivars came from the University of California, USA breeding program, and the rest came from Europe except for Kurdistan which its origin is most likely Russia (Ghaderi et al., 2018). Cluster analysis was able to distinguish overall four clusters of the used genotypes, where most of the genotypes' pairs have matched in both years under different environmental conditions. Genotypes of the same geographic origin may not necessarily place in the same group together probably because of worldwide plant exchange between the strawberry breeding programs and also the presence of varied parental lines in the same breeding program (Vosough et al., 2015). There are some overlaps between genotypes used in this study and those of Gil-Ariza et al. (2009)  The overall results regarding multivariate analyses showed that genotypes Kurdistan, Queen Elisa, and No.14 could be considered together as a cluster of high similarity for being crossed by other clusters such as Pajaro and Chandler, or Silva and Gaviota. In addition to clustering and the biplot results, the heatmap has also proved the e ciency of screening these genotypes for practical breeding and crossing studies, because these genotypes have shown great differences in regards to yield and important yield components such as fruit number. The importance of nding proper parental lines in breeding program is vital not only for increasing the yield, but also for molecular and population genetics for mapping purposes as well. Different researchers have used parental lines with a high genetic distance to produce high variability population for mapping different quantitative traits loci (Vallarino et al., 2019;Labadie et al., 2020). Our results could also be used in the association mapping studies where the individual with different parental lines are used due to the high genetic variations that was detected. In this regard, Pincot et al. (2018) used a population of different strawberry cultivars for mapping for mapping Fw1 genes . Furthermore, the local farmers in the Kurdistan district, as the main source and secondary origin of strawberry in Iran, are narrowing down their use of varied genotypes to use of some few cultivars in their elds e.g., Kurdistan, Camarosa, and Queen Elisa; however, these results are indicating that other genotypes than abovementioned ones are also e cient and economically explainable for being used in their eld.

Conclusions
The combined ANOVA results showed that environmental factors have great in uence on changing the values' size of the measured-traits. Also, high heritability and genetic variation were detected for yield and its components showing a high availability of improvement potential among the strawberry genotypes for being considered in breeding programs. The biplot and heatmap, in line with clustering, showed to be key methods for nding about the structural association among genotypes or measured-traits and their cross relationships and in uences on one another simultaneously. Cluster analysis showed that the genotypes of the same geographic origin did not necessarily group in the same clusters which could be due to the environmental variability and the different responses of the genotypes coming from altered genetic materials. The overall results regarding multivariate analyses showed that genotypes Kurdistan, Queen Elisa, and No.14 could be considered together as a cluster of high similarity for being crossed by other clusters such as Pajaro and Chandler, or Silva and Gaviota. Weber, D., Egan, P.A., Muola, A., Stenberg, J.A., 2020. Genetic variation in herbivore resistance within a strawberry crop wild relative (Fragaria vesca L.). Arthropod-Plant Interactions 14, 31-40.