Molecular mapping of QTLs for fiber quality traits in Gossypium hirsutum multi-parent recombinant inbred lines

Cotton is a valuable fiber crop which supplies raw material to more than 50 industries and is produced in more than 70 countries worldwide. The superiority of cotton fiber over other crops is primarily dependent on its quality. However, further improvements in fiber length and strength are required for modern processing technology and for cotton to maintain its position in the global market. Association mapping enables identification of QTLs controlling fiber quality-related traits which can be useful in cotton breeding. In the present study, we performed genetic diversity, linkage disequilibrium and association mapping analyses in 157 G. hirsutum multi-parent recombinant inbred lines using a total of 102 SSR markers. The population had depressed genetic variability (14%), a result of inbreeding of modern cotton genotypes. Despite this, we identified 11 significant and stable marker-trait associations for seed cotton yield, lint percentage, fiber length and fiber strength (p < 0.005). We also detected QTL co-localizations with positive and negative marker additive effects. Our results indicate that selection against negative alleles may be as important as selection for positive alleles. Analysis of the effects of allelic combinations at different QTLs revealed significant and stable marker clusters that can be selected for or against to provide maximum quality gains in cotton fiber quality.


Introduction
Gossypium is a complex and highly variable genus consisting of species with different ploidies. According to one estimate, the genus includes 50 species of which 45 are diploids (n = x = 13) and five are wellknown allotetraploids (n = 2x = 26). Gossypium species have been used as a source of desired traits and cotton fiber for thousands of years (since 800-700 BC); however, only four species are domesticated and cultivated: the New World cottons, G. hirsutum L. and G. barbadense L. (n = 2x = 26), and the Old World cottons, G. arboreum L. and G. herbaceum L. (n = x = 13). G. barbadense has good fiber features such as long, stable, fine fibers while G. hirsutum does not. On the other hand, G. barbadense has narrow adaptability due to its requirement for a long growing season while G. hirsutum is adapted to a wider range of environments and produces higher yields (May and Lege 1999). Therefore, G. hirsutum is the main supplier for the textile industry accounting for 95% of global cotton production (Jenkins 2003;Lee 1984). G. barbadense has a small market share along with the other domesticated cottons in the USA, China, India and Egypt (PROTA 2020; Dube 2020).
Upland cotton (G. hirsutum) is a valuable fiber crop which supplies raw material to more than 50 industries (textile, oil, paper, etc.) and is produced in more than 70 countries worldwide (USDA 2020). The market value and superiority of cotton is primarily dependent on fiber quality (Gordon and Hsieh 2007). The most important forces driving the improvement of cotton quality are technological developments in processing, increased quality of life, preference for environmentally friendly and organic products, population growth and increased competition with synthetic fibers. The natural structure of cotton fiber provides many advantages over synthetic fibers including air permeability, comfort and high absorbency (Baytar 2014; Kalabek and Babaarslan 2016); however, increased fiber length and strength are needed for cotton fiber to adequately meet the requirements of modern processing technology. Therefore, significant improvements in fiber quality are essential for cotton to maintain its position in the global market.
Upland cotton has been subjected to selection and cultivation for several thousand years. As a result, domesticated cotton's genome and morphology have undergone dramatic changes compared to wild types (Seyoum et al. 2018;Grover et al. 2020). The use of closely related genotypes and inaccurate selection criteria have reduced the genetic diversity of modern cotton cultivars. This narrow genetic basis is the main obstacle to improvement of quality and productivity in cotton. Characterization of the genetic diversity of cotton germplasm is essential for its use in breeding because it allows selection of the most appropriate combinations of parental genotypes and effective use of genetic materials. Molecular analysis is an effective and efficient way to determine the genetic background of and relationships between individuals. The information gained from such analyses can also be used in combination with phenotypic characterization to map yield and quality parameters such as fiber traits.
Many association mapping studies have targeted fiber quality in Upland cotton using simple sequence repeat (SSR) markers. These include work that examined fiber traits (Shermotov et al. 2010;Baytar et al. 2018b;Zeng et al. 2009), yield traits Baytar et al. 2018a;Jia et al. 2014a, b;Mei et al. 2013) yield and fiber traits ) and fiber quality traits (Abdurakhmonov et al. 2008(Abdurakhmonov et al. , 2009Cai et al. 2014;Nie et al. 2016;Ademe et al. 2017;Iqbal and Rahman 2017;Dong et al. 2019). For example, Cai et al. (2014) identified 107 significant fiber trait-associations in 99 G. hirsutum accessions using 97 polymorphic SSR markers. They reported that 70 SSR loci of 107 associations were significant in more than one environment and, that 36 SSR loci had been identified in previous studies which indicated the stability of the loci. Huang et al. (2018) screened Upland cotton cross populations using 284 SSR markers. They revealed that 54 SSR loci were significantly associated with fiber quality and, that 14 of them matched previously reported quantitative trait loci (QTL). In another association study, 57 significant associations were determined for fiber quality in 305 Upland cotton accessions using 198 SSR markers. Of the significant loci, 34 were identified for more than one fiber trait (Ademe et al. 2017). These and similar association analysis studies will be helpful for fiber quality trait breeding of modern cotton cultivars.
In the present study, we aimed to (I) examine the genetic diversity and ancestral background of 157 G. hirsutum multi-parent recombinant inbred lines (RILs) using a total of 102 SSR markers to detect promising genotypes for breeding, (II) identify marker-trait associations for fiber quality traits in the RIL population, and (III) reveal significant and stable molecular marker combinations to select alleles that provide maximum quality gains in cotton breeding.

Plant material
A total of 157 advanced cotton lines (G. hirsutum) were provided by Ö zaltın Agricultural Enterprises Industry and Commerce Inc. (OAE), Aydın Turkey (Supplementary Information Table S1). The advanced cotton lines were developed by crossing 100 inbred lines with three donor parents, Candia, Carmen and N84-S which were selected for their fiber quality characteristics. The total of300 F1 individuals from three parent were self-pollinated to produce F2 populations of approximately 100 plants of each cross which were grown in the field and observed for their agronomic traits. Selected lines were self-pollinated with observation, selection and self-pollination of progeny continuing an additional six generations to produce the 157 F8 multi-parent recombinant inbred lines that were used in this study (Supplementary Information. Fig. S1).

Field experiments and phenotypic data collection
Field experiments were performed over two growing seasons at (OAE), Aydın, Turkey. Plants were manually harvested in September. The experimental region had sandy, loam soil with 13-14% water capacity and 4-6% wilting point. For field experiments, 10 replicates for each of the 157 cotton lines were planted. Each genotype was planted in a 12 m row with 0.7 m between rows and 0.2 m between individual plants. The population was characterized morphologically and the results were arithmetically averaged for each season. Seed cotton yield was calculated as total weight of seed cotton (kg ha -1 ) before ginning. A roller gin was used to remove seeds from fibers. Lint percentage was calculated based on the formula: [lint(g)/(lint (g) ? seed (g)) 9 100%]. The fibers were incubated at 21°C and 65% relative humidity for 48 h until they reached 7-8% moisture content before fiber traits were evaluated. Fiber length and fiber strength were measured with a USTER-HVI machine based on HVI cotton standards.
Best linear unbiased predictions (BLUPs) were calculated to model the morphological data over two years based on the method described by Wen et al. (2014) using JMP software (JMPÒ, Version 1.4. SAS Institute Inc., Cary, NC, 1989. BLUPs were used in heritability, bivariate pairwise correlation and association analyses. Broad sense heritability was estimated with the formula: H 2 = Vg/Vp * 100, Vg (r 2 g) = genotypic variance = MS(G)-MS (E) /r, Vp (r 2 p) = phenotypic variance = r 2 g ? r 2 e where, Ve (r 2 e) = error variance, i.e. MS(E)/r; and (r = number of replicates) (Bhagasara et al. 2017;Kruijer et al. 2014). MS(G) and MS(E) are the mean squares for genotype and residual error which were calculated with ANOVA analysis. Bivariate correlation coefficients were estimated with Pearson Correlation, twotailed method. All statistical analyses were done with PAWS statistics software (SPSS Inc. Released 2009, PASW Statistics for Windows, Version 18.0, Chicago). SSR genotyping DNA was isolated from young, fresh leaves with the manual DNA isolation protocol introduced by Doyle and Doyle (1987). DNA was quantified using a Nanodrop ND-1000 spectrophotometer and adjusted to 50 ng/ll for further analysis. A total of 102 SSR primer pairs (Supplementary Information Table S2) which were determined to be polymorphic in our previous study (Baytar et al. 2018b), were used to identify loci within the population. Primer information is available at Cotton Database Resources (www. cottongen.org).
Polymerase chain reactions (PCRs) were performed as described by Baytar et al. (2017). PCR amplicons were separated with a capillary electrophoresis Fragment Analyzer TM Automated CE System based on DNF-900-55-DNA-35-500 bp method. PROSize 2.0 analytical software was used to analyze the raw data. Allele sizes were determined by binning fragments into ± 2 base pair bins.
Genetic diversity and population structure SSR data were scored dominantly: ''1'' for presence, ''0'' for absence and ''9'' for missing data. DARwin6 (Dissimilarity Analysis and Representation for Windows) (Perrier and Jacquemoud-Collet 2006) was used to calculate pairwise distances between cultivars with the Dice coefficient and the unweighted neighborjoining algorithm. Pairwise PhiPT values, analogous to Fst, were calculated among clusters in the population by molecular variance analysis with GenAlEx 6.503 Smouse 2006, 2012). STRUC-TURE software (Pritchard et al. 2000) was used to evaluate genetic structure in the population with a Bayesian iterative algorithm. Sub-sets of the entire population were estimated based on different allele frequencies in the data. Sub-set numbers (K) from 2 to 10 were tested with 20 iterations each. Admixture model was used with a burn-in of 50,000 iterations and 300,000 MCMC (Markov Chain Monte Carlo) replications for correct estimation. The best representative K was decided based on the highest DK acquired from the Structure Harvester web-based program (Earl and von Holdt 2012) by implementing the Evanno method (Evanno et al. 2005). Individuals were assigned to the sub-sets based on the ancestral similarity pattern with a cut-off value of 60%.
Linkage disequilibrium and association mapping analysis QTL analysis was performed with TASSEL software v2.1. SSR alleles with frequencies below 0.05 were removed before analysis because minor alleles can bias LD. General linear (GLM) and mixed linear (MLM) models were tested to determine the best fit for association analysis. At first, MLM was implemented with the kinship (K) matrix. Then, MLM (with K) and GLM were each corrected with Q matrix and principal components (PC), separately. They were also corrected with both Q and PC. Q matrix was acquired from population structure analysis. PCs and K were calculated by TASSEL software. The seven methods were evaluated based on the proportions of null pvalues (p0) estimated with QVALUE software (Storey 2002) (False discovery rate, FDR = 0.05). The one with the highest proportion of significant p values (p1 = 1-p0) was the best model for QTL detection. Significance level was set at p B 0.005 for QTL detection. FDR was calculated for p values with QVALUE (p B 0.005 and q-value \ 0.1) (Weller et al. 1998;Benjamini and Yekutieli 2005).
Pairwise linkage disequilibrium (LD) was estimated based on correlation coefficient (r 2 ) (Kruglyak 1999;Ardlie et al. 2002) between all pairs of SSRs with a rapid permutation test with 10,000 shuffles (p B 0.01) using TASSEL 2.1 (Bradbury et al. 2007). Significant LD (p B 0.01 and r 2 C 0.03) was used to generate the LD decay pattern. The chromosomal positions of the molecular markers were based on Blenda et al. (2012) and Fang and Yu (2012) (Supplementary Information Table S2).

Result and discussion
Field experiments and phenotypic data collection We used intraspecific multi-parent RILs in our study because the results of interspecific population QTL analysis are often not effective for breeding of Upland cotton . For example, favorable alleles often come from a related species' genome and must be introgressed into G. hirsutum. This adds several generations to the breeding process. Moreover, markers that are polymorphic in interspecific populations may not be polymorphic in G. hirsutum resulting in markers which are unusable in selection for Upland cotton ). On the other hand, intraspecific populations often have low variation within the population. To overcome these problems, advanced cotton lines were generated from F1 hybrids of 100 different inbred lines and three donor parents: Candia, Carmen and N84-S (Supplementary Information Table S1). Candia and Carmen were released by CSIRO and marketed by CSD in Australia and by Bayer CropScience in the United States. Candia was marketed as Scot71 in Australia and Carmen was marketed as FiberMax989 in the United States. The donors were selected for their good performance when grown in the Aegean region of Turkey. Candia shows good fiber quality with an average of 290.5 kN m kg -1 strength, 27.9 mm length, 4.5 micronaire fineness and 43.9% lint (Harem, 2014). Carmen was reported to have better quality than Candia with 301.27-356.8 kN m kg -1 strength, 30.3-32.0 mm length, 4.4-5.1 micronaire and 41.8% lint. N84-S (Nazilli 84-S) bred in Turkey shows good fiber quality with 78-84 (1000 Lb/inch 2 ) strength, 28.5-29.5 mm length, 4.3-4.8 micronaire and a high ginning efficiency of 44-45% lint (Harem, 2014).

Seed cotton yield
Seed cotton yield of the 157 RILs ranged from 59 to 922 kg ha -1 with an average of 482 kg ha -1 for all genotypes (Table 1). Yield was more variable in the RILs than fiber characteristics (CV 29%). The majority of the RILs (59%) showed more seed cotton yield from 1 to 92% than the population average. The four highest yielding genotypes (42-92% more than the average yield) were 13 OZ 034, 13 OZ 077, 13 OZ 104 and 13 OZ 090 whose donor parents were Carmen or Candia (Supplementary Information Table S1). These RILs may be used as high-yielding material in breeding programs. Seed cotton yield was significantly correlated only with fiber strength (r = 0.23, p \ 0.01) indicating that strong-fiber varieties yielded more than weaker ones (Table 2). Thus, fiber strength is an important trait affecting yield and can be used as a selection criterion along with other parameters. Significant positive correlations were also reported between seed cotton yield and fiber strength in other work (r = 0.63; Yaqoob et al. 2016), (r = 0.28; Azhar et al. 2004). However, this is not consistent across the literature as negative correlations were detected for these traits in other work (Karademir et al. 2010;Mendez-Natera et al. 2012;Tang et al. 1996). Seed cotton yield is greatly affected by genotype, temperature and growing conditions. For example, Baytar et al. (2018a) detected a significant decrease in seed cotton yield (27%, p \ 0.001) with insufficient soil moisture. These factors may account for the contradictions among fiber quality studies in previous reports.

Lint percentage
Lint percentage is the percent of lint obtained from seed cotton. Lint percentage significantly varied between genotypes (p \ 0.05) (Supplementary Information Table S3) ranging from 37 to 47% with an average of 41% over two years (Table 1). In our study, almost half of the genotypes (45%) had more lint yield with 1 to 10% more lint compared to the mean of population. There were statistically significant differences for lint percentage between the cross combinations using Carmen versus Candia (p \ 0.0001) as well as Carmen versus N84-S (p = 0.03) (Supplementary Information Table S4) such that the Candia and N84-S combinations had slightly but significantly higher lint yield (Supplementary Information Table S5 and Table S6). Some advanced lines stood out with the highest lint percentage: 13 OZ 111, 13 OZ 118, 13 OZ 110 and 13 OZ 109. Of these lines, three were from same cross (Candia x 06OZ70-4) which indicates the promise of this combination for improvement of lint yield. However, the fiber characteristics of these lines were poorer than average. In accordance with this result, we detected negative correlations between lint percentage and fiber length (r = -0.33, p \ 0.01) and fiber strength (r = -0.21, p \ 0.01) indicating that long fibers were associated with reduced lint percentage. Research has consistently indicated negative correlations between lint percentage and fiber parameters (fiber length and strength) (Karademir et al. 2010;Ulloa and Meredith, 2000;Rakshit et al. 2010;Stroman 1949). One exception is the work of Khalid et al. (2018) who saw a significant positive correlation between lint percentage and fiber length. Given the preponderance of evidence, it seems that selection for higher lint percentage is likely to be accompanied by negative linkage drag of undesired fiber quality.  Table S3). Low heritability was also detected in other studies (30%, Jarwar et al. 2018;32%, Dhivya et al. 2014). Taking all of these results into consideration, phenotypic selection based on lint percentage alone may not be effective for improvement of productivity. This is consistent with the statement by Kearney (1912) that lint percentage should not be used directly to evaluate productivity as it can often lead to unintended consequences.

Fiber length
All genotypes had long or medium-long fibers ranging from 26.1 to 33.8 mm with an average of approximately 30.0 mm over both years. The majority of the RILs (87%) were classified as long-fiber types (C 29 mm) (Supplementary Information Table S1). There was a statistically significant difference between the averages for the three cross combinations (p = 0.001) such that the progeny of Carmen tended to have longer fibers than those of Candia. This was expected given that Carmen has significantly longer fibers than Candia. The genotypes with the longest fibers were 13 OZ 043, 13 OZ 132, 13 OZ 145 with average fiber length of 32 mm, similar to Carmen. Indeed, two of these RILs were from crosses with Carmen. In addition to its long fibers, RIL 13 OZ 043 also had strong fibers. Consistent with these findings, fiber length correlated positively with fiber strength (r = 0.28, p \ 0.01) such that long fibers tended to be stronger than shorter ones. Similarly, other studies found significant correlations such that fiber length correlated positively with fiber strength (r = 0.22-0.77, p \ 0.01) (Baytar et al. 2018b;Khalid et al. 2018;Karademir et al. 2010;Wan et al. 2007;Zhang et al. 2005;Asif et al. 2008)). Simultaneous improvement of fiber features is a highly desirable goal for cotton breeders and the correlations between these features show that they can be improved together without unwanted linkage drag. Stronger, longer fibers positively affect spinning consistency as they can be twisted around each other with higher efficiency in processing (Chee et al. 2005;Moore 1996). The early stages of fiber growth are highly sensitive to differences in temperature, growth area and especially watering regime (Reynolds and Killough 1933;Gipson and Joham 1969;Hanson et al. 1956). For example, we previously reported that reducing irrigation by 50% caused shorter fiber (5%, p \ 0001) development (Baytar et al. 2018b). This sensitivity was reflected in our current study by a significant year effect (p = 0.02) (Table S3) and also a moderate heritability (43%) (   stronger than the population mean. Two of these lines were Carmen crosses; however, as already indicated, there were no statistically significant differences between the means for the three cross combinations. Fiber strength is a complex feature and is reported to be influenced by environmental factors more than fiber length (Pope and Ware 1945;Hanson et al. 1956) in agreement with our finding that this trait had the moderate heritability estimate (42%) and the highest year effect (Supplementary Information Table S3). Temperature differences, soil moisture availability and especially sunlight intensity have great effects on fiber strength (Hanson et al. 1956). Accordingly, we previously found that insufficient soil moisture significantly reduced fiber strength (Baytar et al. 2018b). In this study, there was a 13.5% reduction in average precipitation during the second season's growing period (April-September) for Aydin (Supplementary  Information Table S7) (WWO 2013). Thus, one of the factors accounting for the improved fiber strength in the second year may be related to unexpected rainfall during the previous growing season.
It is important to highlight individuals with multiple favorable traits. Such individuals are good candidates for simultaneous improvement of fiber quality parameters. The genotypes 13 OZ 120 (Carmen X 06OZ569) and 13 OZ 104 (Candia X 06OZ653) had good seed cotton yield, fiber length and fiber strength; while 13 OZ 146 (Carmen X 06OZ653) and 13 OZ 087 (N84-S X 06OZ720) had longer and stronger fibers than the population averages.
Genetic diversity and population structure A total of 102 SSR primer pairs produced 379 loci with an average of 3.7 alleles per marker across 157 multiparent Upland cotton RILs. Eight individuals were discarded from genetic analysis due to a high proportion of missing data in pairwise comparisons ([ 50%): 13 OZ 035, 13 OZ 042, 13 OZ 044, 13 OZ 076, 13 OZ 101, 13 OZ 123, 13 OZ 130 and 13 OZ 139. Unweighted neighbor-joining (NJ) dendrogram analysis revealed three main sub-clusters (Supplementary Information Fig. S2): Cluster1 with 78 (52.3%), Cluster2 with 62 (41.6%) and Cluster3 with 9 (6%) individuals. There was a strong correlation between the Dice coefficient dissimilarity matrix and the relationships demonstrated by the dendrogram (r = 0.89) based on a Mantel test. The progeny of the cross combinations were distributed over the three main clusters without any tendency to group based on donor parent (Candia, Carmen or N84-S) (Supplementary Information Table S8) indicating that recipient parents (inbred line parents) contributed greatly to the genetic differentiation of the RILs.
In this study, we used a three parent-based RIL population in an attempt to obtain higher genetic variability in the population, however, genetic analysis revealed low diversity with a mean dissimilarity of only 14%. This indicates a genetic bottleneck within current cotton genotypes due to thousands of years of selection and over-cultivation of limited genetic material (Noumkina et al. 2019;Chaundry et al. 2010;May Ol 1999). In contrast, when germplasm instead of a bi-parental population was used, higher level of genetic diversity (38%) was detected (Baytar et al. 2017). Low and moderate levels of genetic diversity were also reported for Upland cotton in many other studies regardless of population model (Tyagi et al. 2014;Abdurahkmanaov et al. 2008;Lacepe et al. 2007;Fang et al. 2013;Fang et al. 2014;Hinze et al. 2012).
Molecular genetic analysis reveals existing diversity among individuals as well as the ancestral background of the population. Such knowledge is particularly useful in guiding selection of desirable genotypes especially when pedigree information is lacking. We identified the highest dissimilarities between lines from Candia 9 06 OZ 770-4 and Candia 9 06 OZ 602 (27-29%) (Supplementary Information Table S9). Progeny from Candia 9 06 OZ 770-4 were also found to have the highest ginning efficiency which indicates that these lines have breeding potential. The least diversity was between 13 OZ 013 and 13 OZ 010 with 95% similarity. Within the cross combination groups, genetic difference was low with a mean of 14% dissimilarity for Candia combinations and 13% for both Carmen and N84-S combinations. AMOVA analysis revealed that nearly all of the variation in the RILs (98%) resided within the cross groups rather than among groups (2%) (PhiPT = 0.019, p = 0.001) (Supplementary Information Table S10).
Based on the evaluation of population structure, the best representative number of sub-groups (K) was inferred to be three for the population because the highest DK peaked at K = 3 (Figs. 2 and S3a-b). There was also a meaningful signal at K = 6 which was also distinguishable in the NJ tree supporting that three main clusters of the population could be further separated into six smaller groups. Similar subtle substructures were also observed in other clustering analyses (Zhao et al. 2014;Ersoz et al. 2007;Kuzay et al. 2020). However, the Q matrix for K3 was used for association analysis as it is recommended that the minimum K is used in order to include as many loci as possible. The individuals were assigned to the three sub-groups based on a membership probability cutoff value of 60% (Supplementary Information Table S11). Sub-groups 1, 2 and 3 contained 23 (14.6%), 31 (19.8%), and 39 (24.8%) individuals, respectively. Sixty-four (40.8%) individuals remained admixed with probability values less than 60%. Admixed individuals were not assigned clearly to a group as they possess several ancestral sources indicating that they are more genetically variable than the other lines. The best example of this was 13 OZ 103 (Candia 9 06 OZ 602) which was admixed and the most divergent line based on diversity analysis (Supplementary  Information Table S9). The results of both structure and diversity analysis were compared to judge their consistency (Fig. 1). This comparison showed that 31.8% of the population was similarly grouped in both clustering methods (Supplementary Information  Table S11). While this value is low, it is not unexpected given the fact that the two methods use different approaches to determine genetic relationships: the dendrogram was drawn using a distancebased method while population structure was determined with a Bayesian approach.

Linkage disequilibrium
Linkage disequilibrium (LD) refers to statistically non-random associations between pairs of loci in the genome. Genome-wide LD must be calculated because it determines the quality of association mapping and reveals the effects of genetic components (recombination, genetic migration, natural selection, etc.) on the population through evolutionary history (Shen et al. 2019;Silva-Junior and Grattapaglia 2015). In this study, 8.4% and 3.1% of SSR loci were in significant LD at p B 0.05 and p B 0.01 with average r 2 of 0.13 and 0.18, respectively. Approximately 6.4% of SSR loci remained in significant linkage at r 2 C 0.05 with an average r 2 of 0.16. In addition, 2.5% of loci were in LD with r 2 C 0.1 (average r 2 = 0.30). The extent of LD was low and similar to results of previous studies that used different panels of Upland cotton germplasm: 1.83% (p \ 0.001) of loci in 158 elite germplasm lines (Zhao et al. 2014), 2.93% (p \ 0.01) of loci in 81 accessions (Zhang et al. 2013), 9.4% (p B 0.05) of loci in 241 cotton lines (Qin et al. 2015) and 4% (r 2 C 0.05) of loci in 335 varieties (Abdurakmanov et al. 2009). Although LD values vary considerably with the different thresholds and germplasm used, an overall low level of LD was observed and is hypothesized to result from the high recombination rate in allopolyploid cotton (Brubaker et al. 1999). Because of this, it is important to include population structure (Q matrix) and kinship (K) in the association models for this crop (Qin et al. 2015). As a result, we corrected GLM with both Q and PC to remove spurious linkages in association mapping.
A reduced rate of recombination is expected between loci in significant LD due to their physical adjacency on the same chromosome. However, 6.6% of unlinked loci were in significant LD (p B 0.05) with an average r 2 of 0.35 which was higher than the average LD of linked loci (0.30). Similar findings were previously reported in cotton (Qin et al. 2015;Mei et al. 2013;Abdurakhamanov et al. 2008;Fang et al. 2013) indicating that not only physical linkage but also other factors produce significant LD in this species (Abdurrakmanv et al. 2009;Nordborg et al. 2002). These factors include selection for certain traits and the concomitant indirect selection of other loci through generations of breeding, use of common parents in the population and genetic bottlenecks (Abdurakmanov et al. 2009;Strich et al. 2005;Huttley et al. 1999). Molecular explanations for this phenomenon include the co-segregation of protein complex subunit genes which must be inherited together through generations in order to be functional in the cell (Miglani 2002). In addition, multi-stage pathways in which different genomic loci contribute to the same mechanism generate LD between unlinked loci in the genome.
LD decay is another useful parameter for association analysis. The distance for which the LD clearly decayed gives information about how the linkage between SSR loci was conserved over generations. LD decay is also used to deduce theoretically marker density and the fitness of the QTL detection method. In this study, LD decay was plotted based on genetic distance at a significance level of p B 0.05 and r 2 [ 0.03. It decayed around 60 cM at r 2 = 0.1 ( Supplementary Information Fig. S4). The tetraploid cotton genome is 5200 cM (Paterson and Smith, 1999) and, based on the extent of LD decay, 90-100 markers are sufficient for association mapping in this population. Therefore, the 102 markers used in this study to detect QTLs should be sufficient. The LD decay observed in our population was similar to that seen by Jia et al. (2014a, b) with 55-60 cM at r 2 = 0.1 (Fig. 2).

Association mapping
Both Q and PC-corrected GLM produced the highest proportion of significant p values (p1) among the seven models tested (Supplementary Information  Table S12). Therefore, the GLM (Q ? PC) method was applied to detect QTLs at a significance level of p B 0.005 and q \ 0.1. As a result, association analysis produced 11 SSR loci significantly associated with four traits: seed cotton yield, lint percentage, fiber length and fiber strength ( Table 3). PVE of individual SSR loci was relatively low ranging from 7 to 12%. This was expected due to the complex nature of quantitative traits which are under the control of multiple genes that have small effects on the trait phenotype and are highly sensitive to environmental factors (Mackay, 2009).
BNL2495 215 was the only SSR locus which was significantly associated with seed cotton yield (p = 1.8 9 10 -4 ). It had a positive effect, increasing seed cotton yield approximately 51 kg ha -1 and accounting for 9% PVE in this trait.
Five marker alleles were associated with lint percentage with PVE values ranging from 8 to 11%. BNL1495 236 was the most statistically significant locus (p = 6.3 9 10 -5 ) and had a positive allelic effect with the highest phenotypic variance. Two alleles of DPL322 marker, DPL322 227 and DPL322 218 , had positive and negative effects on lint percentage, respectively. This indicates that if this marker is used for selection, the allele with negative effects should also be considered (i.e. selected against) for improvement of lint percentage.
Three marker loci were linked to fiber length. BNL3545 181 was associated with this trait and had higher significance (p = 3.3 9 10 -5 ) and PVE (12%) than the other loci. The allelic loci of BNL3545 had opposite additive effects on fiber length: BNL3545 181 was associated with increased length while BNL3545 170 was associated with decreased length.
Two SSR loci were significantly associated with fiber strength with positive effects on the trait. BNL1495 242 and BNL3545 120 had 8% and 7% PVE.

Marker co-localization
Several QTLs were significantly associated with multiple traits in this study. This may indicate that a single gene/locus has a role in controlling more than one trait (pleiotropy) or that linked genes/loci are responsible for biological pathways inherited together in cotton. Such QTL hotspots harbor potential for deeper investigation to reveal complicated QTL networks and the mechanisms underlying traits .
Different alleles of three markers were associated with more than one trait: BNL1495, BNL3545 and JESPR274, each with three alleles (Table 3). BNL1495 was linked to fiber length with a negative effect and, to both lint percentage and fiber strength with positive effects. This is compatible with the finding of a negative correlation between lint percentage and fiber length in our study. This may indicate that there is unwanted linkage drag between lint percentage and fiber length which makes it challenging to increase fiber quality without sacrificing ginning yield.
Two alleles of BNL3545 were associated with fiber length (BNL3545 181 ) and strength (BNL3545 120 ) with positive marker effects indicating a pleiotropic interaction that was consistent with the positive correlation identified between them in this study. On the other hand, a third allele of BNL3545 had a negative allelic We compared our results with a reference genetic map (Fang and Yu 2012) to find significant QTLs in close proximity with our markers (less than approximately 10 cM) for the same trait or other fiber features. This is useful as it compiles the QTL results for multiple cotton traits which must be simultaneously improved by breeders (Supplementary Information Table S13).
BNL2495 marker loci (73.26 cM on chromosome 26) linked to seed cotton yield in our study were also detected for seed cotton yield by Yu et al. (2013) indicating this is a reliable QTL for fiber yield. Moreover, this region was also associated with fiber strength (Wang et al. 2015) and micronaire (Zhang et al. 2012). Liang et al. (2013) also identified a QTL for fiber uniformity at both BNL2495 and the nearby marker DPL070 (0.57 cM away).
BNL1495 which was linked to lint percentage, fiber length and strength in this study was similarly reported as linked to lint yield and percentage (Wu et al. 2009;Ma et al. 2019), fiber length and strength (Frelichowski et al. 2006;Liang et al. 2013;Abdullaev et al. 2017), as well as, fiber elongation (Park et al. 2005;Frelichowski et al. 2006). Thus, this genomic region houses a common, stable QTL cluster for fiber quality. BNL1495 was mapped on chromosome 13 (53.7 cM) and within 5.8 cM there is another QTL cluster for fiber strength and length harboring DPL0687 (55.6 cM) (Liang et al. 2013) and GH678 (59.2 cM, fiber strength only) (Yu et al. 2013).
JESPR274 on chromosome 9 was associated with lint percentage in this study. We previously identified JESPR274 as associated with plant structural features (Baytar et al. 2018b) as well as lint and seed cotton yield (Baytar et al. 2018a). This marker was also linked to lint index ) and fiber maturity (Qin et al. 2015). Together with two adjacent markers, DPL783 ) and MUSB958 (Yu et al. 2014), the region of JESPR274 is an informative QTL for lint percentage within an interval of 5.7 cM.
DPL322, associated with lint percentage in this study, was linked to lint percentage and lint yield in previous work (Baytar et al. 2018a) and also proposed as a useful marker for improvement of fiber traits (Saeed and Elçi 2017). DPL322 is on chromosome 15 (51.4 cM) where marker-fiber trait associations clustered within 3.1 cM: DPL003 was linked to seed cotton weight , MUSB1267 and TMB2931 linked to fiber fineness and short fiber content (Yu et al. 2014) and BNL2496 associated with seed cotton weight, seed index  and fiber uniformity .
BNL3545 (119 cM on chromosome 2), identified for fiber length and strength in our study, was also linked to seed cotton weight (Liu et al. 2018). Moreover, it mapped in relatively close proximity with BNL3972 (at 106.5 cM) and BNL1434 (at 96.4 cM) which were associated with fiber elongation ) indicating a QTL for fiber quality in this region. The intervals between BNL3545 and its neighboring markers related to fiber traits are more than 10 cM due to the insufficient saturation of this region of chromosome 2. However, this distance is within the range of significant LD level determined by this study. BNL3545 was also mapped on chromosome 14 (7.6 cM) which is homeologous to chromosome 02. This marker mapped close to C2-00,118 (16.4 cM) which was linked to fiber length and micronaire (Huang et al. 2018) and TMB1931 (11.49 cM) linked to 2.5 span length and micronaire (Shang et al. 2016). BNL3545 is also close to TMB0071 (6.3 cM) which is linked to micronaire and lint index (Shang et al. 2016). All of these loci contribute to a 9 cM interval for fiber quality traits.
The identification of common and stable QTLs is important for marker-mediated selection in breeding.
In this respect, our results showed that the study was successful in validating common markers.

QTL confirmation
The population was divided into subsets containing different combinations of favorable SSR marker alleles for each trait. The trait means of these subsets were compared to gain insight into how to use the alleles effectively in both positive and negative molecular selection for breeding of improved fiber quality.
For seed cotton yield, the individuals were divided into two groups based on presence or absence of SSR locus BNL2495 215 . ANOVA analysis indicated significantly higher yield in the group which had the allele linked to improved seed cotton yield (p = 0.021) (Fig. 3) (Supplementary Information Table S14A).
For lint percentage, the individuals were first divided into three groups based on the presence of positive alleles (BNL1495 236 and DPL322 227 ): groups 1 and 2 with one and two positive alleles, respectively, and the third group (0) with no positive alleles. The statistical analysis showed that there was a significant increase in lint percentage as the number of positive alleles increased (p = 0.001) (Fig. 4) (Supplementary  Information Table S14B). The individuals were then divided into four groups based on the absence of negative alleles (DOW050 205 , JESPR274 161 and DPL322 218 ): groups 1 and 2 lacked one and two adverse alleles (from any of the linked loci), respectively; while group 3 lacked all three adverse alleles. There was a statistically significant decrease in lint percentage as the number of adverse alleles increased (p = 0.0001) (Fig. 4) (Supplementary Information Table S14C) with no significant differences between alleles. In other words, each allele was equally effective at reducing lint yield. These results indicate that negative selection over each adverse allele had a positive impact on lint percentage, and that, eliminating all adverse alleles resulted in higher yields.
For fiber length, the population was separated into two groups. Individuals having the favorable allele (BNL3545 181 ) and lacking two adverse alleles (BNL3545 170 and BNL1495 231 ) were clustered in one group while all other allelic combinations remained in the second group. As a result, the first group had significantly longer fibers than the other group (p = 0.029) (Fig. 5) (Supplementary Information Table S14D). We also tested for significant differences when individuals with either presence of the positive allele or absence of any negative allele were clustered in the desired group with the remaining individuals making up the second group. In this case, the first group had significantly longer fiber than the other group (p \ 0.0001) indicating that positive and negative selection were equally effective (Fig. 5) (Supplementary Information Table S14E). In addition, negative selection of BNL3545 170 and BNL1495 231 gave similar results (p [ 0.05) indicating that selection over either of the two adverse allele resulted in significantly longer fiber length.
There were two favorable alleles for fiber strength (BNL1495 242 and BNL3545 120 ). Therefore, individuals that had both alleles were combined and the remaining individuals formed the second group. Expectedly, the first group had statistically significant stronger fibers than the second one (p = 0.004) (Fig. 6) (Supplementary Information Table S14F). In addition, we grouped the individuals that had any two favorable alleles (BNL1495 242 or BNL3545 120 ) while the rest of the individuals were assigned to the opposing group. Similarly, there was significantly greater fiber strength in the first group (p = 0.003) (Fig. 6)

Conclusion
Conventional breeding has encountered many challenges in the development of elite varieties such as inbreeding depression, unpredictable interaction of genotype versus environment and the complex nature of quantitative traits that do not follow the laws of Mendelian genetics. Breeding is also time-consuming and laborious work. Molecular genetic advances promise an additional tool that can be leverage along with traditional breeding methods. Molecular markerbased strategies save time and workload and help achieve success in breeding.
In this study, we clearly detected the depression of genetic variability that has resulted from inbreeding of modern cotton genotypes indicating that genetic diversity must be urgently broadened. Our threeparent RIL population contained individuals with promise for breeding of fiber quality traits which are highly complex features that are affected by indirect interactions between genetic components, pleiotropy and the environment. It is not feasible, nor necessarily desirable, to reveal all minor effect-genes for fiber traits. Thus, we identified a manageable number of reliable and stable QTL for fiber quality traits and confirmed the significance of positive and negative selection based on these QTLs. We also revealed important QTL hotspots for fiber related traits which can be useful in breeding. Based on this work, we recommend that: I. fiber quality traits are improved simultaneously rather than individually due to the effects of negative linkage drag; II. molecular markers are integrated into classical breeding programs to direct improvement; and III. negative selection can be as important as positive selection in marker-assisted breeding.
Authors' contribution AAB: molecular characterization, data analysis, interpretation of data, manuscript drafting and revision; CP, VS: field experiments; AF: conception and design, interpretation of data, manuscript revision; SD: conception and design, manuscript revision; All: final approval of the version to be published.
Funding This research was funded by grants from The Scientific and Technological Research Council of Turkey to Sami Doganlar (TÜ BİTAK 119O677) and Ceng Peynircioglu (TÜ BİTAK TEYDEP 306O450).
Data availability Data are available upon request.

Declarations
Conflict of interest The authors declare that they have no conflict of interest or competing interests.
Ethical approval The authors declare that they have followed all necessary ethical guidelines. Human and animal subjects were not used in this work. Fiber strength (kN m kg -1 ) b a * * Fig. 6 a Significantly stronger fibers in Group 1 which contained individuals with two favorable alleles for fiber strength (BNL1495 242 and BNL3545 120 ) compared to the rest of the population (Group 0). b Significantly stronger fibers in Group 1 which contained individuals that had any two favorable alleles (BNL1495 242 or BNL3545 120 ) as compared to the rest of the individuals (Group 0