Phenotypic diversity analysis
In order to evaluate the phenotypic variations of salt tolerance in the GWAS population with 217 upland cotton cultivars (Additional file 1: Table S1), three traits related to salt tolerance including relative plant height (RPH), relative shoot fresh matter weight (RSFW) and relative shoot dry matter weight (RSDW) were determined. The mean values, ranges, standard deviations (SD), coefficients of variation (CV) and broad-sense heritability (H2) for these salt tolerance related traits are shown in Table 1. Great differences of the CV in both years were found for the three traits. Overall, the upland cotton cultivars in this GWAS panel clearly exhibited considerable natural variations in the three traits to salt tolerance and displayed very high genetic diversity.
Genetic diversity analysis
GBS produced 53,321 high-quality polymorphic single nucleotide polymorphisms (SNPs) among the 217 upland cotton cultivars, with 47,133 SNPs located in the intergenic intervals and 6,188 SNPs in the coding regions. Of these, 95.8% of the loci (51,060 SNPs) were mapped onto 26 chromosomes of the cotton genome, and were selected for the GWAS analysis (Additional file 2: Table S2). These SNPs were not evenly distributed (Additional file 3: Fig.S1) with an average of 1,964 SNPs per chromosome ranging from 863 to 5,209. Chromosome A08 had the most SNPs (5,209) with the highest SNP density (199 kb/SNP). D04 had the least SNPs (863), but A02 had the lowest SNP density with 680 kb/SNP. The genetic diversity of this population varied from 0.224 (A08) to 0.389 (A13). The polymorphism information content (PIC) varied from 0.192 (A08) to 0.305 (A13) (Table 2). The above results indicated a relatively large span in genetic diversity index and PIC in the cotton genome.
Population structure and LD analysis
It is important for GWAS analysis to control the effect of population structure, because population stratification could eliminate spurious associations between genotypes and phenotypes [26, 27]. STRUCTURE software was used to calculate the Bayesian clustering from K=1 to 10 for five repetitions. LnP (D) value continued to increase from K=1 to K=10 without a significant inflection point (Fig. 1a). However, there was an obvious spike at the value of ΔK= 3 (Fig. 1b), suggesting that the population could be divided into 3 subgroups (Fig. 1c). Taking the corresponding Q matrix at k = 3 as the covariate could reasonably eliminate spurious association effects and improve the GWAS accuracy.
The LD distribution among chromosomes of the 217 upland cotton cultivars was shown in Table 2. The LD decay distance was 869 kb and 1,120 kb when the r2 dropped to 0.2 and 0.1, respectively. The LD decay distance was not evenly distributed among chromosomes, ranged from 435 kb (D01) to 1,157 kb (A06) when r2 was set at 0.2 and ranged from 712 kb (D02) to 1,377 kb (A06) when r2 was set at 0.1, the overall LD decay in the At subgenome was significantly higher than that in the Dt subgenome.
Association analysis
In order to explore the genetic factors underlying salt tolerance, the mixed linear models (MLMs) were performed by simultaneously accounting for population structure and relative kinship matrix to conduct a GWAS. A total of 25 significant associations (-log10p > 4) with 27 significant SNPs located on chromosomes A05, A07, A08, A09, A10, A11, A12, A13, D02, D03, D06, and D09 were detected for the three salt tolerance related traits in the 2016 and 2017 dataset. Eleven associations with 12 SNPs were detected in 2016 (Fig. 2) and nine associations with 9 SNPs were detected in 2017 (Fig. 3). In addition, five associations with 6 SNPs were detected in both 2016 and 2017 dataset. The phenotypic variance explained (PVE) by individual QTL ranged from 1.29% to 7.00% (Table 3).
For RPH, nine significant associations with 10 SNPs were identified on chromosomes A07, A09, A10, A12, A13, D02, D03, and D08. The PVE ranged from 1.50% to 7.00%. Moreover, the association with one SNP on chromosome A13 and the association with two SNPs on chromosome D08 were detected in both years.
For RSDW, eight significant associations with 9 SNPs on A05, A08, A13, and D08 were detected. The PVE ranged from 1.29% to 4.85%. The association with one SNP on chromosome A08 and the association with one SNP on chromosome A13 were detected in both years.
For RSFW, eight associations with 8 significant SNPs on A07, A08, A10, A11 and D06 were detected in 2016 and 2017. The PVE ranged from 2.35% to 5.25%. The association with three SNPs on chromosome A07 was detected in both years.
Pleiotropy was also found in our GWAS results. For example, the significant SNP A07_90682411 was simultaneously detected to be associated with RPH in 2017 and RSFW in both environments. The significant SNP A10_84786908 was simultaneously detected to be associated with RPH in 2016 and RSFW in 2016. The significant SNP D08_49014753 and D08_49080865 simultaneously detected to be associated with RPH in 2016, 2017, and RSDW in 2016. In addition, the associations on A08 for RSFW and RSDW in 2016 were only 70 kb apart. Their confidence interval overlapped with each other.
Identification and preliminary function verification of candidate genes
In general, the LD decay could be used as confidence interval to identify candidate genes. Because the cotton genome has a large LD decay [28, 29], we extracted potential candidate genes within 100 kb of flanking significant markers on the basis of the published upland cotton genome sequencing database [30]. A total of 156 genes were identified in these intervals (Additional file 4: Table S3). Gene ontology (GO) enrichment analysis indicated that “dioxygenase activity”, “oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen” and “oxidoreductase activity, acting on single donors with incorporation of molecular oxygen” were significantly enriched using an false discovery rates (FDR) adjusted P-value of ≤0.05 as the cutoff. Of the 156 genes, 12 were differentially expressed between salt tolerant variety Miscott7913-83 and salt sensitive variety Su 12 according to the previous transcriptome sequencing results [31] (Additional file 4: Table S3). Some of these genes may be associated with salt stress, such as GH_A08G0488 and GH_A10G1620 encoding protein kinase. Protein kinase has been proved to play an important role in salt tolerance in cotton [32]. Another gene GH_A13G0171, which encodes aquaporins (AQPs), was also likely to regulate the salt stress response [33]. The confidence interval contains GH_A13G0171 was simultaneously detected in both years. We found that the salt tolerance of upland cotton cultivars with the G haplotype was significantly higher than that of cultivars with the C haplotype in both years upon a t test (Fig. 4).
The three promising genes (GH_A08G0488, GH_A10G1620 and GH_A13G0171) were selected for preliminary function verification of salt tolerance in cotton. Analysis of gene expression patterns could provide important clues for gene function determination. A quantitative real-time PCR (qRT-PCR) was performed to analyze the expression levels of GH_A08G0488, GH_A10G1620 and GH_A13G0171 in roots and leaves under salt stress treatment in salt tolerant variety Miscott7913-83 and salt sensitive variety Su 12. As shown in Fig. 5, the three genes were induced by salt stress and displayed distinct expression patterns in response to salt stress in salt tolerant variety Miscott7913-83. The three genes had a much higher expression level in roots than in leaves. The gene GH_A13G0171 exhibited a significantly down-regulated expression in both root and leaf tissues after salt stress. The gene GH_A08G0488 exhibited a significantly up-regulated expression in both root and leaf tissues. The expression level of GH_A10G1620 showed an increase in leaf and no significant changes in the root tissues. As shown in Fig. 6, the three genes were also displayed distinct expression patterns in response to salt stress in salt sensitive variety Su 12. The gene GH_A13G0171 exhibited an identical expression pattern in Miscott7913-83 and Su 12. The expression levels of GH_A10G1620 and GH_A08G0488 were not significant different.
To confirm the functional roles of GH_A08G0488, GH_A10G1620 and GH_A13G0171 genes under salt stress, virus-induced gene silencing (VIGS) assay was used to repress expression of these genes in salt tolerant variety Miscott7913-83 plants. The inoculated seedlings were grown in three light incubators at 23°C under a 16-h light and 8-h dark cycle as three biological replicates. At the developmental period when two leaves had formed, the pTRV2 :: GH_A08G0488, pTRV2 :: GH_A10G1620, pTRV2 :: GH_A13G0171 and pTRV2 :: 00 inoculated plants were treated with 350 mM NaCl. After 15 days, the plant height, fresh and dry shoot matter weight were determined and the corresponding relative values were calculated. The transcripts of the three genes in the VIGS leaves were significantly reduced compared to the negative control pTRV2 :: 00 inoculated plants, indicating that they were effectively silenced (Additional file 5: Fig.S2, Fig. 7a). Compared with the control pTRV2 :: 00, no significance effect on their phenotypes was observed in pTRV2 :: GH_A08G0488 and pTRV2 :: GH_A10G1620 inoculated plants, and the plant height, fresh and dry shoot matter weigh of GH_A13G0171-silenced plants were significantly higher than that of pTRV2 :: 00 inoculated plants (Table 4, Fig. 7b), indicating that GH_A13G0171 could reduce seedling tolerance to salt stress. The corresponding markers SNP_A13_1869056 could be applied for improvement of cotton salt tolerance.