1. Phenotypic variation in the natural population
Two traits related to plant architecture and early maturity, the FFBN and HFFBN, were investigated in the natural population. We observed that the FFBN varied from 4.57 to 8.90 cm in E1, 3.56 to 9.89 cm in E2, and 3.49 to 6.33 cm in E3, and the HFFBN ranged from 11.20 to 29.30 cm in E1, 11.11 to 26.00 cm in E2, and 10.22 to 26.00 cm in E3, respectively. The data showed that the two target traits exhibited high phenotypic variation among the 315 lines with a normal distribution (Fig. 1). Correlation analysis revealed significant positive correlations between FFBN and HFFBN in all three environments (Fig. 1). The h2 values of FFBN and HFFBN were 78.73% and 81.64%, respectively, suggesting high heritability of the target traits and precision of the experimental results. The ANOVA results showed that FFBN and HFFBN were significantly influenced by the genotype, environment, and interaction between them (Table S4). These results indicate that FFBN and HFFBN are controlled by heredity, the environment, and the interaction of the two, and that the first plays a dominant role.
2. Marker‑trait associations using the RTM-GWAS
In our previous research, a total of 13,391 high-quality SNP loci were detected in the association panel, and 9,244 SNPLDB loci with multiple haplotypes/alleles were organized as molecular markers (Su et al., 2020a). In the present study, to identify the significant SNPLDB loci related to FFBN and HFFBN, RTM-GWAS analyses were performed using SNPLDB markers constructed previously (Su et al., 2020a). According to the single-environment phenotypic values of FFBN and HFFBN, 28, 37 and 23 significant associations with FFBN were identified in E1, E2 and E3, respectively (Fig. S1), and 51, 37 and 21 significant associations with HFFBN were identified in E1, E2 and E3, respectively (Fig. S2). Moreover, 52 and 54 significant associations with FFBN and HFFBN, respectively, were detected using the total phenotypic values of the two target traits (Fig. 2, Table S5). Among these associations with the total phenotypic values of FFBN and HFFBN, eight associations involving seven SNPLDB loci with high -log10(P) values were found in one or two single-environment associations (Fig. 2, Table 1). Among the seven SNPLDB loci, one locus (LDB_16_37952328) was concurrently associated with FFBN and HFFBN in two different single environments (E1 and E2) and was distinctly labeled with the numbers 1 and 7, respectively (Fig. 2, Table 1). Three SNPLDB loci (LDB_13_75038211_75038228, LDB_12_103278546 and LDB_5_15144433) associated with FFBN were detected in one environment (E1) and were marked with numbers 2, 3 and 4, respectively (Fig. 2a, Table S2). In addition, three SNPLDB loci (LDB_11_112414092_112414298, LDB_18_37434283 and LDB_26_48647858) for the HFFBN were also identified in one environment (E1) and were labeled with numbers 5, 6 and 8, respectively (Fig. 2b, Table 1). These results imply that the seven SNPLDBs involved in the eight associations labeled 1–8 should have stable major associations with FFBN and HFFBN.
3. Effects and matrices of the SNPLDB alleles with significant associations with FFBN and HFFBN
To investigate the allele effects of the significant associations with FFBN and HFFBN, the effect values of each SNPLDB allele for all the significant associations were determined though the RTM-GWAS procedure. Of the 52 significant associations with the FFBN, 38 and 14 single- and multi-locus associations were detected, respectively. Furthermore, 171 SNPLDB alleles, including 87 negative and 84 positive alleles, were identified for the 52 significant associations. The negative allele effect values varied from -0.84 to -0.0045, and the positive allele effect values ranged from 0.0054 to 1.37 (Fig. 3a). Among the 54 significant associations with HFFBN, 40 and 14 contained, respectively, one and multiple loci. Moreover, 170 SNPLDB alleles, including 84 negative and 86 positive alleles, were detected for the 54 significant associations with HFFBN. The allelic effect values of the 84 negative alleles ranged from −1.39 to −0.031 cm, and the effect values of the 86 positive alleles varied from 0.0094 to 4.30 cm (Fig. 3b). Additionally, we found that not all eight labeled associations with larger –log10(P) values were loci with larger heredity effects.
Furthermore, to clearly display the allele effects of all the significant associations, we constructed diminutive allele effect matrices in the natural population. The allele effects of the 52 significant SNPLDB loci related to the FFBN were subsequently built into a 52 × 315 (locus × line) matrix for the 315 upland cotton lines (Fig. 3c). Similarly, the allele effects of the 54 significant SNPLDB loci associated with the HFFBN were organized into a 54 × 315 (locus × line) matrix in the natural population (Fig. 3d). The two SNPLDB allele matrices for FFBN and HFFBN from RTM-GWAS, which are abbreviated forms of hereditary constitution, provide comprehensive information, including positive and negative allele effects of the significant SNPLDB loci for each germplasm.
4. Mining of promising haplotypes/alleles with stable associations with FFBN and HFFBN
To identify promising haplotypes/alleles of FFBN and HFFBN, we selected seven SNPLDB loci involved in eight stable associations with the two target traits (Table 1) for further study. Among the eight stable associations with FFBN and HFFBN, one SNPLDB locus, LDB_16_37952328, which was concurrently related to FFBN and HFFBN, had three allelic variations (CC, CT and TT), and the FFBN and HFFBN of the lines with a positive allele (CC) were significantly greater than those of the lines with a negative allele (TT). In the natural population, for the SNPLDB locus LDB_5_15144433 associated with FFBN, the AA allele accessions had significantly greater FFBN than did the CC allele accessions (Fig. 4). LDB_11_112414092_112414298 associated with HFFBN produced five haplotypes (AACC, AACT, AATT, ACCC and CCCC), and the AACC haplotype lines had significantly larger HFFBN than did the AATT haplotype accessions. There were three alleles (CC, CT and TT) related to HFFBN in LDB_26_48647858, and the CC allele accessions had significantly higher HFFBN than did the AA allele accessions. For the other three associated loci (LDB_12_103278546, LDB_13_75038211_75038228, and LDB_18_37434283), however, there were no significant differences in the FFBN or HFFBN among the lines with different alleles (Fig. 4), implying that the three SNPLDBs might not be truly associated with the FFBN or HFFBN. Keeping mechanized harvesting of cotton cultivars with relatively larger FFBN and HFFBN in mind, the alleles with larger HFFBN were identified as promising haplotypes/alleles. In summary, we identified four promising haplotypes/alleles, namely, LDB_5_15144433 (AA), LDB_11_112414092_112414298 (AACC), LDB_16_37952328 (CC) and LDB_26_48647858 (CC), for five stable associations with FFBN and HFFBN. The results suggest that the genomic regions containing the four promising haplotypes/alleles of the five stable and true associations might be major QTLs regulating the FFBN or HFFBN in cotton.
5. Distribution frequency of promising haplotypes/alleles for the five stable and true major associations with FFBN and HFFBN in cotton accessions
To check the promising haplotype/allele effects of the five stable and true major associations with FFBN and HFFBN, we selected the 50 lines with the maximum and minimum phenotypic values, respectively, to determine the cumulative effect of promising alleles on the target traits. The promising haplotype/allele frequencies in the 50 lines with maximum phenotypic values were markedly greater than those in the 50 lines with minimum phenotypic values for the five stable and true major associations with FFBN and HFFBN (Fig. 5a, b). Moreover, we performed ANOVA for the FFBN and HFFBN values among lines from five diverse cultivation regions (YRR, YZRR, NIR, NSER and FR), and there was a striking difference in the FFBN and HFFBN among these regions (Fig. 5c, d). The mean FFBN values from highest to lowest were YZRR, FR, YRR, NIR and NSER. The ANOVA results showed that the FFBN values of the NSER lines were significantly lower than those of the FR, YZRR and YRR lines, and there was no significant difference in the FFBN between the NIR and NSER lines. We also found that the HFFBN significantly differed among the five diverse cultivated areas (Fig. 5d). The HFFBN of the NIR lines was significantly higher than that of the lines from the YZRR and YRR, and there were no significant differences in the HFFBN among the NIR, YZRR and FR lines. The results demonstrated that the NSER lines had relatively lower FFBN and HFFBN than did the YZRR, YRR and FR lines. Interestingly, we observed that the NIR lines had the highest HFFBN values, while they had lower FFBN values, indicating that the NIR lines had long internodes on the cotton plants. To investigate the regional distribution of promising haplotypes/alleles, we compared the frequency distributions of the four stable and true major SNPLDB loci among cotton lines from diverse cultivated areas. For the three major SNPLDB loci, except for LDB_5_15144433, the YZRR and NIR accessions had higher frequencies of promising haplotypes/alleles, and the YRR and NSER accessions had lower frequencies of promising haplotypes/alleles (Fig. 5e). These results imply that promising haplotypes/alleles might experience artificial selection during cotton breeding procedures.
6. Integration of MQTLs and the SNPLDB loci associated with FFBN and HFFBN
The 27 previous studies, published mainly from 2010 to 2020, that included QTL mapping of FFBN and HFFBN in cotton were systematically analyzed, and 167 and 107 original FFBN and HFFBN QTLs, accounting for 61% and 39%, respectively, were identified among the original QTLs (Table S6). The original QTL distribution was uneven across the26 chromosomes pairs of upland cotton, with approximately 39.05% (107/274) and 60.95% (167/274) on the At and Dt subgenomes, respectively. On chromosomes A11, D01, D03 and D08, 17, 19, 55 and 19 original QTLs were distributed, respectively, while on chromosomes D04 and D05, only two and three original QTLs were found, respectively. To merge the original QTLs from the 27 previous studies, an upgraded consensus map was established via 16 individual genetic maps from the 27 previous papers and the reference genetic map, which contained 5,044 SSR markers with an overall length of 3,830.90 cM and a mean length of 147.35 cM for each chromosome. Then, of the 274 original QTLs, 203 were projected onto the consensus map. The meta-analysis results showed that the 203 mapped QTLs constituted 40 MQTL regions (19.70%), and each MQTL contained at least two mapped QTLs. The MQTLs were unevenly dispersed across different upland cotton chromosomes. The four most common MQTLs were identified on chromosome D01, while no MQTLs were identified on chromosomes A06, A12, D04, D05, D06 or D13 (Fig. 6). Of the 40 MQTLs, 22 were composed of three or more mapped QTLs, and three MQTLs, namely, MQTL-D03.1, MQTL-D03.2 and MQTL-D08.2, contained seven, 17 and six mapped QTLs, respectively.
To integrate the MQTL information with the significant SNPLDBs related to FFBN and HFFBN, we compared the 40 MQTLs with the 52 and 54 SNPLDB loci for the two target traits. The results showed that 11 and 15 SNPLDB loci associated with FFBN and HFFBN, respectively, were distributed among the MQTLs (Fig. 6). Among the five stable and true major SNPLDB associations, two loci (LDB_16_37952328 and LDB_5_15144433), which were involved in three (FFBN-1, FFBN-4 and HFFBN-3), were distributed among the two MQTLs.
7. Identification of potential candidate genes forFFBN and HFFBN
To identify candidate genes related to FFBN and HFFBN, we focused on two stable and true major loci (LDB_5_15144433 and LDB_16_37952328) that were distributed among the MQTLs. Referring to the linkage disequilibrium decay distance (500 kb) of the SNPs (Su et al., 2020a, b), we selected two genome segments, D03:37.45–38.45 Mb and A05:14.64–15.64 Mb, to mine potential candidate genes for FFBN and HFFBN. Based on the reference genome v2.1 (Hu et al., 2019), 142 genes (Table S7) were annotated in the two genomic regions. Then, we analyzed the expression levels of the 142 genes using previously published transcriptome data (Cheng et al., 2021). The data were obtained from two upland cotton cultivars, including one lower-FFBN (5.31) and -HFFBN (13.20 cm) cultivar, ZMS50, and one higher-FFBN (6.69) and -HFFBN (19.69 cm) cultivar, GXM11 (Fig. 7a, b). Using standardized gene FPKM values from the RNA-Seq data, we detected 89 of the 142 annotated genes that were highly expressed in the shoot apical meristems and the youngest leaves during five growth periods. The heatmaps showed that four genes, namely, GH_A05G1602 (GhCIT), GH_A05G1606 (GhE6), GH_A05G1609 (GhDRM1) and GH_A05G1635 (GhGES), were highly differentially expressed in the cotyledon and 1–4-true-leaf stages between ZMS50 and GXM11 (Fig. 7c, d). To validate the four differentially expressed genes (DEGs), we conducted qRT‐PCR experiments on shoot apical meristems and the youngest leaves of the third-true-leaf stage between two low-FFBN and -HFFBN cultivars (ZMS50 and ZMS74) and two high-FFBN and HFFBN accessions (GXM11 and STS458). We found that four genes, GhCIT, GhE6, GhDRM1 and GhGES, were significantly differentially expressed between the shoot apical meristems and the youngest leaves of the third-true-leaf stage between the small and large FFBN and HFFBN accessions (Fig. 7e). The results indicated four candidate genes, GhCIT, GhE6, GhDRM1 and GhGES, for cotton FFBN and HFFBN.
8. Functional analysis of the candidate genes for FFBN and HFFBN in upland cotton
To confirm the biological function of the candidate genes for FFBN and HFFBN, we selected three highly significant DEGs (GhE6, GhDRM1 and GhGES) for the VIGS experiments. Eight days after virus injection, the TRV:GhCLA1 plants became albino (Fig. S3a), confirming that the VIGS assay was successful. Moreover, the expression levels of the three candidate genes were effectively lower in most of the gene-silenced plants than in the TRV:00 plants (Fig. S3b–d). To investigate the relationship between the gene expression levels and the target characteristics, we selected the gene-silenced plants in which the expression levels of the three candidate genes were half or less than those in the TRV:00 plants. Compared with the TRV:00 plants, the TRV:GhDRM1 and TRV:GhGES plants exhibited late-flowering phenotypes (Figure 8a), and the TRV:GhE6 plants exhibited a high HFFBN (Figure 8b, c). The BT and FT of the TRV:GhDRM1 and TRV:GhGES plants were significantly earlier than those of the TRV:00 plants, and the HFFBN and PH of the TRV:GhE6 plants were significantly higher than those of the TRV:00 plants. (Figure 8d). The results demonstrated that GhE6 is a key gene related to HFFBN and PH and that GhDRM1 and GhGES are important genes associated with early flowering in cotton.