Phenotype analysis of cottonseed oil content traits
The descriptive statistics of phenotypic data for cottonseed oil content from the parents and the RIL populations are summarized in Table 1. There is a significant difference between the average cottonseed oil content of parents 0-153 and sGK9708 (P < 0.01), which ranged from 28.09% to 31.64% (mean ± SD, 30.57% ± 1.27), and 31.67% to 35.74% (mean ± SD, 32.86% ± 1.71), respectively. While the average cottonseed oil content of the RIL population increased from 25.09% to 28.49%. Through comparison of the mean values of cottonseed oil content, we observed a relatively great variation between the parents and the RIL population, and the mean values of oil content for RIL population existed transgressive segregations lower than the low parent. Variability in the RIL population according to the standard deviation values was slight. The phenotypic data of seed oil content trait in the RIL population showed continuous distribution, and the absolute values of skewness were all less than one, which demonstrates that the segregations of cottonseed oil content trait fit a normal distribution. Additionally, the ANOVA indicated that the genotype, environment, repeat, and genotype × environment / environment × repeat interaction had significant effects on seed oil content (P < 0.01), and the estimated h2 was 91.61% (Additional file 1: Table S1). These results suggested that the seed oil content is highly inherited in upland cotton, although environmental factor should be considered in oil accumulation.
Mapping QTLs for cottonseed oil content traits across multiple environments
Detection of main- and small-effect quantitative trait loci (QTLs) for cottonseed oil content Based on the consensus genetic map of G. hirsutum L. in our previous studies (8295 markers spanning a total distance of 5197.17 cM) and the phenotype data of cottonseed oil content across eight environments, a total of 39 QTLs located on eighteen different chromosomes were identified in the upland cotton 0-153 × sGK9708 RIL population by CIM method in WinQTLCart2.5 and GCIM method in QTL.gCIMapping. The LOD values of these QTLs ranged from 3.02 to 5.96, and the individual phenotypic variance (r2) ranged from 4.43 to 14.15 (%) (Additional file 2: Table S2). Of these, five QTLs were stably detected in at least three environments (Table 2). And these stable QTLs were mainly distributed on upland cotton chromosomes At11, At12 and Dt4, Dt9, and Dt11. Intriguingly, we found that qOC-chr17-3 is stable in four environments, while other four stable QTLs are in three environments. Moreover, the additive effects of QTLs qOC-chr12-1 and qOC-chr22-1 detected by GCIM method were relatively smaller than that of other QTLs identified by CIM method.
Detection of QTL-by-environment interactions for cottonseed oil content The above datasets in QTL mapping were also used to detect QTN-by-environment interactions (QEI) effects using mix model-based composite interval mapping (MCIM) module in QTLNetwork 2.1 software. Among the stable QTLs, a total of 3 QTLs showed significant Q × E interactions (P < 0.05), of which qOC-chr11-1 at 15ale (QEI = -0.002, P < 0.05), 14ay (QEI = -0.026, P < 0.05), and 15shz (QEI = -0.197, P < 0.05), qOC-chr12-1 at 14ale (QEI = 0.0346, P < 0.05), 14kel (QEI = -0.0361, P < 0.05), and 15kel (QEI = 0.044, P < 0.05), and qOC-chr17-3 at 15kel (QEI = -0.080, P < 0.05), 14shz (QEI = 0.074, P < 0.05), 15ay (QEI = -0.119, P < 0.05) and 14ale (QEI = -0.092, P < 0.05). qOC-chr12-1 and qOC-chr17-3 had antagonistic pleiotropic effects in different environments (Additional file 2: Table S2). Moreover, the locus qOC-chr17-3 was found to be significantly associated with cottonseed oil content in four different environments (P < 0.05).
Identification of candidate genes related to cottonseed lipid synthesis
To further mine candidate genes associated with cottonseed oil accumulation, we adopted an integrated method that combines the orthologs sequence alignment with the Arabidopsis oil-related genes and KEGG pathway enrichment analysis for those candidate genes within stable QTL regions. A total of 832 genes were acquired within the confidence interval of these stable QTLs (Additional file 3: Table S3). A KEGG pathway enrichment analysis was subsequently carried out to obtain the top 20 KEGG pathways of those genes (Fig. 1a). Of these, six enriched pathways with 43 candidate genes were found to be associated with oil synthesis. These enriched pathways included “biosynthesis of amino acids”, “valine, leucine and isoleucine biosynthesis”, “glycine, serine and threonine metabolism”, “biosynthesis of unsaturated fatty acids”, “fatty acid metabolism”, and “fatty acid biosynthesis” (Fig. 1a; Additional file 4: Table S4). The network relationships analysis of enriched pathway terms completed with metascape (http://metascape.org/gp/index.html) also showed that above-mentioned six pathways have interconnectedness, where terms with a similarity > 0.3 (Fig. 1b).
To date more than 700 Arabidopsis acyl-lipid metabolism genes have been detected, of which 135 are directly involved in the processes of de novo FA synthesis, TAG synthesis and lipid droplet formation. We further compared the 43 potential oil candidate genes of cottonseed with the 135 Arabidopsis oil-related genes using BLAST tool to validate these common significant potential candidate genes associated with oil content in upland cotton from a different perspective. The results showed that there are 24 potential candidate genes that are involved in cottonseed oil accumulation based on Arabidopsis orthologous oil-related genes identified previously (Additional file 5: Table S5).
Dynamic expression patterns of candidate genes in developing cottonseed ovules
To investigate the dynamic expression patterns of 24 candidate lipid genes in developing cottonseed ovules, we analyzed the genes expression level at ten different ovules developmental stages (–3, –1, 0, 1, 3, 5, 10, 20, 30 and 35 DPA) of TM-1 using transcriptome sequencing data (Additional file 6: Table S6; Fig. 2). Among them, 5 genes were not expressed during cottonseed ovules developmental stages. The gene expression levels heatmap of other 19 potential candidate genes showed that they were clustered into three branches based on gene expression profiles after Z-score standard normalization (Fig. 2). It was noted that the gene expression levels in branches with red dot were relatively lower during early stage (–3 DPA to 1 DPA), while higher at middle stage (3 DPA to 10 DPA). Surprisingly, most genes showed a sharp decline in expression at mature stage (30DPA to 35 DPA) (Fig. 2).
Transcription factors and microRNAs regulatory network construction of oil candidate genes
Determining gene partners by gene network is an essential step toward understanding protein-coding genes function and identifying relevant biological pathways. To clarify the regulatory networks of potential candidate genes in upland cotton, regulatory associations between microRNAs (miRNAs) and their target genes, transcription factors (TFs) and their host genes, as well as TFs and miRNAs were macroscopically investigated. We directly acquired TFs and miRNAs of G. hirsutum L. from PlantTFDB v5.0 (http://plantregmap.cbi.pku.edu.cn/) and miRBase v22.0 (http://www.mirbase.org/), respectively. Together with the results of candidate oil genes mentioned above, total four cotton TFs are obtained, namely GhLEC1 (Gh_A11G1560), GhHSL1 (Gh_A12G0518 and Gh_D04G0979), GhPKL (Gh_D04G1092), and GhABI4 (Gh_D04G1143) (Additional file 7: Table S7). In addition to cleaving mRNA, plant miRNA reportedly inhibits the translation of target genes. The miRNAs target genes among 19 cotton lipid candidate genes were predicted using psRNATarget, a plant small RNA target analysis server (http://plantgrn.noble.org/psRNATarget/) (Additional file 8: Table S8). A regulatory network of miRNAs and its target genes that defined the regulation in the complex process of lipid synthesis was constructed based on those results, which showed the interactions between seventeen miRNAs and seven target genes, which including one TFs (GhHSL1, Gh_A12G0518) (Fig. 3, Additional file 8: Table S8). There are 6 hub nodes with connectivity no less than 3 in the regulatory network, of which 2 miRNAs (ghr-miR2949b and ghr-miR2949c) and 4 candidate target genes (GhACP2, Gh_D11G2438; GhLPEAT2, Gh_D04G1019; GhCICDH, Gh_A11G1562; and GhHSL1, Gh_A12G0518) (Fig. 3). Of those, GhHSL1 was targeted by nine microRNAs with translation or cleavage mRNA. As a putative transcription factor, it is also involved in protein phosphorylation and ultimately having high expression level during early-middle stages of developing cotton ovules. Gh_D11G2438 encodes ACP2, a member of acyl carrier protein family, which is involved in fatty acid synthetic process. Gene Gh_D04G1019 encodes acyl-CoA:lysophosphatidylethanolamine acyltransferase 2 (LPEAT2), a member of the LPLAT family implicated in very long-chain fatty acid metabolic process. Gene Gh_A11G1562 encodes cytosolic NADP+-dependent isocitrate dehydrogenase (CICDH), which involves in carbon metabolism. The genes GhACP2, GhLPEAT2 and GhHSL1 were directly target of ghr-miR2949b and ghr-miR2949c. Meanwhile, GhHSL1 was targeted by additional seven miRNAs at the same time (Fig. 3). The expression level of GhHSL1 was high (~ 6.99) during the early-middle stages in developing cottonseed ovules, while during the late stage come down (~ 1.69) (Fig. 2). The results indicated that transcription factor GhHSL1 is specific to lipid synthesis in developing cottonseed ovules.