Identication and analysis of oil candidate genes revealed the molecular basis of oil accumulation in Gossypium hirsutum L. for cottonseed biofuels development

In this study, a recombinant inbred line (RIL) population, developed from a cross between upland cotton cultivars 0-153 and sGK9708, was used to detect quantitative trait loci (QTLs) associated with cottonseed oil content and further identify candidate factors regulating cottonseed lipid synthesis. A total of 39 QTLs located on eighteen different chromosomes were identied across eight different environments. Of these, ve QTLs were stable in at least three environments. By integrating candidate gene approach and physical mapping data, we preliminary obtained 43 candidate genes potentially involved in carbon metabolism, fatty acid (FA) synthesis and transcription, and triacylglycerol (TAG) synthesis around the stable QTLs. KEGG pathway enrichment analysis and local BLAST with Arabidopsis oil-related genes, as well as transcriptome analysis further showed that 19 candidate genes, expressed during developing cottonseed ovules, could inuence cottonseed oil accumulation. Transcription factors (TFs) and microRNAs (miRNAs) regulatory network analyses suggested six genes, two core miRNAs (ghr-miR2949b and ghr-miR2949c), and one TFs GhHSL1 were considered to be closely associated with cottonseed lipid content.


Conclusions
The study provides an unprecedented level of insight from QTL mapping and regulatory network analysis to reveal the oil accumulation mechanism in developing cottonseed ovules through the construction of a detailed oil accumulation model. Moreover, the present study of cottonseed oil also lays a foundation for further oil production improvement in oil crops, and contributes to renewable energy production for solving the energy shortage and stabilizing greenhouse gases at a certain extent.

Background
The Gossypium genus, as one of the most important oilseed crops worldwide, not only provides numbers of natural ber for global textile industries, but also is a vital source of edible oil and biodiesel for its high content of unsaturated fatty acids [1,2]. Cottonseed, the second major product from the cotton plant, generally has an oil content ranging from 16-25% [1] accounting for 10-15% of the total value of cotton crops. According to the United States National Cottonseed Products Association (http://www.cottonseed.com/), cotton provides the sixth largest source of vegetable oil in the world, as well as renewable raw materials for various industrial products such as biofuels and lubricants oils [3][4][5]. Moreover, cottonseed oil has become increasingly acceptable in a wide range of processed foods due to its signi cant cost advantage and avor stability compared to canola or olive oil [6]. The history of cottonseed oil-related researches could be traced back to ancient times, and the modern cottonseed crushing and re nery technology have obtained a rapid development with the development of epoch [7]. Given the importance of cottonseed as a good source of edible oil and alternative energy, the relative dearth of cottonseed oil research is extremely surprising, especially when compared with the extensive research devoted to cotton bers.
Up to now many QTLs have been detected to be associated with cottonseed oil content by linkage mapping. For example, Shang et al. [8] detected 40 QTLs associated with cottonseed oil using two backcross inbred lines (BILs) and two recombinant inbred lines (RILs) populations; Yu et al. [9] obtained 17 cottonseed oil QTLs using the BIL population of 146 lines. Similarly, Alfred et al. [10] found 4 QTLs using an immortalized F 2 population (IF 2 ); Liu et al. [11] acquired 15 QTLs using an F 2:7 cotton RIL population.
To overcome the low genotypic variation and limited recombination events in speci c genetic populations, genome-wide association studies (GWAS) were further adopted. Such as Du et al. [12], Yuan et al. [13], and Badigannavar et al. [14] detected 16, 28 and 6 signi cant SNPs related to cottonseed oil, respectively. Moreover, plant seed oil accumulation, as a complex biological process, includes at least conversion of sucrose to pyruvate, de novo FA synthesis in plastids, endoplasmic TAG synthesis, and oil-body assembly [15,16]. In the past several decades, some genes and TFs in oil crops also have been veri ed to be associated with lipid synthesis based on the Arabidopsis thaliana oil-related genes through multi-omics analyses. For instance, phosphoenolpyruvate carboxylase (PEPC) in upland cotton for the formation of pyruvate [17]; acetyl-CoA carboxylase (ACCase) in rapeseed [18], cotton [19], and potato [20] participate in de novo FA synthesis; fatty acylthioesterase B (FATB) in soybean [21] and cotton [22], and β-ketoacyl-ACP synthase I (KASI) and long chain acyl-CoA synthetase 9 (LACS9) in soybean [23], and KASII [24], Stearyl-ACPdesaturase (SAD) [25], and GhWRI1 [26] in cotton are involved in FA synthesis; glycerol-3-phosphate dehydrogenase (GPDH) in rapeseed [27], glycerol-3-phosphate acyltransferase (GPAT) [28] and GhDof1 [29] in cotton, 2-lysophosphatidic acid acyltransferase (LPAAT) in cotton [30], and OLEs in soybean [31] are reported to be related to TAG synthesis. Recently, Liu et al. [32] identi ed GmPDAT, GmAGT, GmACP4, GmZF351 and GmPgs1 ve soybean seed oil related genes using an innovative three-dimension network construction approach, as well as genome-wide association studies and multi-omics analyses. However, few gene regulatory network studies about cottonseed oil synthetic have been reported, and the genetic basis under cottonseed lipid storage is far from being understood.
The present work developed a 196 lines intra-speci c RILs population from the cross between 0-153 and sGK9708. The cottonseed oil content trait was evaluated across eight environments, and a whole-genomebased high-density genetic linkage map constructed previously by our lab was used for QTL mapping. A total 39 QTLs for cottonseed oil content trait were identi ed in these environments, of which 5 stable QTLs were identi ed in more than three environments. There were 832 genes in the con dence intervals of all the stable QTLs, of which 19 genes expressed during developing cottonseed ovules were ultimately considered as cottonseed oil-related potential candidate genes. Moreover, several important miRNAs and TFs participated in cottonseed lipid synthesis were also obtained through gene regulation network analysis. To clarify the molecular regulatory mechanisms of cottonseed oil accumulation, the carbon metabolism, along with FA and TAG biosynthesis pathways in developing cottonseed ovules were constructed to form a lipid accumulation model that includes detailed information on where and how important genes are expressed and affected seed oil accumulation. The results will provide a useful resource for cottonseed oil research and lay a foundation to convert cottonseed oil into biofuels in the future.

Results
Phenotype analysis of cottonseed oil content traits The descriptive statistics of phenotypic data for cottonseed oil content from the parents and the RIL populations are summarized in Table 1. There is a signi cant difference between the average cottonseed oil content of parents 0-153 and sGK9708 (P < 0.01), which ranged from 28.09% to 31.64% (mean ± SD, 30.57% ± 1.27), and 31.67% to 35.74% (mean ± SD, 32.86% ± 1.71), respectively. While the average cottonseed oil content of the RIL population increased from 25.09% to 28.49%. Through comparison of the mean values of cottonseed oil content, we observed a relatively great variation between the parents and the RIL population, and the mean values of oil content for RIL population existed transgressive segregations lower than the low parent. Variability in the RIL population according to the standard deviation values was slight. The phenotypic data of seed oil content trait in the RIL population showed continuous distribution, and the absolute values of skewness were all less than one, which demonstrates that the segregations of cottonseed oil content trait t a normal distribution. Additionally, the ANOVA indicated that the genotype, environment, repeat, and genotype × environment / environment × repeat interaction had signi cant effects on seed oil content (P < 0.01), and the estimated h 2 was 91.61% (Additional le 1: Table S1). These results suggested that the seed oil content is highly inherited in upland cotton, although environmental factor should be considered in oil accumulation.
Identi cation of candidate genes related to cottonseed lipid synthesis To further mine candidate genes associated with cottonseed oil accumulation, we adopted an integrated method that combines the orthologs sequence alignment with the Arabidopsis oil-related genes and KEGG pathway enrichment analysis for those candidate genes within stable QTL regions. A total of 832 genes were acquired within the con dence interval of these stable QTLs (Additional le 3: Table S3). A KEGG pathway enrichment analysis was subsequently carried out to obtain the top 20 KEGG pathways of those genes (Fig. 1a). Of these, six enriched pathways with 43 candidate genes were found to be associated with oil synthesis. These enriched pathways included "biosynthesis of amino acids", "valine, leucine and isoleucine biosynthesis", "glycine, serine and threonine metabolism", "biosynthesis of unsaturated fatty acids", "fatty acid metabolism", and "fatty acid biosynthesis" (Fig. 1a; Additional le 4: Table S4). The network relationships analysis of enriched pathway terms completed with metascape (http://metascape.org/gp/index.html) also showed that above-mentioned six pathways have interconnectedness, where terms with a similarity > 0.3 (Fig. 1b).
To date more than 700 Arabidopsis acyl-lipid metabolism genes have been detected, of which 135 are directly involved in the processes of de novo FA synthesis, TAG synthesis and lipid droplet formation. We further compared the 43 potential oil candidate genes of cottonseed with the 135 Arabidopsis oil-related genes using BLAST tool to validate these common signi cant potential candidate genes associated with oil content in upland cotton from a different perspective. The results showed that there are 24 potential candidate genes that are involved in cottonseed oil accumulation based on Arabidopsis orthologous oilrelated genes identi ed previously (Additional le 5: Table S5).

Dynamic expression patterns of candidate genes in developing cottonseed ovules
To investigate the dynamic expression patterns of 24 candidate lipid genes in developing cottonseed ovules, we analyzed the genes expression level at ten different ovules developmental stages (-3, -1, 0, 1, 3, 5, 10, 20, 30 and 35 DPA) of TM-1 using transcriptome sequencing data (Additional le 6: Table S6; Fig.   2). Among them, 5 genes were not expressed during cottonseed ovules developmental stages. The gene expression levels heatmap of other 19 potential candidate genes showed that they were clustered into three branches based on gene expression pro les after Z-score standard normalization (Fig. 2). It was noted that the gene expression levels in branches with red dot were relatively lower during early stage (-3 DPA to 1 DPA), while higher at middle stage (3 DPA to 10 DPA). Surprisingly, most genes showed a sharp decline in expression at mature stage (30DPA to 35 DPA) (Fig. 2).

Transcription factors and microRNAs regulatory network construction of oil candidate genes
Determining gene partners by gene network is an essential step toward understanding protein-coding genes function and identifying relevant biological pathways. To clarify the regulatory networks of potential candidate genes in upland cotton, regulatory associations between microRNAs (miRNAs) and their target genes, transcription factors (TFs) and their host genes, as well as TFs and miRNAs were macroscopically investigated. We directly acquired TFs and miRNAs of G. hirsutum L. from PlantTFDB v5.0 (http://plantregmap.cbi.pku.edu.cn/) and miRBase v22.0 (http://www.mirbase.org/), respectively. Together with the results of candidate oil genes mentioned above, total four cotton TFs are obtained, namely GhLEC1 (Gh_A11G1560), GhHSL1 (Gh_A12G0518 and Gh_D04G0979), GhPKL (Gh_D04G1092), and GhABI4 (Gh_D04G1143) (Additional le 7: Table S7). In addition to cleaving mRNA, plant miRNA reportedly inhibits the translation of target genes. The miRNAs target genes among 19 cotton lipid candidate genes were predicted using psRNATarget, a plant small RNA target analysis server (http://plantgrn.noble.org/psRNATarget/) (Additional le 8: Table S8). A regulatory network of miRNAs and its target genes that de ned the regulation in the complex process of lipid synthesis was constructed based on those results, which showed the interactions between seventeen miRNAs and seven target genes, which including one TFs (GhHSL1, Gh_A12G0518) (Fig. 3, Additional le 8: Table S8). There are 6 hub nodes with connectivity no less than 3 in the regulatory network, of which 2 miRNAs (ghr-miR2949b and ghr-miR2949c) and 4 candidate target genes (GhACP2, Gh_D11G2438; GhLPEAT2, Gh_D04G1019; GhCICDH, Gh_A11G1562; and GhHSL1, Gh_A12G0518) (Fig. 3). Of those, GhHSL1 was targeted by nine microRNAs with translation or cleavage mRNA. As a putative transcription factor, it is also involved in protein phosphorylation and ultimately having high expression level during early-middle stages of developing cotton ovules. Gh_D11G2438 encodes ACP2, a member of acyl carrier protein family, which is involved in fatty acid synthetic process. Gene Gh_D04G1019 encodes acyl-CoA:lysophosphatidylethanolamine acyltransferase 2 (LPEAT2), a member of the LPLAT family implicated in very long-chain fatty acid metabolic process. Gene Gh_A11G1562 encodes cytosolic NADP+-dependent isocitrate dehydrogenase (CICDH), which involves in carbon metabolism. The genes GhACP2, GhLPEAT2 and GhHSL1 were directly target of ghr-miR2949b and ghr-miR2949c. Meanwhile, GhHSL1 was targeted by additional seven miRNAs at the same time (Fig. 3). The expression level of GhHSL1 was high (~ 6.99) during the early-middle stages in developing cottonseed ovules, while during the late stage come down (1 .69) (Fig. 2). The results indicated that transcription factor GhHSL1 is speci c to lipid synthesis in developing cottonseed ovules.

Discussion
Cottonseed oil has a great potential as an alternative commercial source of biofuels production [33,34] and edible oil [35] rich in polyunsaturated fatty acids (PUFAs). The investigation of molecular regulatory mechanism of fatty acid synthesis in developing cotton ovules is pivotal for cottonseed biodiesel development. Although some studies have been carried out to identify QTLs/genes for cottonseed oil synthesis so far, the molecular genetic mechanisms are still unclear. In this study, we used an integrated approach, including QTL mapping and gene identi cation, RNA-seq, and regulatory network, to identify and analyze candidate genes involved in cottonseed oil accumulation.
With the development of SNP arrays, sequencing and genotyping technologies, an increasing number of QTL mapping about cottonseed oil studies recently have been carried out [36][37][38][39][40]. This study developed an intra-speci c RIL population in upland cotton for QTL mapping, which consisting of 196 lines with the parents 0-153 and sGK9708. The consensus genetic map was constructed by Zhang et al. [41] in previous study, which covered the whole genome of upland cotton with a high saturation and was a valuable tool for QTL mapping across the whole genome. 39 QTLs for cottonseed oil content were identi ed in the study. As the quantitative trait in uenced by environment, some QTLs identi ed in multiple speci c environments or generations (≥ 3) were named as stable QTLs [42]. Based on this de nition, this study therefore identi ed ve stable QTLs. To further determine whether the stable QTLs obtained in our study were new or had been identi ed previously, we compared our results with those QTLs from cotton QTL database based on their physical con dence intervals and previous QTL mapping reports. It was found that stable QTL qOC-chr24-2 shared the overlapping con dence intervals with QTLs identi ed in previous GWAS studies [8,43], while other four were newly identi ed which could provide more information about the mechanism of cottonseed oil accumulation and accelerate biofuels development of cottonseed oil.
Stable QTL qOC-chr17-3 was also identi ed to be connected with seed oil content in other upland cotton RIL population (Zhong 70) which was unpublished nowadays. Moreover, phenotypic plasticity refers to the ability of a single genotype to exhibit variable phenotypes in different environments. When phenotypic plasticity differs among genotypes, it can be classi ed as a GEI. A mixed model methodology with terms for QEI can be used to reveal the genetic basis of complex traits showing GEI. In this study, 3 stable QTLs with cottonseed oil traits were identi ed with signi cant QEI (Additional le 2: Table S2), which demonstrated that average temperature and the coordinate of latitude and longitude could in uence cottonseed oil content across years and locations.
To further explore the mechanism of cottonseed oil accumulation from the standpoint of lipid candidate genes, total 832 genes located in stable QTL (qOC-chr11-1, qOC-chr12-1, qOC-chr17-3, qOC-chr22-1, and qOC-chr24-2) were extracted based on the physical location. However, the analysis and interpretation of these candidate genes is often a major challenge for many researchers which requires an impractically large amount of manual literature searching to interpret. A standard approach to addressing this problem is pathway enrichment analysis, which summarizes the large gene list as a smaller list of more easily interpretable pathways [44]. In this study, we obtained the top 20 KEGG enrichment pathway of 832 genes using pathway enrichment analysis database KOBAS (http://kobas.cbi.pku.edu.cn/index.php) (Fig. 1a).
Acyl-lipid synthesis, as a complex biological process, includes at least the conversion of sucrose to pyruvate, plastid de novo FA synthesis, endoplasmic TAG synthesis, and oil-body assembly [45]. Moreover, an integrated omics analysis of rapeseed and soybean previously showed that some genes involved in photosynthesis and plant hormone signal transduction were related to lipid metabolism [46], as well as substrate competition between seed oil and protein synthesis exists in oil crops [47]. Therefore, the six pathways, "biosynthesis of amino acids", "valine, leucine and isoleucine biosynthesis", "glycine, serine and threonine metabolism", "biosynthesis of unsaturated fatty acids", "fatty acid metabolism", and "fatty acid biosynthesis", were ultimately selected for subsequent identi cation of candidate oil genes based on the results of KEGG enrichment analysis and network analyses of enriched terms (Fig. 1a,b). Among them, 43 genes were found and further analyzed by BLAST with Arabidopsis oil-related genes. As a result, 24 Arabidopsis-ortholog oil genes in upland cotton were obtained, ve of which were not expressed during developing cotton ovules (Additional le 5: Table S5). Integration of KEGG, BLAST and RNA-seq analysis results, 19 credible candidate genes related to cottonseed oil accumulation were ultimately obtained (Additional le 5: Table S5). Of these, transcription factor HSL1 were found to regulate a series of reactions in oil synthesis in previous researches for oilseed crops [29,48]. This indicated that GhHSL1 may play a signi cant role for oil synthesis in developing cotton ovules.
The expression pro les of these 19 candidate genes were analyzed to investigate the cottonseed oil accumulation mechanism in developing cotton ovules. Ma et al. [39] reported that cottonseed oil accumulates rapidly from approximately 20 DPA in cotton ovules. Therefore, we focused on genes with signi cantly higher expression levels at the early-middle stages (-3-20 DPA) of ovule development than those in the late stages (30)(31)(32)(33)(34)(35). In the end, total 19 candidate genes expressed in developing cotton ovules were considered to be related to cottonseed oil accumulation (Additional le 6: Table S6). The heatmap clusters of gene expression levels showed that branches with red hot in Fig. 2 basically matched the condition of cottonseed oil rapid accumulation nearly 20 DPA.
Gene regulatory network analysis is a powerful approach to identify the key regulators underlying important biological processes. In this study, we identi ed one TFs regulated by nine miRNAs, and three core genes (GhCICDH, GhACP2 and GhLPEAT2) that all have sustained up-regulation expression in earlymiddle stage of developing cottonseed ovules (Figs. 2,3,4; Additional le 6: Table S6), and two hub miRNAs (ghr-miR2949b and ghr-miR2949c) that both have regulated three genes in the regulatory network (Figs. 3). Previous researches have showed that these miRNAs appeared to play a critical role in regulating cottonseed oil accumulation by regulating the accumulation of auxin [49]. Moreover, it was formerly noted that ghr-miR2949b was predicted to act as a regulator of sugar transporter protein (STP) and the downregulation of ghr-miR2949b could increase STP expression levels [50]. Meanwhile, there is a lack of soluble sugar in the plastids, and most of soluble sugar is mainly transported from cytoplasm via STPs [51].
Additionally, Arabidopsis transcription factors AtHSL1 could repress the sugar-inducible expression of seed maturation program in seedlings and play an essential role in regulating the transition from seed maturation to seedling growth, which contributes to the seed oil accumulation [52]. Moreover, HSL1dependent histone methylation plays critical roles in regulation of seed dormancy during seed germination and early seedling growth, which will affect the oil content accumulated in seed [53]. Similarly, AtLPEAT2 [54], AtLACS2 [55] and ChACP4 [56] were found to be associated with PI (34:3), linolenic acid (P = 2.63e-07), and pyruvate (LOD = 14.68) in oil synthesis process, respectively. Ma et al. [39] also found that the cotton genes GhHSL1, GhACP2 and GhLPEAT2 were differentially expressed between the upland cotton (G. hirsutum L.) high oil content genotype (3008) and low oil content genotype (3012) in 10, 20 and 30 DPA ovules. These results indicated that GhHSL1, GhACP2 and GhLPEAT2 may play vital roles in cottonseed oil accumulation. Of course, further experiments are needed for the other lipid candidate genes to con rm that they can indeed affect the oil content of cotton seeds.
A detailed cottonseed oil accumulation model was constructed based on our results of the entire study ( Fig. 4). It starts with the transportation of sucrose, followed by glycolysis and fatty acid synthesis in the cytosol, and results in the formation of TAG in the endoplasmic reticulum. Pyruvate and acetyl-CoA are the substrate for FA synthesis. The putative lipid candidate genes, participating in the pathway of pyruvate and acetyl-CoA formation, showed similar expression patterns during developing cottonseed ovules (Fig. 2), which indicated that these genes may play an essential role in cottonseed oil accumulation.
Additionally, it is noteworthy that these genes (such as GhCICDH, GhACP2, GhFAD2, GhACBP, GhLPEAT2, GhLACS4 and GhPAH1) were up-regulated expressed at early-middle development stages of cotton ovules (Fig. 4) and primarily involved in carbohydrate metabolism and lipid synthesis (Additional le 5: Table S5), indicated that they play a vital role in carbon source supply, FA synthesis, and TAG synthesis. As acyl-CoAbinding protein (ACBP) plays an important housekeeping role in lipid metabolism by maintaining the intracellular acyl-CoA pool, and involved in lipid synthesis and transport, gene expression, and membrane biogenesis [57]. Genes ACBP4, ACBP6, FAD2 and PAH1 were also found to have catalyzed substrate Gly-3-P to generate TAG. Furthermore, the deletion of 3-phosphate glycerol dehydrogenase isomer (GPDH) in Yarrowia lipolytica can redirect carbon ux to lipid synthesis and signi cantly increase lipid content [58]. LACS4 was found to be involved in the synthesis of very long-chain fatty acid [55]. This investigation will help to understand the molecular regulatory basis of cottonseed oil accumulation, and provide a foundation to increase cottonseed oil content for biofuels development.

Conclusion
In the present study, a total of ve stable QTLs related to cottonseed oil content were identi ed, of which four QTLs were identi ed for the rst time and one QTL was veri ed in previous studies. Upon further analysis of these genes around the ve stable QTLs by means of KEGG enrichment, sequence alignment and TFs/microRNAs regulatory network, we obtained 19 oil candidate genes in cottonseed (including four TFs) and two core microRNAs related to cottonseed oil accumulation during the developing ovules. Of these, genes GhACP2, GhLPEAT2 and transcription factor GhHSL1, along with ghr-miR2949, as the hub nodes of gene regulation network, may be key candidate factors on increasing seed oil content in G. hirsutum L. These results provide a fundamental resource for cottonseed genetic research and breeding, and encourage the prospects of the cottonseed oil used for biofuels in the future.

Methods
Plant materials and seed oil trait A RIL population (n = 196) developed between two upland cotton parental lines, 0-153 and sGK9708, in our lab was used for QTL mapping in this study. Brie y, the cross was made in 2001, and F 1 plants were self-pollinated in Anyang, Henan Province in 2002 (02AY), and 250 F 2 seeds harvested were grown in 10 rows (each 8 m long and 0.8 m apart in 03AY), then 196 F 2:3 families were randomly selected to be planted in single-row plots, with the parents and F 1 plants in two-row plots (each 8 m long and 0.8 m apart in 04AY).
Finally, a segregation population consisting of 196 F 6:8 RILs were developed in 2007 [42]. In year 2014 and 2015, the RIL population and its parents were planted in 4 various ecological locations (Anyang, Alaer, Shihezi, and Kuerle) with each row 5 m long and 0.8 m apart, and 20 plants in each row. Field management was performed according to the local farming practices.
In 2014 and 2015 years, the mature open boll samples of RIL population and its parents were harvested by hand and ginned by machines based on per plant in the plot, the seed oil contents were subsequently investigated. In this study, we randomly chose 100 healthy seeds (with hull) of each sample in Anyang, Alaer, Shihezi and Kuerle respectively and shelled seeds by hand to measure the oil content with 3 repeats per genotype (2014 and 2015). The Niumag Imaging and Analyzing System (NMI20-Analyst, Niumag Electric Corporation, Shanghai, China) was used to detect the total oil content of mature cottonseeds.

Data statistical analysis
Student's t test was used to detect the statistical signi cance of the phenotypes. A variance (ANOVA) analysis of the oil content for the 196 RIL population across eight environments were performed using SPSS. The software QTL IciMapping was used to estimate the combined broad-sense heritability (h 2 ) of oil content among different environments [59].

QTL mapping and QTL-by-environment interaction
The consensus high-density genetic map used in this study was constructed with three types of markers (8295 markers, 5197.17 cM) by Zhang et al. [41]. We could determine the physical positions of linkage groups using the map, which could supply powerful information for subsequently selecting candidate genes. In this study, composite interval mapping (CIM) method in WinQTLCart2.5 [60], and genome-wide composite interval mapping (GCIM) method in QTL.gCIMapping [61][62][63] were applied to detect main-and small-effect QTLs, and estimate the effects which were included as co-factors in the stepwise selection model used for background marker selection in the CIM. QTLs that are consistently identi ed in at least three environments are considered stable [42]. Positive additive effects indicated that 0-153 alleles increased the phenotypic trait values, and negative scores indicated that 0-153 alleles decreased the values, while the sGK9708 alleles had the opposite effects. QTL were named as follows: (q + trait abbreviation) + chromosome/linkage groups + QTL number. QTL for the same trait across different genetic backgrounds were considered common QTL when their con dence intervals overlapped. The variance ratio, additive (a) effects of each particular QTL interpretation, and QTL-by-environment interactions were obtained with the MCIM method in QTLNetwork-2.1 [64]. To produce an accurate LOD pro le and adjust the background mark effect, a 10 cM window size, a 0.5 cM walking speed and the critical LOD score of 3 for signi cant QTL were used to declare the linkage [65,66]. Additionally, a critical Pvalue of 0.05 was used to avoid over-tting of the model [67].

Identi cation and annotation of genes around QTL regions
To identify all genes in the con dence intervals of QTLs regions for cottonseed oil content, we used Cotton Functional Genomics Database (CottonFGD, https://cottonfgd.org/) [68] to search for G. hirsutum L. genes based on genomic coordinate or genetic marker. We annotated all the genes around QTL regions through Gene Ontology (GO) annotation and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway enrichment analysis with agriGO 2.0 (http://bioinfo.cau.edu.cn/agriGO/) [69] and KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/index.php) [70], respectively. The P-values for each KEGG biological process was calculated by Fisher's exact test. To control the false discovery rate (FDR ≤ 0.05), the Benjamin-Hochberg method was used to conduct multiple testing correction. In addition, the small term cut-off value was set at 5.

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional les.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that there are no competing interests.
Authors' contributions ZZ, SH and YY conceived the study; ZZ contributed to the data processing, analysis and wrote the manuscript; GJ collected and analyzed the phenotype data; ZZ, GW, LJ and SY managed the RIL population; LA, GQ and PJ collected the phenotype data in Anyang and Shihezi; FS and DX managed and collected the phenotype data in Kuerle and Alaer; LS and CQ helped with manuscript reviewing; SH and YY provided the resources, designed the experiment and revised the manuscript. All authors read and approved the nal manuscript.   Figure 1 KEGG pathway enrichment analysis of 832 genes around the ve stable QTLs. a top 20 enrichment pathways of all genes. Terms with a P-value < 0.05 were considered signi cant enrichment and selected for further analysis. b the relationships between the top 20 terms. A subset of enriched terms was selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges, and where each node represents an enriched term.

Figure 2
Temporal expression pro le of putative genes that are involved in cottonseed oil accumulation in developing ovules by Z-score normalized FPKM.  Characterization of the cottonseed oil accumulation model in developing cotton ovules for the regulation of enzymes, TFs, and miRNAs in oil biosynthesis.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.