QTL mapping for pumpkin fruit traits using a GBS-based high-density genetic map

Pumpkin is a popular vegetable crop and exhibits a broad diversity in fruit shape and size. Fruit-related traits are the decisive factors determining consumer acceptance and market value of pumpkin cultivar. As a result, deciphering the genetic basis of fruit-related traits is of great importance for pumpkin breeding. To address this problem, a F2 population was generated by two Cucurbita moschata inbred lines with contrasting fruit shapes, and genotyping by sequencing approach was used to construct a genetic map and localize the QTLs underlying the fruit-related traits in this study. The results showed that a high-quality genetic map was constructed for pumpkin, which comprised of 2413 bins and spanned a total length of 2252.10 cM with an average genetic distance of 0.94 cM. Thirty significant QTLs with moderate or small effects were identified for seven fruit-related traits, including fruit length, fruit diameter, fruit shape index, fruit weight, fruit flesh thickness, seed cavity size, and total soluble solids content. Co-locations were observed between the QTLs underlying different traits, demonstrating that pleiotropic effect plays an important role in genetic control of fruit-related traits. The identified QTLs provide valuable information for further fine mapping of the related genes and pumpkin breeding programs with the aim of improving fruit quality.


Pumpkin
(Cucurbita moschata Duchesne, 2n = 2x = 40) is an annual dicotyledonous vegetable belonging to the Cucurbita genus of Cucurbitaceae family and widely cultivated for food, medicine, and ornamental purposes. Its edible parts include fruits, seeds, flowers, leaves, and young stems. The advantages of high yield, good storage life, long period of consumption, high nutritive value, and fitness in transport make pumpkin a very popular vegetable crop (Marie-Magdeleine et al. 2011). Pumpkin fruits are rich in carotenoids, phenolics, flavonoids, vitamins, carbohydrates, non-cellulosic polysaccharides, and amino acids, which have various effects beneficial to human health, such as anti-diabetic, anti-carcinogenic, antioxidant and anti-inflammatory potential 1 3 106 Page 2 of 12 Vol:. (1234567890) (Yadav et al. 2010;Rolnik and Olas 2020). As a result, pumpkin is recognized as a functional food. Recently, pumpkin was recommended as a complementary food to combat vitamin A deficiency in low and middle income countries (Buzigi et al. 2021). Easy cultivation and high nutritional value make pumpkin constitute an important role in the basic diet of several countries and attractive for production around the world.
Fruit traits, including fruit size, fruit shape, fruit weight (FW), flesh thickness (FTH), seed cavity size (SCS) and sweetness, are the decisive factors for market value of cucurbit crops. Elucidating the genetic architectures of fruit morphology is a popular topic in cucurbit crops, and recently several fruit-related QTLs have been determined in cucumber , melon (Pereira et al. 2021), bottle gourd , and bitter gourd (Rao et al. 2021). Fruit size and shape are the important external quality not only affecting the yield, but also determining the market classes and consumer acceptance (Weng et al. 2015). Consequently, they are the important targets of selection in cucurbit breeding programs. So far, phytohormone-related genes, microtubule-related genes and cyclin-related genes have been reported to play important roles in controlling fruit size in cucurbit crops (Chomicki et al. 2020). Meanwhile, a total of 150 consensus QTLs underlying fruit size and shape variations were comprehensively summarized in cucurbit crops, especially in cucumber . Besides the external quality, several QTLs related to fruit internal quality of cucurbit crops were also identified. In cucumber, six and five QTLs were identified for hollow size and FTH, respectively . One QTL was identified for FTH in bitter gourd (Rao et al. 2021). One QTL determining sugar content was identified in melon (Pereira et al. 2021). These progresses will facilitate the mapbased cloning of related genes in cucurbit crops.
Pumpkins show a broad diversity in fruit size and shape. Its major fruit shapes include round, oblate, oval, oblong, and pear-shaped. Deciphering the genetic basis of fruit traits is of great importance for pumpkin breeding. In the previous study, 12 pumpkin fruit morphology-related QTLs were identified in a F2 population derived from the oblate-fruited and round-fruited parents (Zhong et al. 2017). Using a pyriform-shaped maternal parent and a dumbbellshaped paternal parent, a moderate-effect QTL was detected for pumpkin fruit length (FL) and FW in the F2 population, respectively (Hashimoto et al. 2020). Using the pumpkin parents with flattened shape and spheroid shape, respectively, a significant QTL underlying SCS was detected in the BC1 population (Echevarria et al. 2020). These works provided necessary information to understand the genetic control of fruit traits in pumpkin. However, compared with the other cucurbit crops, such as cucumber, melon and watermelon, knowledge on genetic architecture of fruit traits in pumpkin is very limited ). Long and oblate shapes are the two dominant fruit shapes in commercial pumpkin cultivars consuming for cuisine purpose. However, genetic controls on the long and oblate fruits of pumpkins are still unknown.
High-density genetic map is an efficient tool to dissect the genetic architecture of interested traits. In the past, traditional molecular makers, such as SSR, were used to construct genetic map (Kong et al. 2020). Due to the limited number of available makers, resolution of the genetic map constructed using the traditional markers is usually very low, which hampers the subsequent cloning of the targeted genes. With the advent of next generation sequencing, the genomewide makers including SNP and InDel can be called and genotyped simultaneously, which greatly facilitate the construction of high-density genetic map. Genotyping by sequencing (GBS) is a robust, simple, high-throughput, and cost-effective approach for SNP marker discovery and has been widely used for studying genomic diversity (Elshire et al. 2011;Glaubitz et al. 2014). The genome-wide marker discovery capability makes GBS a powerful tool to construct high-density genetic map. A saturated genetic map with an average spacing of 0.4 cM was established for QTL analysis of fruit-related traits in Zucchini (C. pepo) using GBS approach. In pumpkin, a GBSbased genetic map with an average of 3.7 cM was constructed to detect the QTLs underlying fruit and agronomic traits (Echevarria et al. 2020). The comparatively low density of genetic map decreased the resolution of QTL mapping in pumpkin.
Pumpkin breeding is a time-consuming and challenging task. The large sizes of the plants also make pumpkin breeding a resource-intensive job, especially in terms of space (Hernandez et al. 2020). Understanding the genetic basis and cloning of fruit-quality related genes are valuable to improve breeding efficiency. To address this problem, two C. moschata Page 3 of 12 106 Vol.: (0123456789) inbred lines with the contrasting fruit shapes were selected in this study, to construct a F2 segregating population with the aims of mapping the fruit-related QTLs using a GBS-based high-density genetic map. The results will provide substantial information for understanding the genetic basis of pumpkin fruit traits.

Materials and methods
Construction and phenotypic evaluation of the F2 population The C. moschata inbred lines N137 and N29, which differ significantly in fruit shape, were used as parents to develop the F2 mapping population. N137 has long fruit shape, while N29 has oblate fruit shape. The F1 hybrids were generated using N317 as the female parent and N29 as the male parent. Self-pollinations were performed on the F1 plants to produce an F2 population consisting of 200 individuals. The parental lines, F1 hybrids and F2 individuals were planted in an open field in the spring of 2020 in Hunan Academy of Agricultural Sciences, China. The fruits were harvested at the stage of commercial mature. The fruit-related traits, including fruit length (FL), fruit diameter (FD), flesh thickness (FTH), seed cavity size (SCS), and fruit weight (FW) were measured. The fruit shape index (FSI) was calculated as the ratio of FL and FD. The total soluble solids (TSS) were measured using a pocket refractometer (ATAGO Co. Ltd). The names and abbreviations of the investigated traits followed the rules proposed in cucumber . Histogram and density plots were used to display distributions of the traits. Pearson correlation coefficients between the traits were calculated using R (https:// www.r-proje ct. org/).

DNA isolation and sequencing
Healthy young leaves from the F2 population and the parental lines were individually collected and stored at -80 °C. Genomic DNA was isolated from each sample using a Plant DNA Extraction Kit (Tiangen, China) following the manufacturer's protocol. Integrity and purity of each DNA sample were measured using the 1% agarose gel electrophoresis and a Nan-oDrop 1000 spectrophotometer. The high-quality DNA samples were used for sequencing library construction. Sequencings of the parents and F2 individuals were performed according to the GBS protocol (Elshire et al. 2011). The enzyme and sizes of the digested fragments were predicted and evaluated according to the reference genome of C. moschata (Sun et al. 2017). Based on the evaluation results, HaeII was selected as the restrict enzyme. Library preparation and sequencing were carried out with the help of Novogene Co. Ltd on an Illumina HiSeq 4000 platform to produce the paired-end reads in length of 125 bp.

SNP discovery and genotyping
Quality controls on the generated reads, including adapter trimming and low-quality reads removing, were conducted using fastp (Chen et al. 2018). Genome sequence of C. moschata was download from the Cucurbit Genomics Database and used as the reference genome (Sun et al. 2017). The bwa-mem2 aligner was used to map the reads to the reference genome (Vasimuddin et al. 2019). Variants were called using GATK4 according to the GATK Best Practices. The mpileup and call modules of BCFtools were also used to identify the variant loci (Danecek et al. 2021). The SNPs identified both by GATK and BCFtools were selected for further filter. Using the SelectVariants module in GATK, the missing and heterozygous loci in each parental line, as well as the monomorphic loci between the parent lines were removed.

Genetic map construction
The high-quality SNPs were used to construct the genetic linkage map using lep-MAP3 (Rastas 2017). Briefly, the parental genotypes were called using the ParentCall2 module. Then the loci with segregation distortion (p < 0.001) or missing genotype ratio larger than 0.2 were removed using the Filtering2 module. LOD scores ranging from 20 to 40 were used to separate the SNPs into different linkage groups (LGs). At last, the markers were ordered in each LG and Kosambi genetic distances between the adjacent markers were calculated using the OrderMarkers2 module. The SNPs locating at the interval without recombination were grouped into a bin. The ID of the first SNP in each bin was used to represent the bin. QTL mapping QTL analysis was performed using the Composite Interval Mapping (CIM) method provided in R/qtl package (Arends et al. 2010). One thousand permutations were performed to determine the genomewide significance threshold for QTL detection. QTLs exceeding the threshold value (p < 0.05) were considered significant. The QTL effect and the proportion of phenotypic variance explained (PVE) by the QTL were estimated at the highest QTL peaks using fitqtl function in R/qtl package. For each detected QTL, the confidence interval was defined using a 1.5-LOD support interval. Naming of QTL for the traits followed the nomenclature recommendations in cucumber .

Phenotypic variation in the F2 population
The pumpkin inbred lines of N137 with long fruit and N29 with oblate fruit were selected as the parents to generate an F2 segregating population. The fruits of two parental lines, F1 hybrid, and several representative F2 individuals are shown in Fig. 1. Phenotypic variations of FL, FD, FSI, FTH, SCS, FW, and TSS were recorded at the commercially mature fruit stage. Distributions of these traits are shown in Fig. 2. The FL of N137 was five folds higher than that of N29. While, the FD of N29 was two folds higher than that of N137. The FSI for N137 and N29 were 3.65 and 0.34, respectively. The FTH and SCS in N29 were higher than that of N137. The FW and TSS were higher in N137 compared with N29. Remarkable differences of these fruit-related traits between N137 and N29 demonstrated that different alleles harbored by the parental lines. Based on the performance of F1 hybrid, FL, FD, FSI, and SCS had high degree of dominance, while FTH, FW, and TSS exhibited transgressive segregation. Both the histograms and density plots displayed the continuous distributions of these fruit-related traits in the F2 population, suggesting their quantitative inheritances.
Person correlation coefficients were calculated between the traits, which are presented in Fig. 2. Highly positive correlations were observed between FSI and FL, FD and SCS, as well as FD and FTH, respectively, suggesting their potentially common genetic basis. While, highly negative correlations was observed between FSI and FD, FSI and SCS, FD and FL, respectively, implying different genetic basis between these traits. TSS exhibited very low or no significant correlation with the other traits, demonstrating its distinct genetic basis. Moderate correlations were observed between the remaining pairs of traits, suggesting different genetic factors may regulate them.

GBS sequencing and SNP calling
To determine the genotype of each plant, a total of 202 GBS libraries, including the 200 F2 individuals and the two parent lines, were sequenced and yielded 158.0 Gb paired-end reads. The clean reads for the parental lines of N29 and N317 were 3.4 Gb and 3.7 Gb, representing 12.5 × and 13.7 × sequencing depth, respectively. The clean reads generated by the F2 individuals ranged from 0.45 to 0.90 Gb, with the mean of 0.66 Gb. The corresponding sequencing depths of the F2 individuals varied from 1.7 × to 3.4 × with the mean of 2.5 × .
The clean reads were aligned to the C. moschata reference genome using bwa-mem2. The successful mapping rates varied from 98.1 to 99.7% with the mean of 99.3%. Using GATK HaplotypeCaller, a total of 2,073,737 variants, including 1,649,104 SNPs and 431,304 InDels, were identified. BCFtools identified 2,355,207 variants, including 2,117,313 SNPs and 237,894 InDels. Intersection analysis was conducted between the variants identified by GATK and BCFtools. The results showed that a total of 1,423,580 variants, including 1,391,130 SNPs and 32,450 InDels were collectively identified by both GATK and BCFtools. The shared variants were selected and further filtered. The biallelic SNP loci with homozygous genotypes in each parental line and  exhibiting polymorphism between the parental lines were kept. The loci with larger than 20% genotype missing ratio were filtered out, resulting in a total of 7,082 high-quality SNPs with the Ts/Tv ratio of 2.13.

Genetic map construction
Lep-MAP3 was used to construct the genetic map. The parental genotypes firstly were called using the ParentCall2 module. The loci with significant segregation distortion (p < 0.001) were removed. The remaining SNPs were separated into LGs with the LOD scores varying from 20 to 40. At LOD score of 27, a total of 6,918 SNPs were classified into 20 LGs and the number of single marker was 13. The other LOD scores resulted in number of LGs different from 20 or larger number of single marker. As a result, LOD score of 27 was adopted to separate the SNPs into LGs. The SNPs were then ordered in each LG and the Kosambi function was used to calculate their genetic distances. The SNPs locating in the interval without recombination were collapsed into a bin. The length and genotype for each bin are show in Fig. S1. All the SNPs were grouped into their corresponding chromosome, demonstrating a chromosome-level genetic map was established (Fig. S2). The number of crossovers in the F2 population varied from 33 to 64 with the mean of 46, which was reasonable in a F2 segregating population with 200 individuals (Fig. S3). The recombination fractions and the corresponding LOD scores for tests of linkage between all pairs of markers are shown in Fig. S4. The largest LOD was observed between the adjacent markers on the LGs, indicating the high quality of the genetic map. Good agreements were observed between the genetic marker positions and physical marker positions on the reference genome, adding confidence in the quality of this map (Fig. 3). All these evidences proved that a high-quality and chromosome-level genetic map was constructed for pumpkin. Summary statistics on the genetic map are list in  Identification of QTLs underlying fruit traits QTL analysis for the fruit traits was performed in R/qtl using the CIM method. Genome-wide views of QTLs detected in the F2 segregating population are illustrated in Fig. 4. Detailed information on the QTLs is reported in Table 2. The IDs and annotations of genes locating in each QTL are listed in Table S1. In total, 30 QTLs were detected for the seven traits.
Fruit size was evaluated in terms of FL and FD. Three QTLs, namely fl2.1, fl6.1, and fl14.1, were detected for FL, which could explain 8.02%, 5.67%,    (Table S1). A total of five QTLs were identified for FD, which contained 339 genes and located on the chromosomes of 1, 5, 12, 15, and 19, respectively. The fd5.1 contained a P450 family gene (CmoCh05G004790) like FW3.2 (Table S1). The total PVE of the five QTLs reached up to 25.45%. No co-localized QTLs were found between FL and FD, demonstrating their different genetic basis.
Fruit shape was defined using the ratio of FL to FD, namely FSI. Three QTLs with the total PVE of 20.25% were detected for FSI. The fsi2.1 and fsi6.1 were overlapped with fl2.1 and fl6.1, respectively, suggesting that FL plays a major role in determining fruit shape. Meanwhile, fsi15.1 was closely linked with fd15.1 and contained a YABBY gene (Cmo-Ch15G012090) like CRABS CLAW (Table S1).
Fruit internal quality was evaluated in terms of FTH and SCS. Six QTLs with the total PVE of 24.11% and two QTLs with the total PVE of 8.75% were detected for FTH and SCS, respectively. The fth4.1 contained a CLAVATA3 gene (Cmo-Ch04G010690). The fth1.1 was closely linked with fd1.1. The fth5.1 and fth6.1 were co-localized with fd5.1 and fsi6.1, respectively. While, scs5.1 was overlapped with fd5.1. These results demonstrated that the genetic basis of fruit shape and internal quality were closely related.
Fruit weight is an important trait that determines the yield. A total of four QTLs for FW were identified. The fw12.1 contained a YABBY gene (Cmo-Ch12G012310) like CRABS CLAW. These QTLs showed similar effects on FW with the total PVE of 18.52%. The fw12.1 and fw16.1 were co-localized with scs12.1 and fth16.1, respectively, indicating that FTH and SCS share the same genetic basis with FW to some extent. The scs5.1 and scs12.1 overlapped with 24 and 14 annotated genes, respectively.
The total soluble solids is an important factor determining fruit sensory quality. Seven QTLs were detected for TSS with the total PVE of 34.20%. All the QTLs had small effects on TSS. The tss5.1 was overlapped with fth5.1 and closely linked with fd5.1 and scs5.1. These results suggested that this region possibly had pleiotropic effect on fruit size and quality.

Discussion
Pumpkin is an important vegetable with high nutritional value and wide adaptability. Fruit is its major edible part, and shows a broad diversity in size, shape and weight. As a result, fruit-related traits have been a subject of research for a long time. However, compared with the other cucurbit crops, especially cucumber and melon, knowledge on the genetic basis of fruit-related traits in pumpkins is very limited . To fill this gap, a F2 population and GBS sequencing approach were used to elucidate the genetic architecture of pumpkin fruit-related traits.
Genetic map is a powerful tool to localize the QTLs underlying the targeted traits. Several genetic maps were constructed in pumpkins using different approaches. A genetic map spanning a total genetic distance of 3087.03 cM on 20 LGs with an average genetic distance of 0.89 cM was constructed to map QTLs for fruit-related traits in pumpkin (Zhong et al. 2017). Moreover, using the GBS approach, a genetic map consisting of 24 LGs with an overall genetic length of 2665.0 cM and an average spacing of 3.7 cM was constructed for a pumpkin BC1 population (Echevarria et al. 2020). These genetic maps were constructed without using the information of reference genome, which hampered their comparisons with the other genetic maps. Recently, based on a F2 population, a genetic map with total length of 2026.7 cM and an average marker interval of 4.6 cM was constructed in pumpkin (Hashimoto et al. 2020). The resolution of this map was relatively low, which decreased the accuracy of QTL mapping. In the present study, a genetic map comprising of 2413 bins and spanning 2252.10 cM with an average genetic distance of 0.94 cM was constructed (Table 1). Tests on crossover number, relationships between recombination fraction and LOD scores, as well as synteny between the genetic and physical maps proved that it was a high-quality and chromosome-level genetic map, which greatly facilitated the downstream QTL mapping and comparison with the other genetic maps.
Fruit size is an important composition of fruit external quality, and determines product classes and consumer acceptance (Weng et al. 2015). FL and FD 106 Page 10 of 12 Vol:. (1234567890) are usually used to describe the fruit size and represent fruit elongation and radial growth, respectively . FL was positively correlated with FD in C. maxima (Kazminska et al. 2020). However, negative correlation of phenotypic variations between FL and FD was observed in the present study, probably due to their different genetic backgrounds. Two QTLs for FD were previously detected in pumpkin (Zhong et al. 2017). Due to lack of the physical position information in pumpkin reference genome, it was difficult to compare the above-mentioned QTLs with the other studies. Moreover, three and two QTLs were identified for FL and FD in C. maxima, respectively (Kazminska et al. 2018). In the present study, three QTLs for FL and five QTLs for FD were detected in pumpkin. Of them, a QTL for FL locating on chromosome 14 was overlapped with the one detected in C. maxima, while no consensus QTLs for FD were observed between pumpkin and C. maxima (Kazminska et al. 2018). In addition, no co-location of QTLs for FL and FD were found in the present study, suggesting that elongation of fruit and increase of diameter might be under different genetic mechanisms in pumpkin.
Fruit shape is usually represented by FSI, which is the ratio of FL and FD. Three QTLs with the total PVE of 20.25% were detected for FSI in this study, and two of them overlapped with the FL related QTLs, indicating that fruit elongation played an important role in determination of final fruit shape. Similar results were also observed in cucumber (Weng et al. 2015;Pan et al. 2020). However, in C. maxima, the FSI was reported not to reflect the variation of fruit shape appearing in RILs (Kazminska et al. 2020). Since FSI is a composite trait and calculated from FL and FD, its genetic basis is unknown. As a result, caution should be taken in using the FSI related QTLs in marker-assisted selection, and more sophisticated approach for fruit shape phenotyping is needed to accurately map fruit shape QTLs .
Fruit weight is an important yield-related trait. In the present study, FW exhibited transgressive segregation and was positively correlated with FL, FD, SCS, and FTH, respectively. Similar results were also observed in C. maxima (Kazminska et al. 2020). In the previous reports, one and six QTL related to FW was detected in pumpkin and C. maxima (Kazminska et al. 2020;Hashimoto et al. 2020), respectively. In the present study, four QTLs were detected for FW, and two of them were co-localized with the SCS and FTH related QTLs, respectively. Similar result was also observed in C. maxima (Kazminska et al. 2020). These results demonstrated that FTH and SCS have the same genetic basis with FW to some extent.
Fruit internal quality includes the traits of FTH, SCS, TSS, etc. A fruit with small SCS, thick FTH, and high TSS is preferred for both the producers and consumers. In the present study, the parental line N137 had thicker FTH, smaller SCS, and higher TSS than N25. FTH was positively correlated with SCS in the F2 population. Similar results were also observed in cucumber . Three and four QTLs for FTH were previously detected in pumpkin and C. maxima (Zhong et al. 2017;Kazminska et al. 2020), respectively. In the present study, six and two QTLs were detected for FTH and SCS, respectively. Colocations of QTLs between FTH and TSS, FD, FSI, as well as FW were observed, respectively. Close locations of QTLs between FTH and FD, SCS were also observed. Two QTLs for SCS were overlapped with the QTLs of FW and FD, respectively. These results demonstrated that the genetic basis of fruit external and internal qualities were closely related.
TSS is an important component affecting fruit quality and taste, which has been studied in several crops. No QTLs related to Brix were detected in pumpkin in the previous report (Zhong et al. 2017). However, one QTL related to Brix was detected in pumpkin in a recent report (Hashimoto et al. 2020). Similarly, one QTL for Brix was also identified in melon (Pereira et al. 2021). In the present study, seven QTLs for TSS with the total PVE of 34.20% were identified. The large number of QTLs for TSS identified in the present study is probably due to the significant difference of TSS contents between the parental lines. Moreover, co-location and close location were observed between TSS and FTH and FD on chromosome five, respectively, indicating that pleiotropic effect plays an important role in genetic control of fruit traits.
Homologs of SUN, OVATE, FW2.2, FW3.2, GL7, SlTRM5, FW11.3, CRABS CLAW (YABBY family), WUSCHEL, CLAVATA3, ACS2 were reported to be related to fruit size/shape and weight in cucumber, melon and watermelon ). In the present study, homologs of FW3.2 and CLAVATA3 were also overlapped with fd5.1 and fth4.1, respectively. Moreover, both fsi15.1 and fw12.1 contained the YABBY family genes like CRABS CLAW. These results also demonstrated the effectiveness of the QTLs identified in the present study. Taken together, the QTLs identified in the present study will provide substantial information for understanding the genetic basis of pumpkin fruit traits and further fine mapping of the related genes.

Conclusion
In the present study, a high density linkage map was constructed for pumpkin. Using the genetic map, a total of 30 significant QTLs with moderate or small effects were identified for seven traits related to the external or internal quality of pumpkin fruit. Colocations and close locations were observed between the QTLs underlying different traits. The pleiotropic effect possibly plays an important role in genetic control of fruit-related traits.
Author contributions All authors contributed to the study conception and design. Material preparation and data collection were performed by XH, ZM, YL, DW, ZZ and XH. Data analysis was performed by MW and QK. The first draft of the manuscript was written by QK and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.