Genome-wide Association Analysis Reveals a Novel QTL CsFS1 for Fruit Skin Color in Cucumber

Background: Cucumber is an important melon crop in the world, with different skin colors. However, the candidate genes and the underlying genetic mechanism for such an important trait in cucumber are unknown. In this study, a locus controlling fruit skin color was found on chromosome 3 of cucumber genome. Results: In this study, the light green inbred line G35 and the dark green inbred line Q51 were used as parents to construct an F 2 population. On chromosome 3, we identied a major QTL CsFS1 (Fruit skin 1). We further narrowed down the CsFS1 locus to a 94-kb interval containing 15 candidate genes. Among these genes, Csa3G912920, which encodes a GATA transcription factor, was expressed at a higher level in the pericarp of the NIL-1334 line (with light-green fruit skin) than in that of the NIL-1325 line (with dark-green fruit skin). This study provides a new allele for the improvement of cucumber fruit skin color. Conclusion: A major QTL that controls fruit skin color in cucumber, CsFS1, was identied in a 94-kb region that harbors the strong candidate gene CsGATA1.


Background
Fruit skin color is a valuable trait in the horticulture industry because it strongly in uences consumer preference and exhibits extensive phenotypic variation that can be used in breeding. Many quantitative trait loci (QTLs) and genes related to fruit skin color have been detected and/or cloned in crops. In melon, skin color is determined by the composition and content of pigments such as carotenes, avonoids, and chlorophylls [1]. In yellow casaba muskmelon, CmKFB, which encodes a kelch domain-containing F-box protein, was identi ed on chromosome 10 and shown to downregulate the accumulation of naringenin chalcone [2]. MEL03C003375 on chromosome 4 and MELO3C003097 on chromosome 8 were also shown to be closely associated with skin color [3]. In watermelon, qrc-c8-1 on chromosome 8 controls the green shade of fruit skin; it was identi ed by high-density genetic mapping of recombinant inbred lines and explained 49.942% of the phenotypic variation in skin color [4]. Cla002755 and Cla002769 on chromosome 4 are markers for yellow skin and were identi ed by bulked segregant analysis sequencing (BSA-seq) and genome-wide association studies (GWAS) [5]. In tomato, SlMYB12 was mapped to chromosome 1; it corresponded to the pink gene y and controlled the accumulation of yellow-colored avonoids in the tomato fruit epidermis [6,7]. In pepper, three independent pairs of genes (y, c1, and c2) and two QTLs (pc8.1 and pc10) were identi ed as controlling ripe fruit color and chlorophyll content [8].
Cucumber (Cucumis sativus L., 2n=2x=14) is an economically important cucurbitaceous crop worldwide, with a total global production of 75.2 million tons, of which 56.2 million tons (74.7%) were produced throughout the Chinese mainland in 2018 (data available at http://www.fao.org/). The skin color of cucumber fruit is an important agronomic character that affects consumer choice. The locus w that controls the white fruit skin of cucumber was mapped to an 8.2-kb region on chromosome 3 between the LH39253 and ASPCR39250 markers and contains only one gene, Csa3G904140 (APRR2) [9]. APRR2 encodes a nuclear localization transcription factor and controls fruit skin color by reducing the content of chlorophyll and chloroplasts [10,11]. Cucumber Csa7G051430 was identi ed by BSA-seq of extremephenotype F 2 individuals from a cross between the light-green skin mutant lgp and the wild type 406. It is homologous to Arabidopsis ARC5, which plays an important role in chloroplast division [12,13]. Similarly, Csa6G133820, mapped through the light-green leaf and fruit skin mutant M218, encodes a Ycf54-like protein required for chlorophyll synthesis named CsYcf54 [14,15]. Csa2G352940 (CsMYB36), encoding the transcription factor MYB36, regulates yellow-green peel color in cucumber [16]. To date, the mechanism that controls green fruit skin color in cucumber remains unclear. Further study of skin color inheritance and identi cation of candidate genes associated with green skin color will therefore provide valuable information.
BSA-seq and GWAS are simple and effective methods for the identi cation of molecular markers associated with target genes and QTLs that control traits of interest [17,18]. This study was designed to determine the inheritance pattern of green fruit skin color and to map major skin color QTLs. BSA-seq analysis detected a genomic region harboring a major fruit skin color QTL, CsFS1, on chromosome 3, and it was further validated by GWAS analysis. This study also provides preliminary evidence that Csa3G912920 is the probable candidate gene in the CsFS1 locus.

Phenotypic analysis of fruit skin color in cucumber
The inbred lines G35 (light-green cucumber) and Q51 (dark-green cucumber) were used as parents for ne mapping of fruit skin color. The fruit skin color of all F 1 individuals was darker green than G35 and lighter green than Q51, but it inclined more towards dark green (Fig. 1a). Pigment content analysis showed that chlorophyll a and chlorophyll b contents were signi cantly lower in G35 than in Q51 (Fig. 1b). These results indicated that fruit skin color was determined by chlorophyll content.
Identi cation of a major QTL locus, CsFS1, on chromosome 3 by BSA-seq and GWAS To rapidly identify loci for skin color in the F 2 population, two bulks consisting of 20 dark-green (SL-pool) and 20 light-green (QL-pool) progenies were sequenced on the Illumina platform. A total of 12.9 Gb of raw reads were generated, with an average depth of approximately 20.4×. The short reads were aligned to the cucumber reference genome [19], and 145,804 SNPs were identi ed between the dark-green and lightgreen parents. Based on the SNP-indices of the QL-and SL-pools, the ∆(SNP-index) of a genomic region from 36.62 Mb to 39.77 Mb on chromosome 3 was greater than the threshold value and close to 1.00 ( Fig. 2a). This region may therefore harbor a major QTL for the fruit skin color trait in cucumber.
To independently con rm that this region was indeed related to fruit skin color, GWAS was performed on 289 cucumber accessions (average depth of 19.73× and 98.27% coverage of the cucumber reference genome) [19]. A total of 2,352,638 SNPs were identi ed using GATK software with default parameters [20]. To reduce the incidence of false-positive signals, a high-resolution variation map of 399,352 SNPs with minor allele frequency >5% and missing rate <0.2% was generated and used for genome-wide association analysis of fruit skin color with a uni ed mixed linear model that controlled for population structure and familial relatedness. A Manhattan plot for cucumber fruit skin color showed the strongest association signal (SNP fs ) on the distal arm of chromosome 3, overlapping with the genomic region identi ed by QTL-seq (Fig. 2b). This indicated that a major QTL controlling fruit skin color resided on the distal arm of chromosome 3, and it was named CsFS1 (Fruit skin 1).
Fine mapping narrowed down CsFS1 to a 94-kb interval and used for genotypic analysis of the F 2 segregating population (Table S3). QTL analysis using an MQM showed that the LOD peak from 64.85 to 69.05 cM was consistent with the physical distance from 39.0 to 39.77 Mb on chromosome 3 (Fig. 3a). In this interval, the highest LOD marker explained 35.6% of the phenotypic variation in the F 2 segregating population (Table S1). The genomic interval of CsFS1 was further narrowed down to between two SNP markers (39,531,980 and 39,626,163 bp) using four recombinant individuals from the F 2 and BC 4 F 2 populations (Fig. 3b). We therefore con rmed that the CsFS1 locus lay within a 94-kb interval on chromosome 3.

Identi cation of a candidate gene related to fruit skin color
According to the cucumber genome database (http://www.icugi.org/), 12 of the 15 predicted proteincoding genes in the 94-kb interval have functional annotations (Table S2). qPCR experiments were performed to investigate the expression patterns of three possible candidate genes associated with fruit skin traits in NIL-1334 (light-green) and NIL-1325 (dark-green). In the pericarp, only the expression of Csa3G912920 differed signi cantly between NIL-1334 and NIL-1325 (P < 0.05) (Fig. 3c, Fig. S2). The Csa3G912920 gene encodes a plant GATA transcription factor and has a conserved zinc nger domain. A phylogenetic tree and sequence alignment showed that Csa3G912920 homologs from melon (MELO3C003335), watermelon (Cla97C09G175500), and wax gourd (Bhi05M000420), highlighted in the gray-shadowed box, all encode GATA transcription factors ( Fig. 4a and b). Secondary structural element analysis showed that the zinc nger domains include four -folds and one -spiral by looking up the literature (Fig. 4b). Csa3G912920 was designated as a candidate gene for CsFS1.
Previous studies have shown that Arabidopsis GNC (GATA NITRATE-INDUCIBLE CARBON-METABOLISM-INVOLVED) and CGA1 (CYTOKININ-RESPONSIVE GATA1), members of the GATA transcription factor family, play a major role in the regulation of chlorophyll synthesis [21]. Under light, overexpression of GNC promotes chloroplast development and the production of chlorophyll in roots [22]. We therefore inferred that Csa3G912920 is the probable candidate gene for CsFS1 and named it CsGATA1.

Discussion
In this study, we combined QTL-seq [23] of an F 2 segregating population with GWAS to identify a major QTL CsFS1 for fruit skin color in cucumber. The main advantage of QTL-seq is that there is no need to develop DNA markers and marker genotyping. The SNP available between parental strains is such a marker, reducing cost and time. In addition, the use of SNP-index allows accurate assessment of the frequency of parental alleles. These advantages make QTL-seq an attractive method to quickly identify genomic regions containing major QTLs.
Fruit skin color is an essential agronomic trait in cucumber that affects exterior quality and consumer preferences. In this study, we detected the major QTL CsFS1 on chromosome 3 between 39,531,980 and 39,626,163 bp. Previously, the w locus controlling the white fruit skin trait was also mapped to chromosome 3 (Liu et al. 2016), residing 281 kb upstream of the CsFS1 locus. In the w locus, Csa3G904140 (APRR2) harbors a single-nucleotide insertion that causes a frameshift mutation and a truncated protein in the white cucumber. Here, we found no sequence differences in APRR2 between the two parents, G35 and Q51. Therefore, CsFS1 is a novel QTL that controls green fruit skin in cucumber.
Through classical genetic mapping, CsFS1 was narrowed to a 94-kb physical interval that contains 15 predicted protein-coding genes. The Csa3G912920 gene encodes a GATA-type transcription factor, and its expression differed signi cantly between near isogenic lines with light-and dark-green fruit skins.
Previous studies have shown that the GATA transcription factor families are highly conserved in Arabidopsis, rice, and other plants [24]. The GATA transcription factors are evolutionarily conserved transcriptional regulators that recognize promoter elements with a G-A-T-A core sequence [25]. The paralogous LLM-domain B-GATA transcription factors GNC and GNL contribute to chlorophyll biosynthesis and chloroplast formation in light-grown Arabidopsis seedlings [21,26,27]. Together, GNC and GNL control germination, greening, owering time, and senescence downstream of auxin, cytokinin, gibberellin, and light signaling [28]. Studies have con rmed that some GATA genes are preferentially expressed in the leaf [29]. Leaves are the main organs for photosynthesis and light stress response in plants. High expression of a GATA transcription factor in leaves is consistent with its in uence on chlorophyll synthesis. Therefore, it is reasonable to suggest that Csa3G912920 is the candidate gene for fruit skin color in cucumber. Nonetheless, additional experiments are required to provide evidence for Csa3G912920 gene function and robustly evaluate this hypothesis.
In conclusion, we identi ed a novel QTL, CsFS1, that controls green fruit skin color in cucumber and proposed a candidate gene, Csa3G912920, that may be responsible for the green color phenotype. Our results provide insight into the biological and molecular mechanisms of fruit skin color formation and can promote the development of attractive cucumber varieties with enhanced nutrients in the future.

Plant materials and phenotype evaluation
Two cucumber inbred lines, G35 (light-green skin color) and Q51 (dark-green skin color), were crossed to create F 1 progeny and then self-pollinated to generate an F 2 population. The F 1 progeny was backcrossed four times to the recurrent inbred parent G35 and then self-crossed to yield the BC 4 F 2 generation.
Chlorophyll a and chlorophyll b were extracted from fruit skins of G35, Q51, and F 1 progeny with ethyl alcohol and quanti ed by a spectrophotometric method. Two parental lines, together with the F 1 and F 2 generations, were used to describe and validate the inheritance pattern of skin color traits in immature fruit. Twenty

BSA-seq
Two DNA pools, the light-green pool (QL-pool) and dark-green pool (SL-pool), were created by mixing equal amounts of DNA from 20 individuals with light-green fruit skins and 20 individuals with dark-green fruit skins, respectively. Paired-end sequencing libraries (150-bp read length) with insert sizes of approximately 400 bp were prepared for sequencing on the Illumina NovaSeq 6000 platform. The short reads from the two pools were aligned to the reference genome of the 9930 line [19] using BWA software with default parameters [31]. SNP-calling was performed using SAMtools and BCFtools [31]. Low-quality SNPs with base quality value < 30, read depth < 2×, and mapping quality value < 30 were excluded to minimize false positives caused by repetitive genomic sequence or sequencing and alignment errors.
Two parameters, SNP-index and ∆(SNP-index) [23], were calculated to identify candidate regions for fruit skin color QTLs. SNP-index is the proportion of reads covering a given SNP that differ from the reference sequence. Thus, SNP-index = 0 if all short reads covering a given nucleotide position contain the reference SNP (9930 line), whereas SNP-index = 1 if all the short reads at that position contain the mutant SNP. ∆(SNP-index) is obtained by subtracting the SNP-index of the QL-pool from that of the SL-pool. The average SNP-index at a given genomic interval was calculated using a sliding window with a 1-Mb window size and a 10-kb increment. SNP-index graphs for the QL-pool and SL-pool, as well as the corresponding ∆(SNP-index), were plotted. The ∆(SNP-index) should not differ signi cantly from 0 in a genomic region with no major QTL [23]. We used a R script simulation to generate con dence intervals around the SNP-index under the null hypothesis of no QTL. First, we created two pools of progeny with a given number of individuals by random sampling. From each pool, a given number of alleles were sampled, corresponding to the read depth. Second, the SNP-index for each pool and the Δ(SNP-index) were calculated, and the process was iterated 10,000 times for each read depth to generate con dence intervals. Finally, these intervals were plotted for all genomic regions with variable read depths.

GWAS
Re-sequencing data from 289 cucumber accessions were obtained, with an average genome coverage of 98.27% and an average sequencing depth of 19.728×. We obtained 2,352,638 SNPs, and 399,352 highquality SNPs were retained, with a deletion rate of less than 0.2. The association between fruit skin color and each SNP was tested using a uni ed mixed model [32,33] that includes principal components [34] as a xed effect to account for the population structure and kinship matrix [35] and to explain familial relatedness. Using the Bayesian information criterion, a backward elimination procedure was implemented to determine the optimal number of principal components to include in the mixed model [36]. The false discovery rate was controlled at 5% using the Benjamini and Hochberg procedure [37]. A likelihood ratio-based r 2 statistic was used to assess the goodness-of-t of each SNP [38]. All analyses were performed using the Genome Association and Prediction Integrated Tool (GAPIT) package [39].

Marker development and QTL analysis
The SNPs were ltered from the re-sequencing data of the two parents, G35 and Q51. The sequence context of the candidate SNPs was examined in the 9930 reference genome using BLAST alignment to obtain longer sequences for marker development. In total, 35 kompetitive allele speci c PCR (KASP) SNP markers on chromosome 3 were developed using the BSA-seq and GWAS data and created using Primer 5.0 (PREMIER Biosoft International, USA) ( Table S3). The genotypes of the F 2 population were analyzed using an In nite M1000 microplate reader (Tecan, Switzerland) and the online tool "snpdecoder" (http://www.snpway.com/snpdecoder/). Linkage analysis was performed using JoinMap 4.0 [40], and QTL analysis was performed in MapQTL6.0 using the multiple QTL model (MQM mapping) procedure [41] Competing interests  Identi cation of overlapping intervals identi ed by BSA-seq and GWAS for fruit skin color in cucumber.
a ∆(SNP-index) plot with statistical con dence intervals under the null hypothesis of no QTL (red, P < 0.01). The candidate QTL (CsFS1) location was identi ed between 36.62 and 39.77 Mb on chromosome 3. b GWAS analysis (Manhattan plots) showed a signi cant peak (SNP fs ) above the threshold on chromosome 3 within the region previously identi ed in the QTL-seq analysis.

Figure 3
Fine mapping of CsFS1 on chromosome 3.
a LOD (log 10 of the odds ratio) plots of linkage analysis based on SNP markers indicate the most likely position of CsFS1 between markers SNP39009359 and SNP39775194 on chromosome 3. b Mapping of the CsFS1 region using three recombinants with extremely light-green fruit skin color identi ed from 278 plants in the F 2 and BC 4 F 2 populations. CsFS1 was placed within a 94-kb interval containing 15 candidate genes between the markers SNP39531980 and SNP39626163. c Relative expression of three candidate genes in the fruit pericarp of the light-green near isogenic line NIL-1334 and the dark-green near isogenic line NIL-13250 at 0 days post-anthesis (DPA). The relative expression is shown as the mean ± standard deviation, and statistical signi cance was determined using Student's t-tests (*P < 0.05).

Figure 4
Page 18/18 Phylogenetic tree and structure identity of Csa3G912920 and its homologs in different species.