Phenotypic analysis of fruit skin color in cucumber
The inbred lines G35 (light-green cucumber) and Q51 (dark-green cucumber) were used as parents for fine mapping of fruit skin color. The fruit skin color of all F1 individuals was darker green than G35 and lighter green than Q51, but it inclined more towards dark green (Fig. 1a). Pigment content analysis showed that chlorophyll a and chlorophyll b contents were significantly lower in G35 than in Q51 (Fig. 1b). These results indicated that fruit skin color was determined by chlorophyll content.
Identification of a major QTL locus, CsFS1, on chromosome 3 by BSA-seq and GWAS
To rapidly identify loci for skin color in the F2 population, two bulks consisting of 20 dark-green (SL-pool) and 20 light-green (QL-pool) progenies were sequenced on the Illumina platform. A total of 12.9 Gb of raw reads were generated, with an average depth of approximately 20.4×. The short reads were aligned to the cucumber reference genome (Huang et al. 2009), and 145,804 SNPs were identified between the dark-green and light-green parents. Based on the SNP-indices of the QL- and SL-pools, the ∆(SNP-index) of a genomic region from 36.62 Mb to 39.77 Mb on chromosome 3 was greater than the threshold value and close to 1.00 (Fig. 2a). This region may therefore harbor a major QTL for the fruit skin color trait in cucumber.
To independently confirm that this region was indeed related to fruit skin color, GWAS was performed on 289 cucumber accessions (average depth of 19.73× and 98.27% coverage of the cucumber reference genome) (Huang et al. 2009). A total of 2,352,638 SNPs were identified using GATK software with default parameters (McKenna et al. 2010). To reduce the incidence of false-positive signals, a high-resolution variation map of 399,352 SNPs with minor allele frequency > 5% and missing rate < 0.2% was generated and used for genome-wide association analysis of fruit skin color with a unified mixed linear model that controlled for population structure and familial relatedness. A Manhattan plot for cucumber fruit skin color showed the strongest association signal (SNPfs) on the distal arm of chromosome 3, overlapping with the genomic region identified by QTL-seq (Fig. 2b). This indicated that a major QTL controlling fruit skin color resided on the distal arm of chromosome 3, and it was named CsFS1 (Fruit skin 1).
Fine mapping narrowed down CsFS1 to a 94-kb interval
To identify the candidate gene(s) in the CsFS1 locus, classical QTL analysis was performed using 278 F2 progenies. A total of 35 SNP markers were developed between 15.66 and 39.77 Mb on chromosome 3 and used for genotypic analysis of the F2 segregating population (Table S3). QTL analysis using an MQM showed that the LOD peak from 64.85 to 69.05 cM was consistent with the physical distance from 39.0 to 39.77 Mb on chromosome 3 (Fig. 3a). In this interval, the highest LOD marker explained 35.6% of the phenotypic variation in the F2 segregating population (Table S1). The genomic interval of CsFS1 was further narrowed down to between two SNP markers (39,531,980 and 39,626,163 bp) using four recombinant individuals from the F2 and BC4F2 populations (Fig. 3b). We therefore confirmed that the CsFS1 locus lay within a 94-kb interval on chromosome 3.
Identification of a candidate gene related to fruit skin color
According to the cucumber genome database (http://www.icugi.org/), 12 of the 15 predicted protein-coding genes in the 94-kb interval have functional annotations (Table S2). qPCR experiments were performed to investigate the expression patterns of three possible candidate genes associated with fruit skin traits in NIL-1334 (light-green) and NIL-1325 (dark-green). In the pericarp, only the expression of Csa3G912920 differed significantly between NIL-1334 and NIL-1325 (P < 0.05) (Fig. 3c, Fig. S2). The Csa3G912920 gene encodes a plant GATA transcription factor and has a conserved zinc finger domain. A phylogenetic tree and sequence alignment showed that Csa3G912920 homologs from melon (MELO3C003335), watermelon (Cla97C09G175500), and wax gourd (Bhi05M000420), highlighted in the gray-shadowed box, all encode GATA transcription factors (Fig. 4a and b). Secondary structural element analysis showed that the zinc finger domains include four 𝛽-folds and one 𝛼-spiral by looking up the literature (Fig. 4b). Csa3G912920 was designated as a candidate gene for CsFS1.
Previous studies have shown that Arabidopsis GNC (GATA NITRATE-INDUCIBLE CARBON-METABOLISM-INVOLVED) and CGA1 (CYTOKININ-RESPONSIVE GATA1), members of the GATA transcription factor family, play a major role in the regulation of chlorophyll synthesis (Chiang et al. 2012). Under light, overexpression of GNC promotes chloroplast development and the production of chlorophyll in roots (Richter et al. 2013). We therefore inferred that Csa3G912920 is the probable candidate gene for CsFS1 and named it CsGATA1.