Genome-wide Association Study and Possible Candidate Genes for Root Color and Carotenoid Contents in Japanese Orange Carrot F2 Populations

Carrot is a major source of provitamin A in a human diet. Two of the most important traits for carrot breeding are carotenoid contents and root color. To examine genomic regions related to these traits and develop DNA markers for carrot breeding, we performed a genome-wide association study (GWAS) using genome-wide single-nucleotide polymorphisms (SNPs) in two F2 populations, both derived from crosses of orange root carrots bred by a Japanese seed company. The GWAS revealed 21 signicant associations, and the physical position of some associations suggested two possible candidate genes. An Orange (Or) gene was a possible candidate for visual color evaluation and the α- and β-carotene contents. Sanger sequencing detected a new allele of Or with an SNP which caused a non-synonymous amino acid substitution. Genotypes of this SNP corresponded to the visual evaluation of root color in another breeding line. A chromoplast-specic lycopene β-cyclase (CYC-B) gene was a possible candidate for the β/α carotene ratio. On CYC-B, ve amino acid substitutions were detected between parental plants of the F2 population. The detected associations and SNPs on the possible candidate genes will contribute to carrot breeding and the understanding of carotenoid biosynthesis and accumulation in orange carrots. Values at the indicate the percentage consensus support as calculated using a bootstrapping test with 1,000 replications. Five amino acid substitutions ( ● were detected between (Fs001) and (Fs002) substitutions


Introduction
Carrot (Daucus carota L.), a major source of provitamin A carotenes in the human diet, is consumed worldwide [1]. Carrots accumulate abundant carotenoids in their taproots, and these carotenoids (which are responsible for the orange pigmentation in the carrot roots) are thought to provide health bene ts [2]. A variety of colors has been observed in carrot taproots, including orange, white, yellow, red, and purple.
Quantitative trait loci (QTL) analyses and association studies for carrots' root color and carotenoid contents have been performed in several populations, and important and useful QTLs have been reported [3][4][5][6]. These studies used populations derived from crosses between accessions showing clearly different root colors such as orange and white [3][4][5] and orange and dark orange [4], and other studies used inter-crossed populations derived from white, yellow, red, and orange carrots [6].
Carotenoid biosynthesis is well established, and a highly conserved carotenoid biosynthesis pathway has been characterized in many plant species (Fig. 1) [7][8][9]. In carrot, several carotenoid biosynthetic genes have been mapped [3], and the released carrot whole-genome sequences showed orthologous and homologous genes involved in the carotenoid biosynthesis pathway [10,11]. Several genes involved in carotenoid biosynthesis and accumulation in carrot have also been identi ed. An ortholog of carotene hydroxylase CYP97A3 in the carotenoid biosynthesis pathway has been identi ed in carrot; it controls the α-carotene, total carotenoid contents, and the α/β carotene ratio [12]. A candidate gene association study of the carotenoid biosynthesis pathway revealed associations between the total carotenoid and βcarotene contents and the genes zeaxanthin epoxidase (ZEP), phytoene desaturase (PDS), and carotenoid isomerase (CRTISO), between the α-carotene content and the genes CRTISO and plastid terminal oxidase (PTOX), and between color components and the gene ZEP [6].
It was also reported that not only genes in the carotenoid biosynthesis pathway but also genes that have other functions considerably affect carotenoid contents. Y and Y 2 loci account for most of the color differences of orange, yellow, and white carrot roots [13]. The Y gene has been identi ed, and this gene has been hypothesized to regulate photosystem development and functional processes, including photomorphogenesis and root de-etiolation [10]. The Y2 locus has been mapped to an approx. 650-kb genomic region; in addition, no annotated gene involved in the carotenoid biosynthesis pathway was located within the candidate region [14]. An Orange (Or) gene, which was rst identi ed in cauli ower and accounted for an abnormally elevated β-carotene accumulation [15], was identi ed in carrot and is associated with the presence of carotenoid in carrot [16]. However, the genes, polymorphisms, and genomic regions involved in carotenoid biosynthesis and the carotenoid accumulation that cause slight differences in root color and carotenoids are not fully understood, especially within the orange carrots.
In Japan, consumers prefer a bright orange root color for carrots, and a cultivar showing uniform root colors is popular. There are accessions showing slight color differences in bright orange roots, and breeders in Japan have selected the best 'bright orange' and uniform color among the accessions that have bright orange roots. DNA markers that can be used to distinguish slight differences within bright orange color have thus been sought in Japanese carrot breeding. Toward this goal, there has been no study using populations derived from a cross between orange root carrots with slight color differences, but the recent release of whole genome sequences of carrot has made it easier to analyze whole-genome constitutions with high marker density and to conduct association analyses, even in the populations derived from genetically close orange carrots [10,11].
In the present study, we developed two F 2 populations that have a common parent. Both populations were derived from crosses between orange-root parents. We performed a genome-wide association study (GWAS) to investigate the genomic regions that cause slight but important differences in the root color and carotenoid contents within carrots with orange root color.

Plant materials
We developed two F 2 populations (A and B) using orange-colored carrot plants bred by a Japanese seed company, Fujii Seed (Osaka, Japan). Population A was derived from a cross between Fs001 and Fs002, and population B was derived from a cross between Fs002 and Fs003 (Fig. 2). Fs002 was the pollen parent for F 2 population A and the seed parent for F 2 population B. Plants of F 2 populations A (n=146) and B (n=136) were cultivated from mid-February to early June 2018 in a natural eld at Narashino, Chiba, Japan, and used for DNA extraction and the visual evaluation of root colors. Roots of population A were also used for the quanti cation of carotenoid content by high-performance liquid chromatography (HPLC) and the measurement of color components.
To examine a developed DNA marker on Or gene, we also used breeding line C, which was bred by Fujii Seed. This line was developed by using Fs002 as one of the breeding materials (Fig. 2). Breeding line C was cultivated from the end of March to early July 2017 in a natural eld at Oirase, Aomori, Japan, and 40 plants were used for DNA extraction and the visual evaluation of root colors.
Experimental research and eld studies on plant materials comply with relevant institutional, national, and international guidelines and legislation.

Visual evaluation of root colors and evaluation of color components
The visual evaluation of root colors was performed by two experienced breeders at Fujii Seed. The root colors were visually evaluated to ten grades of orange darkness in F 2 population A, and to seven grades in F 2 population B, and to three grades in breeding line C. In the F 2 population A, color components (L*, a* and b*) were measured with a spectrocolorimeter (model CM2600d, Minolta, Tokyo) equipped with a 5mm measuring area. The surface of the middle part of washed carrot root was measured three times, and the average values were used for phenotypic data.
Quanti cation of carotenoid contents (α-carotene, βcarotene, and lutein) by HPLC Carrot root surface, i.e., approx. 1-2 mm of outer parts from phloem in the middle of roots was cut and collected. The collected samples were immediately frozen in liquid nitrogen and stored at −80°C. Root surface was used for HPLC because the visual and color component evaluations were performed on the carrot root surface. The extraction for HPLC was performed as described [6] with a scale-down and some modi cations. Frozen samples were crushed into a powdery status with a tube mill control (S001, IKA, Staufen, Germany). Extraction was done on approx. 50 mg (50 mg ± 5%) of crushed frozen material to which 50 µL of b-apo-8'-carotenal at 5 µg/mL was rst added as an internal standard. Samples were mixed with 600 µL of MgCO 3 0.57%, 3,5-di-tert-butyl-4-hydroxytoluene (BHT) 0.1% in methanol, then vortexed, and mixed with 600 µL of 0.1% BHT-containing chloroform. After 10 times of vertical mixing and incubation for 15 min in darkness at 4°C, 600 µL of ultrapure water was added, and samples were centrifuged at 236 g for 10 min. Next, 400 µL from the lower layer was concentrated under vacuum evaporation, and the dry extract was dissolved in 50 µL of acetone containing 0.1% BHT. Samples were kept at 4°C and protected from direct light during the entire procedure.
The carotenoid quanti cation was done on an Ultimate 3000 HPLC system coupled with a diode array detector (Thermo Fisher Scienti c, Waltham, MA, USA) according to the manufacturer's instruction with slight modi cations. Carotenoids were separated on an Acclaim C30 column (150 × 2.1 mm, 3 µm, Thermo Fisher Scienti c). The mobile phases were acetonitrile as eluent A, methanol/acetic ether (1:1, v/v) as eluent B, and 10 mM formic acid (pH 3.0) as eluent C. The elution program was as follows: the proportions of solvent A, B and C were 85% A, 14.5% B, and 0.5% C at 0-2 min; 85%-44.5% A, 14.5%-55% B, and 0.5% C at 2-7 min; 44.5% A, 55% B, and 0.5% C at 7-21 min; and returned to the initial conditions (85% A, 14.5% B, and 0.5% C) at 21.1-28.5 min. The ow rate was 0.4 mL/min. The injection volume of the ltered sample by a 0.22-µm PTFE membrane lter was 3.9 µL. Analytes were detected by a photodiode array detector at 450 nm. The data were analyzed using Chromeleon 7 software (Thermo Fisher Scienti c) based on internal calibration using b-apo-8'-carotenal and the extraction yield.
Double-digest restriction site-associated DNA sequencing (ddRAD-seq) and GWAS Total genomic DNA was extracted from young leaves of carrot plants with the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). A double-digest restriction site-associated DNA sequencing (ddRAD-seq) analysis was performed as described [17] with the restriction enzymes PstI and MspI. The ddRAD-seq libraries were constructed and sequenced on a HiSeq 4000 platform (Illumina, San Diego, CA) in pairedend 101-nucleiotide (nt) mode as described [17]. Primary data processing such as deleting low-quality bases and trimming adapters, mapping onto carrot genome Daucus carota v2.0 [10], and ltering singlenucleotide polymorphisms (SNPs) to obtain high-con dence SNPs were performed as described [18]. The association analysis between the phenotype data and the genotype data was performed using the generalized linear model (GLM) of trait analysis by association, evolution, and linkage (TASSEL) ver. [19].

Sanger sequencing of candidate genes
For the comparison of the genomic sequences of possible candidate genes between parental plants in F 2 populations A and B, we performed Sanger sequencing from the start codon to the stop codon on the genes. The primers used in the Sanger sequencing are listed in Supplementary Table S1. SNP genotyping with KASP marker KASP marker, which genotypes an SNP on Or gene in this study, was developed and performed according to the manufacturer's instructions (Biosearch Technologies, Novato, CA).

Results
GWAS for the visual evaluation of root color and evaluations of color components and carotene contents in roots of F 2 populations A and B F 2 populations A and B both showed a normal distribution in all root color evaluations (Suppl. Fig. S1), suggesting the involvement of multiple associations in carrot root color. The ddRAD-seq analysis detected 3,106 and 1,901 high-con dence SNPs in F 2 populations A and B, respectively. The GWASs were performed using these genotypic data and values from the visual evaluation and evaluations of the color components and carotene contents in the carrot roots. In F 2 population A, signi cant associations were detected for the visual evaluation of root color (Fig. 3a); color components a* (Fig. 3c) and b* (Fig. 3d); αcarotene (Fig. 3e), β-carotene (Fig. 3f), and lutein contents (Fig. 3g); and the β/α-carotene ratio (Fig. 3h) in root (Table 1). No signi cant associations were detected for color component L* (Fig. 3b).
The associations for visual evaluation, color components a* and b*, and α-and β-carotene contents on chromosome 1 were detected at close physical positions, and the highest associations were detected at a physical position around 31 Mb (Fig. 3, Table 1), suggesting that these associations are caused by an identical locus. The physical positions of the associations for the α-and β-carotene contents on chromosome 3 were close, and the highest associations were detected at a physical position around 6 Mb ( Fig. 3, Table 1). An association detected in population B for visual color evaluation on chromosome 3 showed the highest association at physical position 5.4 Mb, and this physical position was similar to those of the associations detected in population A for α-and β-carotene contents (Figs. 3, 4, Table 1). These results suggest that the associations are caused by an identical locus. Interestingly, the association detected on chromosome 5 (showing the highest association for visual evaluation in F 2 population A) was not detected in any other evaluations (Fig. 3, Table 1). Correlations among visual evaluation, color components, and carotene contents in root of F 2 population A The Pearson correlation between each phenotype showed that three color components, i.e., L*, a* and b*, the α-carotene content, and the β-carotene content were highly correlated ( Table 2). The lutein content was slightly correlated with L*, a* and b* and highly correlated with the α-carotene content. As lutein is biosynthesized downstream of the α-carotene (Fig. 1), this high correlation of lutein and α-carotene is consistent with the biosynthesis pathway. The visual evaluation was not highly correlated with any other phenotypes.
Allelic effects of associations detected by GWAS on chromosomes 1 and 3 for the α-carotene and β-carotene contents in F 2 population A We examined the allelic effects of the associations detected by the GWAS for the α-and β-carotene contents. At the median, the carrots with AA allele on the SNP showing the highest association for αcarotene (DCARV2_CHR1_30704558) had approx. 1.5-fold higher contents of α-and β-carotene than those with GG allele (Fig. 5a, b). Similarly, at the median, the carrots with GG allele on the SNP showing the highest association for α-carotene (DCARV2_CHR3_5849853) had approx. 1.3-fold higher contents of α-carotene and approx. 1.2-fold higher contents of β-carotene compared to those with AA allele (Fig. 5c,  d). A clear genetic interaction such as epistasis was not observed between the associations detected on chromosomes 1 and 3 (Suppl. Fig. S2). Together with both associations detected on chromosomes 1 and 3, at the median, the carrots that had alleles showing higher carotenoid content in both associations also had approx. 2.6-fold higher α-carotene and approx. 1.8-fold higher β-carotene contents in carrot surface compared to those with alleles showing lower carotenoid contents in both associations (Suppl. Fig. S2).

Possible candidate gene for the association detected on chromosome 3 by the GWAS and sequence comparison between parents in F 2 populations A and B
Signi cant associations were detected around the physical position at 5-6 Mb on chromosome 3 for αcarotene and β-carotene contents in F 2 population A and for visual evaluation in F 2 population B (Figs. 3,   4, Table 1). Within this region, the reported Or gene (DCAR_009172), which affects carotenoid contents in carrot [16], is located at 5.2 Mb on chromosome 3. To examine the involvement of Or, we performed Sanger sequencing of Or in the parents of populations A and B. The Sanger sequencing detected only one non-synonymous amino acid substitution at the fourth amino acid from the end, which was caused by an SNP between both parents of F 2 populations A and B (Fig. 6a). A thymine which was identical to that in the carrot reference genome [10] in Fs001 and Fs003 was changed to guanine in Fs002, which resulted in a change from Tyr309 in the Fs001 and Fs003 to aspartic acid in the Fs002.
To examine the effect of the SNP on Or causing the non-synonymous amino acid substitution, we developed a KASP marker which could genotype the SNP. We applied the developed KASP marker to breeding line C whose root color was segregated and that is the progeny of Fs002 (Fig. 2). The root color of breeding line C was visually evaluated into three grades (Fig. 6c). The genotype of KASP marker on Or was clearly correlated with the visual evaluation (Fig. 6b). All of the carrots whose root color was bright and middle orange had a heterozygote for the SNP on Or, and all of the carrots whose root color was slightly light orange had a TT homozygote for the SNP. We thus speculate that the associations detected in F 2 population A for α-carotene and β-carotene contents on chromosome 3 and the association detected in F 2 population B for visual evaluation were responsible for the SNP causing the non-synonymous amino acid substitution on Or.
Possible candidate gene for the β/α-carotene ratio in population A, and the amino acid comparison between parents of population A In the GWAS of F 2 population A, the association for the β/α-carotene ratio was detected on chromosome 6 and showed the highest association on the physical position at around 4.6 Mb. Iorizzo et al. [10] summarized the carrot orthologous and homologous candidate genes involved in the plastid 2-C-methyl-D-erythritol 4-phosphate (MEP) and carotenoid pathways in a table. According to the table, DCAR_022896 (which has a lycopene cyclase domain) is located on a physical position at 4.1 Mb on chromosome 6, which is between the SNP showing the highest association for the β/α-carotene ratio and the next SNP (Suppl . Table S2). Carotenoid biosynthesis bifurcates after lycopene to produce e-and β-carotenoids by enzymatic activity of the two lycopene cyclases, lycopene e-cyclase (LCYE) and lycopene β-cyclase (LCYB) [28] (Fig. 1). In addition, it is known that the proportions of β-carotene and α-carotene are determined mostly by the comparative amounts and/or activities of the LCYB and LCYE enzymes [20,[29][30][31][32].
Our phylogenic analysis of DCAR_022896, LCYE, LCYB, NSY, CCS, and CYC-B in carrot and Arabidopsis as well as Solanum lycopersicum, Carica papaya, Citrus sinensis, Capsicum annuum, and Lillium lancifolium showed that DCAR_022896 belonged to the same clade as CYC-B in C. sinensis and C. papaya (Fig. 7a). At the amino acid level, DCAR_022896 had 76.9% identity to CYC-B in C. sinensis and 62.1% to CYC-B in C. papaya. CYC-B is a LCYB, and it converts lycopene to β-carotene in chromoplasts, where carotenoids are accumulated [35,36], in a speci c manner [20] (Fig. 1). Moreover, our BLAST search of primers for the reported LCYB2 in carrot showed that CYC-B (DCAR_022896) in the present study is identical to LCYB2 [5,6,37,38]. We thus presume that DCAR_022896 is a possible candidate gene for the β/α-carotene ratio, and we compared the amino acid sequences between the parents of F 2 population A by Sanger sequencing. The amino acid comparison revealed ve amino acid substitutions between the parents of F 2 population A (Fig. 7b). These results suggested the possibility of the involvement of CYC-B in the β/α-carotene ratio in carrot root.

Discussion
Our GWAS using the two F 2 populations derived from orange root carrots detected 21 associations for visual color evaluation, color component a* and b*, α-and β-carotene content, lutein content, and the β/αcarotene ratio (Figs. 3, 4, Table 1). Some associations were detected on close physical positions for several evaluations of root color. However, interestingly, associations for visual evaluation in F 2 population A on chromosomes 4 and 5 were not detected for any other phenotypes. The Pearson correlation also showed no high correlation between visual color evaluation and other phenotypes ( Table  2). These results suggest that we could not evaluate carrot root colors as same as experienced breeders by using spectrocolorimetry and HPLC. Experienced breeders evaluate root color comprehensively including the gloss and texture of the carrot surface, and thus the detected associations only for visual evaluation might be associated for these phenotypes. On the other hand, the associations detected herein on chromosome 1 were signi cant for visual evaluation, color components a* and b*, and the α-and β-carotene contents (Fig. 3, Table 1). However, there are no annotated genes for MEP and carotenoid pathways within 5 Mb from the physical position of 31 Mb on chromosome 1 where the highest association was detected [10]. Similarly, a highly signi cant association was detected in F 2 population A for lutein content on chromosome 5 (Fig. 3, Table   1), whereas there are no predicted genes annotated for MEP and carotenoid pathways [10] or for chromatin-modifying histone methyltransferase, SDG8 (CCR1), which affects the lutein content in leaves [39] around this locus except for neoxanthin synthase (NSY). The NSY gene is located approx. 1.1 Mb away from the physical position of the highest association for lutein content. NSY has a role downstream of another branch which does not include lutein in carotenoid biosynthesis (Fig. 1), and no feedback regulation between NSY and lutein content has been reported. However, we cannot exclude the possibility that the mutation of the NSY of another branch affects the ow rate of each branch, resulting in an effect on the lutein content. Further analyses such as a map-based strategy is necessary to narrow down the candidate regions and identify candidate genes causing the associations detected on chromosome 1 for visual evaluation, color components a* and b*, and α-and β-carotene contents and chromosome 5 for lutein content, and for the other signi cant associations revealed in this study.
The GWAS in F 2 population A for the α-and β-carotene content and in F 2 population B for visual evaluation detected the association at a similar physical position around 5-6 Mb on chromosome 3 (Figs. 3, 4, Table 1), and the previously reported Or was located on a similar physical position (5.2 Mb). The Or gene is involved in carotenoid accumulation via chromoplast development and biosynthesis via 15-cis-phytoene synthase (PSY) expression, and it affects the total carotenoid content [40,41]. Carrot Or was recently identi ed and is associated with the carotenoid presence in carrot root [16]. The similarity of physical positions and function of Or suggests that these associations were caused by Or; in addition, an SNP causing a non-synonymous mutation was detected in the present study between the parents of F 2 populations A and B by Sanger sequencing (Fig. 6a). Iorizzo et al. [10] re-sequenced 35 carrot accessions and released SNPs on Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html). In the released SNPs, there are many SNPs on the Or gene, including three SNPs causing a non-synonymous mutation. However, the SNP that causes the non-synonymous mutation detected in Fs002 is not included, suggesting that this is a new allele of Or and that a distribution of the SNP detected in this study might be limited. Further analyses of the distribution of this SNP are needed. As breeding line C was derived from Fs002 (Fig. 2), a bright orange allele would be derived from the Fs002. Peaks of the Manhattan plots of this region were also observed for visual color evaluation (Fig. 3a), color components L* (Fig. 3b) and b* (Fig. 3d), and lutein content (Fig. 3g). Although their associations were not signi cant, these peaks might be caused by Or.
The GWAS in F 2 population A revealed the association for the β/α-carotene ratio on chromosome 6; CYC-B is located on the associated region (Fig. 3h, Suppl. Table S2). In several model plants such as Arabidopsis thaliana [42], rice (Oryza sativa) [43], and maize (Zea mays) [29,44], LCYB is encoded by a single gene. However, LCYB is encoded by two genes in some plant species that accumulate high levels of carotenoids in non-photosynthetic organs, such as fruits and owers [38]. These genes are differentially expressed in photosynthetic and non-photosynthetic organs, and genes that are expressed in non-photosynthetic organs were named CYC-B. As named, CYC-B is a chromoplast-speci c lycopene βcyclase.
Carrot has two LCYBs: LCYB1 and LCYB2 [3]. Our present phylogenetic analysis demonstrated that carrot LCYB1 (LCYB) and LCYB2 (CYC-B) belong to LCYB and CYC-B clades, respectively (Fig. 7a). The CYC-B was rst reported in tomato (Solanum lycopersicum) as a fruit-and ower-speci c lycopene β-cyclase using two mutations named Beta and old-gold. Beta increases the β-carotene content in fruit, and oldgold, a null mutation of CYC-B, abolishes β-carotene and increases lycopene contents in fruits and caused tawny orange owers [20]. The CYC-B has also been reported to be responsible for fruit color in papaya (Carica papaya) [22] and citrus (Citrus sinensis) and for the involvement of the null allele in the high lycopene accumulation in red grapefruits [23]. In carrot, unlike plants that have organ-speci c LCYBs, LCYB1 is expressed in both leaves and root, and the transcript level of LCYB1 increases as the carotenoid content increases during root development [38,45]. Since the GWAS detected an association around the CYC-B region in this study, we speculate that in carotenoid-accumulating carrot root, in addition to the LCYB (LCYB1), CYC-B (LCYB2) would also have a role in carotenoid biosynthesis. To the best of our Visual appearance traits are important targets in carrot breeding in Japan, and the 'best bright orange color' is selected based on a comparison of minute color differences as shown in Figure 6c. The differences are actually di cult for non-specialists to detect, but the resultant selected cultivars attract consumers in Japan with their 'best bright orange color'. The present study provides the rst results of a GWAS analysis for carrot root color for the selection of bright orange color in orange root populations. The developed KASP marker on Or as well as the SNPs showing signi cant associations will contribute to orange carrot breeding.

Data availability
Nucleotide sequence data for the ddRADseq in F 2 population A and B is available in the DDBJ Sequence Read Archive under accession numbers from DRA012848 to DRA012853.

Figure 2
Lineage images of the plant materials, F2 populations A and B, and breeding line C. Fs002 was used as a common breeding material.