Genetic characteristics of 106 bitter gourd accessions
A total of 47 M. charantia accessions were collected from China, with one exception. The resequencing of these accessions produced 222.90 Gb of clean data, corresponding to a genomic coverage ranging from approximately 13.12× (KG19) to 19.07× (KG21) (Table S1). For comparative purposes with individuals from distinct geographical regions, an additional 60 M. charantia accessions were downloaded from the NCBI database. Therefore, this study included a total of 66 accessions from Southeast Asia (SEA), 24 from South Asia (SA), 16 from Taiwan (TAI) and Thailand (THAI), along with one outgroup (Fig. 1). By mapping the clean reads onto the OHB3-1 reference genome, this study identified a final set of 3,335,160 high-quality SNPs based on the resequencing of 106 accessions.
To infer the genetic structure of the 106 bitter gourd accessions, ADMIXTURE analysis, phylogenetic tree construction, and principal component analysis (PCA) were employed. Interestingly, our bitter gourd accessions formed groups distinct from the previously published TAI, THAI, SA, and SEA (Matsumura et al. 2020). The 106 accessions in this study clustered into four distinct groups: wild type from TAI and THAI, SA, SEA1, and SEA2 (Fig. 2A, B, Table S1). The SEA group initially diverged from the others when K = 2, subsequently, the SA group separated from the wild type when K = 3. When K = 4, SEA split into SEA1 and SEA2, reported for the first time here. Accordingly, the ADMIXTURE models with K = 4 yielded the lowest cross-validation errors (Fig. 2A). The neighbor-joining (NJ) tree and PCA were also largely consistent with the ADMIXTURE analysis (Figs. 2C, D). PC1 and PC2 explained 70.46% and 8.40% of the variation, respectively. Eight individuals were found to be admixtures among wild, SA and SEA for max Q < 0.70, while 15 were admixtures between SEA1 and SEA2 (Figs. 2B, C, and D, Table S1).
Genome-wide nucleotide diversity (π) for each group was calculated using 3,335,160 SNPs. The wild group displayed a significantly higher π (2.61×10-3) than the total cultivated bitter gourd groups (0.9×10-3). Among the cultivated groups, the SA group exhibited the highest π (1.28×10-3), while the SEA2 group showed the lowest (0.37×10-3) (Fig. S1A). Additionally, the π was relatively consistent across the 11 chromosomes in the wild group, SEA1, and SEA2, except for Chr4 of the SA group, which exhibited higher diversity (Fig. S1B). Evaluations of the fixation index value (Fst) revealed that SEA1 and SEA2 displayed higher differentiation from the wild group than the SA group (Table S2). SEA1 only weakly differentiated from SEA2, while differentiation between SEA and the SA group was moderate.
Genome-wide association of pericarp color in bitter gourd
Significant variation in pericarp color was observed among the 106 bitter gourd accessions from different genetic groups. The wild group typically showed green or dark green pericarp, whereas the cultivated group exhibited a wide array of green shades (Table S1). Specifically, 83.33% of individuals from the SA group displayed medium to dark green pericarp, while 71.21% of individuals from the SEA group exhibited light green or white pericarp (Fig. 3B).
A genome-wide association study (GWAS) for pericarp color was conducted using the SNP variations derived from the resequencing of 106 accessions. The GWAS analysis revealed a single peak across the 11 chromosomes (Fig. 3A). We identified 41 significantly associated SNPs in Chr6 for pericarp color at the threshold of –log(p) = 10, thereby narrowing down the region to 17.3 kb (Fig. 3AB, Table S3). The quantile-quantile plot demonstrated a deviation of observed values from the expected X=Y line, verifying the reliability of the identified significant association SNPs (Fig. S2). Within the candidate region, only one gene, designated as evm.TU.chr6.3557, was located in the OHB3-1 V2 genome. This gene (McAPRR2) encodes a two-component response regulator-like protein APRR2, which has previously been reported to be involved in regulating pericarp color in Cucurbitaceae (Liu et al. 2016; Ma et al. 2021; Oren et al. 2019; Zhao et al. 2019).
The haplotype of McAPRR2
Seven SNPs on the promoter of McAPRR2, one nonsynonymous mutation on exon9, and one InDel on exon10 were found to be significantly associated with pericarp color (–log(p)>10) (Fig. 3C). Linkage disequilibrium (LD) analysis of McAPRR2 revealed a strong LD block between InDel4422 and other significant variants (Fig. 3C). Using these significant variants, a haplotype analysis was conducted on 106 bitter gourd accessions. After excluding missing data and heterozygous loci, haplotypes were finally assigned for 86 bitter gourd accessions. These 86 accessions were divided into three haplotypes, namely Hap1, Hap2, and Hap3 (Fig. 3D). Hap1, consisting 14 accessions, exclusively exhibited a green pericarp phenotype and originated entirely from the wild group. Hap2, which included 28 accessions, presented a uniformly green pericarp phenotype, with 75% of samples from the SA group and 25% from the SEA group. Hap3 contained 44 accessions, 95.45% of which displayed a light green pericarp phenotype, primarily originating from the SEA group. Interestingly, one wild type in Hap3 may be the original source of Hap3. Taken together, these findings suggest that the McAPRR2 haplotype is strongly correlated with pericarp color and geographical location.
The expression pattern of McAPRR2 and pigment content in pericarp color
The expression pattern of McAPRR2 and pigment content of the three haplotypes were analyzed. McAPRR2 expression of in Hap1 and Hap2 was higher than that in Hap3, with dark green pericarp showing significantly higher McAPRR2 expression compared to green and light green pericarp (Fig. 4A). Additionally, the samples used for analyzing haplotype expression were also used for pigment content assessment. The contents of chlorophyll a, chlorophyll b, and carotenoids in Hap1 were higher than those in Hap2 and Hap3, with Hap2 displaying higher pigment content than Hap3 (Fig. 4B). Dark green pericarp exhibited higher pigment content than the other pericarp types. The relationship between McAPRR2 expression and the levels of chlorophyll a, chlorophyll b, and carotenoids was analyzed, revealing a positive correlation with a coefficient exceeding 0.94 (Fig. 4C).
The 15 bp insertion in McAPRR2 alters its function
Sequencing of the McAPRR2 CDS from the three haplotypes confirmed the existence of SNP3742 and InDel4422 (Fig. 5AB). Notably, InDel4422, a 15 bp insertion (TCCTAACTGATAATC) in exon 10 of McAPRR2, leads to a premature termination in the protein encoded by Hap3, inducing a shift in pericarp color from green to light green.
The conserved domains of McAPRR2 were predicted using InterPro, identifying the Response Regulatory Domain and the Myb Domain. The Response Regulatory Domain is located before the Myb Domain (Fig. S3). InDel4422 is not located within either domain, thus its mutation does not impact the function of the two domains in McAPRR2. APRR2 has been previously reported to significantly influence the pericarp color of watermelon, cucumber, melon, and wax gourd (Oren et al. 2019; Liu et al. 2016; Ma et al. 2021). In this study, we conducted a multiple sequence alignment of the APRR2 protein sequence. The mutation sites of these APRR2 proteins are located after the Myb Domain, suggesting they do not affect the function of the two domains, with the exception of the mutation sites in white wax gourd, which are found before the Myb Domain (Fig. S3).
McAPRR2 was under domestication in bitter gourd
McAPRR2 haplotype is significantly correlated with both pericarp color and ecological type. Hap1 was strictly associated with wild types exhibiting green pericarp, while Hap2 corresponded to SA and SEA types with green pericarp. Hap3 was mainly related to SEA types, distinguished by a light green pericarp. The insertion of a 15 bp sequence in Hap3 altered the function of McAPRR2, leading to a modification in pericarp color. These findings suggest that McAPRR2 was potentially subjected to selection during domestication. To investigate whether McAPRR2 was subject to domestication selection, we analyzed the phylogenetic tree and nucleotide diversity of this gene. Using 74 SNPs located in the McAPRR2 gene region and its promoter, a phylogenetic tree was constructed. We observed that the tree generally clustered according to pericarp color and genetic group. Green pericarp types were clustered together, as were light green pericarp types. Wild types were clustered together, while SA and a subset of green pericarp SEA types formed another group, and SEA types were gathered in yet another cluster. The nucleotide diversity in the McAPRR2 gene showed that the πwild/SA value was the lowest, indicating low nucleotide diversity between these two groups on the McAPRR2 gene. High πwild/SEA and πSA/SEA values indicate high nucleotide diversity between wild and SEA types and between SA and SEA types on the McAPRR2 gene. The loss of genome-wide genetic diversity in modern crops is a typical characteristic of plant domestication. In conjunction with haplotype and pericarp color phenotypes, we concluded that pericarp color underwent domestication in the SEA region, leading to the formation of the light green pericarp phenotype.