Characteristics and inheritance of gynoecy
Gynoecious bitter gourd plants produce only female flowers on both the main stems and the lateral branches, although there is the occasional appearance of one or two male flowers at low nodes along the main stem (Fig. 1). Monoecious bitter gourd plants produce both female and male flowers along the main stems and lateral branches (Fig. 1). We evaluated each plant among the three sets of independent genetic materials. We found that all plants resulting from female S156G, K20-95, and K20-506 lines were gynoecious, and all plants from male K8-201, S060, and K20-507 lines were monoecious (Table 1). Like their male parents, the plants composing these F1 generations were monoecious (Table 1). Among the plants of the three F2 populations, 228 and 63 plants in the S156G×K8-201 F2 population were monoecious and gynoecious, respectively, 187 and 69 plants in the K20-95×S060 F2 population were monoecious and gynoecious, respectively, and 226 and 80 plants in the K20-506×K20-507 F2 population were monoecious and gynoecious, respectively (Table 1). Chi-square tests revealed that the recorded segregation in all three F2 populations fit the expected ratio of 3:1 (Table 1). Our findings suggested that gynoecy is a stable recessive trait controlled by a single gene named Mcgy1.
Table 1
Basic statistics of plant phenotypes based on field investigations
Generation
|
Monoecy
|
Gynoecy
|
Expected ratio
|
χ2
|
P
|
S156G
|
0
|
33
|
-
|
-
|
-
|
K8-201
|
30
|
0
|
-
|
-
|
-
|
S156G×K8-201 F1
|
30
|
0
|
-
|
-
|
-
|
S156G×K8-201 F2
|
228
|
63
|
3︰1
|
1.74
|
0.19
|
K20-95
|
0
|
35
|
|
|
|
S060
|
31
|
0
|
-
|
-
|
-
|
K20-95×S060 F1
|
30
|
0
|
-
|
-
|
-
|
K20-95×S060 F2
|
187
|
69
|
3︰1
|
0.52
|
0.47
|
K20-506
|
0
|
30
|
|
|
|
K20-507
|
33
|
0
|
-
|
-
|
-
|
K20-506×K20-507 F1
|
30
|
0
|
-
|
-
|
-
|
K20-506×K20-507 F2
|
226
|
80
|
3︰1
|
0.21
|
0.64
|
BSA-seq to rapidly identify the Mcgy1 locus
After filtering the raw data generated from the whole-genome sequencing of S156G, K8-201, the monoecy-associated pool, and the gynoecy-related pool data, a total of 52.06 Gb of clean data with a mean sequencing depth of 43.69×, an average Q20 value of 96.44%, and an average coverage of 97.15% (≥1×) were obtained, which indicated that these data were of high enough quality to perform subsequent analyses (Supplementary Table S4). Based on variant calling, a large number of SNPs evenly covering the entire genome were identified (Supplementary Table S5) and then used to execute BSA for identifying the Mcgy1 locus. The results revealed that the Mcgy1 locus was located on the end of pseudochromosome MC01 (hereafter referred to as simply MC01) within a 3.02-Mb region from 18,124,767 to 21,148,382 bp via the ED algorithm (Fig. 2a), and by the use of the SNP index algorithm, the Mcgy1 locus was located to within a 2.55-Mb region from 18,595,926 to 21,148,382 bp on MC01 (Fig. 2b). Overlapping the regions calculated by the ED and SNP index algorithms, we preliminarily delimited the Mcgy1 locus to within a physical distance of 2.55-Mb from 18,595,926 to 21,148,382 bp on MC01 (Fig. 2).
Classic molecular marker linkage analysis validates the Mcgy1 locus
To verify the Mcgy1 locus delimited by BSA-seq, seven polymorphic molecular markers, namely, four InDel and three CAPS markers (Supplementary Table S2), targeting the candidate region were developed and used for genotyping the 291 individuals composing the S156G×K8-201 F2 population. A partial molecular marker linkage map was constructed on the basis of the genotype and phenotype of each plant (Fig. 3a). The results showed that there were different degrees of linkage between the seven markers and Mcgy1, with genetic distances ranging from 0 cM to 23.6 cM, and four markers, namely, GY_4, GY_5, GY_6, and GY_7, cosegregated with Mcgy1 (Fig. 3a). Notably, the CAPS marker GY_7 (21,144,148 bp) can be used as a boundary for the Mcgy1 locus because, on the basis of the gene annotation of the Dali-11 genome, there is no gene in the 4.23-kb region from GY_7 to the right end of MC01 (Fig. 3b) (Cui et al., 2020). Therefore, the Mcgy1 locus identified by BSA-seq was validated by molecular marker linkage analysis and narrowed to a 476.61-kb region from 20,667,540 to 21,144,148 bp between two flanking markers, GY_3 and GY_7 (Fig. 3b).
Fine-mapping of the Mcgy1 locus
A large S156G×K8-201 F2 population comprising 5,656 individuals was genotyped using the GY_3 and GY_7 markers to screen recombinant plants. A total of 45 recombinants, namely, 27 gynoecious plants and 18 monoecious plants, were obtained and divided into six haplotypes (Fig. 3c) by the use of GY_4~GY_6 and four new markers, GY_8~GY_11 (Supplementary Table S2). Six markers, namely, GY_9, GY_10, GY_11, GY_5, GY_6, and GY_7, cosegregated with Mcgy1 in all 45 recombinants (Fig. 3c). Combining the genotypes and phenotypes of the 45 recombinants, we ultimately mapped the Mcgy1 locus to a 292.70-kb interval of 20,851,441~21,144,148 bp between the two markers GY_8 and GY_7 on MC01 (Fig. 3c), which reflected a region with a very low recombination rate. Additionally, the results of collinearity analysis showed that there is high similarity in the 292.70-kb region between the Dali-11 genome and the long-read OHB3-1 genome (Supplementary Fig. S1), implying that the assembly of genomic sequences within this 292.70-kb region from the Dali-11 genome is reliable. In other words, the causal gene of Mcgy1 is most likely in this 292.70-kb region. Based on the gene annotation of the Dali-11 reference genome (Cui et al., 2020), there were 29 genes in the 292.70-kb interval (Supplementary Table S6).
RNA-seq of stem apexes provides a preliminary estimate for the candidate genes of Mcgy1
The stem apexes from S156G and S156 at the two-leaf, six-leaf, 10-leaf, and 14-leaf developmental stages displaying the most active flower bud differentiation were subjected to RNA-seq. In total, 162.53 Gb of clean data with a mean Q30 value of 94.47% was obtained, and the mapping percentage of each sample against the Dali-11 reference genome ranged from 93.59% to 96.96% (Supplementary Table S7). Based on the RNA-seq data of stem apexes, we further improved the gene annotation data of the Dali-11 genome generated via transcriptome data from multiple tissues, including roots, stems, leaves, male flowers, ovaries, and fruits (Cui et al., 2020), that optimized 2,442 gene structures (Supplementary Table S8) and found 1,717 new genes (Supplementary Table S9). Four new genes, namely, MC_newGene0100, MC_newGene0101, MC_newGene0102, and MC_newGene0103, were located in the 292.70-kb candidate region delimited by fine-mapping. Hence, adding the 29 genes originally annotated from the Dali-11 genome, there were actually 33 genes in the fine-mapping region (Supplementary Table S6). Moreover, the results of differential alternative splicing analysis showed that there were no alternative splicing events in this region between S156G and S156 at any of the four developmental stages, excluding the possibility that Mcgy1 regulated gynoecy through alternative splicing (Supplementary Tables S10 and S11). In addition, within the fine-mapping interval, we detected only one SNP (20,943,203 bp) locating at 3’ untranslated region (3’ UTR) of MC01g1681, which encodes a cytidine triphosphate synthase (CTPS), and none of the variations in coding sequence of any of the 33 genes (Supplementary Table S12). In S156G and S156, notably, the transcript levels of MC01g1681 showed more obvious difference at two-leaf developmental stage; whereas of the other 32 genes, at all four developmental stages, part of them showed none or only minor difference and the rest showed awfully low or absent (Supplementary Fig. S2).
Meanwhile, BLAST analysis revealed six WIP genes in bitter gourd, namely, MC00g0064, MC01g0142, MC02g0091, MC03g0414, MC05g1109, and MC09g0449 (Supplementary Fig. S3), whose homologues lead to gynoecy in other cucurbit species, such as cucumber, melon, and watermelon (Martin et al., 2009; Hu et al., 2017; Zhang et al., 2020a). Additionally, there are nine ACS genes in bitter gourd, namely, MC01g0962, MC06g0753, MC06g0754, MC07g0325, MC08g0531, MC06g2011, MC10g0187, MC10g0527, and MC10g0702 (Supplementary Fig. S4), whose homologues result in gynoecy in cucumber (Mibus & Tatlioglu, 2004). Taken together, these results showed that the 292.70-kb candidate region does not include any WIP or ACS homologous, indicating that Mcgy1 might be a novel gene responsible for gynoecy in bitter gourd.
Multiple genomic sequence variation analysis to define the candidate genes of Mcgy1
After aligning the 12 inbred lines, namely, the six parents and six additional monoecious lines, to the Dali-11 genome, we obtained 25, 108, 139, 40, 1,293, 244, 38, 3,027, 10, 9, 120, and 3,044 variations in S156G, K20-95, K20-506, K8-201, S060, K20-507, K20-508, THMC170, FOLI112, K7-359, S121, and TR, respectively, within the 292.70-kb interval delimited by fine-mapping (Supplementary Tables S13~S24). Among these variations, only four single-nucleotide variations (SNVs), namely, SNV-1 (20,860,222 bp), SNV-2 (20,943,203 bp), SNV-3 (20,997,811 bp), and SNV-4 (21,144,642 bp), belonged independently to the three gynoecious lines, S156G, K20-95, and K20-506 (Fig. 4a). Of these SNVs, three of them, SNV-1, SNV-3, and SNV-4, are located within intergenic spacers and 2,963 bp, 22,496 bp, and 21,076 bp away from their upstream genes – MC01g_new0504, MC01g1685, and MC01g_new0512, respectively (Fig. 4b). Only two variations, SNV-1 and SNV-3, harbour downstream genes – MC01g1670 and MC01g1686 – and are 2,432 bp and 10,185 bp away, respectively, as SNV-4 is located at the end of MC01, with no downstream gene (Fig. 4b). The other SNV-2 same as the SNP detected by RNA-seq is located within the 3’ UTR of MC01g1681 (Fig. 4b). Therefore, the above six genes should be considered as the candidate genes of Mcgy1, and of which MC01g1681 should be focused on since the distribution of the four SNVs.
Expression analysis suggests that MC01g1681 is an important candidate for Mcgy1
For further define the candidate genes, we evaluated detailly expression levels of the six candidate genes in stem apexes combined with RNA-seq and qRT-PCR analyses. Among the five candidate genes adjacent to the three SNVs locating at intergenic spacers, three, MC01g1670, MC01g1685, and MC01g_new0512, showed hardly any expression levels at all four developmental stages in both S156G and S156, and two, MC01g1686 and MC01g_new0504, had no obvious expression difference at all stages between S156G and S156, except for one minor multiple (only 1.4 times) of significant difference of MC01g_new0504 at six-leaf developmental stage (Supplementary Fig. S5). Furthermore, the results of qRT-PCR also confirmed that there was no obvious expression difference both of MC01g1686 and MC01g_new0504 at all four stages between S156G and S156, especially the expression level of MC01g_new0504 at six-leaf developmental stage was slightly higher (only 1.2 times) in S156 than that in S156G but no significant difference (Supplementary Fig. S6). For the remaining candidate gene, notably, MC01g1681 had the highest FPKM value in S156G at the two-leaf developmental stage and was significantly higher in S156G than that in S156 (Fig. 4c). The transcript levels of MC01g1681 in S156G at the later three developmental stages (six-leaf, 10-leaf, and 14-leaf) were relatively stable and significantly lower than those at the two-leaf developmental stage, whereas the expression levels of MC01g1681 in S156 remained relatively stable at all four developmental stages (Fig. 4c), the results of which were subsequently confirmed by qRT-PCR (Fig. 4d). Taken together, we believed that MC01g1681 is an important candidate for Mcgy1, which might play an important role in sex differentiation in bitter gourd at the early developmental stage.
MC01g1681 can be verified in different bitter gourd germplasms using molecular marker
SNV-2 is a G-A (monoecy-gynoecy) mutation locating at 3’UTR of MC01g1681 obtained by resequencing data of the 12 inbred lines, which was further confirmed using Sanger sequencing (Fig. 4e). To verify the association of MC01g1681 and gynoecy in different bitter gourd germplasms, we designed a dCPAS marker (GY_10) to target SNV-2 by introducing a mismatched base (A) at end of reverse primer to create a Acc Ⅰ restriction enzyme site (Supplementary Table S2). A total of 79 monoecious inbred lines of bitter gourd germplasm were genotyped with the marker GY_10, the results of which showed a complete correlation with type of sexual reproduction (Fig. 4f), which implied that GY_10 developed by SNV-2 can be used to marker-assisted selection for gynoecy in bitter gourd.
Transcriptome analysis reveals the potential molecular formation mechanisms of gynoecy
Based on transcriptome data, we obtained global gene expression pattern of stem apexes of S156G and S156 at the two-leaf, six-leaf, 10-leaf, and 14-leaf developmental stages (Supplementary Table S25). Comparative transcriptome analyses revealed that there were 461, 128, 866, and 149 DEGs between S156G and S156 at the two-leaf, six-leaf, 10-leaf, and 14-leaf developmental stages, respectively (Supplementary Table S26). Among these DEGs, 10 were upregulated at all four developmental stages, and the transcript levels in S156G were significantly higher than those in S156 (Fig. 5a and Supplementary Table S26); and 11 genes were downregulated at all four developmental stages (Fig. 5b and Supplementary Table S26). These 21 core DEGs may play an important role in the formation and maintain of gynoecy in bitter gourd.
KEGG enrichment analysis was then carried out to further explore the metabolic pathway impacting the formation of gynoecy. The results of which showed that plant hormone signal transduction (ko04075) was the most significantly enriched pathway at the two-leaf and six-leaf developmental stages (Fig. 5c and Supplementary Fig. S7), whereas protein processing in the endoplasmic reticulum (ko04141) was the most significantly enriched pathway at the 10-leaf and 14-leaf developmental stages (Supplementary Figs. S8 and S9). 10 DEGs from the plant hormone signal transduction pathway at two-leaf developmental stage were selected for quantification via qRT-PCR to independently verify the RNA-seq data, and the results showed that all of them displayed significant differences in terms of their expression levels between S156G and S156, which was consistent with the results of RNA-seq analysis (Fig. 5d and Supplementary Tables S25 and S27). These findings provided crucial clues into the molecular formation mechanism of gynoecy in bitter gourd.