Fine-mapping and candidate gene analysis of the Mcgy1 locus responsible for gynoecy in bitter gourd (Momordica spp.)

The Mcgy1 locus responsible for gynoecy was fine-mapped into a 296.94-kb region, in which four single-nucleotide variations and six genes adjacent to them might be associate with sex differentiation in bitter gourd. Gynoecy plays an important role in high-efficiency hybrid seed production, and gynoecious plants are excellent materials for dissecting sex differentiation in Cucurbitaceae crop species, including bitter gourd. However, the gene responsible for gynoecy in bitter gourd is unknown. Here, we first identified a gynoecy locus designated Mcgy1 using the F2 population (n = 291) crossed from the gynoecious line S156G and the monoecious line K8-201 via bulked segregant analysis with whole-genome resequencing (BSA-seq) and molecular marker linkage analysis. Then, a large S156G × K8-201 F2 population (n = 5,656) was used for fine-mapping to delimit the Mcgy1 locus into a 296.94-kb physical region on pseudochromosome MC01, where included 33 annotated genes different from any homologous gynoecy genes previously reported in Cucurbitaceae species. Within this region, four underlying single-nucleotide variations (SNVs) that might cause gynoecy were identified by multiple genomic sequence variation analysis, and their six neighbouring genes were considered as potential candidate genes for Mcgy1. Of these, only MC01g1681 showed a significant differential expression at two-leaf developmental stage between S156G and its monoecious near-isogenic line S156 based on RNA sequencing (RNA-seq) and qRT-PCR analyses. In addition, transcriptome analysis revealed 21 key differentially expressed genes (DEGs) and possible regulatory pathways of the formation of gynoecy in bitter gourd. Our findings provide a new clue for researching on gynoecious plants in Cucurbitaceae species and a theoretical basis for breeding gynoecious bitter gourd lines by the use of molecular markers-assisted selection.


Introduction
In angiosperms, there are three basic types of flower sexes, namely, hermaphroditic, male, and female flowers, as well as combinations of them. Thus, there are a total of seven types of sexual plants, namely, androecious (only male flowers per plant), gynoecious (only female flowers per plant), hermaphroditic (only hermaphroditic flowers per plant), monoecious (male and female flowers per plant), andromonoecious (male and hermaphroditic flowers per plant), gynomonoecious (female and hermaphroditic flowers per plant), and trimonoecious (male, female, and hermaphroditic flowers per plant) plants (Dellaporta and Calderon-Urrea 1993;Lewis, 1942). Generally, hermaphroditic flowers are considered the ancestors of unisexual (female or male) flowers (Barrett 2002), and the latter are capable of cross-pollination and thus more easily promote abundant genetic diversity (Hamrick and Godt 1996). Among these different types of traits, gynoecy has become a popular topic and focus of research because it plays an important role in high-efficiency hybrid seed production by avoiding the possibility of pollen contamination and the need for artificial emasculation, and gynoecious plants serve as excellent materials for dissecting sex differentiation, especially in cucurbit crop species (Martin et al. 2009;Zhang et al. 2020a). Cucurbit crop species include many important vegetable and fruit species, such as cucumber (Cucumis sativus), melon (Cucumis melo), watermelon (Citrullus lanatus), and bitter gourd (Momordica charantia), and are usually considered ideal model plant species for dissecting sex differentiation because all seven types of sexual plants have been found (Schaefer and Renner 2011). Many researchers consider that ethylene is a main factor regulating sex differentiation in cucurbits, and the endogenous biosynthesis of ethylene is mainly determined by two genes, ACS and ACO (Byers et al. 1972;Manzano et al. 2011Manzano et al. , 2014Zhang et al. 2017). ACS encodes a 1-aminocyclopropane-1-carboxylic acid (ACC) synthase catalysing the conversion of S-adenosylmethionine (SAM) to form ACC, and ethylene is then synthesized from ACC, which is catalysed by ACC oxidase encoded by ACO (Boualem et al. 2015). For example, a unique F locus associated with gynoecy is identified in cucumber, and previous works have suggested that monoecious plants carry only one CsACS1 gene, while gynoecious plants have not only the CsACS1 gene but also an additional copy of CsACS1 named CsACS1G (F) (Mibus and Tatlioglu 2004;Li et al. 2020). The loss of function of the androecy (a) gene CsACS11/CmACS11, which is the hypostatic gene of F and controls female flower development, leads to androecious cucumber and melon plants (Boualem et al. 2015). CsACO2/CmACO3, the downstream gene of CsACS11/CmACS11, is identified as the second androecy-related gene, and the loss of function of this gene results in monoecious cucumber and melon mutants becoming androecious (Chen et al. 2016). Unlike in unisexual flowers, a monoecious (m) gene (CmACS7/CsACS2/CitACS4) is believed to arrest the development of stamens in female flowers, and in cucumber, melon, and watermelon mutants harbouring a non-functional m gene, the development of stamens is not blocked, leading to hermaphroditic flowers (Boualem et al. 2008Tan et al. 2015;Ji et al. 2016;Manzano et al. 2016). In addition, the interaction of F and m (FFmm) results in the formation of hermaphroditic cucumber plants ).
Some studies have suggested that, in addition to the genes involved in ethylene biosynthesis, the C 2 H 2 zinc finger transcription factor WIP1 plays a significant role in the formation of gynoecy in cucurbits. The expression of CmWIP1 (g) leads to carpel abortion and suppresses the expression of CmACS7 in male flowers, and epigenetic change of CmWIP1 induced by the insertion of a transposon is shown to induce the expression of CmACS7 in melon, resulting in the conversion of male flowers to female thus leading to gynoecy (ggMM) (Martin et al. 2009). Similarly, chromosome translocation disrupts the expression of ClWIP1, leading to gynoecy in watermelon (Zhang et al. 2020a). The function of CsWIP1 has shown to be the same as that of CmWIP1/ClWIP1 via clustered regularly interspaced short palindromic repeats/CRISPR-associated system 9 (CRISPR/Cas9) technology, as no natural mutants of CsWIP1 are available in cucumber (Hu et al. 2017). These studies indicate that these WIP1 orthologues are highly conserved in three cucurbit crop species: cucumber, melon, and watermelon.
Bitter gourd, an important member of the Cucurbitaceae family, is originated in the tropical regions of Africa (Schaefer and Renner 2010). The fruit of this species is rich in nutrients and has medicinal properties, including antidiabetic, anticancer, hypotensive, anti-obesity, antimicrobial, and antihyperlipidaemic properties (Wang et al. 2017). It is currently widely cultivated in Africa, Asia, South America, the Caribbean, parts of the Amazon Basin, and so forth (Basch et al. 2003). Like other cucurbit species such as cucumber, there are also many types of sexual plants in bitter gourd, but only two types, monoecy and gynoecy, have been reported thus far (Kole 2020). Previous genetic analyses have showed that gynoecy in bitter gourd is controlled by a single recessive gene named Mcgy, gy, or gy-1 (Matsumura et al. 2014;Cui et al. 2018;Gangadhara Rao et al. 2018). Based on the genetic map, Mcgy and gy have been delimited via restriction site-associated DNA sequencing (RAD-seq) to genetic regions of 12.7 cM and 42.9 cM, respectively (Matsumura et al. 2014;Cui et al. 2018), and gy-1 was delimited via genotyping by sequencing (GBS) technology to a 10.10 cM genetic interval (Gangadhara Rao et al. 2018). However, the gene responsible for gynoecy in bitter gourd remains unknown.
In this study, the candidate locus controlling gynoecy was identified via BSA-seq, and further narrowed by molecular marker linkage analysis and fine-mapping. Furthermore, the gynoecious candidate genes were analysed via multiple genomic sequence variation analysis, RNA-seq and qRT-PCR. Our findings provide a basis for breeding gynoecious lines and help elucidate the molecular mechanism of gynoecy formation in bitter gourd.

Plant materials
Three gynoecious lines, S156G, K20-95, and K20-506, whose causal gene of gynoecy was transferred from the spontaneous mutant K44 (Cui et al. 2018), and three monoecious lines, K8-201, S060, and K20-507, were used as female and male parents, respectively, to construct three F 1 populations, namely, S156G × K8-201 F 1 , K20-95 × S060 F 1 , and K20-506 × K20-507 F 1 populations, by artificial hybridization. Three F 2 populations, S156G × K8-201 F 2 , K20-95 × S060 F 2 , and K20-506 × K20-507 F 2 populations, were then produced by artificial self-pollination using the corresponding F 1 materials. These three sets of independent genetic materials, which included three gynoecious lines, three monoecious lines, three F 1 populations, and three F 2 populations, were used to verify the inheritance of gynoecy. Furthermore, a set of S156G, K8-201, and their resulting F 2 plants was used for BSA-seq, molecular marker linkage analysis, and fine-mapping to delimit the candidate region of the gynoecy-related gene. The above six parental lines and six additional monoecious inbred lines, namely, K20-508, THMC170, FOLI112, K7-359, S121, and TR, were subjected to whole-genome resequencing and genomic sequence variation analysis to identify the underlying sequence variations that might cause gynoecy, and 79 monoecious inbred lines of bitter gourd germplasms (Supplementary Table S1) with distinct genetic backgrounds were genotyped by the molecular markers developed by these underlying sequence variations. In addition, S156G and its monoecious nearisogenic line, S156, were used for RNA-seq and qRT-PCR. All the plants were cultivated in the experimental field at the SCAU Teaching and Research Base in Zengcheng District, Guangzhou (23.24N, 113.64E).

Whole-genome resequencing and variant calling
DNA of all the samples was extracted from young leaves by using the modified cetyl-trimethylammonium bromide method (Porebski et al. 1997). Two DNA pools, namely, monoecy-associated pool and gynoecy-related pool, were constructed by mixing equal amounts of DNA from 30 monoecious and 30 gynoecious plants in the S156G × K8-201 F 2 population for use in BSA-seq.
A total of 14 DNA libraries for whole-genome resequencing, namely, those corresponding to two DNA pools, six parental lines, and six additional monoecious inbred lines, were prepared as follows. The qualified DNA samples were randomly fragmented by Covaris, followed by fragment selection to an average size of 200 ~ 400 bp by an Agencourt AMPure XP-Medium Kit (Thermo Fisher Scientific, USA). The selected fragments were end repaired and 3'adenylated, and the adaptors were then ligated to the ends of the 3'adenylated fragments. The products were amplified by PCR and purified by the Agencourt AMPure XP-Medium Kit. The purified double-stranded PCR products were denatured (via heating) to a single strand and then circularised by splint oligo sequences. Single-strand circular DNA was used to constitute the final DNA library, which was then sequenced on an Illumina X-ten system.
The raw data generated from the sequencing pipeline were filtered using SOAPnuke (Chen et al. 2018) by discarding reads with more than 50% adaptor sequences, lowquality reads with more than 50% bases and Phred values less than 20, and reads with 2% or more "N" bases to obtain clean data. The clean reads were aligned to the Dali-11 reference genome using Burrows-Wheeler Aligner (Li and Durbin 2009;Cui et al. 2020), and variant calling of singlenucleotide polymorphisms (SNPs) and insertions-deletions (InDels) was performed via the Genome Analysis Toolkit (Mckenna et al. 2010). All the variants were annotated by ANNOVAR (Wang et al. 2010).

BSA-seq
Based on the SNPs generated from the S156G, K8-201, monoecy-associated pool, and gynoecy-related pool data, the Euclidean distance (ED) and SNP index algorithms were used for BSA. The ED^2, which represented the association value, was calculated by the ED algorithm, and the candidate region controlling gynoecy was delimited where the fitted value was greater than the median + 3SD (Hill et al. 2013). The ΔSNP index, which represented the association value, was the difference value of the SNP index of two pools calculated by the SNP index algorithms by sliding window analysis with a sliding window size of 200 kb and a sliding step of 100 kb, and the fitted value within the 99% confidence interval was taken as the lower limit to delimit the candidate region controlling gynoecy. After overlapping the results from the ED and SNP index algorithms, we preliminarily mapped the candidate region responsible for gynoecy. A collinearity analysis within the candidate region between the Dali-11 genome (Cui et al. 2020) and OHB3-1 genome (Matsumura et al. 2020) was performed using MUMmer and LINKVIEW 2 (Delcher et al. 2003) to ensure the reliability of the genome assembly in this candidate region.

Molecular marker linkage analysis and fine-mapping
To genotype each plant within the S156G × K8-201 F 2 population (n = 291), seven molecular markers, GY_1 ~ GY_7, that were polymorphic between S156G and K8-201 and targeted the candidate region delimited by BSA-seq were developed using Primer3 Plus (https:// www. prime r3plus. com/ index. html) according to the variations obtained from the resequencing data; the primer sequences are listed in Supplementary Table S2. PCR was performed in a 10 µL reaction mixture consisting of 5 µL of Green Taq Mix (Vazyme, Nanjing, China), 0.2 µL of forward and reverse primers (10 µmol L −1 ), 1 µL of template DNA (50-100 ng µL −1 ), and 3.6 µL of nuclease-free water. The PCR procedure was as follows: an initial denaturation cycle of 3 min at 81 Page 4 of 16 94 °C; 34 cycles of denaturation of 15 s at 94 °C, annealing of 15 s at 55 °C, and extension for 30 s at 72 °C; and a final extension cycle of 5 min at 72 °C. The SNP markers were further transformed into cleaved amplified polymorphic sequences (CAPS) markers or derived cleaved amplified polymorphic sequences (dCAPS) markers through a stationary temperature condition of 30 min at 37 °C followed by digestion with the corresponding restriction enzyme (Supplementary Table S2). The PCR products or the digested products were then visualised using 6% polyacrylamide gel electrophoresis (PAGE). A local molecular marker linkage map of candidate region was constructed based on the genotype and phenotype of each plant within the F 2 population using JoinMap 4.0 software; the logarithm of odds (LOD) score threshold was set to 3.0, and the recombination frequency was set to less than 0.4 (Van Ooijen 2006).
Based on the local molecular marker linkage map, two flanking markers, GY_3 and GY_7 (Supplementary Table S2), of the gynoecy locus were used for screening recombinant plants among the 5,656 individuals composing the large S156G × K8-201 F 2 population. Moreover, four new polymorphic molecular markers, GY_8 ~ GY_11 (Supplementary Table S2), between the two flanking markers were developed and used to genotype each recombinant in the same manner as that described above. Then, after combining the genotype and phenotype data of all the recombinants, the candidate region was fine-mapped into an even narrower region.

RNA-seq and comparative transcriptome analysis
The stem apexes of S156G and S156 at the two-leaf, sixleaf, 10-leaf, and 14-leaf developmental stages were collected and frozen in liquid nitrogen for total RNA isolation using an Eastep Super Total RNA Extraction Kit (Promega, China). Three biological replicates were included for each sample. The qualified RNA was prepared for library construction using an NEBNext Ultra™ RNA Library Prep Kit (NEB, USA) following the manufacturer's instructions. The accuracies of the library concentrations were evaluated by the qPCR method, and after which the libraries were then sequenced on an Illumina sequencing platform.
Based on the raw data generated by RNA-seq, clean data were obtained by removing the low-quality reads containing adaptor and poly-N (N ≥ 10%) sequences and then mapped to the Dali-11 reference genome using HISAT2 (Kim et al. 2015). The mapped reads were subsequently assembled by StringTie (Pertea et al. 2015) and then used for comparison with the original gene annotation information from the Dali-11 genome (Cui et al. 2020) to optimise the annotated genes or search for new genes. The Genome Analysis Toolkit was used for calling variants (Mckenna et al. 2010). Fragments per kilobase of transcript per million fragments mapped (FPKM) values were applied to measure the expression level of transcripts by StringTie using the maximum flow algorithm (Florea et al. 2013). Differential expression analysis was performed using DESeq2 (Love et al. 2014), and the genes were considered DEGs only when the resulting P value was < 0.01, which was adjusted based on Benjamini and Hochberg's approach. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the DEGs was performed using KOBAS software (Mao et al. 2005). Candidate genes were cloned using Phanta Max Super-Fidelity DNA Polymerase (Vazyme, Nanjing, China) and ligated into pMDTM19-T vector (Takara, Japan) for Sanger sequencing, and the primer sequences are listed in Supplementary Table S3. qRT-PCR was performed on a CFX384 Real-Time System (Bio-Rad, CA, United States) using Eastep qPCR Master Mix (Promega, China) according to the manufacturer's protocols. The GAPDH gene (MC08g1155) was used as the internal control, and the sequences of the primers used are listed in Supplementary  Table S3. The relative expression levels were calculated via the 2 −ΔΔCt method (Livak and Schmittgen 2001).

Characteristics and inheritance of gynoecy
Gynoecious bitter gourd plants produced only female flowers on both the main stems and the lateral branches, although there was the occasional appearance of one or two male flowers at low nodes along the main stem (Fig. 1). Monoecious bitter gourd plants produced both female and male flowers along the main stems and lateral branches (Fig. 1). We evaluated each plant among the three sets of independent genetic materials. We found that all plants resulting from female S156G, K20-95, and K20-506 lines were gynoecious, and all plants from male K8-201, S060, and K20-507 lines were monoecious (Table 1). Like their male parents, the plants composing these F 1 generations were monoecious (Table 1). Among the plants of the three F 2 populations, 228 and 63 plants in the S156G × K8-201 F 2 population were monoecious and gynoecious, respectively, 187 and 69 plants in the K20-95 × S060 F 2 population were monoecious and gynoecious, respectively, and 226 and 80 plants in the K20-506 × K20-507 F 2 population were monoecious and gynoecious, respectively (Table 1). Chi-square tests revealed that the segregation ratio of monoecious and gynoecious plants in all three F 2 populations fitted the expected ratio of 3:1 (Table 1). These results showed that gynoecy was a stable recessive trait controlled by a single gene named Mcgy1.

BSA-seq to rapidly identify the Mcgy1 locus
After filtering the raw data generated from the wholegenome resequencing of S156G, K8-201, the monoecyassociated pool, and the gynoecy-related pool data, a total of 52.06 Gb of clean data with a mean sequencing depth of 43.69 × , an average Q20 value of 96.44%, and an average coverage of 97.15% (≥ 1 ×) were obtained, which indicated that these data were of high enough quality to perform subsequent analyses (Supplementary Table S4). Based on variant calling, a large number of SNPs evenly covering the entire genome were identified (Supplementary Table S5) and then used to execute BSA for identifying the Mcgy1 locus. The results revealed that the Mcgy1 locus was located on the end of pseudochromosome MC01 (hereafter referred to as simply MC01) within a 3.02-Mb region from 18,124,767 to 21,144,642 bp via the ED algorithm (Fig. 2a), and by the use of the SNP index algorithm, the Mcgy1 locus was located to within a 2.55-Mb region from 18,595,926 to 21,144,642 bp on MC01 (Fig. 2b). Overlapping the regions calculated by the ED and SNP index algorithms, we preliminarily delimited the Mcgy1 locus to within a physical distance of 2.55-Mb from 18,595,926 to 21,144,642 bp on MC01 (Fig. 2). Additionally, the results of collinearity analysis showed that the assembly of genomic sequences at the whole 2.55-Mb region, especially the end of the MC01, were highly consistent between the Dali-11 genome (Cui et al. 2020) and the long-read OHB3-1 genome (Matsumura et al. 2020) (Supplementary Fig. S1).

Classic molecular marker linkage analysis validates the Mcgy1 locus
To verify the Mcgy1 locus delimited by BSA-seq, seven polymorphic molecular markers, namely, four InDel and three CAPS markers (Supplementary Table S2), targeting  the candidate region were developed and used for genotyping the 291 individuals composing the S156G × K8-201 F 2 population. A local molecular marker linkage map was constructed on the basis of the genotype and phenotype of each plant (Fig. 3a). The results showed that there were different degrees of linkage between the seven markers and Mcgy1, with genetic distances ranging from 0 cM to 23.6 cM, and four markers, namely, GY_4, GY_5, GY_6, and GY_7, cosegregated with Mcgy1 (Fig. 3a), validating the results of BSA-seq. Because GY_7 was the right-most side marker which was 3.78 kb from the end of MC01, and in order not to miss this 3.78-kb interval, we suggested that the Mcgy1 locus was located in a 480.84-kb region from 20,667,540 (GY_3) to 21,148,382 bp (end of MC01) (Fig. 3b).

Fine-mapping of the Mcgy1 locus
A large S156G × K8-201 F 2 population comprising 5,656 individuals was genotyped using the GY_3 and GY_7 markers to screen recombinant plants. A total of 45 recombinants, namely, 27 gynoecious plants and 18 monoecious plants, were obtained and divided into six haplotypes (Fig. 3c) by the use of GY_4 ~ GY_6 and four new markers, GY_8 ~ GY_11 (Supplementary Table S2). Six markers, namely, GY_9, GY_10, GY_11, GY_5, GY_6, and GY_7, cosegregated with Mcgy1 in all 45 recombinants (Fig. 3c). Combining the genotypes and phenotypes of the 45 recombinants, we ultimately mapped the Mcgy1 locus to a 296.94-kb interval of 20,851,441 ~ 21,148,382 bp from the marker GY_8 to the end of MC01 (Fig. 3c), which reflected a region with an extremely low recombination rate. Based on the gene annotation of the Dali-11 reference genome (Cui et al. 2020), there were 29 genes in the 296.94-kb interval (Supplementary Table S6).
Accordingly, we suggested that the above four SNVs might be associated with gynoecy, and their six neighbouring genes were considered as potential candidate genes for Mcgy1. It has been reported that WIP1 and CsACS1G are responsible for gynoecy in cucurbit species (Mibus and Tatlioglu 2004;Martin et al. 2009;Hu et al. 2017;Zhang et al. 2020a). Thus, to understand whether WIP1 and CsACS1G are associated with gynoecy in bitter gourd, we performed BLAST analysis to search their all homologs in Dali-11 genome (Altschul et al. 1990). The results showed there are six WIP1 homologs, namely, MC00g0064, MC01g0142, MC02g0091, MC03g0414, MC05g1109, andMC09g0449, andnine ACS homologs, namely, MC01g0962, MC06g0753, MC06g0754, MC06g2011, MC07g0325, MC08g0531, MC10g0187, MC10g0527, andMC10g0702. And all of above six WIP1 and nine ACS homologs were not in the fine-mapping interval of 296.94-kb (Supplementary Table S8). Also, we compared them in the above 12 inbred lines, and the results showed that there was no sequence variation belonging exclusively to the three gynoecious lines (Supplementary  Table S9).
In addition, the GY_10 and GY_7 markers developed by SNV-2 and SNV-4, respectively, were used to genotype the 79 monoecious bitter gourd inbred lines with different genetic background (Supplementary table S1). The results of genotypes showed a complete correlation with gynoecy phenotypes (Fig. 4c and d), implying that GY_10 and GY_7 could be applied as reliable markers to marker-assisted selection for gynoecy in bitter gourd.

RNA-seq and qRT-PCR analysis of candidate genes
The stem apexes from S156G and S156 at the two-leaf, sixleaf, 10-leaf, and 14-leaf developmental stages displaying the most active flower bud differentiation were subjected to RNA-seq. In total, 162.53 Gb of clean data with a mean Q30 value of 94.47% was obtained, and the mapping percentage of each sample against the Dali-11 reference genome ranged from 93.59 to 96.96% (Supplementary Table S10). Based on the RNA-seq data, we further improved the gene annotation data of the Dali-11 genome generated via transcriptome data from multiple tissues, including roots, stems, leaves, male flowers, ovaries, and fruits (Cui et al. 2020), that optimised 2,442 gene structures (Supplementary Table S11 were located in the 296.94-kb candidate region delimited by fine-mapping. Hence, adding the 29 genes originally annotated from the Dali-11 genome, there were actually 33 genes in the fine-mapping region (Supplementary Table S6). Notably, the positions of these four new genes were not adjacent to the four SNVs identified by sequence variation Fig. 4 Analyses of candidate genes of Mcgy1. a UpSet plot of all variations within three gynoecious lines (S156G, K20-95 and K20-506) and the nine monoecious lines (K8-201, S060, K20-507, K20-508, THMC170, FOLI112, K7-359, S121, and TR). The numbers above columns represent the number of variations. The red box represents the unique variations belonging to only the three gynoecious lines. b Physical location diagram of the four unique SNVs belonging to the three gynoecious lines. The green, yellow, and orange bars indicate the gene structures of the coding DNA sequence (CDS), 5′ untranslated region (5′ UTR), and 3′ UTR, respectively, and the introns are omitted. c Genotypes of GY_10 in the three gynoecious parent lines and 79 monoecious inbred lines. d Genotypes of GY_7 in the three gynoecious parent lines and 79 monoecious inbred lines. Three gynoecious inbred lines include S156G, K20-95, and K20-506, and 79 monoecious inbred lines are listed in Supplementary Table S1 analysis. Within these 33 genes, we detected only one SNP (20,943,203 bp) at 3' UTR of MC01g1681, which was the same as SNV-2, whereas none of any sequence variations in coding sequences (Supplementary Table S13). Furthermore, the full-length cDNA and partial untranslated region of MC01g1681 were cloned using cDNA templates of S156G and S156 ( Supplementary Fig. S3), verifying the SNP (20,943,203 bp) detected by RNA-seq.
Based on the FPKM values calculated by RNA-seq data, we did not discover any genes of all the 33 genes within the fine-mapping interval which showed significant differential expression in stem apex between S156G and S156 throughout all the four developmental (Fig. 5a). For the six potential candidate genes neighbouring the four SNVs in the fine-mapping region, MC01g1681 had the highest FPKM value in stem apex of S156G at the two-leaf developmental stage, which was also significantly higher than that of S156, MC01g1686 and MC01g_new0504 had not obvious expression difference at all four developmental stages between S156G and S156, and the remaining three genes, MC01g1670, MC01g1685, and MC01g_new0512, showed hardly any transcript levels at all four developmental stages in both S156G and S156 (Fig. 5a).
The results of qRT-PCR also showed that the relative expression level of MC01g1681 was significantly higher in stem apex of S156G than that of S156 at two-leaf developmental stage, whereas had not significant expression difference at the later three developmental stages (six-leaf, 10-leaf, and 14-leaf) between S156G and S156 (Fig. 5b). Furthermore, there were not significant difference of relative within the 296.94-kb candidate region from 24 samples of stem apexes, which spanned four developmental stages, each with three biological replicates, namely, two-leaf of S156G (gy_2_1/2/3) and S156 (mono_2_1/2/3), six-leaf of S156G (gy_6_1/2/3) and S156 (mono_6_1/2/3), 10-leaf of S156G (gy_10_1/2/3) and S156 (mono_10_1/2/3), and 14-leaf of S156G (gy_14_1/2/3) and S156 (mono_14_1/2/3). The blue circles indicate the six potential candidate genes of Mcgy1 defined by sequence variation analysis. b Relative expression levels of MC01g1681, MC01g_ new0504, and MC01g1686 at the four developmental stages. Light green and bottle green bars indicate S156G and S156, respectively. The data are presented as the means ± SDs (n = 3). *represents significance at the 0.05 level (Student's t test) expression levels both of MC01g1686 and MC01g_new0504 at all four developmental stages between S156G and S156 (Fig. 5b).
In addition, to further understand whether the transcript levels of the six WIP1 and nine ACS homologs are associated with gynoecy in bitter gourd, we compared them in stem apex based on the RNA-seq data. The results showed that except for an ACS gene, MC06g0753, which was expressed significantly higher in stem apexes of S156G than that of S156 throughout all the four developmental stages, all the six WIP1 and other eight ACS genes showed no significant differential expression in stem apexes between S156G and S156 throughout all the four developmental stages (Supplementary Figs. S4 and S5). Furthermore, to explain the relationship between the differential expression of MC06g0753 and gynoecy, we performed phylogenetic analysis for the nine ACS homologs of bitter gourd and the reported ACS homologs in other cucurbit crops, namely CsACS1G (F) (Mibus and Tatlioglu 2004), CsACS2/CmACS7/ CitACS4 (m) Manzano et al. 2016), and CsACS11/CmACS11 (a) (Boualem et al. 2008). The results showed that MC01g0962 and MC10g0527 had the closest phylogenetic relationships with F, whereas MC06g0753 was in a different evolutionary branch from F and was closest to m ( Supplementary Fig. S6), which is highly expressed in female flower to prevent stamen development, while lowly expressed in male flower (Martin et al. 2009). Hence, we suggested that MC06g0753 was a tissuespecific expression gene, rather than a gynoecy-associated gene in bitter gourd. Taken together, the six WIP1 and nine ACS homologs were not associated with gynoecy, implying there might be a novel gene responsible for gynoecy in bitter gourd.

Transcriptome analysis reveals the potential molecular formation mechanisms of gynoecy
Based on transcriptome data, we obtained global gene expression pattern of stem apexes of S156G and S156 at the two-leaf, six-leaf, 10-leaf, and 14-leaf developmental stages (Supplementary Table S14). Comparative transcriptome analyses revealed that there were 461, 128, 866, and 149 DEGs between S156G and S156 at the two-leaf, sixleaf, 10-leaf, and 14-leaf developmental stages, respectively (Supplementary Table S15). Among these DEGs, 10 DEGs were upregulated at all four developmental stages, and the transcript levels in S156G were significantly higher than those in S156 ( Fig. 6a and Supplementary Table S15); and 11 DEGs were downregulated at all four developmental stages ( Fig. 6b and Supplementary Table S15). We speculated that these 21 core DEGs might play an important role in the formation of gynoecy in bitter gourd.
To further explore the metabolic pathway impacting the formation of gynoecy, KEGG enrichment analysis was then carried out. The results showed that plant hormone signal transduction (ko04075) was the most significantly enriched pathway at the two-leaf and six-leaf developmental stages ( Fig. 6c and Supplementary Fig. S7), whereas protein processing in the endoplasmic reticulum (ko04141) was the most significantly enriched pathway at the 10-leaf and 14-leaf developmental stages (Supplementary Figs. S8 and S9). 10 DEGs associated with the plant hormone signal transduction pathway at two-leaf developmental stage were selected for quantification via qRT-PCR to independently verify the RNA-seq data, and the results showed that all of them displayed significant differences in terms of their expression levels between S156G and S156, which was consistent with the results of RNA-seq analysis ( Fig. 6d and Supplementary Tables S14 and S16). These findings provided crucial clues into the molecular formation mechanism of gynoecy in bitter gourd.

Discussion
Gynoecious plants serve as excellent materials for increasing yields and producing hybrid seeds of Cucurbitaceae crop species. Previous studies have established that the C 2 H 2 zinc finger transcription factors CsWIP1, CmWIP1, and ClWIP1 are the causal genes of gynoecy in cucumber, melon, and watermelon, respectively (Martin et al. 2009;Hu et al. 2017;Zhang et al. 2020a), and a unique F locus associated with an additional copy of CsACS1, namely, CsACS1G, is responsible for gynoecy in cucumber (Mibus and Tatlioglu 2004). In addition, no other gene involved in gynoecy has been identified in Cucurbitaceae crops thus far.
In the past two decades, consistent with this study, some studies have shown that gynoecy in bitter gourd is controlled by a single recessive gene (Behera et al. 2006(Behera et al. , 2009Ram et al. 2006), and the inheritance pattern is the same as that in other cucurbits -controlled by the WIP1 gene. Matsumura et al. (2014) first map the gynoecy locus, Mcgy, to a 12.73-cM genetic interval, which is anchored to a 1.48-Mb physical interval from 19,622,042 bp to 21,103,779 bp on MC01 based on the Dali-11 genome (Cui et al. 2020). Later, Cui et al. (2018) also map the gynoecy locus, gy, into a 3.5-Mb physical interval between 17,619,724 bp and 21,144,642 bp on MC01. In this study, we identified the Mcgy1 locus responsible for gynoecy via BSA-seq (Fig. 2), and further delimited it into a 296.94-kb physical interval from 20,851,441 to 21,148,382 bp (end of MC01) by the use of classic forward genetic strategies, including molecular marker linkage analysis and fine-mapping (Fig. 3). Obviously, these three loci overlap, and the results of our study have greatly narrowed it. It is worth noting that the border-right of the Mcgy1 locus is the end of MC01, although this end is supported by Dali-11 and long-read OHB3-1 genomes ( Supplementary Fig.  S1), there may still be some unknown sequences behind the end of MC01 that cannot be sequenced and assembled due to some reasons such as special sequence structures, technical difficulties, extremely low recombination rate.
Hence, further studies on gynoecy in bitter gourd need to pay attention to this problem.
In Cucurbitaceae family, it is well known that WIP1 and CsACS1G are the gynoecy-associated genes reported thus far (Mibus and Tatlioglu 2004;Martin et al. 2009;Hu et al. 2017;Zhang et al. 2020a). But in our study, there is neither WIP1 nor CsACS1G homologs in the 296.94-kb Fig. 6 Analyses of DEGs between S156G and S156. a Venn diagram of upregulated DEGs between S156G and S156 at four different developmental stages. b Venn diagram of downregulated DEGs between S156G and S156 at four different developmental stages. c Main pathways identified via KEGG enrichment analysis based on RNA-seq at the two-leaf developmental stage. The number of genes is indicated by the size of the dots. The q value of the metabolic pathway is indicated by the colour of the dots. The circles and triangles represent down-and upregulated DEGs, respectively, and the squares represent both up-and downregulated DEGs in the same pathway. d qRT-PCR results for validating DEGs. The DEGs subjected to qRT-PCR were selected from among the genes involved in the plant hormone signal transduction pathway. The data are presented as the means ± SDs (n = 3). *represents significance at the 0.05 level (Student's t test) fine-mapping region (Supplementary Tables S6 and S8). Under normal condition, WIP1 is highly expressed in male flowers from the monoecious line up to stage 6 of flower bud development, while lowly expressed in female flower (Martin et al. 2009), hence, its transcription level detected in stem apexes from gynoecious line is much lower than that from monoecious line, such as in watermelon (Zhang et al. 2020a). However, none of the six WIP1 homologs in bitter gourd show such expression pattern in stem apexes ( Supplementary Fig. S4). While CsACS1G in cucumber is a dominant gene, which is expressed in stem apexes of gynoecious line much higher than that of monoecious line (Li et al. 2020). But here, gynoecy in bitter gourd is controlled by a recessive gene (Table 1), and MC01g0962 and MC10g0527, which phylogenetic relationships are closest to CsACS1G, are hardly expressed in stem apexes both S156G and S156 (Supplementary Figs. S5 and S6). Notably, within the nine ACS homologs in bitter gourd, MC06g0753, a homolog of CmACS7 (m) ( Supplementary Fig. S6), is the only gene that shows significant differential expression in stem apexes between S156G than S156 throughout all the four developmental stages ( Supplementary Fig. S5). Previous studies have suggested that CmACS7 is highly expressed in female flower to prevent stamen development, while lowly expressed in male flower, and its non-functional mutant leads female flower to hermaphrodite flower in melon (Martin et al. 2009). Therefore, it is reasonable that MC06g0753 high expresses in gynoecious stem apexes of bitter gourd. CmACS7 encodes an ethylene biosynthesis enzyme, which regulates sex phenotype by altering endogenous ethylene (Martin et al. 2009), while treatment with silver nitrate (AgNO 3 ) inhibits ethylene production to response CmACS7 (Beyer 1976). For example, treatment with AgNO 3 in gynoecious cucumber, melon, and watermelon leads to hermaphrodite flowers (Yamasaki and Manabe 2011;Dai et al. 2022;Zhang et al. 2017). In bitter gourd, previous studies have shown that AgNO 3 treatment in gynoecious lines leads to hermaphrodite flowers (Eishin and Toyoaki 2005). Accordingly, we speculate that CmACS7 homolog (MC06g0753) in bitter gourd may be conservative. In addition, sequence variation analysis also supports that the six WIP1 and nine ACS homologs in bitter gourd are not gynoecy-associated genes (Supplementary Table S9). Our findings indicate that Mcgy1 may be a novel gene different from previously reported in other cucurbit species, and may provide a new clue to dissect gynoecy in Cucurbitaceae family.
The causal gene responsible for gynoecy in the three gynoecious lines used in this study, namely, S156G, K20-95 and K20-506, is the same, since each was developed from a spontaneous gynoecious mutant, K44 (Cui et al. 2018). Hence, we suggest that the four SNVs within the 296.94-kb fine-mapping region only belonging to the three gynoecious lines comparing with the nine monoecious inbred lines may be related to gynoecy (Fig. 4a). Among the six potential candidate genes adjacent to these four SNVs, MC01g1685 is a homolog of DYSFUNCTIONAL TAPETUM 1 (DYT1) which plays a critical role in regulating tapetum function and pollen development in Arabidopsis (Feng et al. 2012).
MC01g1686 is an ethylene-responsive transcription factor, and previous studies suggest its homologs, CsERF110 and CmERF110, respond to ethylene signalling, mediating ethylene-regulated transcription of CsACS11 and CmACS11 in cucumber and melon, respectively, to influence sex determination (Tao et al. 2018). For MC01g1670, an MYB transcription factor, which confers a short stamen phenotype leading to male sterility in Arabidopsis (Cheng et al. 2009;Mandaokar and Browse 2009). MC01g_new0504 encodes a FK506-binding protein 13 (FKBP13), an immunophilin in the chloroplast thylakoid lumen, participating in redoxregulatory processes (Gopalan et al. 2006), but no research has shown that FKBP13 is involved in sex regulation so far. MC01g1681 encodes a cytidine triphosphate synthase (CTPS), whose homologs normally play a role in the formation of filamentous structures in various organisms, such as Escherichia coli, Drosophila melanogaster, Saccharomyces cerevisiae, and Homo sapiens (Liu 2010;Barry et al. 2014;Noree et al. 2014;Lynch et al. 2017), and CTPS also plays rate-limiting roles at the final step of de novo synthesis of cytidine triphosphate (CTP), which is essential for DNA, RNA and phospholipid biosynthesis in all organisms (Daumann et al. 2018) and is vital for early endosperm development in Arabidopsis thaliana and Oryza sativa (Hickl et al. 2021;Yoon et al. 2021), but also absent study supports the relationship of CTPS and sex determination. Notably, MC01g1681 occurs a SNP mutation on 3'UTR in gynoecious lines of bitter gourd (Fig. 4b), and some studies on the 3' UTR have shown that this component isoften highly conserved (Siepel et al. 2005;Xie et al. 2005), which regulates mRNA-based processes, such as mRNA localization, mRNA stability, and mRNA translation (Macdonald and Struhl 1988;An et al. 2008;Mayr 2019), and stores genetic information transmitted to proteins through the formation of 3'UTR-mediated protein-protein interactions (Berkovits and Mayr 2015). And the last gene, MC01g_new0512, is an unknown gene. However, unlike WIP1 or CsACS1G, our expression data indicated that none of the six potential candidate genes showed significant differential expression in stem apex throughout all the four developmental stages, only MC01g1681 revealed significant differential expression at two-leaf developmental stage (Fig. 5). Hence, we speculate that up regulation of MC01g1681 caused by a SNP in 3'UTR may be related to the formation of gynoecy in bitter gourd. Similar phenomenon has been reported in the cytoplasmic male sterility (CMS)/fertility restoration (Rf) system of Chinese wild rice (CW)-type, where a fertility restorer gene Rf17 is highly expressed in the CMS line while lowly expressed in the Rf line, and a SNP in the promoter region of Rf17 may lead to down regulation of Rf17 in Rf line, resulting fertility restoration (dominant phenotype) Toriyama 2005, 2009). Whether we capture the right tissue or developmental stage of differential expression of Mcgy1, or whether gynoecious phenotype can be determined when the expression of MC01g1681 reaches a certain high level at two-leaf developmental stage? More efforts and further studies are needed.
Based on RNA-seq analysis, we found that only 21 DEGs showed significant difference between S156G and S156 at all four developmental stages ( Fig. 6a and b and Supplementary Table S15). Among these DEGs, MC02g0607 belongs to the AGL1 MADS-box family, which homologs involve in the regulation of gynoecium and ovule development in Arabidopsis (Flanagan et al. 1996); MC05g0014 is an uclacyanin-3-like gene, which homologs are related to pollen development in rice (Zhang et al. 2020b); and MC04g1310 is annotated as regulating anther development (Supplementary Table S14). The higher expression of MC02g0607 and the lower expression of MC04g1310 and MC05g0014 in S156G compared with S156 (Supplementary Table S15) may be part of an important molecular mechanism for the formation of gynoecy in bitter gourd. Five (MC06g1243, MC06g2033, MC09g1421, MC10g1173, and MC11g0366), one (MC03g0290), two (MC08g1661 and MC10g0031), and two (MC03g0811 and MC03g0812) DEGs, which are annotated as being involved in inorganic ion transport and metabolism, carbohydrate transport and metabolism, the formation of the integral component of membrane, and metal ion binding, respectively (Supplementary Table S14), implying that the molecular regulation for gynoecy may involve transmembrane transport. In addition, the annotations of three DEGs (MC_newGene1691, MC06g1147, and MC10g1236) involve in DNA replication, RNA transport, and posttranslational modification, respectively (Supplementary Table S14). The functions of the remained five genes are unknown. We believe that these 21 DEGs are vital clues to uncover the molecular mechanism of the formation of gynoecy, but more efforts are needed to elucidate their relationships and concrete mechanisms.
Previous studies have shown that the phytohormone ethylene plays a direct or indirect role in the formation of gynoecy in Cucurbitaceae crop species. For instance, by directly increasing the biosynthesis of ethylene, CsACS1G promotes gynoecy in cucumber (Mibus and Tatlioglu 2004); this gene also promotes the femaleness in melon (Rudich et al. 1969). Epigenetic changes in CmWIP1 promote the expression of CmACS7 to indirectly enhance the biosynthesis of ethylene, leading to gynoecy in melon (Martin et al. 2009). In this study, the DEGs related to plant hormone signal transduction were significantly enriched at the two-leaf and six-leaf developmental stages ( Fig. 6c and supplementary Fig. S7). Of these, two DEGs related to ethylene signal transduction, MC11g0603, a homolog of Constitutive Triple Response 1 (CTR1), and MC04g0109, a homolog of Ethylene Insensitive 3 (EIN3), exhibited significantly higher and lower transcript levels in S156G than in S156, respectively (Supplementary Tables S14 and S16). Previous reports have shown that CTR1 plays a positive role in the biosynthesis of ethylene (Leclercq et al. 2002), while EIN3 has the opposite effect . These results suggest that ethylene may play an important role in the formation of gynoecy in bitter gourd. There were also three DEGs related to cytokinin signal transduction (Supplementary Tables S14 and S16), whose homologs play an important role in sex differentiation in watermelon (Zhang et al. 2020a). In addition, there were several other genes involved in plant hormone signal transduction, such as that of gibberellin, brassinosteroid, and auxin (Supplementary Tables S14 and S16). Hence, ethylene may not be the only phytohormone influencing the formation of gynoecy in bitter gourd, and the metabolic mechanism by which Mcgy1 regulates sex differentiation needs to be clarified.
The Cucurbitaceae family comprises approximately 97 genera and 940 ~ 980 species (Schaefer and Renner 2011), while only two pathways regulating gynoecy have been discovered in three species thus far. Here, we suggest that there may be a novel pathway for gynoecy in bitter gourd. Our findings contribute to an improved understanding of the molecular formation mechanism of gynoecy in bitter gourd and enrich the knowledge about of genetic pathway of gynoecy in the Cucurbitaceae family.
Author contribution statement KH and JWC conceived and designed the research. JZ and JJC performed most of the experiments and wrote the manuscript. JL and CFZ performed statistical analysis. FH and JCD provided helpful discussions. All authors read and approved the final manuscript.
Funding This work was supported by the Key Project of Basic and Applied Research for University in Guangdong Province (2018KZDXM016), the Science and Technology Planning Project of Guangdong Province (2018B020202007), the Guangzhou Science and Technology Plan Projects (202002020086, 202102020800, and 202206010170), and the Science and Technology Plan Projects of Guangdong Province (2019A050520002).

Data availability
The data that support the findings of the current study are available from the corresponding author upon reasonable request. Raw data of genome and transcriptome generated in this study have been deposited in National Genomic Data Center (NGDC) of China National Center for Bioinformation (CNCB) under accession number PRJCA011401.