Development of InDel Markers and Fingerprinting of Bitter Gourd (Momordica Charantia) Based on Whole Genome Re-sequencing

Bitter gourd (Momordica charantia L.) is one of the most important vegetable crops in many Asian and African countries. Here, InDel markers developed by comparing whole genome re-sequencing data were used to analyze the genetic diversity of a bitter gourd germplasm sample from various geographical origins. To verify the reliability of the set of InDel markers identied, 220 pairs of InDel primers were designed. The primers were preliminarily detected by 8% polyacrylamide gel electrophoresis, and 25 pairs of primers with better polymorphism were screened. Using the 25 primer combinations, the 53 bitter gourd accessions were effectively distinguished and the InDel ngerprint of DNA was constructed. Concomitantly, the degree of purity of different crosses was determined based on the differences in specic bands among genotypes. The unweighted pair-group method with arithmetic means showed that the 53 bitter gourd materials may be divided into three groups, with a similarity coecient of 0.645 as the threshold. Therefore, this study can provide many InDel markers for genotypic identication, genetic relationship analysis, and genetic map construction of bitter gourd.


Introduction
Bitter gourd is a common, annual, climbing herbaceous medicinal plant, and a melon vegetable suitable for cultivation in temperate and tropical regions. Bitter gourd originated in tropical Africa [1] and is mainly distributed in tropical and subtropical regions of Asia and Africa, as well as in the Amazon region of Brazil, the Caribbean, and South Africa [2,3].
Several wild bitter gourd genetic resources have been found in the southern foothills of the Himalayas.
Particularly, India is home to some of the wildest bitter gourd germplasm, followed by Southeast Asian countries, such as Thailand and Myanmar, and the Yunnan and Hainan areas in China [4]. Wild bitter gourd fruit is small, thin, and prickly; furthermore, because of its strong bitter taste, it is di cult to eat but after years of domestication, modern cultivated bitter gourd varieties have developed thick and long esh, an increased weight, and a milder bitter taste and are more suitable for eating. Expanding research efforts have con rmed a high nutritional value and medicinal potential of bitter gourd for reducing blood glucose levels in patients with diabetes [5] .
Genetic resources provide abundant genetic variation for cultivar improvement as well as genetic and biological research. Germplasm is an important part of genetic diversity, the basic premise of agricultural origin and development, and the raw material for various breeding pathways, which largely determine the breeding effect [6]. As in any other horticultural crop, any major breakthrough in bitter gourd breeding depends on the development and utilization of key germplasm. Breeding high-quality bitter gourd varieties requires access to extensive genetic resources. In addition, genetic diversity and genetic relationship analysis are the main concerns while evaluating germplasm resources. Genetic diversity is a re ection of plant adaptability to environmental changes, which in turn is re ected at the morphological, cytological, physiological, and biochemical levels [7].
Analysis of genetic diversity and genetic relationship of agronomic traits within a germplasm sample is of great signi cance for the development and utilization of germplasm resources. Thus, for example, Huang et al. [8] used 28 morphological traits to conduct cluster analysis of 33 bitter gourd accessions, and divided them into three groups, namely, a wildtype group, a dense tumor, small-fruit type group and a longfruit type group. They found that there were obvious regional differences among groups. Further, the genetic distance between Chinese bitter gourd germplasm and germplasm resources from India and Southeast Asia was relatively large. Some bitter gourd varieties re ect differences in morphological characteristics that are quite different from the classi cation of morphological experience.
More recently, Kang et al. [9] used ISSR molecular markers to study the genetic diversity among 48 bitter gourd samples. They found that Jiangxi variety 608 and Fujian variety 615 were similar in skin maturity, color, and shape but the characteristics of the fruit surface differed. Similarly, Dhillon et al. [10] used 50 pairs of SSR primers to conduct cluster analysis on 114 bitter gourd accessions from Asia. The genetic similarity coe cients ranged between 0.61 and 0.88, and they were divided into four groups at a threshold value of 0.66. Further, Wang [11] used morphological markers and SRAP and SSR molecular markers to conduct cluster analysis on 14 accessions of Momordica charantia. He found that the similarity of clustering based on morphological markers and molecular markers reached 78.57%; furthermore, the genetic differences among the 14 accessions were not signi cant. However, the clustering results obtained based on the two molecular markers were very close, indicating that molecular marker-based clustering might be more reliable. Therefore, it is better to study the diversity among germplasm accessions using molecular markers than using morphological characteristics, as the former can accurately indicate the genetic relationship of bitter gourd at the molecular level and determine the extent of homology.
Some varieties with similar morphological characteristics can be clustered together, thus con rming the correctness of the traditional morphological classi cation framework, which can be supplemented and re ned by clustering agronomic traits based on differences at the DNA level. Therefore, the evaluation, innovation, and utilization of molecular markers within bitter gourd germplasm certainly warrants further support.
Insertion-deletion length polymorphism (InDel) refers to the difference between two materials relative to the whole species genome. Compared with each other, there will much likely be a certain number of nucleotide insertions or deletions in the genome of one material which do not exist in the other. As a new generation of markers, InDel markers are widely distributed, numerous, highly polymorphic, stable in variation, low in price, and co-dominant in the genome of a species. To date, InDel markers have been successfully applied to the study of rice, cucumber, and pepper germplasm, among other crops. For example, Hayashi et al. [12] developed a set of InDel markers for the blast-resistance gene in rice. In turn, Li et al. [13] explored InDel loci based on re-sequencing results and used 134 pairs of InDel primers to detect the effectiveness of 16 typical cucumber genotypes. Their results showed that 116 pairs of InDel primers fully revealed the diversity and speci city of the 16 genotypes. Similarly, Li et al. [14] developed InDel markers for pepper genetic mapping using two inbred lines, BA3 and B702, with genome-wide re-sequencing; they obtained a linkage genetic map based on the InDel markers developed, which consisted of 12 linkage groups with a genetic distance of 1178.01 cM and an average distance of 5.01 cM between bin markers.
With the release of the bitter gourd genome, SSR markers [15] and InDel markers [16] have been developed for the whole species genome. However, InDel markers have not been reported for genetic diversity analysis of bitter gourd germplasm. According to the whole genome series of bitter gourd Dali-11, two materials, namely, No. 12 from Guangxi and H-13 from Guangdong were re-sequenced to develop InDel markers. The fruits of these two materials are green and straight tumor, but their sources and fruit lengths are different. The fruit of No. 12 is longer (35-40 cm) and shows late maturity, while the fruit of H-13 is shorter (approximately 20 cm) and shows early maturity. There is strong heterosis between the two materials, which is important to re-sequence and develop InDel markers. Concomitantly, this set of markers was used to screen out the core markers to construct the molecular ngerprints of 53 bitter gourd germplasm accessions from different regions in China, as well as other countries, which can be used for variety identi cation and for determination of hybrid seed purity. Finally, the genetic relationship of the tested materials was analyzed to provide a theoretical reference for the selection of parents in the process of breeding superior varieties.

Plant materials
All 53 bitter gourd (Table S1) accessions and seeds of three hybrids, namely, Zhongyu, Tiantianhao 11, and Tiantianhao 12, were provided by the Liu Zhengguo Research Group, of the College of Agriculture at Guangxi University. Two of the materials are from Thailand, one from Malaysia, and the rest from various regions of China.

DNA library construction sequencing and resequencing data quality assessment
After the genomic DNA of bitter gourd accessions No. 12 and H-13 was quali ed, the DNA was fragmented using ultrasonication; then, the fragmented DNA was puri ed, repaired at the end, added with A at the 3′ end, and connected to the sequencing joint. The fragment size was selected using agarose gel electrophoresis, and the sequencing library was formed by PCR ampli cation. The constructed library was subjected to library quality inspection, and the quali ed library was sequenced by Illumina. Sequenced or raw reads containing a joint, i.e., low-quality reads, were ltered to obtain clean reads to ensure the quality of subsequent information analysis.

Sequencing comparative analysis
The sequencing reads obtained by re-sequencing need to be relocated to the reference genome to analyze the existent variation. The BWA [18] software was used for the comparison between the short sequence and the reference genome obtained by high-throughput sequencing (as per the sequencing platform of Illumina) with the reference genome, and information such as sample comparison rate was counted. Comparison rate: Clean Reads that could be located on the reference genome accounted for all clean reads. If the reference genome is selected properly and there is no pollution in the relevant experimental process, the comparison rate of sequencing reads will be higher than 70%. In addition, the alignment rate is related to the genetic relationship between the sequenced species and the reference genome, the assembly quality of the reference genome, and the sequencing quality of reads. The closer the species, the more complete the assembly of the reference genome; furthermore, the higher the quality of the sequencing reads, the more reads can be located in the reference genome, and the higher the alignment rate.
According to the position information of small InDel loci on the reference genome detected by the sample, compared with the gene and CDS location information of the reference genome, we can annotate whether the InDel locus occurs in the intergenic region, in the gene region, or in the CDS region, and whether it is a frame-shift mutation. The annotation of small InDel is performed by the SnpEff software. An InDel with a transcoding mutation may lead to changes in gene function.

Screening and analysis of InDel primers
InDel primers were designed according to the predicted InDel loci while re-sequencing data, and 20 pairs of InDel primers were randomly selected from each chromosome, which were evenly distributed on the chromosome, with a total of 220 pairs covering the whole genome of bitter gourd. The primers were synthesized by Beijing Qingke Biotechnology Co., Ltd.
The optimized ampli cation system comprised a total volume of 12 µL containing 5 µL 2 × T5 Super PCR Mix (PAGE), 2 µL genomic DNA, 1 µL upstream and downstream primers, and 4 µL ddH 2 O. The reaction process comprised pre-denaturation at 94°C for 45 min; denaturation at 94°C for 30 s, renaturation at 52°C for 30 s, extension at 72°C for 30 s, 35 cycles; then, extension at 72°C for 5 min, and preservation at 16°C. PCR ampli cation products were detected by 8% polyacrylamide gel electrophoresis (DYY-6D, Beijing Liuyi Biotechnology Co., Ltd, China) performed at 300 V and 150 mA for 150 min. After electrophoresis, silver staining, statistical band analysis, and imaging were performed.

Construction of the molecular ngerprints of 53 bitter gourd genetic accessions
Twenty-ve pairs of InDel primers were used to construct the molecular ngerprint of the bitter gourd germplasm sample under study. Then, according to the separation of PCR ampli cation products of the 53 bitter gourd accessions analyzed by PAGE, the assignment rules were as follows. Regarding the same migration position, strong bands were labeled "1," while bands without, or with weak bands, were denoted as "0." Thus, the assignment of bands ampli ed by 25 marker primers in each variety was combined into a series of numbers, which was used as the molecular identity card of the corresponding variety.
According to the rules of complementary bands between parents and their different hybrids, suitable primers were screened from the constructed ngerprint to identify the purity of the tested samples. Based on the results of InDel molecular marker PAGE, the sample type was determined according to whether the electrophoresis band of the sample was complementary to the electrophoresis band of the parents. The seed purity of Zhongyu, Tiantianhao 11, and Tiantianhao 12 was determined by the extent of homology of speci c bands shared with each parent.

Data analysis
The PopGen32 software [19] was used to calculate Nei's gene diversity index (H) and the Shannon-Wiener diversity index (I). Polymorphism information content (PIC) was calculated using the formula PIC = (X/X-1) (1-P 2 ij ), where X represents the number of samples and Pij represents the frequency of the i th site that appears in the j th gene. The ampli ed bands were statistically analyzed, and the 0,1 matrix (0 represents no band and 1 represents band) was stored in an Excel table according to the band type. The genetic similarity matrix was calculated by the Ntsys 2.10 software, and cluster analysis was conducted according to the unweighted pair-group method of arithmetic average (UPGMA) [20].

Results
3.1. Quality assessment of re-sequencing data Table 1 shows the analysis of the sequencing data obtained and summarizes the results of sequencing quality analysis. This analysis for 20.05 Gbp Clean Data revealed high sequencing quality (Q 20 ≥ 96.85%, Q 30 ≥ 91.49%) and a GC content of approximately 36.80-36.85%; furthermore, the quality of the sample base distribution in the rst four bases and the last 10 bases will be lower than the quality of the intermediate-base sequencing, but the quality values Q are higher than 30%; AT, CG base pairs basically did not separate, the curve is gentle, and the sequencing results are normal (Table 1, Fig. 1). According to these results, the genome sequencing data is su cient, C and G are distributed in the normal range, and sequencing quality is high, which altogether, support the subsequent analysis. The Dali-11 analysis of the reference genome of bitter gourd showed that the average comparison rate between the sample and the reference genome was 99.22%, indicating that the library was constructed normally and without pollution. The average coverage depth was 26X, and the genome coverage was 92.71%, at least one-base coverage. The comparison results are normal (Table 2), which might be used for subsequent detection of variations and correlation analysis. According to the coverage depth of each chromosome site, it can be seen that the genome is evenly covered, indicating that sequencing randomness was adequate (Fig. 1).

Detection of small InDel between samples and reference genomes
A total of 59098 and 73683 InDel were detected in the whole genome of bitter gourd H-13 and No. 12, respectively. The number of InDel in the coding region was 540 and 671, respectively. Further comparative analysis revealed that the number of InDel across the genome between H-13 and No. 12 was 96348. The InDel locus length distribution according to the length of InDel in the CDS region and the whole genome of the samples is shown in Fig. 3. The length distribution trend of InDel loci across the whole genome of the two varieties was basically the same (Fig. 2). The number of InDel loci decreased with increasing InDel fragment length, except for the gene region. There are more types of + l, + 1, +2, + 2, +3, and + 3 mutations in the CDS region, and relatively few types of + 3 and + 3 mutations in the genome. Using the SnpEff software to annotate the whole genome small InDel of H-13 and No. 12 (Fig. 2), we found 301 codingshift mutations in the CDS region of small InDel of H-13, which accounted for 56.90% of all mutations.
Followed by the deletion and insertion of codons, a total of 152 (28.73%), we found fewer mutations in start codons and stop codons, a total of 14 (2.65%). Small InDel in the CDS region of No. 12 were also coding-shift mutations accounting for 345 (52.11%), a total of 207 codon deletions and insertions (31.27%), and 28 mutations (2.72%) of the fewer start codons and stop codons. These small InDel caused a total of 1211 gene mutations (Table 3).

Polymorphism analysis of InDel markers
Twenty representative bitter gourd accessions were used as test materials to comprehensively reveal the genetic background of the bitter gourd germplasm collected by our research group; 220 pairs of InDel primers were synthesized for polymorphism detection and 25 pairs of primers with better polymorphism were screened (Table S2) According to the principle of using the smallest primer combination to identify the largest possible number of differing genotypes, the DNA ngerprints of 53 bitter gourd accessions were constructed using the combination of these 25 primers (Table S3) for the identi cation of the accessions in the germplasm sample under study.
Self-pollination of parents producing false hybrids sometimes occurs during hybrid seed production. According to the ampli cation of characteristic heterobelts by each primer in the ngerprint, our results showed that primers MC02-1, MC05-2, and MC11-4 effectively determined the degree of hybrid purity in different crosses. As shown in Fig. 4, primer MC02-1 helped to identify the F 1 hybrid produced with No. 12 as female parent and H-13 as male parent. In turn, the results showed that numbers A14 and A29 are actually false hybrids, and primer MC05-2 helped to identify the hybrid produced with 10-2-1 as female parent and 10-2-5-3 as male parent. The commercial name given to this hybrid is Tiantianhao 11. The identi cation results showed that this is not a false hybrid. Primer MC11-4 was used to identify the hybrid produced with 3-1-6-2-1 as female parent and 10-2-5-3 as male parent. The commercial name given to this hybrid was Tiantianhao 12, and again, the identi cation results showed that this is not a false hybrid plant.

Genetic relationship analysis of bitter gourd germplasm
The genetic similarity coe cient of Jaccard was calculated by the NTSYS-pc 2.10 analysis software, and the UPGMA method was used for cluster analysis to construct the resulting phylogenetic tree (Fig. 5). The results showed that the 53 bitter gourd germplasm accessions analyzed clustered into three groups with a similarity coe cient of 0.645 as threshold. The rst group included three accessions, namely, 16 − 1, MX-1, and LH-4, which accounted for 5. Our experiments demonstrated the feasibility and convenience of using genome re-sequencing data to verify InDel polymorphism. The identi ed InDel loci could be validated to produce informative genetic markers with reliable, high polymorphism rates. More importantly, as the coordinates of InDels were known relative to the reference genome, it was possible to develop genetic markers in speci c genome regions to help in effectively constructing genetic maps and using them for ne mapping of bitter gourd accessions No. 12 and H-13. The number of InDel detected in the whole genome of bitter gourd genotypes H-13 and No. 12 were 59098 and 73683, respectively, while the number of InDel detected across these two entire genomes together was 96348. In addition, most InDels were located in the intergenic region, followed by the upstream region of the gene (within 5 K), the downstream region of the gene (within 5 K), the intron, the gene, and the gene coding sequence. The small InDel in the CDS region also accounted for a large proportion of code-shift mutations. The InDel with code-shift mutations will lead to the change of the protein triplet codon reading-frame, which in turn may lead to a change of gene function. Therefore, InDel markers are of great value in genetic diversity analysis [21], high-density genetic map construction [22], and gene mapping [23] .
The genetic diversity of 53 bitter gourd germplasm accessions was analyzed by 25 pairs of InDel markers selected in the experiments reported herein. The results showed that the average PIC was 0.590, and that the selected primers were rich in polymorphism and might provide su cient genetic information. The speci c molecular ngerprints of the 53 bitter gourd accessions were constructed using the bands ampli ed by 25 pairs of primers, which was the external manifestation of the characteristic molecular structure and is bene cial to the identi cation of varieties in the process of plant variety protection. Simultaneously, speci c differences in band pro les between varieties can be used to identify the degree of seed purity of the hybrid generation obtained by crossing the two parental varieties. In this study, three pairs of InDel marker primers were used to determine the degree of seed purity of the three hybrids. Therefore, the 53 characteristic branch ngerprints constructed can help in identifying varieties and help determine the degree of seed purity in large-scale seed production. Thus, seed production units may determine seed purity at any time to ensure seed quality to promote the adoption of a variety in the market.

UPGMA-cluster analysis of germplasm is useful in bitter gourd breeding
Based on the correlation between molecular genetic distance and heterosis, the level of genetic diversity between parents is considered a means to predict hybrid performance and heterosis [24,25]. UPGMAcluster analysis results showed that, at a genetic similarity coe cient of 0.645, the 53 bitter gourd materials tested here can be divided into three groups. The observed clustering had certain correlation with fruit color, fruit nodule characteristics, and geographical origin, consistently with previous results [26,27]. The three materials in Group I had green peel; two of them were from Guangxi, China. Group II contained germplasms from different regions and with different fruit color and fruit tumor traits, and there were certain traits and regional similarities in some germplasms with close genetic relationships. Group III was green except for the white peel of BBL-1-2. It was also found that 3-1-6,12 and eight bitter gourds from Guangdong were closely related.
The two re-sequenced materials No. 12 and H-13 were clustered in different groups; the genetic distance was 0.563, the genetic relationship was distant, and the genetic background was complex. The two materials showed strong heterosis for yield. Therefore, when the molecular markers of the parent populations have been developed, heterosis prediction can be enhanced, which may not predict the best hybrid combinations, but it will certainly reveal the practical value of allocating existing and new hybrid bitter gourd germplasm to heterosis populations, thereby increasing the opportunity to develop ideal hybrids from the best heterosis populations. Similar conclusions have been reached for rice and maize [28,29]. Conversely, white bitter gourd did not show a similar aggregation trend, likely because the origin of white bitter gourd may involve diversi cation and its genetic basis may be relatively rich, consistent with results reported by Zhou et al. [30].
In this study, InDel markers were used to analyze the genetic diversity within a bitter gourd germplasm sample and the genetic relationship among bitter gourd genetic resources was studied at the molecular level. The evaluation of this genetic diversity showed that the InDel markers developed in our study will be a valuable marker source for the study of the genetic background and the molecular variation within bitter gourd germplasm with other sources in the future.

Conclusions
In this study, A novel set of InDel primers with high polymorphism potential were developed based on resequencing. The application of these markers for the evaluation of genetic diversity among 53 bitter gourd genotypes and the construction of molecular ngerprinting indicated that they were of great signi cance for mapping, variety protection and hybrid seed purity identi cation in the genetic analysis of bitter gourd. Therefore, the newly developed Indel marker of bitter gourd between the ne germplasm H-13, NO.12 was of great signi cance and provided a valuabitter gourdble resource for the identi cation of genetic characteristics of bitter gourd genotype.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.