Detection of Genomic Variation and Tepal Shape Related Genes Between Broad and Narrow Tepal Lotus (Nelumbo Adans.) by whole Genome Resequencing

Background: The shape, color, and size of petals or tepals are vital attributes of owering plants, as they affect the pollinator attraction ability, environmental stress adaptation, and ornamental value of plants. Compared with rose, chrysanthemum, and water lily, the tepal shapes of lotus (Nelumbo Adans.) exhibit low variation, and the absence of short-broad and long-narrow tepals limits the commercial value of lotus owers. Therefore, this study aimed to uncover the genomic variation underlying the broad and narrow tepal phenotypes of lotus owers, and to screen candidate genes associated with different tepal shapes. Results: Whole genome resequencing of two groups of lotus, NL (comprising three American lotus genotypes: broad tepal mutant, narrow tepal mutant, and wild type) and WSH (comprising two Asian lotus genotypes: narrow tepal mutant and wild type), generated 9.23–12.85 Gb clean data. A total of 716,656 single nucleotide polymorphisms (SNPs) and 221,688 insertion-deletion mutations (Indels) were obtained in the NL group, while 639,953 SNPs and 134,6118 Indels were obtained in the WSH group. Only a small proportion of these SNPs and Indels was mapped to exonic regions of genome: 1.92% and 0.47%, respectively, in the NL group, and 1.66% and 0.48%, respectively, in the WSH group. Gene Ontology (GO) analysis showed that out of 4,890 (NL group) and 1,272 (WSH group) annotated variant genes, 125 and 62 genes were enriched (Q < 0.05), respectively. Additionally, in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, 104 genes (NL group) and 35 genes (WSH group) were selected (P < 0.05). Finally, 41 genes were ltered and predicted to be associated with tepal shape in lotus. Conclusions: This is the rst comprehensive report of genomic variation controlling tepal shape in lotus. Signicant genetic variation was detected between the wild-type and mutant lotus genotypes, and varying levels of differentiation between groups NL and WSH. Candidate genes containing epidermal growth factor (EGF) and wall-associated receptor kinase (WAK) domains are considered important targets for further research on tepal development. Overall, the genomic data and candidate genes obtained in this study are essential references for future identication of tepal-shaped control genes in lotus combined with transcriptome analysis and quantitative trait loci (QTL) mapping. genes in this study related to cell expansion and division must be investigated in future studies for understanding tepal development in lotus. Overall, the results of this provide valuable resource for future identication of petal-shaped control genes combined with transcriptome technology and QTL mapping and for understanding the mechanism of ower morphogenesis in Nelumbo.


Background
Lotus (Nelumbo Adans.) is one of the most economically important plants in the world. The genus Nelumbo has two species: Nelumbo nucifera Gaertn., commonly known as Asian lotus, and Nelumbo lutea Willd., commonly known as American lotus [1,2]. Asian lotus is one of the top 10 traditional owers in China and the national ower of India. Lotus, particularly Asian lotus, is usually divided into three types, depending on its agricultural application: ornamental lotus, rhizome lotus, and seed lotus [3,4]. Approximately 2,000 commercial cultivars of ornamental lotus have been developed to date, accounting for 96% of the total number of the three types [5]. However, as an ornamental character of importance [6], the shape of petals (referred to as tepals in lotus) exhibits little phenotypic variation in these cultivars. Few of lotus germplasms exhibit short-broad tepals (similar to rose) and long-narrow tepals (as in chrysanthemum), which limits the horticultural value of lotus. Therefore, to develop lotus cultivars with unique tepal shapes, understanding the genetic basis of tepal shape determination is critical.
Genetic mapping of quantitative trait loci (QTL), genome-wide association studies (GWAS), and transcriptome analysis using modern high-throughput sequencing methods are powerful approaches that facilitate the identi cation of novel genes and pathways. The completion of de novo sequencing of lotus genome [7,8] and the rapid development of highthroughput sequencing technology have made it possible to analyze nucleotide sequence variation among various lotus cultivars. Whole genome resequencing is used to detect single nucleotide polymorphisms (SNPs) and insertion-deletion mutations (Indels) in the whole genome [9,10] and to mine structural variation and copy number variation by high-depth sequencing [11,12]. Using this technique in lotus, Hu et. al [13] identi ed a large number of nucleotide sequence variation and 103,656 simple sequence repeats (SSRs) between the genomes of 'Chiang Mai Wild' and 'Middle Lake Wild', which could be used for QTL mapping and molecular-assisted breeding. In another study, whole genome resequencing of 19 accessions belonging to three subgroups of cultivated temperate lotus revealed that rhizome lotus showed the lowest genomic diversity, and the candidate genes related to temperate lotus and tropical lotus always exhibited divergent expression patterns [14].
Petal growth depends on the precise orchestration of the frequency of cell division, orientation of cell division planes, and extent of cell elongation [15][16][17]. For example, non-uniform cell expansion within a petal resulted in distorted petals in Eustoma grandi orum [18]. Although the molecular basis of petal shape is largely unclear, it has been shown that several genes, such as LEUNIG [19,20], SEUSS and APETALA2 [21,22], JAGGED [23][24][25], CYCLOIDEA [26], INDOLE-3-BUTYRIC ACID-RESPONSE5 [27], and SPIKE1 [28], regulate petal shape by affecting cell proliferation and or cell expansion. The roles of these genes in petal shape regulation were mainly veri ed in model plants including Arabidopsis and Antirrhinum.
In this study, two groups of lotus, with wide-short and narrow-long tepals, were analyzed by whole genome resequencing. We explored the differences in the number and distribution of SNPs and Indels within each group and between the two lotus groups and screened functional genes and metabolic pathways related to the development of broad and narrow tepals. The results of this study provide important clues for the exploration of genes affecting tepal shape in lotus.

Results
Whole genome resequencing data of the six genotypes A total of six samples belonging to groups NL and WSH were subjected the whole genome resequencing, and 9. 23-12.85 Gb clean data, with 'Q20 rate' > 97% and 'Q30 rate' > 92%, were generated (Additional le 1). Among the six samples, M652 produced the lowest amount of clean data (9.23 Gb), whereas NL-CK produced the highest amount (12.85 Gb). The GC content of all samples varied from 39.28% to 40.14%, and no signi cant difference was detected in GC content between and within the two groups (Additional le 1). The mapping rate of NL group genotypes (93.63%-94.43%) was lower than that of WSH group accessions (98.22%-98.43%). Additionally, the 1× and 4 × coverage of NL group samples was also lower than those of WSH group accessions; the 4 × coverage of NL group ranged from 80.41% to 83.54%, whereas that of the WSH group varied from 94.65% to 98.67% (Additional le 1).

SNPs and Indels in the six genotypes
The sequence reads of all six accessions were mapped onto the reference genome sequence of lotus, and SNPs and Indels were called for each accession. The total number of SNPs identi ed in each sample of NL group was about 7-10fold higher than that of WSH group, and it was about 8-10-fold higher in Indel (Additional le 2). Of all the SNPs identi ed in the NL group, 6.15% (M512), 5.90% (DP23), and 5.73% (NL-CK) were mapped to exonic regions, which were approximately 3.5 times greater than the number of exonic SNPs in the WSH group. Of all the Indels identi ed in the NL group, 0.84% (M512), 0.81% (DP23), and 0.79% (NL-CK) were mapped to exonic regions, which was approximately 1.5 times those of group WSH (Additional le 2).
Within each group, the controls contained more SNPs and Indels than their mutants (Additional le 2), which was probably associated with the increase in variant sites caused by the pooling of DNA extracted from three seedlings per control. Additionally, the ratio of nonsynonymous to synonymous SNPs (Nonsyn/Syn), transitions to transversions (Ts/Tv), and exonic sequence variation to total variation (exonic/total) varied slightly among the three accessions within each group (Additional le 2), to some extent, suggesting that the three genotypes within each group exhibit very low genetic diversity.

Analysis of SNPs and Indels between mutant accessions and their controls
Within each group, the SNPs and Indels were ltered between any two samples (except between WSH-CK1 and WSH-CK2). A total of 716,656 SNPs and 221,688 Indels were obtained from the NL group, while 639,953 SNPs and 134,6118 Indels were obtained from the WSH group (Fig. 1). Compared with the controls, nucleotide polymorphisms in the mutant samples of the two groups occurred mainly in the intergenic regions: intergenic SNPs accounted for 61.3% and 72.6% of all SNPs identi ed in NL and WSH groups, respectively, and intergenic Indels accounted for 58.4% and 67.4% of all Indels identi ed in NL and WSH groups, respectively (Fig. 1). By contrast, the proportion of SNPs and Indels that mapped to exonic regions was very small in both groups (1.9% SNPs and 0.47% Indels in the NL group; 1.7% SNPs and 0.48% Indels in the WSH group) (Fig. 1). This showed that the ratio of exonic SNPs to total SNPs was slightly higher in the NL group than in the WSH group.
Annotation and analysis of genes harboring SNPs and Indels in the exonic regions The predicted amino acid sequences of the variant genes with exonic SNPs and Indels were blasted in the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to annotate their functions. In the GO database, 4,890 (72.8%) of the 6,715 variant genes in the NL group were selected for annotation and assigned to three functional categories for a total of 19,785 times ( Table 1). The most times of these variant genes were assigned to the biological process category (99,311 times [50.2%]), followed by the cellular component and molecular function categories (Table 1). In the entire genetic background and variant genes, GO terms including 'ADP binding', 'pattern binding', and 'polysaccharide binding' were signi cantly (Q < 0.05) enriched in the molecular function category and contained 88, 37, and 37 candidate genes, respectively ( Fig. 2A). After excluding duplicates, a total of 125 candidate genes were selected from the above three GO items (Additional le 3). In the WSH group, 1,272 genes were annotated in the GO database, accounting for 23.8% of the 5,353 variant genes, which was signi cantly lower than the proportion of annotated genes in the NL group (Table 1). These annotated genes of WSH group received a total of 52,425 assignments in the three functional categories (in the same order as that observed in the NL group): 25,824 times (49.3%) in the biological process category, followed by 20,751 times in the cellular component category, and 5,850 times in the molecular function category (Table 1). Seven GO terms, such as 'ADP binding', 'chitin metabolic process', 'chitin catabolic process', showed signi cant differences (Q < 0.05). Among them, 'ADP binding' contained 55 genes, while each of the remaining six GO terms contained 12 identical genes (Fig. 2B).
Finally, A total of 62 candidate genes were identi ed in the seven GO terms, after excluding gene duplicates (Additional le 3).
In the KEGG database, 2,286 variant genes of the NL group were assigned into 122 pathways, of which the top three pathways were 'metabolic pathways' (430 genes), 'biosynthesis of secondary metabolites' (256 genes), and 'starch and sucrose metabolism' (67 genes) ( Table 1). By contrast, only 645 genes of the WSH group were annotated and distributed to 68 pathways, and the top three pathways with the most annotated genes and their order were the same as those in the NL group, each with 128, 84, and 39 genes, respectively ( Candidate genes related to the development of tepal shape in lotus According to the 227 (NL group) and 89 (WSH group) genes enriched in the GO and KEGG databases, those shared by both groups and with mutation rate > 5‰ in the exonic regions were selected. Thus, a total of 41 crucial candidate genes were obtained that potentially regulate the tepal shape in lotus, including eight genes shared by the two groups and 24 and nine genes exclusive to the NL and WSH groups, respectively (Additional le 4). Among the eight shared genes, three (LOC104590721, LOC104605275, and LOC104607443) were enriched in the GO but not the KEGG database, and the remaining ve were enriched in both GO and KEGG databases (Additional le 4).
Of the 41 candidate genes, 35 were annotated with a clear function, including 17 disease resistance related genes, six (like-) receptor protein kinases, and 12 related to chitinase, transferase, synthase, and other functions (Additional le 4). The lengths of the 41 genes varied from 891 to 17,163 bp, in which two candidate genes (LOC104607832 and LOC104609114) produced four transcript variants, and 10 genes produced two to three transcript variants (Additional le 4).
In terms of 'family and domain', the genes harboring nucleotide-binding adaptor shared by APAF-1, R proteins and CED-4 (NB-ARC) and leucine-rich repeat (LRR) domains were the most (18, 44%) and mainly related to the function of 'ADP binding' (Additional le 4). While the domains of the six (like-) receptor protein kinases were epidermal growth factor (EGF)-like, Pkinase, and wall-associated receptor-kinase galacturonan-binding (GUB_WAK), which are associated with the morphological development of plants due to their functions such as 'ATP binding', 'calcium ion binding', 'polysaccharide binding', and 'protein serine/threonine kinase activity' (Additional le 4). In the KEGG database, only 18 of the 41 candidate genes were assigned to speci c pathways, while the remaining 23 genes were unknown (Additional le 4).

Discussion
Differences in genome mapping between the NL and WSH groups The two groups of samples investigated in this study were representative wild-type germplasms of the two species of Nelumbo, and whole genome resequencing results revealed many differences at the DNA level. For example, the mapping rate, 1× coverage, and 4× coverage of the three lotus accessions in the NL group were lower than those in the WSH group (Additional le 1). These differences occurred probably because 'China Antique' (source of the reference genome) exhibited a closer genetic relationship with each of the three accessions within the WSH group (native to China) than with the geographically isolated American lotus accessions in the NL group. This reason may also explain why more SNPs and Indels were detected in the NL group than in the WSH group (Table 1). Previously, we showed that many SNPs and Indels among M512, DP23, and 'China Antique' could be detected using expressed sequence tags (ESTs) and SSRs [33], which indirectly supported the above inference that the reference genome exhibited lower sequence similarity with the American lotus genome than with 'Weishan Hong' genome.
In addition, the three accessions in the WSH group exhibited higher Nonsyn/Syn ratio and Ts/Tv ratio than those in the NL group (Additional le 2). In other words, less non-synonymous mutations and transitions occurred in the genomes of NL group accessions, probably because of the lack of gene transfer between Asian lotus and American lotus during the long period of geographical isolation [14,34].

Distribution of tepal shape related SNPs and Indels on lotus chromosomes
To visualize the genomic diversity among samples, we drew 12 separate maps of SNP and Indel densities on the chromosomes for the six accessions (Additional le 5-8), with the parameters of "window = 100k" and "step = 100k". On the SNP density map of the WSH group, three color-blocks showed visible differences between M652 and its two controls (WSH-CK1 and WSH-CK2) in the 20 longest scaffolds of the lotus genome (Additional le 5), which represent key regions for QTL mapping and gene scanning for the tepal shapes trait in future studies. For instance, the SNP density of M652 in the 6.0-6.3 M interval of NW010729074.1 (located on lotus chromosome 3) was higher than those of the two controls (Additional le 5). In this region, there were 12 variant genes, among which the LOC104586031 locus in M652 genome contained the highest number of exonic SNPs ( ve) compared with the two controls and was also signi cantly enriched in the GO database (Additional le 5). Besides, LOC104586031 was known to encode a 2-OG-Fe(II) oxygenase superfamily protein (ISP7), but its function has not been con rmed. On the Indel density map, the narrow tepal accession, M652, exhibited four sites with lower Indel density than the WSH-CK1 and WSH-CH2 controls (Additional le 6), and these sites may represent the chromosome regions associated with tepal shape in lotus.
As described above, the mapping rate in the NL group was lower than that in the WSH group, probably because of the distant genetic relationship between NL accessions and 'Chinese Antique' (the reference genome). Therefore, no visual distinction was evident on the maps of SNP or Indel density among the three NL group accessions (Additional le 7-8).
Genes associated with tepal shape in lotus Tepal shape and size are remarkably consistent within a given ecotype [35,36] but are in uenced by the environment [37][38][39][40]. Among the 41 tepal shape related candidate genes identi ed in this study, 17 genes encoded disease resistance proteins belonging to the NB-LRR family, of which 12 were exclusive to the NL group, four were common to both groups, and only one gene was unique to the WSH group (Additional le 4), implying that M512 and or DP23 in the NL group may have undergone natural selection for resistance to disease and survived. This hypothesis was supported, to some extent, by the explicit functions of 21 genes detected by the polymorphic bands between M512 and DP23, which were not only involved in plant organogenesis, plant hormone synthesis, and signal transduction but reportedly are also involved in plant physiology and metabolism under stress [33]. Additionally, the contrasting number of disease resistance genes between the two groups was consistent with, at least to some degree, the fact that M512 and DP23 were derived from natural mutation or selection of lotus seeds, whereas M652 was induced by arti cial radiation.
The development of tepal shape mainly regulated by cellular proliferation and expansion and the direction of cell division [15,17]. In this study, six genes predicted to encode EGF-like or WAK domain-containing proteins (Additional le 4), required for cell expansion [41][42][43], may play an essential role in the morphogenesis of lotus tepals. ERECTA, which encodes a LRR receptor-like serine-threonine kinase (STK), was veri ed to be a major effect locus for shaping petals in Arabidopsis thaliana [44]. A BLAST search of this Arabidopsis gene sequence in the lotus genome identi ed its homolog, LOC104600563, which was also annotated as ERECTA. However, this gene was not one of the 366 genes (NL 277 + WSH 89) screened in this study. Nonetheless, the LOC104604923 gene identi ed in the WSH group as well as the previously reported WAK domain-containing genes LOC104590721, LOC104596880, LOC104608917, and LOC109115528 were all STK-type genes [41] like ERECTA, they will be the focus of subsequent research.

Conclusions
The shape and size of petals are essential for the survival and reproduction of plants, and both these morphological features are two of the most important targets in ornamental crop breeding. Lotus germplasms with broad and narrow tepals generated by natural mutation and arti cial irradiation, respectively, represent ideal materials for exploring the mechanism of tepal shape development. The mutant samples and their respective controls within each group showed a large number of SNPs, Indels, and candidate genes, which also exhibited differences between group NL and WSH.
Candidate EGF and WAK domain-containing genes in this study related to cell expansion and division must be investigated in future studies for understanding tepal development in lotus. Overall, the results of this study provide a valuable resource for future identi cation of petal-shaped control genes combined with transcriptome technology and QTL mapping and for understanding the mechanism of ower morphogenesis in Nelumbo.

Plant material
In 2015, two accessions of American lotus with signi cant differences in tepal shape were obtained from the 'International Nelumbo Collection' (certi ed by International Waterlily & Water Gardening Society in 2016) of Shanghai Chenshan Botanical Garden, Shanghai, China. Both these accessions were germinated from wild lotus seeds collected from Florida, USA. Compared with the wild American lotus, one genotype (M512) showed broad and short tepals, while the other genotype (DP23) exhibited narrow and long tepals (Fig. 3), and the tepal shapes of both accessions were stable after years of vegetative propagation via rhizomes. In 2016, 1,024 seeds of 'Weishan Hong', a wild Asian lotus, were irradiated with cobalt-60, which emits gamma rays, and a mutant (M652) with narrow and long tepals was obtained (Fig.   3), and the tepal shape was stable after three years of clonal propagation.

Grouping of lotus accessions
The lotus genotypes investigated in this study were divided into two groups: NL (short for N. lutea), which comprised American lotus genotypes, M512, DP23, and their wild-type control (NL-CK), and WSH (short for 'Weishan Hong'), which comprised M652 and its two wild-type controls, WSH-CK1 and WSH-CK2. Each control contained three wild seedlings.

DNA extraction
Three young leaves of each mutant (M512, DP23, and M652) were harvested separately from three pots of their clones, and one young leaf of three seedlings of each control (NL-CK, WSH-CK1, and WSH-CK2) were collected. DNA was extracted separately from the three leaves of the same genotype using the modi ed cetyl triethylammonium bromide (CTAB) method [29], in which the sugar removal was performed three times, then they were pooled together at equimolar concentrations.

Date ltering
The original image data generated by the sequencing machine were converted into sequence data via base calling (Illumina pipeline CASAVA v1.8.2). The sequence data were then subjected to the quality control (QC) procedure to remove unusable reads: 1) The reads with a Phred quality (Q) score <20; 2) The reads contain the Illumina library construction adapters; 3) The reads contain more than 10% unknown bases (N bases); 4) One end of the read contains more than 50% of low-quality bases (sequencing quality value ≤5).

Read mapping, variant detection, and annotation
The clean reads were mapped to the reference genomes of 'China Antique' (accession number: GCA_000365185.2) using the Burrows-Wheeler Aligner (BWA) program [30] using the following parameters: 'mem -t 4 -k 32 -M'. Subsequent processing, including duplicate removal, was performed using SAM tools [31] with the parameter 'rmdup' and PICARD (http://picard.sourceforge.net). Raw SNP and Indel sets were called with SAM tools using the following parameters: 'mpileup -m 2 -F 0.002 -d 1000'. Then, these sets were ltered using the following criteria: mapping quality > 20; depth of the variate position > 4. ANNOVAR [32] was used for the functional annotation of variants.

Screening and enrichment analysis of variant genes
Genes showing SNPs and Indels between any two samples (except between WSH-CK1 and WSH-CK2) within each group were selected. Then, among these genes, those showing SNPs and Indels in the exonic regions were manually extracted and searched in the NCBI non-redundant protein (Nr) database for annotation and in the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases for enrichment analyses.

Screening of genes controlling tepal shape
Based on the blasting in the GO and KEGG databases, genes from group NL and WSH showing signi cant enrichment (Q < 0.05) were identi ed. Among these genes, two types of genes, common to both groups and showing a mutation rate (SNP and Indel density per kb in the coding region) greater than 5‰, were extracted and searched in NCBI and UniProtKB databases to clari ed their biological information. Both types of genes were considered as candidates that regulate tepal shape in lotus.   Buds and second-day owers of NL group (top panel) and WSH group (bottom panel). The NL group accessions include M512 (wide-short tepals), DP23 (narrow-long tepals), and NL-CK (wild American lotus; the control). The WSH group accessions include M652 (derived from 'Weishan Hong' by gamma irradiation; narrow-long tepals) and WSH-CK ('Weishan Hong' belonging to wild Asian lotus; the control). Scale bars = 1 cm.