Construction of a High-density Genetic Linkage Map and QTL Analysis of Resistance to Cucumber Mosaic Virus in Luffa cylindrica (L.) Roem. Based on Specific Length Amplified Fragment Sequencing

Background: Luffa cylindrica L. is an economically important vegetable crop that is consumed globally. Cucumber mosaic virus (CMV) is an important virus affecting Luffa spp. No specific high-density maps have been constructed owing to a lack of efficient markers. Furthermore, no genes or quantitative trait loci (QTLs) are reportedly responsible for CMV resistance in Luffa spp. The development of next-generation sequencing has enabled discovery of single nucleotide polymorphisms and high-throughput genotyping of large populations. Results: A total of 271.01 Mb pair-end reads were generated. The average sequencing depth was 86.19× in both maternal and parental lines, and 14.57× in each F 2 individual. When filtering low-depth specific locus amplified fragment (SLAF) tags, 100,077 high-quality SLAFs were detected, and 7,405 of them were polymorphic. Finally, 3,701 of the polymorphic markers were selected for genetic map construction, and 13 linkage groups were generated. The map spanned 1,518.56 cM with an average distance of 0.41 cM between adjacent markers. Our results also revealed that CMV resistance was regulated by QTLs. Based on the newly constructed high-density map, two loci located on chromosome 1 (100.072 ~ 100.457 cM) and 4 (42.475 ~ 44.398 cM) were identified to regulate CMV resistance in L. cylindrica . A gag-polypeptide of LTR copia-type retrotransposon was predicted as the candidate gene responsible for CMV resistance in L. cylindrica . Conclusions: A high-density linkage map of L. cylindrica was constructed using SLAF. QTL mapping based on CMV disease phenotypes of F 2 led to the identification of two QTL on chromosome 1 and 4, respectively. Kompetitive allele-specific PCR analysis of 60 F 2 individuals, which gave rise to F 2:3 individuals, was carried out. We found that the QTL on chromosome 1 was associated with CMV resistance. Mapping of CMV QTL combined with the transcriptomic sequence alignment identified a gag-polypeptide of LTR copia-type

retrotransposon as the most likely causal gene.

Background
Luffa spp. (2n = 26), which belongs to the family Cucurbitaceae, is widely cultivated in tropical and subtropical regions, such as China, Malaysia, India, Korea, Thailand, Central America, and Africa [1,2]. Among the nine Luffa species found worldwide, only Luffa cylindrica (L.) Roem. and L. acutangula (L.) Roxb. are domesticated [2,3]. Furthermore, L. cylindrica has always been a preferred crop for cultivation because of its higher fruit set and bigger fruit size [4]. This species is grown principally for production of sponges, which can be used as bath sponges, or as an alternative biomaterial used for packaging, furniture, and plastic appliances, and also as an adsorbent of heavy metal contaminants from industrial waters, which makes it a cash crop [4][5][6]. The immature fruits are used as a vegetable and are rich in various nutrients [7]. Its medicinal value has also been reported [8,9].
High-density genetic linkage maps play an important role in genome assembly, gene/ quantitative trait locus (QTL), fine mapping, and map-based gene cloning. One major requisite for high-density genetic map construction is the availability of a huge amount number of polymorphic markers. In Luffa spp., research concerning genetic linkage maps has only started recently, using sequence related amplified polymorphism (SRAP) markers. Cui et al. (2015) constructed an interspecific genetic linkage map using an F 2 population derived from a cross between L. acutangula (L.) Roxb. and L. cylindrica (L.) Roem [10].
Although this map was the first genetic linkage map of Luffa spp, 24 linkage groups more than 13 chromosomes were construct in Cui's study. Wu et al. (2016) constructed a more condensed interspecific genetic map, which consists of 177 EST-SSR markers distributed in 14 linkage groups (LG) with an average of 102.58 cM per LG [11]. Due to interspecific hybridization, there was a marked segregation distortion of molecular markers in this map. The markers were not well-distributed over the LGs. To our knowledge, based on L. cylindrica (L.) Roem genetic background, no higher-density genetic maps have been constructed with markers distributed evenly in the 13 chromosomes. The lack of abundant, stable, and polymorphism molecular markers, such as single nucleotide polymorphisms (SNPs), hampers the construction of high-density genetic maps.
With the advent of next-generation sequencing (NGS) technology, such as specific locus amplified fragment sequencing (SLAF-seq), it is now feasible to develop SNP markers on a large scale, which is useful for constructing a genetic map. With various advantages, such as its high-throughput, high accuracy, and low-cost, this technique has been employed for high-density genetic map construction in many species, such as common carp [12], sesame [13], soybean [14,15], Prunus mume Sieb. et Zucc [16], cucumber [17,18], wax gourd [19], and sweet osmanthus [20]. Moreover, SLAF-seq based genotyping does not depend on a reference genome sequence, which makes it particularly important for species without a sequenced genome, such as Luffa spp.
Cucumber mosaic virus (CMV) belongs to the Cucumovirus genus of the family Bromoviridae, which can be transmitted via >1200 plants and 60 aphid species [21].
Although several genotypes of L. acutangula have been found to be resistant to CMV [22], the genetic mechanism of their resistance is still unclear. In our study, SLAF-seq was employed to construct an SNP-based genetic map for L. cylindrica (L.), which spanned 1,518.56 cM, with an average of 0.41 cM among adjacent markers, and consisted 3,701SNP markers distributed on 13 linkage groups. In addition, the QTLs associated with CMV resistance were mapped by using this genetic map, certified the usefulness of it.
Furthermore, a gag-polypeptide of LTR copia-type retrotransposon as a candidate gene for such QTL was also predicted in this study. To our knowledge, this is the most high-density genetic map of Luffa spp. to date. The SNP markers and the genetic map reported in this study could prove to be valuable resources for future research. This is also the first report on detection of CMV resistance-associated QTLs and prediction of related candidate genes in L. cylindrica (L.).

SLAF sequencing and genotyping
SLAF-seq is an efficient approach for large-scale genotyping, which is based on highthroughput sequencing technologies, and which could be applicable to various species and populations. DNA was isolated from parental genotypes 'P1-21' and 'P2-16', and 130 F 2 individuals generated by a cross between these genotypes, and used to create SLAF libraries. SLAF-seq yielded 271.01 M clean, trimmed 100 bp paired-end reads, among which the guanine-cytosine (GC) content was 39.58% and Q30 ratio (the ratio of reads with a quality score of >30) was 89.16%.
All SLAF pair-end reads were clustered based on sequence similarity, and sequences with over 90% identity were grouped as one SLAF locus. Table 1 shows an overview of the average degree of SLAFs and the SLAF sequencing depth in both parental lines and the F 2 population. SLAFs containing more than four tags and sequence depth less than 213 were filtered out as repetitive and low-depth SLAFs, respectively. Only the SLAFs with 2-4 tags were identified as polymorphic SLAFs. Finally, in this study, a total of 100,077 high-quality SLAFs were detected, among which 14,918 were polymorphic (14.91%) (Additional file 1).
After filtering out markers lacking parental locus information among the polymorphic SLAFs, 11,788 markers were screened for genetic construction. Following the genotype encoding rule, the 11,788 SLAFs were classified into eight segregation patterns (ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc, and cc × ab) (Fig. 1). Due to the homozygous genetic background of the two parental lines (P1-21 and P2-16), aa × bb segregation pattern was needed to follow. There were 7,404 of 11,788 polymorphic SLAF markers followed the aa × bb segregation pattern, and were further selected to constructed the linkage map (Fig. 1). While SLAFs markers with parental sequence depth of less than 10×, completeness less than 70%, and/or significant segregation distortion (p-value <0.001), were considered as low-quality and should be filtered. Then, 3,701 markers were finally selected for genetic map construction.
High-throughput linkage map construction and evaluation After High Map linkage analysis , the 3,701 markers were mapped on 13 linkage groups (LGs) ( Haplotype and heat maps were used to evaluate the quality of the newly constructed genetic map. The double crossovers in the mapping population showing in haplotype maps indicated the errors in genotyping. the relationship of recombination among markers on the same linkage group was reflected in the heat map, and can be used to find potential sorting errors of markers. By using 3,701 SLAF markers, haplotype maps were generated for each of the F 2 individuals and the parental controls, which can intuitively display the recombination events within each individual as previously described [23] (Additional file 5). In this study, most of the recombination blocks were clearly defined and the percentage of double crossovers in all of the linkage groups was less than 0.18% (Table 3), which was lower than the describe of usually less than 3% [19]. Heat maps were also generated by using The disease rating of parental lines and their F 1 progeny suggested that the resistance to CMV in P2-16 might be incompletely dominant. In the segregating F 2 population, plants that were neither completely immune (disease score 0) nor severely infected (disease score 9) were found. The disease grades of F 2 plants were more or less equally distributed over 1, 5, and 7, whereas a slightly lower number of individuals scored 3 (Fig. 3b) 7), were chosen to produce F 2:3 progenies. When we compared the disease index of F 2:3 progenies generated from the resistant and susceptible F 2 groups, we found that they were extremely significant, which indicated that the disease index of F 2:3 progenies corresponded to the disease score of their F 2 plants. The disease rating in the F 2:3 populations from 60 F 2 single plants also showed a lower distribution on disease indices 33.4 ~ 55.6%, suggesting that the resistance to CMV in P2-16 was regulated by multiple quantitative genes (Fig. 3c). To validate the QTL identified by SLAF-seq, a kompetitive allele-specific PCR (KASP) assay was used for genotyping. All the SLAF markers in the two loci intervals were considered to develop KASP marker. However, due to the lack of genomic database, not all the SLAF marker sequences were suitable to design KASP markers. Finally, only six significant markers on LG 1 for the KASP assay were identified successfully (Table 4 and 5, Fig 5). As to the loci on LG 4, sequences were unable to reveal KASP markers. The disease index from the F 2:3 populations was used to represent the phenotypic values of individual F 2 plants in KASP analysis. All the KASP markers at 100.457 cM on LG 1 were significantly associated with CMV resistance in the 60 extremely resistant and susceptible F 2 plants with the phenotypic values from their pedigrees ( Table 5).
Transcript sequencing and candidate gene prediction Some studies have indicated that genetic diversity within single Luffa species is narrow [26]. This situation would cause a low polymorphism rate for molecular markers, which then makes it difficult to construct a high-density genetic map in Luffa cylindrica. With the rapid development of sequencing technology, SLAF-seq is a good strategy for highthroughput SNP discovery and genotyping, especially for species for which there is no reference genome available, such as Luffa spp.
In the present study, we employed SLAF sequencing technology to obtain novel markers in Luffa cylindrica for linkage map construction and QTL mapping. In total, 271.01 Mb pairend reads were generated, which corresponded to 100,077 high-quality SLAFs. Among them, 7,405 of SLAFs were polymorphic (7.4%). Eventually, 3,701 polymorphic markers were used to construct a genetic map, which contained 13 LGs. This map spanned 1,518.56 cM with an average distance of 0.41 cM between adjacent markers. completely symptomless after inoculation [22]. To our knowledge, this is the only study on CMV resistance in Luffa spp.
The genetics of resistance to CMV are varied among species, comprising dominant, recessive, monogenic, and polygenic genes or QTLs. Monogenic dominant resistance genes have been identified, such as RCY in Arabidopsis thaliana C-24 [28], RT4-4 in common bean [29], and Cmr 1 in pepper [30]. In Arabidopsis, two independent recessive genes (cum1 and cum2) were also identified to be involved in translation and movement of the CMV virus [31,32]. CMV resistance in chili pepper (Capsicum annuum L.)was also found to be regulated by multiple genes and/or QTLs in pepper [33][34][35]. In potato, CMV resistance was reported to be regulated by one major QTL and several minor QTLs [36]. Among cucurbits, like cucumber, a study of the genetics of resistance to CMV in Cucumis sativus var. hardwickii revealed that it was regulated by a single recessive gene [37]. In melon, resistance to CMV in the Korean accession PI161375 (SC) was initially described as oligogenic, recessive [38], and quantitative [39], with a major QTL, named cmv 1, which was in fact a single gene responsible for resistance to some CMV strains, providing full resistance to strains P9 and P104.82. Another study [40] found that the resistance to CMV in melon SC was governed by one gene and at least two quantitative trait loci, the single gene, cmvqw12.1, co-located with cmv 1 in linkage group (LG) XII, two QTLs in LGIII (cmvqw3.1) and LGX (cmvqw10.1) and showed interaction between cmvqw12.1 and cmvqw3.1. Due to the lack of sufficient molecular markers and adapted genetic linkage map, no resistance gene or QTL has yet been mapped.
Presently, we employed SLAF-seq to construct the high-density genetic map in sponge gourd (Fig. 3). SLAF-seq is a low-cost, efficient high-throughput sequencing based technique, and does not depend on the availability of a reference genome sequence. SLAFseq is a high-throughput technique that promoted the detection speed of QTLs associated with CMV resistance. L. cylindrica genotype P2-16 was found to exhibit partial CMV resistance. We were able to identify two QTLs underlying the resistance to CMV on linkage groups 1 and 4. These QTLs could explain 8.41% and 9.75% of the phenotypic variance respectively, with the LOD values of 3.56 and 3.28. The QTL on LG 1 was verified in subsequent F 2:3 lines using KASP markers developed based on the SLAF sequences.
The high-density L. cylindrica genetic map constructed in this study could serve as a valuable tool in many ways, such as gene mapping, marker-assisted breeding, map-based gene cloning, comparative mapping, and draft genome assembling. Furthermore, the resistance of CMV were mapped by using the high-genetic map constructed in this study, which directly interacts with the viral factor of potyviral genome-linked protein (VPg) [45]. eIF4E and eIF(iso)4E are recessive genes conferring resistance to viruses through the viral-encoded protein VPg [46][47][48][49][50]. Resistance to RNA virus has been validated by silencing of the eif4E gene in tomato, melon, and cucumber [51][52][53]. QTLs, involved in the restriction of CMV in pepper, were also found to be associated with partial restriction of CMV long-distance movement [34]. A phloem protein in cucumber has also been reported to be involved in long-distance movement of CMV [54].
The plants attacked by the virus activate defenses at different levels. At first, they activate basal defenses involving pathogen-associated molecular patterns (PAMPs), which are recognized by plant transmembrane receptors [55]. In addition, some plants exhibit a second line of defense, known as effector-triggered immunity (ETI), to counter pathogens that have evaded the basal defenses [56]. In this step, genes associated with ETI, known as resistance (R) genes, such as nucleotide-binding site (NBS) and leucine-rich repeats (LRRs), are activated [55,56]. The RCY1 gene in A. thaliana [28,57], and RT4-4 in common bean [29], are NBS-LRR type genes against CMV. In addition to these R genes, RNA In this study, retrovirus-related Pol polyprotein (gag-polypeptide of LTR copia-type), which exhibits high similarity to the retrotransposon Tnt 1-94 of Nicotiana tabacum, was identified in the CMV resistance mapped location (Additional file 8). Retrotransposons are mobile genetic elements closely related to retroviruses [65]. Both types of elements share structural characteristics and are functionally related. The expression of the Tnt 1-94 element has been studied in detail and shown to be induced in association with plant defense mechanisms [66][67][68]. We speculated that the retrovirus-related Pol polyprotein plays an important function in Luffa spp. against CMV virus. Furthermore, a probable disease resistance protein At4g27220 in A. thaliana or At4g27190-like in Cucumis melo, serine hydrolase (FSH1), and serine/arginine-rich splicing factor, RSZ21, in A. thaliana were predicted very near to the mapped location. Conversely, the annotation of gene function of mapping intervals in this study also validated the effectiveness of the mapped QTLs.

Conclusion
In this study, we performed SLAF-seq on an F 2 population of L. cylindrica. A high-density genetic map was constructed, which spanned 1,518.56 cM with an average distance of 0.41 cM between adjacent markers. Two QTLs on LG 1 and 4 of CMV resistance were mapped based on this map, and QTL on LG 1 was verified by KASP markers. Based on the search of the sequences of the linked markers in an RNA-seq database, candidate genes were predicted to be likely a gag-polypeptide protein of LTR copia-type retrotransposon.
Thus, we concluded that the QTLs on LG 1 and 4 most likely represent a retrovirus-related Seedlings of the offspring and parents were planted in a greenhouse of the Vegetable Research Institute, Jiangsu Academy of Agricultural Sciences, Nanjing, China. Young healthy leaves from the two diploid parents as well as F 2 individuals were immediately frozen in liquid nitrogen and stored in a -80 °C freezer. Total genomic DNA was extracted from each leaf sample using the cetyltrimethylammonium ammonium bromide method [69]. DNA quality was evaluated by 1% agarose gel electrophoresis, and the concentration was estimated using a NanoDrop Spectrophotometer (Thermo Fisher Scientific, USA).

CMV disease evaluation
The CMV isolate used in this study was CMV FNY provided by Dr. Guo of the Institute of  SLAF library construction and high-throughput sequencing The SLAF library construction and high-throughput sequencing were conducted as described by Sun et al. with few modifications [12]. The DNA samples of the 130 F 2 progenies and two parents were treated with two restriction enzymes, i.e. HaeIII and Hpy166II (NEB), to digest the genomic DNA. Subsequently, a single nucleotide (A) overhang was added to the digested fragments using Klenow Fragment (3´ → 5´ exo-) (NEB) and dATP at 37°C. Then, duplex tag-labeled sequencing adapters (PAGE-purified, Life Technologies, USA) were ligated to the A-tailed fragments using T4 DNA ligase (NEB).
Polymerase chain reaction (PCR) was performed using diluted restriction-ligation DNA samples, dNTP, Q5® High-Fidelity DNA polymerase, and PCR primers (Forward primer: 5'-AATGATACGGCGACCACCGA-3', reverse primer: 5'-CAAGCAGAAGACGGCATACG-3') (PAGEpurified, Life Technologies). The PCR products were then purified and separated on a 2 % agarose gel. Fragments with 300 to 500 base pairs (with indexes and adaptors) in size were excised and purified using a QIAquick gel extraction kit (Qiagen, Germany). Gelpurified products were diluted for sequencing. Then, pair-end sequencing (each end 125 bp) was performed on an Illumina HiSeq 2500 system (Illumina, Inc., USA) at the Biomarker Technologies Corporation in Beijing.
SLAF-seq data analysis and genotyping SLAF marker identification and genotyping were performed as previously described [12].
Low-quality reads in each cycle (quality score < 30; a quality score of 30 represents a 0.1% chance of error) were discarded by real-time monitoring during sequencing. Then, all SLAF paired-end reads with clear index information were detected by BLAT (-tileSize = 10 -stepSize = 5) [71]. The sequences with >90% similarity were grouped and identified as one SLAF locus.
In mapping populations of diploid species, such as L. cylindrica, one locus can contain at most four genotypes. Thus, the groups containing more than four tags were filtered out as repetitive SLAFs. Only the individuals that contained fewer than four tags, had an average sequence depth of 20-fold or more in the parents and 8-fold or more in the progenies, and exhibited >70% integrity in mapping population individuals were identified as high-quality SLAFs, and SLAFs containing 2 to 4 tags were considered as polymorphic markers.
Polymorphic SLAF markers were classified into eight segregation patterns as follows: ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc, and cc × ab. The alleles of each SLAF locus were defined according to the parental genotypes, and individuals were genotyped by sequence similarity to their parents [12]. Since the F 2 population was derived from two homozygous parents with a genotype of aa or bb, only the SLAF markers with segregation patterns of aa × bb was selected for subsequent map construction.

Linkage map construction
The modified LOD (MLOD) scores were calculated for the high-quality SLAFs from the F 2 population, which can be used for linkage clustering. Prior to ordering, markers with MLOD score <5 were filtered out. Map construction was achieved according to the method of High Map, as previously explained [72]. First, the markers with high qualities were grouped based on a pair-wise MLOD score for the recombination frequency. Second, those markers were ordered using Gibbs sampling, spatial sampling, and a simulated annealing algorithm. At the same time, the map distance was estimated. Third, the incorrect genotypes were recognized and eliminated through the k-nearest neighbor algorithm.
Marker ordering and error correcting were carried out iteratively so that the markers could be ordered correctly. Several cycles later, accurate linkage maps were achieved. The map function of Kosambi was used in the process. At last, heat maps and haplotype maps were used to evaluate the map quality.

Mapping of CMV resistance genes and KASP assay
The phenotypic data on CMV resistance were collected as described above. Linkage analysis of the CMV resistance gene loci with SLAF markers was performed with the Kosambi mapping function of JoinMap 4.0 (Kyazma, NL). SNP markers associated with CMV resistance identified by SLAF-seq were validated using a KASP assay. For each marker, two allele-specific primers (one for each SNP allele) and one common (reverse) primer were designed for each KASP assay using a tool provided by LGC Genomics (www.lgcgenomics.com) based on the SNP locus sequence. The KASP assays were designed by LGC Genomics and carried out according to the company's protocol (http://lgcgenomics.com).
RNA extraction, transcript sequencing, and candidate gene prediction For RNA extractions, plants of parental lines were inoculated with CMV as described above. At 0, 2, and 6 days' post-inoculation, leaves of three biological replicates per genotype and treatment combination were sampled and stored at 80°C. Total RNA was isolated using the RNeasy Kit (Qiagen, Germany), and RNA samples were shipped on dryice to Biomics (Beijing) Biotech Co. Ltd.
Poly (A) mRNA was isolated using oligo-dT beads (Qiagen, Germany). The mRNA was broken into short fragments of approximately 300 nt. First-strand cDNA was synthesized using random hexamer-primed reverse transcription. Second-strand cDNA was generated using DNA polymerase I. The cDNA fragments were purified and washed for end repair and ligated to sequencing adapters. The cDNA fragments of suitable size were purified and enriched by PCR to obtain the final cDNA library. The cDNA library was sequenced using Illumina HiSeq equipment (Illumina, USA). Clean reads were selected after removing lowquality sequences, adapter sequences, and some ambiguous bases 'N'. The remaining high-quality reads were filtered for short reads below 25 bp, and repeat sequences were masked before assembling. The clean reads were de novo assembled using Trinity with default K-mers = 25 [73]. Contigs were obtained by conjoining the K-mers in an unambiguous path. Then, the reads were mapped back to contigs to construct unigenes with the paired-end information. Finally, the overlapping unigenes from the six libraries were assembled into a continuous sequence, and redundant sequences were removed to acquire non-redundant unigenes that were as long as possible.
The sequences of mapping markers on chromosomes 1 and 4 were aligned to the transcript sequencing database. Candidate gene prediction was performed using GO, Pfam, Swissprot, and nr annotation.

Availability of data and materials
The datasets generated and analyzed during this study are available from the first and corresponding author on reasonable request.