Construction of a high-density genetic map and identification of candidate genes for resistance to Fusarium wilt based on the resequencing of recombinant inbred line population in Gossypium barbadense

Background: Resistance to Fusarium wilt (FW) is of great significance for increasing the yield of Gossypium barbadense . Most published genetic studies on G. barbadense focus on yield and fiber quality traits, while there are few reports on resistance to FW. Results: To understand the genetic basis of cotton resistance to FW, this study used 110 recombinant inbred lines (RILs) of G. barbadense obtained from the parental materials Xinhai 14 and 06-146, and Nannong was used to construct a high-density genetic linkage map. The high-density genetic map was based on the resequencing of 933,845 single-nucleotide polymorphism (SNP) markers, and 3627 bins covering 2483.17 cM were finally obtained. The collinearity matched the physical map. A total of 9 QTLs for FW resistance were identified, each QTL explained 4.27-14.92% of the observed phenotypic variation, and qFW-Dt3-1 was identified in at least two environments. According to gene annotation information from multiple databases, promoter homeopathic elements and transcriptome data, 10 candidate genes were screened in a stable QTL interval. qRT-PCR analysis showed that the GOBAR_DD06292 gene was differentially expressed in the roots of the two parents under FW stress and exhibited the same expression trend in the G. barbadense resource materials. These results indicate of the GOBAR_DD06292 gene in FW resistance in G. barbadense and lay a molecular foundation for the analysis of the molecular mechanism of FW in G. barbadense. the same trend of gene expression and the same relationship with disease resistance in G. barbadense resource materials. It is very likely that when G. barbadense is subjected to FW stress, the significant change in its expression in the roots will enhance resistance to FW in G. barbadense . These results further indicate the importance of the GOBAR_DD06292 gene in resistance against FW in G. barbadense and lay a molecular foundation for the analysis of the molecular mechanism of resistance to FW in G. barbadense .

With the increased yields of various crops, disease resistance is currently one of the most important biological characteristics in the breeding of global crop varieties and has always been an important component of the development of cultivation methods [13][14][15][16]. The demand for Gossypium barbadense has increased dramatically, making it necessary for cotton researchers to cultivate cotton plants that are highly resistant to FW. Therefore, understanding the genetic basis of resistance to FW in G. barbadense will contribute to the efficient breeding of resistant varieties. The control of FW is mainly achieved through the use of chemical fungicides [17]. However, their effects on the soil are often limited by the diversity of pathogens. Therefore, the development and application of disease-resistant varieties is an effective approach for reducing the threat of these pathogens. However, the molecular mechanism of FW infection and the determinants of disease resistance are still largely unclear.
The restriction of genetic recombination events and decreases in genetic diversity increase the difficulty of cross-breeding. With the completion and continuous improvement of cotton genome sequencing, molecular marker-assisted selection (MAS) has been increasingly widely used in cotton hybrid breeding to improve the accuracy and shorten the cycle of breeding [18][19][20][21]. For MAS, a high-density, high-quality genetic map is very important, which lays the foundation for locating QTLs of important agronomic traits and further identifying candidate genes in these QTL regions. With the construction of the first genetic map of allotetraploid cotton, much research has focused on the construction of genetic maps and QTL mapping [22,23]. To date, many genetic maps have been constructed using simple sequence repeat (SSR) markers. The first genetic map of cotton was constructed using restriction fragment length polymorphism (RFLP) markers, and many genetic linkage maps were developed for QTL mapping related to lint yield, the yield composition and fiber quality in cotton [24]. Other traits have also been mapped and identified using F2, BC and recombinant inbred line (RIL) populations of Gossypium hirsutum×Gossypium barbadense [25][26][27][28]. Although the cotton population genetic map has been saturated, many constructed genetic maps are mainly used for QTL mapping within the population of G. hirsutum [29][30][31]. Zhang et al. constructed a multimarker linkage map of G. hirsutum from a three-parent compound population, with a total of 978 SSR markers covering 4184.4 cM and an average marker distance of 4.3 cM [32]. Wang et al. used chromosome fragment introduction lines (CSILs) to transfer fiber quality QTLs to G. hirsutum for the first time without affecting economic traits [33]. However, a low level of SSR marker polymorphism is a bottleneck in constructing a genetic map of the genome. With the rapid development of DNA sequencing technology and the decline in sequencing costs, the development of large numbers of genome-wide single-nucleotide polymorphism (SNP) markers at the cotton genome level has become inexpensive and reliable. Therefore, it has become possible to use SNP markers to construct a high-quality, high-density genetic linkage map (HDGM) in G. barbadense. HDGMs of multiple species have been successfully constructed using resequencing technology [34][35][36][37].
To provide more information for breeding programs, the genetic basis of resistance to FW in cotton needs to be further studied. In this study, we used a recombinant inbred line (RIL) population including 110 individual plants from a cross between G. barbadense Xinhai14 and 06-146. The purpose of this study was to construct a high-density genetic map using SNP markers by resequencing 110 RIL populations. A total of 3627 bin markers were identified in 26 linkage groups, and a genetic map covering a total map distance of 2483.17 cM was finally obtained. A stable QTL for resistance to FW in G. barbadense was identified in various environments, which has also been reported in previous studies. Enrichment analysis combined with RNA-Seq was conducted to analyze the genes in the D03 region, and 10 candidate genes were finally selected. qRT-PCR was then used to evaluate different tissues and the results of different inoculation experiments in the two parents (Xinhai 14 and 06-146). According to a quantitative analysis of expression in different periods, the GOBAR_DD06292 gene showed expression differences in the roots of the two parents under FW stress and showed the same expression trend in the G. barbadense resource materials. These results indicate that the GOBAR_DD06292 gene may play an important role in resistance to FW in G. barbadense, which provides a molecular foundation for the analysis of the molecular mechanism of FW resistance in G. barbadense.

Disease grade in parents and RIL populations
The disease resistance ratings of the two parents and the RIL population were measured in 8 environments. In these 8 environments, the disease grades of the RILs ranged from 0 to 4, with an average of 1.73; the disease grades of the parents Xinhai 14 and 06-146 ranged from 0 to 3 ( Table  1). The disease grade of Xinhai 14 was significantly higher (p <0.01) than that of 06-146. The skewness and kurtosis recorded in each environment showed that the disease grade followed a normal distribution in the RIL population ( Figure 1, Table 1). In addition, compared with the parents Xinhai 14 and 06-146, superparental segregation of the disease grade was observed in the RIL population. An analysis of variance (ANOVA) showed that the generalized heritability estimate (H2) of the disease resistance grade was greater than 90.6%, indicating that the disease grade was highly heritable.

Statistics of DNA-seq data
By resequencing the parents and the 110 RIL population, the sequencing depth for the parent Xinhai 14 was 13×, while that of 06-146 was 16×, and the average sequencing depth for the parents was 14.5×; the coverage of the reference genome was 97%. The progeny sequencing depth was 7×, and the genome coverage was 95% (Table S1). For Xinhai 14, a total of 30.95 Gbp of clean data were obtained, while a total of 38.06 Gbp of clean data were obtained for 06-146. The total data volume for the 110 progeny was 1941.44 Gbp, and the total amount of data was 2010. 45 Gbp. The Q30 value exceeded 90%, and the GC content was greater than 34.72% (Table S2).

Information marked by SNPs
Using the clean reads, a total of 18,446,57 SNPs were obtained in Xinhai 14; the conversion-type category included 1,257,644 SNPs, the number of conversion-type SNPs was 587,013, the number of heterozygous SNPs was 394,393, and the number of pure SNPs was 1,450,264. A total of 1,844,657 SNPs were obtained in 06-146, including 1,257,644 of the conversion type, 587,013 of the transversion type, 483,400 of the heterozygous type, and 1,361,257 of the pure type (Table S3).

Construction of a linkage genetic map
To ensure the quality of the map, the markers that were homozygous and discordant in the parents were filtered out, the depth of the parental markers was less than 10X, and the markers located on a chromosome were removed. The offspring were genotyped in relation to the parents and a high depth of parental sequencing ensured the correctness of the offspring typing results. For the 933,845 filtered SNPs, a window size of 15 SNPs was adopted, with 1 SNP as the step size, for a sliding window scan along the chromosome. When the number of SNP types in the sliding window was greater than or equal to 11, the type was recorded as aa. When the number of SNP types in the sliding window was greater than or equal to 11, the type was recorded as bb. In other cases, ab was used for genotype filling and correction. After completing marker filling and correction, bin division was performed according to the reorganization of the offspring. The samples were arranged neatly according to the physical position of the chromosome. When a typing change was observed in any sample, a recombination breakpoint was considered to have occurred. The SNPs between recombination breakpoints were classified into bins. No recombination event was considered to have occurred in the bin. Finally, the bins were used as the mapped markers for the construction of a genetic map. In total, 3627 bins were finally obtained. The bins were used as a marker for subsequent analysis.
A total of 3627 bins were used to perform genotype analysis in 110 offspring ( Figure 2). Several chromosomes in the genomes of most strains showed no recombination; that is, they were completely derived from a single parental genome. Some strains also exhibited heterozygous chromosomal segments, which may be related to incomplete or incorrect repair after chromosome exchange. The bins were divided into 26 linkage groups based on the available information, the linkage group was used as a unit, the linear arrangement of the markers in the linkage group was obtained via HighMap software analysis, the genetic distance between adjacent markers was estimated, and finally the total genetic map distance was 2483.17 cM ( Table 2). The average number of bins in each linkage group was 138.5, among which D13 contained the fewest bins (53), and A06 contains the most (342 bins). The average distance between 26 linkage groups was 95.51 cM, D13 showed the shortest distance of 41.17 cM, and A01 showed the longest distance of 148.04 cM. The largest gap between A10 and D13 was 19.3 cM. The average proportion of gaps of less than 5 cM in the 26 linkage groups was 96.71%.

Quality assessment of genetic map
The results of collinearity analysis based on the positions of the markers in the genome and the genetic map in the above figure ( Figure S1) showed that the SNP markers in the 26 linkage groups presented higher coverage in the cotton genome. The lowest Spearman correlation coefficient of the 26 linkage groups was 0.80 on chromosome D05, and the highest was 0.99 on chromosome D13, with an average of 0.96. However, with the exception of A12 and D05, showing a coefficient lower than 0.9, the Spearman correlation coefficients of the remaining 24 linkage groups are all higher than 0.9 (Table S4). These results indicated that the order of most of the markers in the linkage group was highly consistent with the sequence of the Nannong reference genome. These findings showed that the quality of the map was reliable.

Mapping of QTLs for resistance to FW in island cotton
According to the results of R/QTL mapping, the composite interval mapping method was employed to locate QTLs for FW resistance in G. barbadense, and such QTLs were detected on 7 chromosomes (A01, A05, A07, A09, D03, D05, D09) in 6 environments ( Table 3). The explained phenotypic variation ranged from 1.037%-19.654%, and the LOD value ranged from 2-5.996. Among the different environments, the LOD values for the 11KC and 12AC environments were not located in the specified interval. qFW-Dt3-1, identified in 11AC, showed the widest range of phenotypic variation.
qFW-Dt3-1 was identified in 11AC and 17AC, and the corresponding chromosomes were the same as those reported by Wang [38]. Therefore, we used qFW-Dt3-1 as a stably inherited QTL for candidate gene mining.

Screening of candidate genes for stable qtl interval
According to the previous mapping results, qFW-Dt3-1 was located on 11AC and 17AC, and the intersecting gene observed in these two environments was selected as the candidate gene for resistance to FW in G. barbadense. In the qFW-Dt3-1 interval, the total number of encoded genes was 166. First, 148 genes were compared and enriched in the NR, SwissProt, GO, COG, and KEGG databases (Table S5). Based on the annotation information, we directly excluded genes that were not related to plant resistance, and we finally selected 37 genes related to protein phosphorylation, carbohydrate metabolism processes, transferase activity, redox processes, phenylpropanoid biosynthesis, plant-pathogen interactions, starch and sucrose metabolism and the MAPK signaling pathway for the next step of the analysis. Soon after examining the transcriptome data of the parents and selecting highly FW-resistant and highly FW-susceptible materials in the 110 RIL population ( Figure S2), it was found that the expression patterns of these 37 genes could be clearly divided into 2 types. There was no obvious difference in the first type of expression. Only the 8 genes of the second category showed significant differences in expression.

qRT-PCR
Adversity-related stress can cause transcriptome reprogramming events. Based on the analysis of the transcriptome expression profile combined with the identified promoter cis-acting elements (Table S6), it was speculated that 10 of the 37 genes (GOBAR_DD02552, GOBAR_DD02562, GOBAR_DD06292, GOBAR_DD07063, GOBAR_DD07508, GOBAR_DD09955, GOBAR_DD23790, GOBAR_DD_34378, GOBAR_DDDD889) may be involved in resistance to FW in G. barbadense. To further study the roles of these candidate genes in FW stress, qRT-PCR was used to detect the transcription levels of these 10 genes in the roots of the two parent seedlings under the conditions of FW stress. Under FW stress, compared with the results obtained at 0 h, genes other than GOBAR_DD07508 exhibited obvious expression differences in the roots (Figure 3). This indicated that the transcription levels of these candidate genes were significantly increased after FW stress. However, only 7 genes (GOBAR_DD02552, GOBAR_DD06292, GOBAR_DD07063, GOBAR_DD09955, GOBAR_DD28836, GOBAR_DD34889, GOBAR_DD37843) showed significant differences in expression during the same period following inoculation in the two parents. In summary, these seven genes may play a role in the response to FW invasion in G. barbadense, rather than the expression of genes related to the growth and development of cotton.  Analyses of the expression of genes in different regions and the different molecular biological functions of the genes were performed. Regardless of whether a plant is under any stress, the roots are the initial sensory plant component. For this reason, we selected the most significant period before and after FW to further verify the tissue-specific expression of these 7 candidate genes in the two parents. Their expression was generally highest in the roots, both before and after stress ( Figure 4). Interestingly, these 7 genes exhibited significant expression changes in different tissues before and after FW stress. Among these genes, three genes GOBAR_DD06292, GOBAR_DD09955 and GOBAR_DD34889, showed significant changes in the roots but showed the opposite expression patterns in the two parents, which was consistent with the previous analysis results. To further verify the functions of these 3 genes after G. barbadense was subjected to FW stress, based on results from our research group, 4 disease-resistant G. barbadense materials and 4 disease-susceptible G. barbadense materials ( Figure 5) were used to study prevalent gene functions. It was found that only GOBAR_DD06292 expression was significantly higher in resistant materials than in susceptible materials. The linear regression equation between the disease resistance grade (X) and expression levels (Y) in 8 materials was as follows: y = 36.9 -15.5 x, R2adj = 0.76. In summary, we speculate that the expression level of GOBAR_DD06292 is positively correlated with resistance to FW of G. barbadense. The results further revealed that the GOBAR_DD06292 gene exhibited a significant difference in expression after the two parents were subjected to FW stress and showed the same trend of gene expression and the same relationship with disease resistance in G. barbadense resource materials. It is very likely that when G. barbadense is subjected to FW stress, the significant change in its expression in the roots will enhance resistance to FW in G. barbadense. These results further indicate the importance of the GOBAR_DD06292 gene in resistance against FW in G. barbadense and lay a molecular foundation for the analysis of the molecular mechanism of resistance to FW in G. barbadense.  [41]. However, the number of SSR markers was not sufficient to construct a genetic map of high-density cotton. Through the use of SNP markers, the marker density in the cotton genetic map has been greatly increased. Jia et al. used 6295 SNP markers and 139 SSR markers to construct a genetic map with a total length of 4,07.98 cM [42]. Zhang et al. used 5521 SNP markers to construct a genetic map covering 3259.37 cM, with an average of 0.78 cM between adjacent markers [43]. Sun et al. used 3978 SNP marker to cover a total genetic distance of 2483.17 cM [44]. Therefore, SNP markers are more effective for constructing genetic maps. Many of the genetic maps constructed for cotton come from G. hirsutum.
In this study, we resequenced 110 G. barbadense RIL populations for the first time and constructed the first high-density genetic map of a G. barbadense RIL population using 933,845 high-quality SNP markers. There were a few chromosome in the genomes of most strains that showed no recombination; that is, they were completely derived from a single parental genome. Some strains also exhibited heterozygous chromosomal segments, which may be related to incomplete or incorrect repair after chromosome exchange. Although the number of RILs in the population was small, the RIL population showed a higher recombination rate and higher position resolution than the F2 population. The population traits were normally distributed, indicating that the population was suitable for constructing a genetic map and identifying QTLS for FW resistance in G. barbadense.
To date, most of the published cotton genetic research has focused on characteristics such as yield and fiber, while few studies have focused on disease. Bolek et al. mapped F2 populations showing high resistance to FW (Pima S-7) and susceptibility to FW (Acala 44) and used SSR to reveal polymorphism the between susceptible and highly resistant parents. A total of 225 SSR marker pairs were screened in a bulk composed of 10 resistant progeny and 10 susceptible progeny. Sixty markers were utilized to break down QTLs, and 11 linkage groups were built using 35 markers covering 531 cM, with a mean distance of 15.17 cM [45]. Wang et al. (2009) used molecular mapping technology to identify a gene for FW resistance, which was closely linked to the SSR marker "JESPR304-", and designated the gene FWR. Simultaneously, 4 FW resistance QTL were analyzed:, A7, D3, D9 (c23) and D1 (c15) [38]. In this study, a total of 8 environmental phenotypic values were used to locate 8 QTLs for resistance to FW in G. barbadense. Combining the functional annotations of genes within stable QTLs and orthologous genes in Arabidopsis with transcriptome data is an effective method for mining candidate genes. The candidate genes were further verified via transcriptome analysis combined with qRT-PCR. Finally, the GOBAR_DD06292 gene was found to be differentially expressed after the two parents were stressed by FW invasion and showed the same expression trend in G. barbadense resource materials. We found that the homologous gene of GOBAR_DD06292 in Arabidopsis thaliana was AT3G10300, and the homology was 80.50. SA, JA and ET treatments applied to Arabidopsis plants induced the expression of this gene. These results further indicated the importance of the GOBAR_DD06292 gene in resistance to FW in G. barbadense and lay a molecular foundation for the analysis of the molecular mechanism of resistance to FW in G. barbadense.

Plant materials
In this study, single-grain F2 plants were obtained by crossing the FW-susceptible G. barbadense material Xinhai 14 with the FW-resistant material 06-146. The obtained plants were then propagated to the 7th generation to form an RIL population containing 110 lines.

Planting method and character inspection
The two parents and 110 RIL lines were grown in 8 environments at 3 locations. The experiments were carried out at the experimental base of the Xinjiang Academy of Agricultural Sciences in the 16th Regiment of the First Agricultural Division of Alar City, Xinjiang, and the experimental site of the Xinjiang Korla Academy of Agricultural Sciences from 2011 to 2012. The experiment aimed at disease resistance identification was conducted in the Sixteenth Regiment of the First Agricultural Division of Alar City over 3 consecutive years from 2016 to 2018. The planting plot length was 3 m, the row spacing was 0.35 m, and the plant distance was 0.10 m. IN each setting, three repeats were conducted, and field management was routine. In 2018, a duplicate control was set up in the cotton cultivation room of Xinjiang Agricultural University to conduct an indoor identification experiment. The tested cotton seeds identified in the laboratory were broadcast live in a nutrient bowl, and the nutrient bowl was filled with vermiculite and sterilized nutrient soil at 1:2, 10 holes per bowl, 2 seeds per hole, and 3 groups of repeated controls were set up. When the seedlings grew to the 2-true-leaf stage, the redundant seedlings were removed. When the cotyledons of the seedlings were fully expanded, the seedlings were inoculated via the root-cutting method. The noninoculated material was used as the CK control, and disease was observed at the 2-leaf and 1-heart stages.
In all environments, the foliar disease area was used as an index to classify the disease. Five normally growing plants were randomly selected in each replication zone for the leaf surface disease area test. SPSS 23.0 was used to perform statistical analysis of the RIL population, including an ANOVA of disease value or the relative disease index in different environments [46]. The R software "qtl" package was used for QTL mapping [47].

DNA library preparation
In July 2017, the young leaves of two pairs of the parents and the 110 RIL populations were collected, and the young leaves were stored in a refrigerator at −70°C. Using the miniprep method, genomic DNA was extracted from the two parents and each individual RIL.

Development of filtering quality control and labeling of sequencing data
The original sequences (raw reads) obtained by sequencing were filtered to obtain clean reads.
The main data filtering quality control steps were as follows: remove adapter sequences; if the proportion of N bases in a read (the specific base type cannot be determined) is greater than 10%, then filter out paired-end reads; remove low-quality reads (in which the proportion of bases with a Q quality value of less than or equal to 10 accounts for more than 50% of the entire read). The positions of the clean reads in the reference genome of G. barbadense Nannong were compared using Burrows-Wheeler Alignment (BWA) software, the sequencing depth and genome coverage of each sample were calculated, and mutation detection was performed [48]. According to the results of clean read localization in the reference genome, Picard's Mark Duplicate tool was used to remove duplications and shield the impact of PCR duplications, and GATK was used for indel realignment (i.e., to locally recompare of the sites located near the indel comparison results) [49].
The error in the comparative results caused by indels was corrected to ensure the accuracy of the detected SNPs. When using GATK for variant calling, strict filtering was performed with the SNP: SNP cluster filter (if there were 2 SNPs within 5 bp, then the sequence was filtered out). Thus, the final SNP site set was produced.

Construction and quality assessment of the genetic map
The linkage map of the RIL population was constructed using the G. barbadense Nannong reference genome database. HighMap software was used to correct genotyping errors and sort sequences along chromosomes [50]. In addition, SMOOTH was used to correct errors based on parental genotype contributions, and the k-nearest neighbor algorithm was used to correct incorrect genotypes [51]. The multipoint maximum likelihood (MML) method was used to add the following distortion markers to the linkage map. Map distance was calculated with the Kosambi mapping function. The chi-square test was used to identify separation distortion marks. As mentioned in previous studies, the separation of distortion markers (SDMs) in the HDGM was acceptable [43]. The SNP markers in the linkage map were aligned with the genome using the local BLAST method.
To ensure the quality of the map, markers that were homozygous and showed inconsistency in relation to the parents were selected. The marker depth for the parents was at least 10X; markers that were not located on chromosomes were removed. The offspring were genotyped in relation to the parents, where the high depth of parental sequencing ensured the correctness of the offspring typing. Each sample was neatly arranged according to the physical location of the chromosome. When there was a typing change in any sample, a recombination breakpoint was considered to exist. A SNP between recombination breakpoints was classified as a bin. No recombination events were considered to occur within a bin. Finally, the bins were used as markers for mapping to construct a genetic map. Bins with a length of less than 10 kb were screened out. The polymorphic markers with severe partial separation (chi-square test P＜0.01) were filtered.

QTL positioning
The phenotypic values recorded in 8 different environments were analyzed with the R-3.62 software package "qtl". A strict LOD threshold was calculated through permutation experiments.
The parameters for QTL positioning were set as follows: calculation time of 1000, I type error p value of 0.05, PIN of 0.001, and mapping step length of 1.0 cm. If the same QTL was identified in two or more environments where the LOD threshold was greater than 2, it was considered a significant QTL. The QTL confidence interval (95%) was set as the location distance interval corresponding to a 1 LOD drop on both sides of the peak. When the confidence intervals overlapped, the QTLs detected in two or more environments were considered stable.

Screening of candidate genes
To obtain potential candidate genes, BLAST software was used to extract gene sequences within stable QTLs, and the genes in the associated regions were compared with the NR, SwissProt, GO, COG, and KEGG databases [52]. To reveal the general expression pattern of the candidate genes, the transcriptome sequencing data from the parents (Xinhai14×06-146) were used as a reference. The roots and young leaves of the two parents were sampled at the three-true leaf-stage, and the expression levels of candidate genes were detected by qRT-PCR.

qRT-PCR
According to the cDNA information for the candidate genes, Primer 5.0 software was used to design primers in a specific regions at the 5' end or 3' end of the gene sequence. Root tissue cDNA was used as a template, and the expression of candidate genes was detected by qRT-PCR. Each sample was set to 3 replicates, and the internal reference gene was GbUBQ7. A total RNA extraction kit (Tiangen, China) was used to extract RNA,. and 5×All-In-One Mastermix (Abm, Canada) was used for reverse transcription. Real-time PCR amplification was performed on an ABI 7500 Fast system. A BrightGreen 2×Qpcr Mastermix (Abm, Canada) kit was used according to the provided method, and the total amplification volume was 20 μL. The reaction program was as follows: predenaturation at 94°C for 30 s, followed by 40 cycles of denaturation at 95°C for 5 s, annealing at 57°C for 5 s, and extension at 72°C for 34 s. The results were used for relative quantitative analysis via the 2-ΔΔCt method [53].