Cytoplasmic male sterility (CMS) system plays an important role in utilization of crop heterosis. CMS is a maternally inherited trait, includes degenerate anthers, aborted pollen with carpelloid and petaloid stamens [1]. Current research determined that CMS phenotype is caused by mutations in the mitochondrial genome linked genes and reserved by fertility restorer genes in the nuclear genome [2–4]. CMS system avoids removal of anthers, thereby through hybrid technology enabling the generation of dramatically superior F1 progenies. These offspring’s display significant advantages over their parents and existing popular cultivars in terms of yield, stress tolerance, adaptability, etc. [5]. The CMS phenomenon exists in more than 150 plants and is also used for hybrid breeding of crops, such as maize [6, 7], rice [8, 9], pepper [10] and sorghum [11].
Cotton (Gossypium hirsutum L.) is vital source of fiber, oil, and most important economic crop for textile industry in the world. The challenge of low yield in cotton can be mitigated with CMS hybrid breeding system. In cotton up till now, Harknessii (D2 − 2) cytoplasmic male sterile (CMS-D2) lines [12, 13], Trilobum (D8) cytoplasmic male sterile (CMS-D8) lines [14], and upland cotton cytoplasmic male sterile (104-7A, Xiangyuan A, Jin A) have been established and utilized [15]. Theoretically, sources of cotton CMS lines are different therefore restorer genes are inconsistent, and so are the CMS recovery mechanisms. The restorer gene Rf1 of CMS-D2 could restore the fertility of CMS-D2 and CMS-D8 sterile lines, while fertility of CMS-D8 sterile lines could only be restore with Rf2 [16]. Furthermore, the Rf1 gene functions in sporophytes, whereas the Rf2 gene has a gametophytic restoration system. Previous studies revealed Rf1 gene loci and Rf2 gene loci are not allelic, but these genes are tightly linked at a genetic distance of 0.93 cM on chromosome D05. The mapping and identification of the molecular marker linked with Rf1 restorer gene in cotton has already been progressed. For example, Liu et al. [17] identified 2 RAPD and 3 SSR markers closed linked to Rf1. However, Feng et al. [18] developed 4 STS markers associated to Rf1. The study of Yin et al. [19] not only identified 5 new SSRs and 2 new STS markers for Rf1, but also made high-resolution genetic and physical maps of 15 markers at a genetic distance of 0.9 cM. More to this, Wu et al. [20, 21] recognized new BNL3535 SSR markers and developed 4 InDel markers by whole-genome resequencing. Zhao et al. [22] used super-BSA and successfully mapped Rf1 to 1.35 Mb region of chromosome D05.
With the increase in crop functional genome research, Rf genes have been successfully cloned in maize (Rf2) [23], petunia (Rf-PPR592) [24], radish (Rfo) [25, 26], rice (Rf1a, Rf1b, Rf2) [27–30],sorghum (Rf1) [31], and sugar beet (Rf1) [32]. Most of these genes encode PPR proteins, but Rf2 in maize CMS-T, Rf17 in CMS-CW and Rf2 in CMS-LD encode aldehyde dehydrogenase, 178-amino-acid mitochondrial sorting protein and mitochondrial glycine-rich protein, respectively [23, 33, 34]. At present, the major bottleneck of cotton CMS breeding system is narrow source of restorer genes and lack of excellent restorer lines compatible with a given sterile line. Unfortunately, no restorer gene has been cloned in cotton. Therefore, fine mapping and isolation of new restorer genes Rf2 in upland cotton are highly needed for efficient breeding. Interestingly, bulked segregant analysis (BSA) make possible to quickly locate molecular markers closely linked to the target gene by analysing the differences between SNPs and InDels in segregating population pools [35]. This method has already been used in gene mapping of Arabidopsis thaliana [36], rice [37–39] maize [40] and tomato [41]. Single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) are the most abundant type of DNA sequence polymorphisms, found within the genomic sequence of each species [42, 43], and used in QTL analysis. These markers have widely been used in cultivar identification, construction of genetic maps, genetic diversity, map-based cloning, the detection of genotype/phenotype associations, and marker-assisted breeding [44–46]. In recent years, the release of the upland cotton genomic sequence [47–49] and the rapid development of sequencing technology have enhanced detection and application of SNPs and InDels. Furthermore, the application of high-throughput genotyping methods makes SNPs highly attractive genetic markers [50, 51].
The objectives of this study were to physical map restorer gene Rf2 and to develop InDel markers co-separated with Rf2. A 1.88 Mb candidate interval was obtained by combining BSA with high-throughput SNP genotyping using a separate BC1F1 population. Based on the InDel variation in the 1.88 Mb interval, the InDel markers were developed and used to narrow down a 1.48 Mb candidate interval. The PPR family genes and the genes selected by transcriptome data in candidate region were analysed by qRT-PCR. The InDel markers co-separated with Rf2 will be useful to trace Rf2 breeding restorer lines in cotton.