Markers redundancy
A total of 1,083 markers were successfully located, accounting for 32.8% of all markers; the others markers were unsuccessfully located, possible due to the deficiency of reference genomes for designing markers, or the too short primer sequence. Among these 1,083 markers, a total of 20 redundant markers were found in this study (Table 2). Previous studies also used physical locations to identify the redundancy of tobacco SSR markers. A total of 478 (239 pairs) redundant primers were found(Tong et al. 2016), which was roughly the same as the result in this study. Among the 478 markers, 70 ones were successfully located in this study, and 4 pairs of markers were also considered as redundant ones in this study.
Map Construction
In this study, an integrated high-density genetic linkage map that was significantly better than other SSR maps had been constructed, a total of 3,354 markers used to construct the IHD map, 1,083 markers were judged as aligning successful. Among the 1,083 markers, 824 markers were corresponded to chromosomes and linkage groups. (Table S3), while the other 259 markers that were not correspond to chromosomes and linkage groups may be due to the high similarity between the S subgenome and the T subgenome.
Table 3
Comparison of published tobacco linkage maps
Marker type | Population type | Cross population | Number (loci) | Length (cM) | References |
AFLP | DH | W6 × Michinoku1 | 106 | 383 | (Nishi et al. 2003) |
RFLP/RAPD | F2 | N. plumbaginifolia × N. longiflora | 171 | 1062 | (Lin et al. 2001) |
SNP | BC4F3 | (Y3 × K326) × Y3 | 4895 | 2885.36 | (Tong et al. 2021) |
SNP | F7:8−9 | Yunyan 85 × Dabaijin599 | 7734 | 2689.06 | (Gong et al. 2020) |
SNP | | Honghuadajinyuan × RBST | 4138 | 1944.7 | (Xiao et al. 2015) |
SNP | | Honghuadajinyuan × RBST | 2162 | 2700.9 | (Xiao et al. 2015) |
SNP | BC1F1 | (TT8 × NC82) × TT8 | 13273 | 3421.8 | (Cheng et al. 2019) |
SNP/SSR | F2 | Yunyan 85” × “Dabaijin 599 | 4409 | 2662.4 | (Gong et al. 2016) |
SSR | | Beinhart-1000 × Hicks | 206 | 990.8 | (Vontimitta and Lewis 2012) |
SSR | F2 | Red Russian × Hicks Broad Leaf | 2363 | 3269.89 | (Bindler et al. 2011) |
SSR | DH | Honghua Dajinyuan × Hicks Broad Leaf | 611 | 1882.3 | (Tong et al. 2012) |
SSR | BC1 | (Y3 × K326) × Y3 | 626 | 1120.45 | (Tong et al. 2016) |
SSR | BC1F1 | (Y3 × Beinhart1000-1) × Y3 | 562 | 1341.18 | (Tong et al. 2018) |
SSR | F2 | Hicks Broad Leaf × Red Russsian | 282 | 1920 | (Bindler et al. 2007) |
SSR | F2:3 | Yanyan 97 × Honghuadajinyuan | 201 | 2326.7 | (Lan et al. 2014) |
SSR | RIL | Florida301 × Hicks Broad Leaf | 119 | 1176.5 | (Xiao et al. 2013) |
SSR | | | 3377 | 2489.82 | This study |
Although IHD map was significantly better than other SSR based maps (Table 3), there are still some defects. Only one HR map had more markers, and the bridge markers were unevenly distributed in different maps, such as LG5, LG15, LG16, LG20 of YK map, LG1 and LG5 of HH map. Bridge markers in these linkage groups were not enough for map integration, which result in that some markers were not integrated. Moreover, there were low-density regions at the ends of LG2, LG8, LG9, LG13, LG14, LG15 and LG16. Increasing the number of markers in these regions can further improve the quality of this IHD map (Fig. 3). In addition, the types of markers in the tobacco genetic map were very single, and there has no tobacco genetic map containing multiple markers, which still lags behind other Solanaceae plants. In tobacco genetic maps, only a few maps had multiple types of markers, such as N. plumbaginifolia × N. longiflora map (Lin et al. 2001) had RFLP markers and RAPD markers. In addition, there were also some SNP maps with high density and excellent quality. Such as Yunyan 85 × Dabaijin599 map(Gong et al. 2020) and (TT8 × NC82) × TT8 map(Cheng et al. 2019), these two maps had 7,734 and 13,273 markers, respectively. It is worth noting that a map developed by Gong et al. (2016) consists of both SNP and SSR markers, of which there were 196 SSR markers(Gong et al. 2016). However, these maps lacked a sufficient number of SSR bridge markers with four individual maps used in this study and cannot be used to integrate maps. Furthermore, due to the narrow genetic diversity of tobacco, the marker polymorphisms are very low, and less than one-fifth of the tobacco SSR markers have been mapped into genetic maps. Overall, there is still great prospect for the improvement of tobacco genetic map. How to increase the number and types of markers to improve the quality of tobacco genetic map is an urgent problem to be solved.
Collinearity Between Individual Maps, Integrated Linkage Map And Physical Map
It is worth noting that the integrated map and individual maps have a certain linear relationship, especially between HH map, YB map and YK map. The marker order on above maps was agreed well with the order of markers in the most linkage groups of the IHD map (S6 Fig), except that LG1, LG22 of HH map and LG6, LG15, LG23 of YK maps had significant differences (Fig. 4b). However, compared with HR map, only 9 linkage groups (LG2, LG6, LG8, LG11, LG12, LG14, LG19, LG20 and LG21) in IHD map had great collinearity, while the other linkage groups were quite different, especially for LG1, LG10 and LG17 (Fig. 4a). This may be due to the fact that the parents used in the HR map construction had a relatively distant with parents in other maps. The parents used in the HH map, YB map and YK map belonged to flue-cured tobacco, while the Red Russian used in the HR map was a special tobacco accession.
The order of the markers on physical map and IHD map was roughly the same (S7 Fig), although there were some misalignments. Only a few chromosomes had large differences, such as Chr2, Chr9, Chr12, Chr13, Chr17 and Chr21 (Fig. 5a). Interestingly, compared with the physical map, Chr2, Chr 9, Chr12, Chr13, Chr17 of YB map and YK map and Chr21 of HR map were also quite different (Fig. 5b). Collinearity differences between genetic map and physical map may be interpreted either as errors in the construction of the consensus map or in the genome assembly. The collinearity of individual maps and physical map were shown in S8 Fig. This tobacco reference genome publication in 2017 was generated by NGS, and covered over 90% of predicted genome size, still had a large gap. In contrast, allotetraploid cotton had a better reference genome. The genome published in 2018 uses single-molecule real-time sequencing technology (PacBio RSII) to assemble the genome, which had covered 98.94% of all sequences(Xiao et al. 2013; Wang et al. 2019). In diploid soybean, the reference genome constructed by PacBio RSII had covered 99.65% of the whole genome sequence(Yi et al. 2022).