Phenotyping
To explore sources of resistance to CBB present among cotton germplasm, the cotton diversity panel (Tyagi et al. 2014) was challenged with Xcm (race 18). Bacteria were infiltrated into cotyledons and resistance was observed as dead or necrotic tissue surrounding the site of inoculation indicating the hypersensitive response, whereas susceptibility was characterized by water-soaked lesions (Fig. 1). Out of 253 accessions screened, 61 (24.11%) accessions showed a resistant CBB response and 192 (75.89%) accessions showed a susceptible CBB response (Supplementary Table 1, Fig. 2). Among the resistant varieties, Arkot 8102 was selected for in depth characterization. A consistent hypersensitive resistance response was observed in Arkot 8102 infiltrated with various races of Xcm (races 1, 2, 3, 12, and 18), suggesting Arkot 8102 carries broad-spectrum resistance gene(s) to CBB.
Arkot 8102 was crossed with Acala Maxxa (susceptible) and advanced to the F6 generation to form a recombinant inbred line (RIL) population. Screening of 104 RILs at the seedling stage with Xcm yielded 54 resistant (51.92%) and 50 susceptible (48.08%) lines. This 1:1 ratio of phenotypic segregation (Test statistic = 0.15, p = 0.69), suggests that CBB resistance in accession Arkot 8102 is controlled by a single locus with a major phenotypic effect.
Genome-wide Association Studies
Genotyping of 380 G. hirsutum accessions with the Cotton SNP 63K array resulted in a total of 18,124 high-quality polymorphic SNPs. The number of polymorphic SNPs in the sub genomes was uneven; the D-genome had 11,004 polymorphic SNPs, whereas in the current study, only 7,120 polymorphic SNPs were identified for the A-genome. The 18,124 SNPs covered a total length of 2,213.6 Mb (99.42% of G. hirsutum cv. TM1 CRI v1 reference assembly). These polymorphic SNPs were evenly distributed on all 26 chromosomes (Supplementary Fig. 1a), with an average SNP density of one SNP per 163.70 Kb. The number of polymorphic markers in the A-genome was comparatively lower than in the D-genome. The SNP density in the A-genome was lower (one SNP per 245.98 Kb) compared to the D-genome (one SNP per 81.42 Kb) (Supplementary Table 2). The estimation of linkage disequilibrium (LD) decay indicated the existence of large linkage blocks (average size 3.02 Mb) in the genome when compared to self-pollinated crops (Andrade et al. 2019; Panahabadi et al. 2021; Würchum et al. 2021) (Supplementary Fig. 1b). The existence of large LD blocks might be attributed to less reshuffling of genomic segments among the accessions.
Among the 380 accessions in this study, the three CBB response phenotypes (resistant, susceptible and unknown) were distributed across all the diverse groups identified by the NJ phylogeny tree (Fig. 2), indicating that population structure would not pose problems for the downstream analysis. MLM analysis with kinship and population structure as covariates identified fifty-eight markers above the Bonferroni threshold p-value (Table 1; Fig. 3; Fig. 4). All the fifty-eight associations were identified in a 2.01 Mb region of the long arm of chromosome D02 (543,166 to 2,553,694 bp). The SNP markers i25755Gh and i46775Gh had the highest -Log10p-value of 19.29 and explained the highest PVE of 28.48% (Table 1, Supplementary Table 3).
Table 1
Significant marker trait associations for Cotton bacterial blight (CBB) resistance identified through genome wide association studies (GWAS) in the diversity panel consisting of 253 G. hirsutum accessions.
SNP | Chromosome | Position (base pairs) | Reference allele | Alternate allele | Minor allele frequency | -Log10 p-values | PVE % |
i04845Gh | D02 | 543166 | C | A | 0.18 | 10.75 | 16.46 |
i31037Gh | D02 | 545043 | T | G | 0.20 | 8.44 | 12.92 |
i15279Gh | D02 | 559284 | C | T | 0.18 | 10.75 | 16.46 |
i15281Gh | D02 | 562823 | C | A | 0.18 | 10.75 | 16.46 |
i34589Gh | D02 | 624606 | T | C | 0.24 | 5.63 | 8.46 |
i33474Gh | D02 | 755697 | T | C | 0.20 | 13.45 | 20.44 |
i21644Gh | D02 | 796513 | G | A | 0.20 | 13.45 | 20.44 |
i04874Gh | D02 | 827194 | A | G | 0.20 | 14.58 | 22.05 |
i35403Gh | D02 | 870024 | T | C | 0.23 | 11.57 | 17.69 |
i22212Gh | D02 | 896017 | T | C | 0.20 | 14.58 | 22.05 |
i18845Gh | D02 | 949231 | T | C | 0.23 | 18.12 | 26.92 |
i04876Gh | D02 | 951167 | C | A | 0.23 | 18.12 | 26.92 |
i18846Gh | D02 | 951367 | G | A | 0.23 | 18.12 | 26.92 |
i69508Gl | D02 | 957010 | A | G | 0.23 | 14.66 | 22.17 |
i04878Gh | D02 | 970930 | T | C | 0.21 | 17.90 | 26.63 |
i04879Gh | D02 | 971612 | G | A | 0.21 | 17.90 | 26.63 |
i04880Gh | D02 | 973912 | G | A | 0.21 | 17.90 | 26.63 |
i15296Gh | D02 | 981029 | T | C | 0.21 | 17.90 | 26.63 |
i15297Gh | D02 | 981804 | C | T | 0.21 | 17.90 | 26.63 |
i15298Gh | D02 | 982155 | C | T | 0.21 | 17.90 | 26.63 |
i04882Gh | D02 | 983729 | G | T | 0.21 | 17.90 | 26.63 |
i04883Gh | D02 | 989410 | T | C | 0.21 | 17.90 | 26.63 |
i15307Gh | D02 | 992868 | A | G | 0.25 | 7.11 | 10.83 |
i18847Gh | D02 | 993950 | T | C | 0.24 | 7.04 | 10.72 |
i04886Gh | D02 | 1000201 | C | T | 0.21 | 15.67 | 23.59 |
i30611Gh | D02 | 1001262 | A | G | 0.21 | 17.90 | 26.63 |
i35404Gh | D02 | 1011058 | A | C | 0.25 | 6.53 | 9.91 |
i04890Gh | D02 | 1037321 | G | A | 0.21 | 17.90 | 26.63 |
i25755Gh | D02 | 1147586 | T | C | 0.21 | 19.29 | 28.48 |
i46775Gh | D02 | 1151037 | T | C | 0.21 | 19.29 | 28.48 |
i04909Gh | D02 | 1420750 | T | C | 0.21 | 15.62 | 23.52 |
i40135Gh | D02 | 1443517 | A | G | 0.23 | 14.77 | 22.33 |
i04915Gh | D02 | 1470075 | A | G | 0.23 | 5.77 | 8.70 |
i04916Gh | D02 | 1471015 | T | C | 0.23 | 6.53 | 9.91 |
i42410Gh | D02 | 1489885 | A | G | 0.23 | 5.77 | 8.70 |
i04917Gh | D02 | 1533931 | A | C | 0.23 | 16.82 | 25.17 |
i15329Gh | D02 | 1534113 | G | A | 0.29 | 11.29 | 17.28 |
i51275Gb | D02 | 1534368 | A | G | 0.29 | 11.29 | 17.28 |
i15330Gh | D02 | 1538965 | G | A | 0.22 | 16.82 | 25.17 |
i04918Gh | D02 | 1540924 | C | A | 0.22 | 16.82 | 25.17 |
i15331Gh | D02 | 1541147 | A | C | 0.22 | 13.13 | 19.98 |
i14029Gh | D02 | 1548603 | A | G | 0.22 | 16.82 | 25.17 |
i00270Gh | D02 | 1967963 | T | C | 0.21 | 7.49 | 11.44 |
i04929Gh | D02 | 1979246 | T | C | 0.23 | 7.62 | 11.63 |
i30271Gh | D02 | 2103773 | A | G | 0.23 | 8.34 | 12.77 |
i00697Gh | D02 | 2113080 | C | T | 0.23 | 9.87 | 15.12 |
i35889Gh | D02 | 2173210 | T | C | 0.21 | 9.70 | 14.86 |
i22184Gh | D02 | 2177973 | C | A | 0.23 | 8.34 | 12.77 |
i21826Gh | D02 | 2208558 | T | C | 0.23 | 8.34 | 12.77 |
i00293Gh | D02 | 2231591 | G | A | 0.21 | 9.70 | 14.86 |
i04940Gh | D02 | 2289925 | C | T | 0.20 | 5.88 | 8.86 |
i04941Gh | D02 | 2294468 | A | G | 0.21 | 8.21 | 12.56 |
i48135Gh | D02 | 2296995 | A | C | 0.21 | 8.21 | 12.56 |
i47804Gh | D02 | 2333681 | A | G | 0.22 | 6.71 | 10.19 |
i47803Gh | D02 | 2414378 | A | G | 0.22 | 6.40 | 9.71 |
i39157Gh | D02 | 2471169 | A | G | 0.21 | 7.42 | 11.32 |
i04944Gh | D02 | 2507620 | G | A | 0.25 | 7.42 | 11.32 |
i40777Gh | D02 | 2553694 | A | G | 0.26 | 7.62 | 11.65 |
PVE % = Phenotypic variance explained by the SNP markers expressed in percentage. |
Linkage mapping
In addition to the GWAS in the diversity panel, we performed linkage mapping to further resolve the broad-spectrum resistance to bacterial leaf blight in accession Arkot 8102. SNP marker analyses of the genotypic data in the RILs showed that out of 63,058 SNP markers, 5,714 (9.06%) markers were polymorphic between parents Acala Maxxa and Arkot 8102. A total of 2,745 non-redundant SNPs were used for the construction of the linkage map. Mapping of these polymorphic markers at the LOD score 9.0 threshold yielded a total of 105 linkage groups. Ninety-two of these groups, consisting of 2,630 SNP markers, were merged into 26 linkage groups. Thirteen groups consisting of 104 nonredundant SNP markers and eleven problematic SNPs could not be assigned. The 26 linkage groups spanned 4,942.62 cM with a marker density of 1.88 cM (Supplementary Table 4). A-genome spanned 2,508.73 cM with 1,250 markers and D-genome spanned 2,433.90 cM with 1,380 SNP markers. Chromosome D08 had the highest marker density of 1.14 cM per marker while Chromosome D11 had the lowest marker density of 4.18 cM. A total of 289 SNP markers (10.99%) showed segregation distortion in the mapped markers. Higher distorted SNP markers were observed in the D-genome (166 SNPs) compared to the A-genome (123 SNPs). Ten SDRs were identified in the A-genome compared to thirteen SDRs in the D-genome (Supplementary Table 5). The pairwise recombination fractions and LOD score plot of the linkage groups showed the correct grouping of the markers. Further, there were no signals of relatedness of markers between the linkage groups (Supplementary Fig. 2). However, there were few linkage blocks (Yellow squares) detected in the linkage groups (Supplementary Fig. 2).
The comparison of linkage map with sequence-based physical map showed high collinearity between the physical map with a genetic map indicating correct ordering of the markers and there are no major chromosome rearrangements in the mapping populations used in comparison to the sequence-based physical map (Supplementary Fig. 3; Supplementary Table 6). The linkage groups covered a total of 2.02 Gb (91.10%) of the physical map (Supplementary Table 7). In general, the marker coverage of the individual chromosomes was high. All the chromosomes showed a high marker coverage of greater than 90% up to 99.41% except A09, A10, A12, D05, D11, and D13. Chromosomes A09 and D05 were poorly covered with 22.18% and 60.31%, respectively (Supplementary Table 7).
Linkage mapping showed CBB phenotype segregated with the group-19 of the JoinMap grouping tree at LOD 9.0. Group-19 contained 93 markers (inclusive of redundant markers), Current study assigned group-19 as the long arm of chromosome D02 based on physical position and anchor marker (BNL2882) information (Shan et al. 2016). The removal of redundant markers and iteration-based ordering of the markers resulted in the refined ordering of SNPs on the long arm of D02. The resultant genetic map of the D02 long arm contained 31 SNP markers with a total map length of 8.43 cM and a marker density of 0.76 cM (Fig. 5). Six redundant markers were also included in the i04900Gh flanking locus distal to telomere (i15323Gh, i04897Gh, i04898Gh, i14028Gh, i04904Gh, i04905Gh, i04907Gh) of CBB resistance (Fig. 5). The CBB resistance gene (BB-13) was mapped at 0.47 cM to SNP i04890Gh proximal to telomere and 0.46 cM to i04900Gh SNP marker distal to telomere (Fig. 5). All the SNPs genetically mapped on the long arm of chromosome D02 were collinear with their genomic positions in the physical map (Fig. 5). These mapped SNP markers corresponded to a genomic region of 1,442,843 bp between 133,045 bp to 1,575,888 bp on chromosome D02. This genomic region accounted for two percent of the total length of chromosome D02 (71,136,581 bp) and contained thirty putative gene sequences (Supplementary Table 8).
Haplotype analysis and candidate gene identification
The combined mapping through GWAS and linkage analysis was used to improve the resolution of the mapping and genomic targeting of the CBB resistance locus. The fifty-eight MTAs detected through GWAS were spread out in the 2.01 Mb region (543,166 to 2,553,694 bp) (Table 1). The CBB resistance locus (BB-13) in linkage analysis was mapped to a region between 1,037,321 bp to 1,408,478 bp (Fig. 5). From the combined analyses the overlapping target genomic region for the CBB resistance was 371 Kb on D02 (Fig. 4). In this targeted genomic region, there were seven haplotypes in the linkage block (Fig. 4b). The haplotype “ACCGCCTTAAAG'' was found in fifty-three of the resistant lines. The haplotypes “GCCATTCCGGGT” and “GATGTTTTAAAG” were found in four resistant lines each. The fifty-three accessions having the same haplotype suggest a major resistant gene common to these accessions. The accessions with the haplotype “GCCATTCCGGGT” and “GATGTTTTAAAG” likely contain additional or different gene(s) for CBB resistance. Cotton reference genome scan in the orthologous genomic region between SNP markers i04890Gh and i04907Gh (1,037,321 to 1,408,478 bp) closely linked to CBB resistance locus (BB-13) identified a total of thirty putative genes (Supplementary Table 8). Out of these, nine putative genes were reported to be involved in disease resistance mechanisms in plants (Table 2). The enrichment analysis of the identified thirty putative genes indicated that the cysteine-type peptidase activity and proteolysis activity are enriched (Supplementary Table 9). Further, the top two significant MTAs in the study i25755Gh (1,147,586 bp; p = 19.29) and i46775Gh (1,151,037 bp; p = 19.29) are the flanking markers (out of four markers that are proximal towards telomere) of the CBB resistance locus BB-13 in the linkage study. The nearest putative genes to marker i25755Gh & i46775Gh are viz., NAC DOMAIN-CONTAINING PROTEIN 14 (1,004,477 to 1,037,718 bp), DELLA PROTEIN GAI (1,053,601 to 1,054,329 bp), CYSTEINE-TYPE PEPTIDASE genes ULP1B (1,121,200 to 1,123,019 bp) and ESD4 (1,134,403 to 1,135,511 bp).
Table 2
Putative candidate genes for CBB resistance in the targeted genomic region identified using combined genome wide association studies (GWAS) and linkage analyses in the Upland cotton.
Gene ID | Gene Name | Description | Start | End | Domains | Reference |
---|
Gh_D02G012100 | RGA4 | Putative disease resistance protein RGA4 | 980685 | 983807 | NB-ARC & LRR | Song et al. 1995; Zhou et al. 1995; Zhou et al 2001; Feuillet et al. 2003 |
Gh_D02G012200 | At4g27190 | Disease resistance protein At4g27190 | 987105 | 994162 | NB-ARC & LRR | Song et al. 1995; Zhou et al. 1995; Zhou et al 2001; Feuillet et al. 2003 |
Gh_D02G012500 | NAC014 | NAC domain-containing protein 14 | 1004477 | 1037718 | NAM | Kaneda et al. 2009; Park et al. 2017 |
Gh_D02G012800 | GAI | DELLA protein GAI | 1053601 | 1054329 | GRAS domain family | De Vleesschauwer et al. 2016; Hou et al. 2010; Wild et al. 2012 |
Gh_D02G013400 | ULP1B | Putative ubiquitin-like-specific protease 1B | 1121200 | 1123019 | Ulp1 protease family & C-terminal catalytic domain | Üstün and Börnke 2014; Gilroy et al. 2007; Shabab et al. 2008; Song et al. 2009 |
Gh_D02G013500 | ESD4 | Ubiquitin-like-specific protease ESD4 | 1134403 | 1135511 | Ulp1 protease family & C-terminal catalytic domain | Üstün and Börnke 2014; Gilroy et al. 2007; Shabab et al. 2008; Song et al. 2009 |
Gh_D02G013900 | GAI | DELLA protein GAI | 1205161 | 1208450 | GRAS domain family | De Vleesschauwer et al. 2016; Hou et al. 2010; Wild et al. 2012 |
Gh_D02G014100 | RPL38 | 60S ribosomal protein L38 | 1233364 | 1235198 | Ribosomal L38e protein family | Nagaraj et al. 2015 |
Gh_D02G014400 | Ulp-1 | Sentrin-specific protease | 1308195 | 1309121 | Ulp1 protease family & C-terminal catalytic domain | Üstün and Börnke 2014; Gilroy et al. 2007; Shabab et al. 2008; Song et al. 2009 |
Gh_D02G014600 | ESD4 | Ubiquitin-like-specific protease ESD4 | 1322758 | 1324872 | Ulp1 protease family & C-terminal catalytic domain | Üstün and Börnke 2014; Gilroy et al. 2007; Shabab et al. 2008; Song et al. 2009 |
Gh_D02G015100 | CRT3 | Calreticulin-3 | 1367646 | 1373635 | Calreticulin protein family | Li et al. 2009; Matsukawa et al. 2013 |
The BB-13 locus may not be completely syntenic between the resistant varieties and the reference genome variety (TM1 CR1 v1) (Yang et al. 2019). Considering the larger genomic region (2.01 Mb) identified by GWAS only analyses, there were nineteen putative R-genes, of which four were NBS-LRR genes, six LRR repeat proteins and nine were WALL ASSOCIATED KINASES. The closest R-genes identified adjacent to the overlapping region were At4g27190 (987,105 to 994,162 bp) and RGA4 (980,685 to 983,807 bp) located 46 to 55 Kb to flanking SNPs i25755Gh and i46775Gh (Table 2).