Linkage Map Construction
The first map was developed from the F2:3 population between G. klotzschianum and G. davidsonii, a total of 728 polymorphic markers were used. The total map length was 1 480.23 cM, with an average marker interval of 2.182 cM (Kirungu et al. 2018). This map was designated as map A. The second map, designated as map B, was derived by genotyping the F2:3 population developed between G. thurberi and G. trilobum, and 849 polymorphic markers were used in the linkage map construction. The map size was 1 012.46 cM with an average marker distance of 1.193 cM. In both maps, it was observed that chromosome number two also annotated as D502 had the least map size of 82.908 cM and 28.665 cM in map A and map B, respectively. Interestingly in both the maps, chromosome D502 had a smaller map size but with the highest percentage of SD (Table 1). Similar results have been observed in other linkage maps in cotton (Yu et al. 2011; Li et al. 2016).
The consensus map was constructed by merging two data sets from the two genetic maps. A total of 1 492 markers, were mapped onto the 13 linkage groups encompassing the 13 chromosomes, and only 85 markers remained unlinked. The diploid cotton species has 13 chromosomes, while the tetraploid cotton species has 52 chromosomes (Mendoza et al. 2013; Magwanga et al. 2018a). This work was based on the diploid cotton species of the D genome. The consensus map size was 1 467.445 cM with an average marker distance of 1.037cM. Even though the map size was relatively smaller than map A, the marker interval was low, which improved the precision of the consensus map. From the consensus map, we observed that Chromosome D502 had the highest percentage of SD with 58.621%, followed by Chromosome D507 with 47.887%. Chromosome D501 had the highest number of markers with 143 markers, while Chromosome D502 had the least number of markers of 58 (Table 1). Most of the markers mapped on the consensus map were found to be contributed by map B rather than map A. A total of 797 markers from map B were mapped on the consensus map accounting for 53.41% while only 695 markers (46.58%) were from map A. The chromosome with the highest number of markers was Chromosome D501 with 143 markers while the chromosome with the least number of markers was Chromosome D502 with only 58 markers (Fig. 1)
Segregation Distortion (SD) Analysis
In map A, out of the 728 markers mapped, 159 markers were distorted accounting for 22.2 %, and the highest SD was observed in Chromosome D502 with 76.087 % followed by Chromosome D507 with 40.698 %. The SDRs were located on Chromosome D502, D505, D507, and D508. Chromosome D502 had the largest SDR, while Chromosome D507 had the highest number of SDR.
It was observed that the alleles in the SDR region were skewed towards a particular parental line, like in Chromosome D502 towards the female parent (G. klotzschianum ), and Chromosome D507 towards the heterozygosity(Kirungu et al. 2018). In the second genetic map B, there was a slightly lower number of distorted markers, with only 135 accounting for 15.783%, and the highest segregation distortions were observed in Chromosome D502 and Chromosome D507 with 42.857 % and 38.333%, respectively (Table 1). Chromosomes that had the SDRs were D501, D502, D506, D507, D509, D510, and D511. Moreover, the largest SDR was located on Chromosome D502, while Chromosome D507 had the highest number of SDR.
In the consensus map, the highest SDs were located on Chromosome D502 and D507, with distortion percentages of 58.621% and 47.887%, respectively. Similarly, the two chromosomes had the largest SDRs, as shown in Fig. 2. The largest SDR was located on Chromosome D502-2 and was skewed toward the female parents while SDR located on Chromosome D502-1 was skewed towards the heterozygous. Chromosome D507 had the highest number of SDRs with a total of five SDRs, and all the SDR were skewed towards the heterozygotes except for the SDR located on Chromosome D507-1, which was skewed towards the female parents. The majority of the SDRs were skewed towards the heterozygotes; similar results were observed in the analysis of SDRs in tetraploid cotton, more specifically on the chromosome 18 (Dai et al. 2017), rice (Wu et al. 2010), and wheat (Fans et al. 1998). Based on the individual maps, the SDs were skewed towards the female compared to the male parent, the results obtained were in agreement with the study conducted on an interspecific F2 population in which the segregated distorted markers were skewed towards the female parent Li et al. 2007.
Annotation of Genes at SDR
We conducted a blast search, and a total of 6, 038 genes were mined within the SDR region in Chromosome D502 and Chromosome D507 (Supplementary Table S2). The proportions of the genes between the two chromosomes were 2 308 genes in Chromosome D502 and 3 730 genes in Chromosome D507. These genes were further grouped according to their domain, in which a total of 1 117 domains were obtained. There are 622 domains which were shared between Chromosome D502 and Chromosome D507; the largest domain was the PF00069 (Pkinase; Protein kinase domain) with a total of 188 genes, followed by PF13855 (LRR_8; Leucine-rich repeat) with 132 genes and the third was PF07714 (Pkinase_Tyr; Protein tyrosine kinase) with a total of 108 genes. The genes in the three main domains were highly correlated with abiotic stress responsiveness. The genes located within the largest 12 domains were analyzed. Out of the 12 domains 9 domains were found to contain 'members of the resistant genes (R group of genes), these domains include Protein kinase domain; LRR_8; Leucine-rich repeat; Protein tyrosine kinase domain; NB-ARC domain; LRRNT_2; Leucine-rich repeat N-terminal domain; Pentatricopeptide repeat (PPR); Pentatricopeptide repeat (PPR_2) repeat family; Cytochromes P450 (CYPs); Myb-like DNA-binding domain and RNA recognition motif (RRM, RBD, or RNP domain) (Table 2 and Table 3).
Analysis of the Physiochemical Properties and Structures of the Genes Obtained from the Dominant Domain the(Protein kinase domain) Mined within the SDR in Chromosome D502 and Chromosome D507
The dominant domain was the Protein kinase domain (PF00069). It has been widely studied; for instance, it was found to be the dominant domain in the analysis of the genes conserved between the two upland cotton, G. hirsutum and its wild relative G. tomentosum (Magwanga et al. 2018b). We, therefore, explored the genes which belonged to this domain. The physiochemical properties of these genes showed significant variations; the molecular weight ranged between 10.351 kDa and 134.232 kDa, the charge was between -27 and 40; Isoelectric point (pl) values were between 4.375 and 10.382; the GRAVY values ranged between -0.721 and 0 251 while their protein lengths ranged between 611 aa and 12 310 aa (Supplementary Table S3). The GRAVY values were all below zero, indicating that these genes were mainly hydrophilic. the Protein kinase domain contained 28 different subfamilies. The subfamily with the highest number of genes was Probable types with a total of 64 genes, which included members such as the Probable inactive receptor kinase (4 genes); Probable leucine-rich repeat receptor-like serine (21), Probable L-type lectin-domain containing receptor kinase (3 genes); Probable receptor-like protein kinase (25 genes) among others (Supplementary Table S4).
The two most abundant subfamilies, the probable protein types and the Serine/threonine-protein kinase were further analyzed, by looking into their classification based on the phylogenetic tree analysis. The genes were found to be grouped into five clades, with clade 2 being the majority (Fig. 3). The most interesting concept is that the members within clade 3 had a percentage bootstrap similarity value of 100%. The majority of these genes have previously been found to be highly correlated to biotic stress tolerance; for instance, 11 genes such as Gorai.007G335000, Gorai.002G039900, Gorai.002G040100, Gorai.002G041100, Gorai.002G041200, Gorai.002G041800, Gorai.002G042100, Gorai.002G047500, Gorai.002G047900, Gorai.007G182500 and Gorai.007G334900 are homologous to an Arabidopsis gene, At5g39020, which has a functional role in leaf senescence during viral infection in Arabidopsis (Espinoza et al. 2007). Moreover, the remaining genes were homologous to an Arabidopsis gene, At1g67000, which plays a more significant role in salt stress pathways. It was also was found to be highly upregulated in the roots under salt stress conditions (Ma et al. 2006).
Cis-regulatory Elements Analyses of the Major Two Subfamilies: The Probable Protein Types and the Serine/Threonine-Protein Kinase
We examined the two major subfamilies to determine if there could be any of the regulatory elements related to either abiotic or biotic stress factors. Cis-regulatory elements are known to enhance the functions of the genes (Tümpel et al. 2006). In the analysis of the cis-elements, all the genes were found to be associated with either abiotic or biotic stress-responsive cis-regulatory elements; for instance, ARFAT with the sequence “TGTCTC” was found to be associated with 87 genes which function as ABA and auxin responsiveness. ABA is a plant phytohormone that is vital for plants' response towards stress (Trivedi et al. 2016). Other cis-regulatory elements predicted were CBFHV with a role in dehydration-responsive element / cold acclimation, DRECRTCOREAT functioning as activators that function in drought-, high-salt- and cold-responsive gene, lastly ABRELATERD1 with a function in early responsive to dehydration, AGMOTIFNTMYB2 induced by various stress such as wounding or elicitor treatment among others (Fig. 4 and Supplementary Table S5). The cis-regulatory elements detected such as ABRE have previously been found to associate with top-ranked plant stress-responsive transcription factors such as the NAC, MYB (Nakashima et al. 2009).
miRNA Prediction for the Major Two Subfamilies; The probable protein types and the Serine/threonine-protein kinase
In the prediction analysis of the miRNA targeting the various genes obtained for the two major subfamilies, a total of 287 miRNAs were found to target 91 genes (Supplementary Table 5). The high miRNA targets detected for these genes showed that the genes obtained from the SDR on chromosome D502 and chromosome D507 have a significant role in various biological processes within the plant. The highest level of miRNA target was observed for the following genes: Gorai.002G039900 (6 miRNAs), Gorai.002G041100 (9 miRNAs), Gorai.002G114100 (9 miRNAs), Gorai.002G133000 (7 miRNAs), Gorai.002G134400 (8 miRNAs), Gorai.007G244000 (9 miRNAs), Gorai.007G271300 (10 miRNAs) among the rest. The miRNA's targets were observed to be very high, with a single gene being targeted by a minimum of two to a maximum of 10 miRNAs. Some of the miRNAs detected were gra-miR172a and gra-miR172b all found to target Gorai.007G059900 which is a member of the serine/threonine-protein kinase. The SAPK2 mined within the SDR located on chromosome D507 has been found to have a function in fiber development in cotton (Abdurakhmonov et al. 2008). Moreover, miR398 has been extensively studied and found to have a role in enhancing abiotic stress tolerance in plants; for instance, gr-miR398 was found to be upregulated in plants exposed to water deficit conditions, and thus found to be responsible for enhancing tolerance towards oxidative stress, water deficit, salt stress, abscisic acid stress, ultraviolet stress, copper and phosphate deficiency, high sucrose and bacterial infection (Jia et al. 2009; Lu et al. 2010; Pashkovskii et al. 2010). The same miRNA was found to target Gorai.007G335000 a member of the probable receptor-like protein kinase mined within the SDR on chromosome D507.
GO Annotation of the Major Two Subfamilies; The probable protein types and the Serine/threonine-protein kinase of the Dominant Gene Domains
In the analysis of the GO terms, a total of 188 genes were found to have GO terms, in which a high number of genes were found to be involved in biological process (BP), with functions such as regulation of the biological process, response to stimulus, single-organism process, metabolic process, and cellular process, in relation to cellular component (CC), four major functions were detected, namely the cell, cell part, membrane part, and membrane while in molecular functions (MF), and only two functions were observed, binding and catalytic activity (Fig. 5). Some unique observations were made in some of the genes found within the SDR regions; for instance, Gorai.002G14960 (BRASSINOSTEROID INSENSITIVE 1-like) was found to have 20 GO functions, with 3 cellular component functions, namely endosome (C: GO:0005768), plasma membrane (C: GO:0005886) and integral component of membrane (C: GO:0016021). Five molecular functions were; protein serine/threonine kinase activity (F: GO:0004674), steroid binding (F: GO:0005496), ATP binding (F: GO:0005524), protein homodimerization activity (F: GO:0042803), and protein heterodimerization activity (F: GO:0046982). A very high number of biological processes were observed microtubule bundle formation (P: GO:0001578), protein phosphorylation (P: GO:0006468), skotomorphogenesis (P: GO:0009647), detection of brassinosteroid stimulus (P: GO:0009729), brassinosteroid mediated signaling pathway (P: GO:0009742), positive regulation of flower development (P: GO:0009911), response to UV-B (P: GO:0010224), pollen exine formation (P: GO:0010584), leaf development (P: GO:0048366), anthers wall tapetum cell differentiation (P: GO:0048657), negative regulation of cell death (P: GO:0060548) and regulation of seedling development (P: GO:1900140). Other genes harbored a range of GO functions from three to 10 different functions (Fig. 6 and Supplementary Table S6).
RNA Sequence Data Analysis Profiled under Abiotic Stress Conditions and in Different Fiber Developmental Stages
By the fact that the two major subfamilies were found to be targeted by stress-specific miRNAs and even found to be associated with some known cis-regulatory elements, we undertook to investigate if the genes would have any varying expression under drought, salt and even different stages of fiber development. Genes were then obtained from the Denovo sequenced data. The raw data for the RNA sequencing were transformed into log 2 and used in the construction of the heat map. The RNA expression analysis showed that the genes were categorized into three groups, with group 1 members exhibiting higher expression analysis at different fiber development stages (Fig. 7). The majority of the highly upregulated genes were obtained from the SDR regions in chromosome D507, such as Gorai.007G283900 (Serine/threonine-protein kinase Nek2), Gorai.007G186000 (Probable inactive receptor kinase At1g48480), Gorai.007G053000 (Serine/threonine-protein kinase SRK2I), Gorai.007G285300 (Serine/threonine-protein kinase WNK1), Gorai.007G235600 (Genome polyprotein), Gorai.007G247600 (Serine/threonine-protein kinase ppk15) and Gorai.007G308900. It is interesting to note that the gene which was highly upregulated in various stages of fiber development, was also found to be targeted by gr-miR164a, and the same miRNA has been found to target the NAC transcription factor family (Xie et al. 2000). Moreover, mutant Arabidopsis lacking ath-miR164c was found to exhibit a slight defect in carpel fusion (Baker et al. 2005). In addition, miR164a,b,c has been found to have a regulatory role in the expression of CUP-SHAPED COTYLEDON1 (CUC1) and CUC2, which encode key transcriptional regulators involved in organ boundary specification (Huang et al. 2012). These previous findings show that the gene found to be targeted by miR164a/b/c could be playing an essential role in fiber development.
Under abiotic stress conditions, genes exhibited differential expression, with group 3 members exhibiting significantly higher expression under salt, cold and drought stress conditions. Some of the genes which were highly expressed include Gorai.007G167300 (Probable serine/threonine-protein kinase WNK11), Gorai.007G247600 (Serine/threonine-protein kinase ppk15), Gorai.007G186000 (Probable inactive receptor kinase At1g48480), Gorai.002G102000 (Serine/threonine-protein kinase D6PKL2), Gorai.002G115600 (Serine/threonine-protein kinase CDL1), Gorai.007G295100 (Serine/threonine-protein kinase CDL1), Gorai.007G157300 (Serine/threonine-protein kinase MHK), Gorai.007G287200 (Probable serine/threonine-protein kinase At1g54610), Gorai.007G322800 (Probable serine/threonine-protein kinase At1g09600), Gorai.007G078700 (Probable receptor-like protein kinase At5g15080) and Gorai.007G020100 (Serine/threonine-protein kinase fray2). Among the highly expressed genes, Gorai.007G167300 was targeted by gra-miR398. Gorai.007G247600 was found to be targeted by gra-miR5207; miR398 is the first plant miRNA reported miRNA to be down-regulated by oxidative stresses. It has been intensively studied and found to be important in the regulatory process of copper homeostasis, in response to abiotic stresses such as heavy metals toxicity, sucrose, and heat, in addition to having a role in biotic stresses through the down-regulation of the expression of Cu/Zn-superoxide dismutase (CSD) (Sunkar 2006; Lu et al. 2010; Pashkovskii et al. 2010). The result shows that the SDR regions could be vital in the evolution of some of the significant genes required for the survival of the plants.
RT-qPCR Validation of the Selected Genes within the SDR Regions of Chromosome D502 and D507 under Drought and Salt Stress Conditions
Thirty genes were profiled on the leaf tissues of the four parental lines under drought and salt stress conditions. The genes exhibited three types of expressions across the four parental lines; however, more genes were found to be highly upregulated in the leaves of G. klotzschianum and G. thurberi compared with G. davidsonii and G. trilobum (Fig. 8A-D). The results obtained were in agreement to previous findings which have shown that G. thurberi is more tolerant to both biotic stress conditions, more so to Verticillium dahliae which is a fungal pathogen causing Verticillium wilt, a terminal disease to various crops (Dong et al. 2019). Moreover, in the study carried by Cai et al. (Cai et al. 2019) revealed that G. thurberi was highly tolerant to cold stress compared with G.trilobum. Furthermore, Kirungu et al. (Kirungu et al. 2018) found that G. klotzschianum harbored more beneficial traits compared with G. davidsonii.