Genome Wide Identification, Classification and Functional Characterization of IQD Gene Family in Diploid and Tetraploid Cotton (Gossypium)


 AimsCalmodulin (CaM) is considered as the most significant Ca+ signaling messenger that mediate various biochemical and physiological reactions. Whereas the calcium level can by provoked by extracellular or intracellular stimuli. CaM is activated by calcium binding proteins (CaMBPs) to regulate physiology of the cell. IQ domain (IQD) proteins are plant specific CML/CaM calcium binding which are characterized by domains of 67 amino acids. MethodsHerein, we accomplished a genome wide comparison and analysis of IQD gene family with in two diploid and tetraploid cotton species. Conserved motifs, gene structure, chromosomal map, evolutionary tree, Ka/Ks distribution, promotor analysis, venn diagram for orthologues genes, and collinearity analysis within genome and between genomes of IQD gene family in four cotton species were performed first time. Expression analysis was performed in four species of cotton from SRA database of NCBI.Results293 nonredundant IQD genes were identified from G. arboreum (A2), G. raimondii (D5), G. barbadense (AD2) and G. hirsutum (AD1) and clustered in the reference genomes from cotton functional genomics database. 50, 50, 94, and 99 IQD genes were detected from G. arboreum, G. raimondii, G. barbadense and G. hirsutum respectively. IQD gene family in four species of cotton elucidated the role of allopolyploid and hybridization during evolutionary cascade of allotetraploid cotton. Comparatively, existence of more orthologous genes in cotton species than Arabidopsis, advocated that polyploidization produced new cotton specific orthologous gene clusters. Duplication of gene events depicts that IQD gene family of cotton evolution was under strong purifying selection. G. hirsutum exhibited high level synteny. Expression analysis revealed that GarIQD18, GarIQD25 and GarIQD50 exhibited up regulated expression trend in ovule and fiber in G. arboreum. GarIQD25 also exhibited high expression in stem, root and flower. In G. raimondii, GraIQD03 demonstrated upregulation expression across stem, ovule, fiber and seed. GbaIQD11 and GbaIQD62 exhibited upregulation fiber development in G. barbadense. GhiIQD69 recognized as main candidate genes for plant parts, floral tissues, fiber and ovule development. Promotor analysis identified cis-regulatory elements which are involved in plant defense and stress response mechanisms. ConclusionsOverwhelmingly, present study paves the way to better understand the evolution of cotton IQD genes and lays a foundation for future investigation of IQD genes in improving abiotic stress tolerance in cotton.


Introduction
Calcium signaling is considered as a messenger that mediates diverse developmental processes and also involved in response to abiotic and biotic stresses (Reddy et al. 2011). Cellular Ca 2+ signaling utilize their function by spatiotemporal speci city with change in concentration of Ca 2+ , which could be persuaded by intra-cellular stimuli i.e., pathogenic factors, plant hormones or intra-cellular stimuli like alkali stress, salt, drought and light Yuan et al. 2017). Cytoplasmic Ca 2+ level exhibited ephemeral escalation or uctuations in regional distribution or their concentration gradient (Yuan et al. 2017). Ca 2+ receptors (Calcium binding proteins) can decode or detected Ca 2+ ions, then convert the signals into downstream events which also comprising phosphorylation or dephosphorylation of target proteins (Salveson et al. 2019;Sebastià et al. 2004). Calcium receptors proteins in higher plants could be divided into four groups. Helix loop helix fold motif (EF hand motif) was observed in calcium receptors that behave as calcium binding domains (Wei et al. 2019).
Moreover, EF hand motifs were exhibited by calcium dependent protein kinas (CDPK), calcineurin B like protein (CBL), CaM like protein (CML) and calmodulin (CaM) . CaM is extensively found in eukaryotic cells and CaM regulated signal transduction pathways responded the main pathway in plants Zhang et al. 2014). CaM is well stable and conserved which contain N and C terminal with two spherical domains and connected with α helix motif. EF hand motif is found in every spherical motif that is calcium ion binding domain. Four Ca 2+ ions can bind with one CaM molecule. Hence changes the CaM conformation to Ca/CaM complex played a key role in target protein and CaM interaction by revealing negatively charged hydrophobic region in spherical surface (Bähler and Rhoads 2002). Apo-calmodulin (ApoCaM) is a CaM without Ca 2+ ions, transmit signals under low calcium levels by combining calcium independent CaMBP (Feldkamp et al. 2011;Nakanishi et al. 2017). Illustration of analysis and identi cation of CaMBPs is important to elucidate the detailed signal transduction of CaM and other diverse biochemical and physiological reactions (Zhou et al. 2018). For the modulation of cellular physiology, CaM transmit signals by interrelating with CaM binding proteins because no enzymatic activity demonstrated by CaM (DeFalco et al. 2010). CaMBPs played a very essential role in CaM and Ca 2+ as well as target proteins. Hence, CaMBPs could be classi ed into two categories i.e., Ca 2+ independent and Ca 2+ dependent (Reddy et al. 2011).
Initially discovered Ca 2+ independent CaM binding motif was IQ motif. IQ motif containing protein in plants are IQ67 domain containing IQD protein family, IQM protein family, cyclic nucleotide gated channel (CNGC) protein family, calmodulin binding transcription activator (CAMTA) protein family and myosin protein family (Abel et al. 2013). Family members of IQD gene, CML/CaM binding plant speci c proteins (CaMBPs) were rstly identi ed in rice and Arabidopsis . IQ67 domain is characterized by comprising residues of 67 amino acids. Moreover, IQ motif have very unique and speci c repetitive arrangements for example Ca 2+ dependent CaM recruitment motifs showed 1-8-14 and 1-5-10 arrangements . Previously, IQD gene families were analyzed in Solanum lycopersicum   ) and functions of limited IQD genes were reported. SUN gene in IQD family of tomato reported for organ shape by interrelating with microtubules (Abel et al. 2013). AtIQD5 involved in the process of cell morphogenesis through calcium signaling in Arabidopsis thaliana (Liang et al. 2018). SUN24 in IQD gene family in tomato controls germination of seed via abscisic acid signaling pathway (Bi et al. 2018). AtIQD1 gene found in microtubules which interrelate with KLCRQ (kinesin light chain related 1) protein to expedite the cellular transportation . PtIQD gene family exhibited tissue speci c pattern and also found to be involved in methyl jasmonate and drought stresses . Similarly, GmIQD gene family also expressed speci cally and also regulated methyl jasmonate stress (Feng et al. 2014).
Cotton is one of the most important cash crops around the world, because it provides edible oil for food use as well as raw material for textile industry. Hence, production of ber along with its quality is a prerequisite in cotton. Whereas the production process is signi cantly affected by biotic and abiotic stresses. Salt stress has been found to reduce ber production of cotton 20% and aphid infestation also reduce ber production up to 30% in China Shaban et al. 2018). IQD family members are also reported to be involved in plant stress signal transduction. Nevertheless, IQD gene family yet not described in cotton species. In present study, initially we identi ed 293 Gossypium IQD genes and evaluated their location of chromosomes, phylogenetic relationships, gene duplication, prediction of physiochemical properties of proteins and expression patterns of oral parts, plant organs, ber, ovule, and under various stresses. These preliminary ndings of Gossypium IQD genes pave the way for further research on the biochemical and physiological characterization of IQD protein in cotton.

Materials And Methods
Identi cation and characterization of cotton IQD genes. The genome assembly sequences of G. arboreum (CRI), G. raimondii (JGI), G. barbadense (ZJU) and G. hirsutum (ZJU) were downloaded from cotton functional genomics database http://cottonfgd.org/. The targeted candidate genes were explored with known IQD protein sequence of Arabidopsis were taken as reference query by using program BLASTP (Matsuda et al. 2013) with 1e -10 as E value and after that, hidden markov model (HMM) were used to identify all possible IQD sequences with HMMER software (Finn et al. 2011). Moreover, search Pfam tools (http://pfam. xfam.org/search) and NCBI CD search (http://www.ncbi.nlm.nih.gov/cdd/) were used for the con rmation of candidate genes sequences.
Biophysical characteristics of IQD proteins were analyzed with the help of an online tool ExPASy (http://www.expasy.org/). The subcellular locations of proteins were calculated using the prediction tool at Bologna Uni ed Subcellular Component Annotator http://busca.biocomp.unibo.it.
Intron/Exon structure and multiple sequence alignment analysis of IQD genes. The IQD genes sequences of 4 species of cotton were aligned with the help of an online tool of Clustal Omega with default settings (https://www.ebi.ac.uk/Tools/msa/clustalo/) (Sievers et al. 2011). Conserved motifs of IQD proteins were identi ed by MEME version 5.1.1 (Bailey et al. 2015), search was performed with 25 to 200bp for motif width and at least 5 motif sites with 12 motifs count. Identi ed 12 motifs then searched in protein database in MAST program (Bailey and Gribskov 2000). Coding regions and gnomonic sequences of cotton were assessed to construct gene structure. Furthermore, intron distribution pattern and intron splicing phase were determined by aligned cDNA sequences. Intron/Exon structure of IQD genes was elucidated by tbtools.
Phylogenetic analysis of IQD genes. A total of 326 IQD proteins comprised of 50 IQD genes of G. arboreum, 50 of G. raimondii, 94 of G. barbadense, 99 of G. hirsutum and 33 genes of Arabidopsis were used to construct a phylogenetic tree by using MEGA 7.0 by neighbor joining method (Kumar et al. 2016). Bootstrap calculation of tree nodes with 1000 replicates was performed for statistical reliability (Felsenstein 1985). The evolutionary distances were computed using the number of differences method (Nei and Kumar 2000) and are in the units of the number of amino acid differences per sequence.
Chromosomal location, collinearity and multiple synteny analysis of IQD genes. Chromosome locations of IQD genes of four species of cotton were achieved from cottonFGD via genome annotation les. Mapchart was used to visualize and map to each IQD gene on its corresponding chromosome (Caligari and Snape 2004). Gene duplication was assessed through MCScanX (Wang et al. 2012b) and the results were achieved by TBtools (Chen et al. 2018). Gene duplication events were analyzed using the multiple collinearity scan toolkit MCScanX (Wang et al. 2012b). To exhibit segmentally duplicated pairs and orthologous pairs of IQD genes of G. arboreum, G. raimondii, G. barbadense and G. hirsutum, we used multiple synteny plotter software in TBtools (Chen et al. 2018) to draw collinearity maps. The visualize duplicated regions in four species of cotton lines were drawn between matching genes using Circos function built-in TBtools. Substitution rates of synonymous (Ks) and non-synonymous (Ka) values of IQD genes were calculated also by TBtools. The duplication events time (T) was calculated as T = Ks/2λ × 10 − 6 (Mya), supposing clock-like rate (λ) = 1.5 × 10 − 8 in cotton (Wang et al. 2012a).
Sequence based orthologous IQD genes identi cation. The orthologous genes in G. arboreum, G. raimondii, G. barbadense and G. hirsutum were characterized using orthoVenn2 (Xu et al. 2019a) (https://orthovenn2.bioinfotoolkits.net/cluster-venn). Protein sequences of identi ed IQDs genes from all four cotton species were used in analysis. Similar calculation was also performed by adding the protein sequences of A. thaliana. Moreover, for identi cation of orthologous genes cluster, each cotton species was individually assessed with each other as well as with Arabidopsis.
Expression pro les analysis in different tissues of cotton IQD genes. Original expression data of G. arboreum, G. raimondii, G. barbadense and G. hirsutum IQD genes of various tissues and under heat, cold, PEG (Polyethyl glycol), salinity and control conditions for various hours were retrieved from NCBI database PRJNA594268 for G. arboreum, PRJNA490626 G. barbadense, PRJNA594268 for G. hirsutum and PRJNA171262 for G. raimondii from the sequence read archive (SRA) (https://www.ncbi.nlm.nih.gov). Adaptors were removed by Trimomatic software (Dong et al. 2016) to perform quality control. Genome reads was mapped by Hisat2 program , then FPKM values was obtained by using Cu inks . Results were log transformed and Genesis software (Sturn et al. 2002) was used to generate heat map.

Results
Identi cation and sequence analysis of four species of cotton IQD.
Four cotton genomes of two diploid and two tetraploid species were downloaded from cotton functional genomics database and scanned through HAMMER and BLASTP software to identify genes that encode proteins that IQD domain. In present studies, we found 309 IQD proteins initially from four cotton species, after alignment and domain analysis, reduced to 293. 50, 50, 94, and 99 IQD proteins were obtained from G. arboreum, G. raimondii, G. barbadense and G. hirsutum genomes respectively. Candidate IQD genes were named according to their chromosome position (Lohani et al. 2019). Amino acid residue length of these cotton IQD proteins ranged from 138 ~ 900 in G. arboreum, 133 ~ 901 in G. raimondii, 133 ~ 900 in G. barbadense and 120 ~ 900 in G. hirsutum. Isoelectric points were varied from 5.015 ~ 12.235 in G. arboreum, 5.051 ~ 12.167 in G. raimondii, 5.050 ~ 12.168 in G. barbadense and 5.051 ~ 12.307 in G. hirsutum. Correspondingly, molecular weights were distributed from 15.832 ~ 99.234 kDa in G. arboreum, 15.392 ~ 99.364 kDa in G. raimondii, 15.440 ~ 99.353 kDa in G. barbadense and 13.69 ~ 99.361 kDa in G. hirsutum. The BUSCA (Bologna Uni ed Subcellular Component Annotator) website analyzes the IQD protein sequences of cotton species and predicts that most of the genes were reside in the nucleus (Table 1, 2,3,4). Subcellular localization revealed that maximum genes resided in nucleus. In G. arboreum out of fty genes, nine genes namely GarIQD04, GarIQD16, GarIQD25, GarIQD34, GarIQD36, GarIQD38, GarIQD41, GarIQD48 and GarIQD49 found in chloroplast. Similarly, in G. raimondii out of fty genes ten genes were observed in cytoplasm i.e., GraIQD05, GraIQD08, GraIQD12, GraIQD23, GraIQD33, GraIQD36, GraIQD40, GraIQD43, GraIQD46 and GraIQD49. While in G. barbadense from ninety-four genes, twenty GbaIQD genes were found in chloroplast and one gene GbaIQD32 was observed in endomembrane system. Whereas out of ninety-nine genes of G. hirsutum, twenty-one GhiIQD genes were detected in chloroplast and one gene GhiIQD50 in plasma membrane. Complete details of nucleic acid, protein, and genomic sequences of 293 IQDs of candidate genes are listed in Supplementary Table S1.    Note: Ninety-four GbaIQD were identi ed and their principle transcript ID, gene name, class, chromosomes, start positions, end position, strand, length, transcr protein Length, molecular weight, charge, isoelectric point (pI) and Localization were listed. Tree diagram, Gene structure, and conserved motifs of cotton IQD gene family Neighbor joining method was used to construct an evolutionary tree diagram by using MEGA7. The evolutionary history was inferred using the Neighbor-Joining method [1]. All positions with less than 95% site coverage were eliminated. That is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. There was a total of 103 positions in the nal dataset. Evolutionary analyses were conducted in MEGA7. There were 103, 84, 89 and 81 positions in the nal data set of G. arboreum, G. raimondii, G. barbadense and G. hirsutum respectively. All positions gaps and missing data were eliminated with 95% deletion. Results revealed that number of exons varied from 2 ~ 8 in G. arboreum, 2 ~ 8 in G. raimondii, 2 ~ 8 in G. barbadense and 2 6 in G. hirsutum in IQD gene family ( Fig. 1, 2, 3, 4). The longest gene length was observed in GarIQD23 with 12,661bp followed by 10,126 bp with GhiIQD77bp,9,751bp in GbaIQD14,and 7,671bp in GraIQD18 (Fig. 1, 2, 3, 4). The exons and introns arrangement revealed the evolutionary relationships between various gene family members. Interestingly, closely related genes are structurally more similar just vary in length of intron and exons. Positive association was exhibited by phylogeny and exon-intron structure. Moreover, all the cotton IQD protein sequences were loaded into MEME to uncover the coding motifs. Among them, motif 6 almost remain conserved in all the proteins across 3 species of cotton except G. barabdense (Table S2).

Chromosomal location and phylogenetic analysis
Results revealed that 293 cotton IQD genes were unevenly and widely allocated on diploid and allotetraploid chromosomes (Fig. 5a, 5b, 5c). Interestingly, none of the IQD gene was observed on scaffold across four species of cotton on A and D genome (Table 1-4). Number of IQD genes on each chromosome of each species extensively different. The highest number of genes were found on A05 in G. arboreum. While highest number of IQD genes was found in G. raimondii on D09, G. barbadense on A05, A13, D05 and G. hirsutum on A05, A13, D05. The largest chromosome size was 135.70 Mb(A03), 85.32 Mb (D13), 120.78 Mb (A08) and 126.48 (A06) in G. arboreum, G. raimondii, G. barbadense and G. hirsutum respectively (Fig. 5a, 5b, 5c). Phylogenetic tree of full-length protein sequences was constructed from the multiple sequence alignment of 50 GarIQD, 50 GraIQD, 94 GbaIQD, 99 GhiIQD and 33 AthIQD (Fig. 6). Comparatively, group 6 has the most members of IQD genes followed by group 5, group 1, group 4 and group 3 whilst group 2 contain least members of IQD gene family.
Collinearity and gene duplication of GarIQD, GraIQD, GbaIQD and GhiIQD Gene duplication always plays a key role in gene expansion and occurrence of novel function of genes. In this study, homologous blast of amino acids of four cotton species was performed with MCScan toolkit. 15891 collinear genes, 1022 collinear blocks and 2375 tandem repeat genes were identi ed on G. arboreum (Table S3). Similarly, 12949, 707 and 1897 collinear genes, collinear blocks and tandem repeat genes were found in G. raimondii respectively (Table  S3). In G. barbadense, 56139 collinear genes, 2215 collinear blocks and 3304 tandem repeat genes were recognized (Table S3). Whereas, 55750 collinear genes, 2191 collinear blocks and 3291 tandem repeat genes were observed on G. hirsutum (Table S3). Expansion pattern of IQD genes was studied circos analysis was performed in four species of cotton (Fig. 7a-d). Paralogous pairs of IQD gene families were arising from whole segmental duplication or genome duplication. Large number of members might arise from transpositions with minor contributions of tandem repeat genes in four species of cotton (Fig. 7a-d).
According to MCScan analysis, 99 gene pairs duplication were detected between diploid G. arboreum with tetraploid G. barbadense and 101 gene pairs duplication in G. arboreum with G. hirsutum respectively. Correspondingly, 161, 163 duplicated gene pairs were detected between G. raimondii D genome and tetraploid G. barbadense and G. raimondii with G. hirsutum respectively (Table S5). Results also demonstrated that duplication of genes also observed from A genome of G. arboreum to A as well as D chromosomes of tetraploid cotton species. Similar behavior was observed in G. raimondii with G. barbadense and G. hirsutum genomes (Fig. 8). Ka/Ks ratio predicts the evolutionary story of a gene region or gene (Hurst 2002). Neutral selection occurred when Ka/Ks ratio equals to 1, greater than 1 revealed positive selection whilst less than 1 endorsed purifying selection. In present study, Ka/Ks ratio for all the IQDs gens in all the species of cotton was less than 1, illustrating purifying selection, except GbaIQD69 in G. barbadense and GhIQD73 in G. hirsutum. 20, 19, 43 and 49 gene duplication pairs were identi ed in G. arboreum, G. raimondii, G. barbadense and G. hirsutum respectively (Fig. 9). In G. arboreum, segmental duplication of IQD genes has happened from 4.36~32.48 Mya. Similarly, replication of IQD genes took place between 4.14~35.04 Mya in G. raimondii. Moreover, in G. barbadense and G. hirsutum segmental duplication was occurred from 0.36~4.24 (except one gene pair) and 0.57~4.59 Mya (except two gene pairs) respectively (Table S6). Results also highlighted that A and D genome are the progenitors of existing allotetraploid cotton species.

Orthologous gene clusters identi cation
The relative assessment was ascertained to detect orthologous IQD gene clusters across the ve species G. arboreum, G. raimondii, G. barbadense, G. hirsutum and A. thaliana. This will assist to evaluate polyploidization events in the evolutionary process of IQD gene family in cotton A and D genomes. Identi ed orthologous gene clusters and their overlap regions of ve species are elucidated in Fig. 10. Maximum clusters were recorded from G. hirsutum, followed by G. barbadense, G. arboreum and G. raimondii. Results also revealed that presence of 28 orthologous genes clusters in cotton species indicating that polyploidization has resulted in the evolution of new cotton speci c orthologues gene clusters. Orthologues gene clusters were also constructed between each species of cotton as well as with A. thaliana individually (Table S7). Furthermore, results also emphasized that number of identi ed orthologues genes decreases with increase in phylogenetic distances. Comparatively, 220, 218, 175 and 171 number of orthologous genes were identi ed in G. hirsutum, G. barbadense, G. arboreum and G. raimondii respectively, whereas 82 orthologous genes were identi ed in A. thaliana (Table. S8). 6 in-paralogs were detected in G. hirsutum and G. barbadense and 13 in-paralogs were spotted in A. thaliana whilst no in-paralog was detected in G. arboreum and G. raimondii. Interestingly, 4,3, and 1 singleton were noticed G. hirsutum, G. barbadense and G. raimondii respectively.

Expression pro les of IQD genes of Cotton
Transcriptome data of cotton was used to explore the variation of expression in IQD genes across tissues. RNAseq data for leaf, stem, root, ower, ber, ovule, PEG and salinity of G. arboreum, seed, ber and ovule of G. raimondii, root, stem, leaf, torus, epicalyx, sepal, petal, lament, anther, ovule and ber of G. barbadense and G. hirsutum were downloaded and evaluated. Results revealed that expression pattern of many IQD genes differed substantially in various tissues (Fig. 11). GarIQD18, GarIQD25 and GarIQD50 exhibited up regulated expression trend in 10DPA, 15DPA and 20 DPA of ovule and ber in G. arboreum. GarIQD25 also exhibited high expression in stem, root and ower (Fig.11a). Interestingly, GarIQD22 unveiled up regulation in ber whilst down regulation in ovule development. In G. raimondii, GraIQD03 demonstrated upregulation expression across stem, 0DPA, 3DPA of ovule, 10DPA and 20DPA DPA of ber and 10DPA, 20DPA, 30DPA and 20DPA of seed. While GarIQD42 didn't express in any tissue (Fig.11b). GbaIQD11, GbaIQD16 and GbaIQD62 exhibited upregulation in ber development and GbaIQD23, GbaIQD32, GbaIQD82, GbaIQD83 and GbaIQD86 genes were found upregulated in ovule development. GbaIQD51 and GbaIQD79 elected as key gene plant parts development in G. barbadense. GbaIQD44 did not expressed in all observed samples (Fig.11c). GhiIQD69 recognized as main candidate genes for plant parts, oral tissues, ber and ovule development (Fig.10d). Collectively, IQD genes in G. arboreum and G. hirsutum demonstrated various expression pattern under diverse environmental stresses hence, advocated that these genes were responsive to stress treatments.
Analysis of cis acting elements of HSF gene family in the promotor region Cis acting regulatory elements were identi ed in 2000 bp of 5' upstream region by using online tool (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Lescot et al. 2002). Results revealed that cis acting elements are highly consistent the result of evolutionary analysis and gene structure in G. arboreum, G. raimondii, G. barbadense and G. hirsutum (Fig. 12a-d). Interestingly, almost similar trend was observed in four species of cotton in identi ed cis-regulatory elements. CAAT-box, MYB, TATA-box and unnamed-4 were relatively found in higher numbers in all the studied species. Sequences of CAAT-box and TATA-box include functional elements and vital binding sites. CAAT-box and TATA-box long with MYB are also involved in response of plant stresses and defense mechanisms i.e., Drought, Salicylic acid, light and Abscisic acid.