Genome-Wide Identi cation and Characterization of The MIOX Gene Family in Cotton

Zhaoguo Li Zhengzhou University, Zhengzhou, Henan 450001, China Zhen Liu Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Yangyang Wei Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Yuling Liu Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Pengtao Li Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Linxue Xing Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Mengjie Liu Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Quanwei Lu Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China Renhai Peng (  aydxprh@163.com ) Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, Henan 455000, China

the phylogenetic analysis. Integrated analysis of collinearity events and chromosome locations suggested that both whole genome duplication and segmental duplication events contributed to the expansion of MIOX genes during cotton evolution. The ratios of non-synonymous (Ka) and synonymous (Ks) substitution rates revealed that purifying selection was the main force driving the evolution of MIOX genes. Numerous cis-acting elements related to light responsive element, defense and stress responsive element were identi ed in the promoter of the MIOX genes. Expression analyses of MIOX genes based on RNA-seq data showed that MIOX genes within the same group shared similar expression patterns with each other.
Conclusions: In this work, we systematically analyzed MIOX genes from eight Gossypium genomes and the Gossypioides kirkii genome using a set of bioinformatics approaches. All of these results provide the foundation for further study of the biological functions of MIOX genes in cotton environmental adaptability.

Background
Myo-inositol (MI) is a small molecule that is important in many different developmental and physiological processes in eukaryotic cells [1]. MI participates in the production of stress-related molecules, cell wall biosynthesis, phytic acid biosynthesis, auxin transport and storage [2]. The Myoinositol oxidation pathway (MIOP) effectively control MI homeostasis. In this pathway, Myo-inositol oxygenase (MIOX, E.C. 1.13.99.1) is a key enzyme that catalyzes the conversion of MI into D-glucuronic acid (D-GlcUA), which is subsequently activated to UDP-D-glucuronic acid (UDP-GlcA) and serves as an important precursor for plant cell wall polysaccharides [3,4]. Furthermore, previous reports suggested that MIOX might also be involved in the production of ascorbate, consequently, protection from ROS-mediated injury [5]. Studies have shown that MIOX plays a role in the response to environmental stresses. In Arabidopsis, MIOX is encoded by a gene family of four members, it has been reported that Arabidopsis MIOX4 over express can increase the tolerance to cold, heat and salt [3,[6][7][8]. In addition, it has also been reported that rice MIOX has a speci c function in drought stress tolerance by decreasing oxidative damage [9].
Cotton (Gossypium) is one of the most economically important ber crop plants throughout the world. It is a remarkably diverse genus, with over 50 species divided into eight diploid genomic groups, designated A-G and K, and one tetraploid genomic group, namely AD [10]. Divergence analysis suggests that the major diploid branches of the cotton genus diverged about 7 ~ 11 million years ago, and the polyploid clade originated circa 1 ~ 2 million years ago [11][12][13][14]. Due to the rapid development of next generation sequencing, the whole genomes have been reported for the diploid D-genome of G. raimondii [15], Agenome of G. arboretum [16], G. herbaceum [17], and allopolyploid AD genome species of G. hirsutum [18], G. barbadense [19], G. darwinii, G. mustelinum and G. tomentosum [20]. The sequence of these genomes provides useful resources for studying the gene families and a series of families have been genome-wide analyzed in cotton [21,22].
Although cotton has better abiotic stress tolerance compared with other crops, its yield and ber quality under saline-alkali, drought and heat stress are still under great threat. Considering the role of MIOX gene in various abiotic stress responses in plants and there is no genome-wide identi cation and characterization of MIOX gene family members in cotton. We conducted a systematic analysis of the cotton MIOX gene family by genome-wide search, the phylogeny evolutionary relationships, group classi cation, conserved motifs and the expression pro le of MIOX were analyzed. Our results will provide good foundation for understanding the key roles of MIOX genes in cotton stress-responsive and other biological processes.

Results
Identi cation of the MIOX Gene Family A total of 81 MIOX genes were identi ed from nine species, including 75 MIOXs in eight Gossypium genomes and 6 MIOXs in the Gossypioides kirkii genome. Except for one MIOX from G. herbaceum has signi cantly more amino acids (896 aa) and two MIOXs from G. barbadense has signi cantly less amino acids (101 aa and 113 aa), the others are relatively conservative (Additional le 1: Table S1). Furthermore, we found that all of the four diploid species contained 6 MIOX genes, the number of MIOX genes in tetraploid cotton is roughly twice that of diploid cotton, but the three wild tetraploid cottons (G. tomentosum, G. mustelinum and G. darwinii) missed a MIOX gene.

Phylogenetic Analysis of the MIOX Gene Family
To reveal the evolutionary relationship of identi ed MIOX proteins, the amino acid sequences of 75 proteins from Gossypium species, 6 proteins from Gossypioides kirkii and 4 proteins from Arabidopsis thaliana were used to construct a phylogenetic tree (Fig. 1, Additional le 2: Figure S1). According to the result, 81 MIOX proteins form 8 Gossypium species and Gossypioides kirkii were divided into six groups of A-F, the number of MIOX genes was very stable in each evolutionary branch, diploid species containing 6 MIOXs had one distribution in each branch, while the other ve tetraploid cottons containing 11 or 12 MIOX genes had one or two distribution in each branch. All of these genes showed one-to-one homology relationship among different genomes or subgenomes. Compared to the A~F groups, the group G~I only contained A. thaliana MIOX proteins.

Gene Structure and Protein Motifs of the MIOX Gene Family
To explore the structural diversity of MIOX genes, the intron-exon organization of each MIOX gene was analyzed (Fig. 2). The number of introns ranged from 3 to 21, most (52/81) MIOX genes contained 9 or 10 introns. In the same group, the intron numbers exhibited comparable exon-intron structure and intron numbers, while many Gossypioideskirkii MIOX genes were different from cotton, implying important evidence for the phylogenetic relationship among members of the MIOX gene family.
Furthermore, we investigated the conserved motifs in MIOX proteins to understand the diversity of motif compositions (Fig. 2). A total of 13 motifs, named Motif 1~Motif 13, were identi ed in MIOX proteins. The number of motifs varied from 3 to 12 in each MIOX protein and most MIOX proteins within the same group exhibited similar motif composition and arrangement, which indicates that the members of MIOX gene family that clustered in the same group may have similar biological functions. Motif 5 was found in 78 MIOX proteins, Motif 8 was completely missing only in group B, and Motif 13 was speci c to group C.
The gene structure and motif composition of the MIOX members from each group that obtained from phylogenetic analysis were similar, which indicates the classi cation was reliable.

Chromosomal Location and Gene Duplications of the MIOX Gene Family
To more intuitively understand the distribution of MIOX family genes on the chromosomes, we performed a chromosomal localization analysis. The result shows that MIOX genes were mapped onto 5/10 different chromosomes of diploid/tetraploid cotton. Each chromosome contained only one MIOX gene usually, while some chromosome contained 2 MIOX genes (Fig. 3). In addition, the distribution of Gossypioides kirkii MIOX genes showed similar to Gossypium.
Whole genome duplication, segmental duplication and tandem duplication provides major forces that drive the expansion of gene families. The number of MIOX genes in tetraploid Gossypium species was twice as much as that in diploid Gossypium species which indicates the expansion of MIOX gene family during polyploidization. We searched the segmental duplication using MCscanX within each genome, and identi ed 133 collinear genes pairs among the 72 MIOX genes. Of the 75 MIOX genes in eight Gossypium genomes, only 3 were located outside of the duplicated blocks, while 96% (72 of 75) were located in duplicated regions. In addition, 40, 38 duplication gene-pairs were found between diploid G. arboretum A-genome and tetraploid G. hirsutum, G. barbadense A-subgenome respectively. 18, 17 duplication genepairs were found between diploid G. raimondii D-genome and tetraploid G. hirsutum, G. barbadense Dsubgenome respectively (Fig. 4). According to Holub's description, a chromosomal region within 200 kb containing two or more genes was de ned as a tandem duplication event [23]. Our results indicated that the MIOX genes of the nine species has no tandem duplication.

Cis-elements in the Promoter of MIOX Genes
Cis-elements in promoters play vital roles in regulating the expression of genes. To gain more insight into the functions of MIOX genes, the cis-regulatory elements were scanned in the 2000 bp upstream of the transcription start sites of cotton MIOX genes (Fig. 5, Additional le 3: Table S2, Additional le 4: Figure  S2). The results showed that there were many kinds of response elements, such as light responsive element, defense and stress responsive element involved in drought and low temperature, and hormone responsive element associated with salicylic acid, abscisic acid, gibberellin and MeJA. All of the cotton MIOX genes contained more than one light responsive element, however only 29.33% (22/75) of the cotton MIOX genes contained auxins responsive element.
The cis-elements of MIOX genes in the same phylogenetic group was similar. In addition, within the same group, half of the cis-elements of MIOX genes in tetraploid Gossypium species are similar to the diploid A genome species (G. herbaceum and G. arboretum), and half are similar to the D genome species (G. raimondii). These results further indicated the expansion of MIOX gene family during polyploidization.

Tissue-Speci c Expression Pro les of MIOX Genes
To study the tissue-speci c expression patterns of the MIOX genes, we analyzed the expression pro les of the MIOX genes in different tissues. As shown in Fig. 6, MIOX genes showed different expression levels in different tissues. In G. hirsutum, the expression of GhMIOX02 and GhMIOX08 were higher in stem, leaf, torus, pistil, bract and speal, while their expression was lower in root. The expression of GhMIOX04 and GhMIOX10 was higher in root, stem, leaf, torus and speal, while their expression was lower in pistil and bract. In G. barbadense, GbMIOX02, GbMIOX08, GbMIOX04 and GbMIOX10 had similar expression pro les with G. hirsutum. In addition, GhMIOX03 of G. hirsutum showed higher transcription level only in root, however, GbMIOX03 of G. barbadense showed higher transcription levels in root, stem and pistil.
The expression levels of MIOX genes at ovule and ber developmental stages were also investigated ( Fig.   6). At most stages of development, the expression of GhMIOX08 from G. hirsutum, GbMIOX08 from G. barbadense were higher than that of the other genes; Several genes were expressed at high levels during speci c developmental stages; for instance, GhMIOX03, GhMIOX09, GhMIOX10 from G. hirsutum and GbMIOX03, GbMIOX09, GbMIOX10 from G. barbadense. In contrast, the RNA transcript levels of GhMIOX05, GhMIOX11 from G. hirsutum and GbMIOX05, GbMIOX11 from G. barbadense were low at all stages and all tissues. These ndings indicated the MIOX genes play differential roles in tissue development.

Stress-Induced Expression Patterns of MIOX Genes
The expression patterns of MIOX genes were further analyzed in G. hirsutum and G. barbadense exposed to different durations of cold, heat, salt, and drought stresses for different times by RNA-seq data downloaded from the public database (Fig. 7). Based on the clustering analysis, MIOX genes in group C and D general responded to all of these stress, but the genes in other groups were not signi cantly responded. In addition, it was observed that MIOX genes exhibited variations in expression in response to one or more stresses. In the case of heat and salt treatment, the expression of GhMIOX02 and GhMIOX08 showed up-regulation in the early time points (1h~3h), then down-regulation (3h~6h), and then another up-regulation (6h~24h). While GhMIOX03 was up-regulated in the early time points (1h~6h), but down regulated in the late time points (12h~24h) under code stress. Moreover, GbMIOX02 showed continuously down-regulation at all of the time points under salt stress. These results indicate that 3~6 hour is an important time point for MIOX genes respond to abiotic stress.

Discussion
MI is a crucial substance in various aspects of plant physiology. Plants maintain an MI pool at a basal level throughout their life cycle, and MIOX is used to control the metabolite level of MI in plants [1,2,5,8]. MIOX proteins are conserved and present in nearly all eukaryotes. MIOX is crucial in abiotic stress tolerance and previous studies on the MIOX gene family have been performed in many plants [2,5], including Arabidopsis thaliana [3,4], Oryza sativa [9] and Solanum lycopersicum [24]. In this study, we performed a comprehensive identi cation of MIOX genes in 5 tetraploid cotton species, 3 diploid cotton species and a cotton closely related species Gossypioides kirkii, with an aim of understanding the important and diverse roles of this gene family in abiotic stress tolerance in plants.
There were 4 MIOX genes in Arabidopsis thaliana and they were not clustered with the groups from Gossypium species and Gossypioides kirkii. In addition, the number of MIOX genes in tetraploid cotton is roughly twice that of diploid cotton. This suggested that MIOX gene duplication events occurred in diploid cotton before the emergence of tetraploid cotton.
Transposable elements are major evolutionary forces which can cause genomic instability, including genome expansion by ampli cation using an RNA intermediate [25]. Besides genome expansion, transposable elements facilitate the creation of new candidate genes called retrogenes by means of retroduplication, in which spliced messenger RNA is captured and subsequently integrated into the genome by a retrotransposon [26]. The role of transposable elements in the expansion of many genes has been unexplored in many plants [27][28][29]. To investigate whether transposable elements played roles in expansion of the MIOX genes, LTR_retriever [30], LTRharvest [31], LTR_FINDER [32], RepeatModeler and RepeatMasker [33] were used to identify transposable elements. We then compared those results with the genome annotation to predict genes inside transposable elements in the 10 genomes. However, the results show that none of MIOX gene was ampli ed by transposable elements.
An analysis of gene expression patterns can be used to some extent to predict the molecular functions of genes involved in different processes. Our heatmap data showed that most MIOX genes within each group shared similar expression patterns. The MIOX genes in Group D in G. hirsutum (GhMIOX02 and GhMIOX08) and G. barbadense (GbMIOX02, GbMIOX08) were expressed in stem, leaf, torus, pistil, bract, speal and responded to cold, heat, salt, and drought stresses. In contrast, the expression of MIOX genes in Group F in G. hirsutum (GhMIOX05, GhMIOX11) and G. barbadense (GbMIOX05, GbMIOX11) were barely observable. To further elucidate the selection pressure of functional diversi cation of each MIOX group, we have comprehensively evaluated the non-synonymous (Ka) and synonymous (Ks) substitution rates (Fig. 8). The results showed that the median Ka/Ks ratios for group D and F was 0.267 and 0.344 respectively, indicating that these groups of Gossypium MIOXs might play specialized roles in the adaptive evolution of cotton.

Conclusions
We performed a genome-wide analysis of the MIOX gene family in 8 cotton species. The MIOX family was divided into 6 groups based on the phylogenetic tree. To understand the expansion mechanism of MIOX family, the gene duplication events was investigated, and the result indicated that segmental duplication and whole genome duplication were the major driving forces of MIOX family expansion, no tandem duplication and transposable element ampli cation was found. The expression pro les of G. hirsutum and G. barbadense MIOX genes in different tissues and under abiotic stresses were also analyzed. The results indicated that MIOX genes within the same group shared similar expression pattern and they exhibited different expression levels in different groups. Furthermore, the Ka/Ks ratio suggesting that all groups experienced purifying selection pressure. Our results will provide clues for researchers regarding the evolution and biological functions of MIOXs.

Identi cation of MIOX Genes in Cotton
The genome sequences of G. darwinii, G. mustelinum and G. tomentosum [20] were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/genome/), the genome sequences of G. hirsutum [18], G. barbadense [19], G. arboretum [16], G. herbaceum [17], G. raimondii [15] and a cotton closely related species Gossypioides kirkii [34] were downloaded from COTTONGEN (http://www.cottongen.org). Arabidopsis MIOX proteins were used to search possible cotton MIOX sequences by BLASTP using a E value of 1.0E-3 [35,36]. The HMMER [37] program was used to identify MIOX proteins with the Hidden Markov Model of the MIOX domain (PF05153), which was downloaded from the Pfam database (http://pfam.xfam.org). Furthermore, NCBI CD-Search (https://www.ncbi.nlm.nih.gov/cdd/) and Search Pfam tools (http://pfam.xfam.org/search) were used to con rm the candidate sequences. The number of amino acids, molecular weights and theoretical isoelectric points of MIOX proteins were calculated using the ExPASy online server tool (https://www.expasy.org/). Phylogenetic, Gene Structure, Conserved Motif and Cis-elements Analysis of MIOXs A neighbor-joining tree of MIOX proteins was constructed using MEGA-X with 1000 bootstrap replications [38]. The phylogenetic tree was drawn using EvolView [39]. The gene structure of MIOXs were analyzed with TBtools [40]. The conserved motifs of MIOX protein sequences were obtained using MEME [41]. To investigate putative cis-acting regulatory elements of MIOX genes, 2000 base pair genomic DNA sequences upstream of the initiation codon were retrieved and screened against the Plant CARE database [42].

Chromosomal Distribution and Gene Duplication of MIOX Genes
The chromosome distributions of MIOX genes were extracted from the GFF les and the Mapchart [43] software was used to visually map the chromosomal location. We made use of Multiple collinear scanning toolkits (MCScanX) [44] to detect the gene duplication events and tandem duplications were identi ed as previously described. The synonymous (Ks) and nonsynonymous (Ka) substitution rates of MIOX genes were calculated by KaKs_Calculator [45].

Expression Patterns of MIOX Genes
To study the expression patterns of MIOX genes, G. hirsutum and G. barbadense high-throughput transcriptome sequencing data including various tissues, developmental stages and stress treatments were downloaded from the NCBI SRA (PRJNA490626). Trimmomatic [46] was used to remove the adapters and to perform quality control. The program hisat2 [47] was used to map the reads to the genomes, then Fragments Per Kilobase of transcript per Million fragments (FPKM) values of the MIOX genes were calculated by Cu inks [48,49]. Heat maps of gene expression pro les were drawn by TBtools [40].

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analyzed during this study are included in this published article [and its supplementary information les].