Genomewide identification and characterization of OMT genes
A genome wide analysis was conducted to characterize OMT family genes in three Gossypium species. A total of 192 OMT members were identified, including 82 in G. hirsutum, 55 in G. arboreum, and 55 in G. raimondii (Table S3. Sheet A). For phylogenetic analysis [15], 33 OMT members in A. thaliana, and 26 members in T. cacao species were also retrieved (Table S3. Sheet B). Retrieving information of OMT genes in G. hirsutum revealed that GhOMT75_Scaf, which was detected in scaffold, coded the smallest protein of 62 amino acids (aa) with a molecular weight of 6.642 kDa. While GhOMT33_Dt, which was identified on chromosome Dt02, coded the largest protein of 969 aa with a molecular weight of 108.296 kDa among all OMT members in three Gossypium species.
In domain analysis of OMT family genes in Gossypium species, the results revealed that 64, 45 and 47 OMTs in G. hirsutum, G. arboreum and G. raimondii contained Pfam domain Pf00891, and that only 20, 10 and 9 OMTs in G. hirsutum, G. arboreum and G. raimondii contained Pfam domain Pf01596. In A. thaliana and T. cacao, 25, 24 OMTs contained Pf00891 domain, and 8, 2 OMTs contained Pf01596 domain respectively.
Chromosomal distribution, collinearity, duplication, and loss of OMT genes
The analysis of chromosomal positioning was performed by using TBtool software [16]. A total of 161OMT genes were positioned on their respective chromosomes, while seven of G. raimondii, one of G. arboreum, and 23 of G. hirsutum were positioned in scaffolds (Figure S1). In G. raimondii (D genome), chr11 was mapped with 13 genes followed by chr08 with nine genes. The minimum number of genes in a chromosome was one in chr2, chr6, and chr10 respectively. There was no OMT family members identified in chr01 and chr07 (Figure S1.a). In G. arboreum (A-genome) (Figure S1.b), 54 OMT genes were mapped in all chromosomes except chr1. Chr10 harbored 13 OMT genes which were the highest per chromosome, followed by chr12 and chr04 with 10 and 9 genes respectively. The minimum number of genes located in a chromosome was one in chr02 and chr11 respectively. In G. hirsutum (AtDt genome) (Figure S1.c), unexpectedly, there were no OMT genes in At02, At05, At07, Dt03, Dt09, and Dt11 chromosomes. The distribution of genes in Dt sub-genome (33 genes) was higher than in At sub-genome (26 genes). The maximum number of genes in a chromosome was seven in Dt04 and At12, followed by four in Dt10 and At10 chromosomes, respectively. Dt01, Dt05, At01, At06, and At11 only had one OMT gene, and Dt06, Dt07, At03, At08, At09, At13 two OMT genes and Dt02, Dt08, and Dt13 three OMT genes respectively (Figure S4.c). A collinearity analysis of the OMT family genes in Gossypium species chromosomes was shown in Figure 1. The results demonstrated a pair wise collinearity of OMT genes between the chromosomes on which OMT family genes were mapped. Noticeably, a number of available genes in At and Dt scaffolds were collinear with their homologues in A and D genomes suggesting the collinearity of the DNA fragments between the scaffolds and chromosome where these OMT genes locate (Figure 1). Taken the OMT gene numbers identified in each A/D genome or At/Dt sub-genome, collinearity analysis also revealed that there were totally 21 and 19 OMT genes exclusively detected in A and D genomes respectively. Their homologous counterparts in AtDt genomes of G. hirsutum are lost. There are also a few OMT genes that are exclusively detected in AtDt genome of G. hirsutum without homologous counterparts in A and D genomes (Figure S1).
According to previous studies there are five types of duplications including singleton, dispersed, proximal, tandem, and segmental or whole-genome duplication [17]. In the present study, the analysis of gene pairs duplication events predicted a total of 31, 28, and 54 gene pairs of D_Dt, A_At and D_A genomes from their common ancestor, 33 gene pairs of At_Dt subgenomes in segmental duplication, and 5 gene pairs of At_Dt subgenomes in tandem duplication events (Table S4 Sheet A).
Analysis of selection pressure
In genetics, the Ka/Ks ratio used to estimate the balance between neutral mutations, purifying selections, and positive mutations based on a set of homologous genes [18]. The ratio of the number of non-synonymous substitutions per non-synonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks) represents selection pressure of the gene [19]. Ka/Ks<1 demonstrates purifying selection pressure, while Ka/Ks=1 and Ka/Ks>1 show neutral and positive selection pressures respectively. Analysis of Ka/Ks ratio of homologous OMTs in three Gossypium species revealed that they are under purifying selection pressure. The Ka/Ks ratio of homologous OMTs in G. raimondii and G. arboreum ranged from 0.09 to 0.8, in G. raimondii and G. hirsutum ranged 0 to 0.7, and in At and Dt of G. hirsutum ranged 0.4 to 0.7 (Table S4 Sheet B).
Sequences alignment, phylogenetic Analyses, conserved motifs and gene structure
The sequence alignment of 251 OMT genes, including 192 genes from three Gossypium species, 33 from A. thaliana, and 26 from T. cacao species was performed to understand the phylogenetic relationship of these genes. The evolutionary relationship of OMT genes in three Gossypium species was monophyletic (Figure 2.a), and the member of A. thaliana and T. cacao were distributed in paraphyletic manner (Figure 2.b). According to the topology of constructed tree, the OMT gene family is divided into five clades (I, II, III, IV, and V) in Gossypium, A. thaliana, and T. cacao species. The results showed that each clade of OMT genes were symmetrically distributed within Gossypium species (Figure 2.a), while in A. thaliana and T. cacao, OMT genes were identified in cluster forms (Figure 2.b). The results demonstrated that these Gossypium OMT members might be evolutionary close within respective species and their identified clades.
To examine the conserved motifs of each clade, the analysis of representative motif logo and conserved motifs prediction were conducted (Figure S2). The results revealed that motif 1, enriched with leucine, valine, and glycine, motif 2, enriched with leucine and valine, and motif3, motif 4, motif 5, and motif 6 were common in clades I, II, III, IV and clade V. While motif 7 was found missing in some members of clade V, which was then replaced with motif 8 at same positions (Figure S2). The enriched amino acid residues of conserved motif1 (L/VDVGGG/TG) was previously identified in S-adenosyl-l-methionine (SAM)-dependant OMTs that shared 95% similarity with G. hirsutum OMT [20].
Investigation of gene structure has uncovered the different number of exons and introns of OMT genes. Exon and intron number of OMT genes varied from the least one exon and no intron to the most 7 to 9 exons and 6 to 8 introns (Table S5). Two members in G. hirsutum including GhOMT82_At and GhOMT82_Dt contain nine exons as the highest (Table S5). Same as, two members including GaOMT82_A in G. arboreum, and GrOMT82_D in G. raimondii contain 9 exons (Table S5). Gene structure analysis revealed that the OMT genes with higher number of exons had a shorter exons and introns, and vise versa. These results demonstrated that OMT members possess different structural patterns in accordance with their features.
Identification of cis-regulatory elements in OMT family
The promoter regions of the OMT family contain precisely a large number of cis-regulatory elements. The analysis of cis-regulatory elements revealed the enrichment of MYB cis-regulatory elements, which was detected more than 350 times in OMT genes (Figure S3). The MYC was another important element that was found 183 times in enlisted OMT genes. Box 4 (part of a conserved DNA module involved in light responsiveness) was found 152 times in 43/82 genes in G. hirsutum. ABRE elements was detected 119 times in 29/82 OMT genes in G. hirsutum. The ERE element was detected 113 times in 37/82 and G-Box 97 times in 37/82 G. hirsutum OMT genes. An auxin RR-core and cis-acting regulatory element involved in the MeJA-responsiveness (TGACG-motif) were also observed in Gossypium OMT genes where this element was identified 48 times in 25/82 genes. Some other important cis-regulatory elements including wun-motif 44 times in 26/82, W-box 39 times in 31/82, GATA-motif 32 times in 27/82, O2-site 30 times in 22/82 OMT genes respectively, in G. hirsutum (Figure S3). These cis-regulatory elements might function collectively in accordance with their specific roles and with specific conditions as well as growth and development stages (Figure S3).
Sub-cellular localization prediction of OMT genes
Understanding and determining the sub-cellular localization of proteins is an important strategy to identify the function of protein at cellular level [21]. This approach includes proteomic-based experiments and microscopic high throughputs [22, 23]. Several sequence-based approaches have been developed to predict the sub-cellular localization by providing amino acid sequences including PSORT [24], Yloc [25], BaCelLO [26], LOCtree [27]. According to CELLO prediction, most of OMT genes were located in the cytoplasm (Table 1), while seven genes were predicted in periplasm, including, GhOMT45_At, GhOMT45_Dt, GhOMT46_Dt, GhOMT48_At, GhOMT48_Dt, GhOMT49_At, and GhOMT49_Dt. Five OMTs were predicted to be localized in both periplasm and cytoplasm, including GhOMT47_At, GhOMT47_Dt, GhOMT53_At, GhOMT54_Dt, and GhOMT68_At. Two genes GhOMT82_At and GhOMT82_Dt were predicted in the outer membrane. Only GhOMT55_At was predicted in inner membrane and cytoplasm (Table. 1). The results of Wolf Psort were highly in agreement with those of CELLO analysis regarding the presence of most of the OMT genes in cytoplasm, however, with exceptions of GhOMT48_At, GhOMT82_At, GhOMT82_Dt, which were predicted in chloroplast and one gene GhOMT76_Dt in mitochondria (Table. 1). The function of the OMT genes might be related to their predicted localizations, though the experimental approach is still needed for further confirmation.
GO enrichment and KEGG pathway analyses
To understand the functional annotations of OMT family genes of G.hirsutum, 82 genes in G. hirsutum were undergone through gene ontology (GO) enrichment, kyoto encyclopedia of genes and genomes (KEGG Pathway), and InterPro analyses. GO term analysis verified their O-methyltransferase activity of all 82 OMT genes, while 62 of the 82 genes were also enriched in methyltransferase activity and 53 of the 82 genes in protein dimerization activity (Figure 3.a). KEGG Pathway analysis revealed that these OMTs were involved in different metabolic pathways. Twenty-nine OMTs were involved in monolignol biosynthesis, phenylpropanoid, secondary metabolism, and metabolic pathways respectively. Eleven genes were involved in phenylalanine and flavonoid biosynthesis pathways respectively (Figure 3.b). InterPro analysis (http://www.ebi.ac.uk/interpro/) categorized these 82 OMT genes as functional genes of S-adenosyl-L-methionine-dependent methyltransferase (Figure 3.c). Sixty-two genes were also predicted in categories of methyltransferase_2 and O-methyltransferase COMT-type respectively (Figure 3.c), while fifty-seven in winged helix-turn-helix DNA-binding domain, fifty-three in plant methyltransferase dimerization (Figure 3.c).
Expression profiling of OMT genes and their homologues in fiber development and salt stress
In order to verify the biological functions of OMT family genes, several transcriptome data sets including TM-1 [28], G. arboreum, G. raimondii, CSSLs [29], and RILs [30], were applied to analyze their expression profiles in different developmental stages, organs, or tissues, and responses to various abiotic stress treatments. The transcriptome clusters showed that the OMT genes can be assorted into three basic groups (Figure 4A): Those that have a broad responses to different developmental stages from germination to fiber maturation, typical examples of which included GhOMT48_At, GhOMT48_Dt, GhOMT49_At and GhOMT49_Dt; those that have specific responses to root development, including GhOMT1_At, GhOMT2-At and GhOMT40_At; and those that have responses to early germination in seed, cotyledon, root and stem, including GhOMT47_At, GhOMT9_Dt, and GhOMT58_Dt. When fiber specific transcriptome data sets of G. arboreum, G. raimondii were applied to observe the expression profiling diploid OMT family genes, the result also supported specific expression profiling of some OMT genes in diploid species of G. arboreum (Figure 4.B) and G. raimondii (Figure 4.C).
The gene expression profiling was further verified with trancriptome datasets of RILs (Figure 5.A) two CSSLs (Figure 5.B and 5.C). The results showed that the genes that had specific expressions during fiber development (Figure 4.A) also had specific expressions in fiber development of RILs and CSSLs materials. These genes had a highly consistent expression profiling among the different cotton cultivars and lines during fiber development. Some selected GhOMT examples genes, GhOMT49_At (Figure 5.D), GhOMT70_At (Figure 5.E), GhOMT48_At (Figure 5.F), GhOMT10_Dt (Figure 5.G), and GhOMT49_Dt (Figure 5.H), were verified through qRT-PCR using sGK9708 and 0-153, the two parental lines of the RIL population with different fiber quality traits. The results showed that GhOMT48_At, GhOMT49_At, and GhOMT49_Dt were significantly up-regulated during fiber development in sGK9708 than in 0-153 Figure 5.D, 5.F and 5.H) and that GhOMT70_At and GhOMT10_Dt did not show differences between the two cultivars (Figure 5.E and 5.G). Noticeably, GhOMT49_At and GhOMT49_Dt reached the highest expression levels at 20 DPA and their high expression lasted in a short time as compared with that of GhOMT48_At. GhOMT48_At had a rapid expression increase from 10 DPA to 15DPA and then its expression steadily increased until 25 DPA when it reached its highest expression level.
Based on the expression profiling of the OMT gene family in responses to cold, hot, osmotic, and salt stress treatments (Figure 6.A), two genes specific in salt stress responses, GhOMT70_At and GhOMT10_Dt, and three genes specific in fiber development, GhOMT48_At, GhOMT49_At, and GhOMT49_Dt, were verified by qRT-PCR with RNA samples extracted from salt treatment. The results indicated that both GhOMT70_At and GhOMT10_Dt showed an elevated expression in salt treatments in salt-tolerant cultivar as compare to the control treatments (Figure 6.B and 6.C). These two genes had different expression profiles from 2h to 6h after salt treatment. GhOMT70_At had the highest expression at 2h and then its expression went down at 6h; whereas GhOMT10_Dt had an increasing expression pattern from 2h to 6h. Both genes had much higher expression in roots than in stem or leaf.