Identification of LACS family members
In order to identify the members of LACS gene family in cotton, we used the hidden Markov model of PF05001 as the query condition, and compared the proteome and genome of G. hirsutum, G. barbadense, G. arboreum and G. raimondii with local blast software. In this process, we screened according to the domain AMP-binding, and determined the family members by incomplete and gene knock-out without this domain. Finally, the number of genes identified in G. hirsutum (GHLACSs), G. barbadense (GBLACSs), G. arboretum (GaLACSs) and G. raimondii (GrLACSs) was 38, 39, 22 and 20 respectively. According to the number of genes in four cotton varieties, it is determined that the LACS gene in cotton belongs to conservative type, which is basically consistent with the evolution of allotetraploid cotton[13].they were named GHLACS1-38, GBLACSC1-39, GaLACS1-22, GrLACS1-20 according to their position on the chromosome.
Then we analyzed the physical and chemical properties of LACS family members, such as protein length, protein molecular weight, isoelectric point, protein hydrophilicity and hydrophobicity (Table.S1). We found that in G. hirsutum, the length of proteins encoded by Lacs family members ranged from 535aa (GHLACS13) to 762aa (GHLACS36). The molecular weight ranges from 59.29kDa (GHLACS27) to 85.11kDa (GHLACS36), and the isoelectric point ranges from 5.3 (GHLACS17) to 8.17 (GHLACS2). The encoded protein's length and molecular weight also fluctuate within a specific range in other cotton species. Furthermore, we have also made predictions about the subcellular location of these family members. The results showed that 25 genes in the GHLACS family were predicted to be located in plastids, 6 in cytoplasmic matrix, 4 in endosperm,2 in nucleus and 1 in cell membrane.
Analysis on the Evolutionary Relationship of Family Members
In order to study the evolutionary relationship of LACS protein family, you have compared the full-length protein sequences, using 128 protein sequences from Arabidopsis thaliana and four cotton species (Fig.S1). The evolutionary relationship between these sequences was constructed by MEGA7 software, and the rootless phylogenetic tree was generated (Fig. 1A). In the results, you divided LACS protein into six subgroups, and observed that subgroup B and subgroup F contained a large number of genes, with 8 and 9 GHLACS genes respectively. In addition, you also found that in Arabidopsis, except for subgroups D and E, the other four subgroups all contain the LACS gene of Arabidopsis. It is worth noting that according to the number of genes contained in four cotton varieties in each subgroup, it is found that the number of tetraploid cotton variety G. hirsutum is almost twice that of diploid cotton variety, This offers evidence indicating that G. hirsutum is the product of hybridization between two diploid cotton varieties (G. arboreum and G. raimondii). Additionally, we conducted an analysis of the homologous genes of G. hirsutum, and it was found that these genes evolved slowly after repeated events, which was conservative at the protein level. In the phylogenetic tree, all species have gene pairs from the same node, indicating that all species' LACS genes have experienced gene duplication events, which eventually led to the expansion of LACS genes. The replication of genes varies across different species and populations. Specifically, upland and island cotton both display a significantly higher gene count compared to other species, indicating a substantial evolutionary expansion of the LACS gene family in these two types of cotton. Furthermore, we constructed a phylogenetic tree that illustrates the evolutionary relationship between Arabidopsis thaliana and upland cotton (Fig. 1B), and found that there was no ATLACSs gene in subgroup E, while the LACS gene in subgroup E might be unique to G. hirsutum.
Distribution of LACS gene family members on chromosomes
In order to further study the genetic differences of LACS gene family members, we drew the chromosome distribution map (Fig. S2-A) and statistical map (Fig. S2-B) of LACS gene family members by using the gff3 file and gene ID information of the genome. Through our research, we discovered that the LACS genes in Gossypium hirsutum, Gossypium barbadense, Gossypium arboreum, and Gossypium raimondii are located on the chromosomes specific to each respective cotton variant. Furthermore, 118 family members were found on varying specific chromosomes, with only the GaACS22 gene located on a scaffold. This result shows that the genetic evolution process of LACSs gene is mature and stable. We discovered a gene in Gossypium hirsutum, located in a similar position on chromosome 1 of both subgroup A and subgroup D, by examining the distribution position of genes. This discovery is the same as Gossypium barbadense, and GHLACS16 and GHLACS34 are homologous genes in evolutionary relationship, and the two genes have similar promoter action elements. Notably, subgroup A of this cotton seed contains two additional genes compared to subgroup D. Furthermore, chromosomes A05 and D05 exhibit a higher gene count compared to other chromosomes in the genome.
The number of genes in Gossypium barbadense differs between the two subgroups. More genes are found on chromosomes 1 and 5 of both subgroup A and D compared to the other chromosomes. Interestingly, there is no gene distribution on chromosomes 4,6,12 and 13 of subgroup A and chromosomes 2,8,12 and 13 of subgroup D of two tetraploid cotton species, which may be related to the chromosome deletion of two cotton species during evolution. In Gossypium arboreum, one gene was mapped to scaffold, and the number of genes on Chr01 and Chr05 was 4,while the gene on Chr08 was deleted. However, in Gossypium raimondii, there are more genes on Chr01 and Chr05, with 3 and 4 genes on the corresponding chromosomes, while Chr02, Chr08, Chr012 and Chr13 of this cotton species have no LACS family gene distribution. Notably, we found that the deletion of LACS gene family was consistent with that of two tetraploid cotton varieties, both on chromosomes 2, 8, 12 and 13,which confirmed to some extent that the donor species of D subgenome of two tetraploid cotton varieties mentioned in previous studies were Gossypium raimondii [14]. The allocation of LACS genes across the 13 chromosomes among different cotton species exhibits an uneven distribution. Interestingly, there is no apparent correlation between the number of genes allocated to each chromosome and the respective chromosome length.
Motif and gene structure analysis of conserved protein of GHLACS
The MEME online website was utilized to examine the amino acid sequence encoded by the GHLACS gene, which revealed a total of 10 motifs (Fig. 2) were identified, which perform different functions and are distributed in the sequences of various subgroups. In subgroup A, the length of gene sequence is quite different, including 7–10 motif, and their gene structures are also quite different. However, subgroup B and subgroup C have similar motifs, and both subgroups contain 10 motifs and frequent intron intervals. Most members of subgroup D include motif6, 2, 4, 7, 5, 3, 1 and 8, and the intron length of this subgroup is generally long. As for the members of subgroup E, except GHLACS13, they all contain motif 6, 2, 4, 9, 5, 1 and 8. Interestingly, GHLACS13 contains two motifs 9. Except for GHLACS7 and GHLACS27, all the members of subgroup F contain 10 motifs, excluding motif 8. There are similar motifs in the same subgroup, which indicates that the protein structure in a specific subgroup is conservative, but the functions of most conservative motifs remain to be clarified. However, there is little difference in exon length of gene coding sequence, which indicates that the gene is conservative to some extent. Each subgroup showed high consistency in gene structure and phase distribution, which further proved the reliability of homologous grouping relationship.
Cis-acting elements and expression analysis of promoters
To investigate the importance of various promoter elements in abiotic stress response, we examined the evolutionary relationship sequence of the GHLACS gene family along with the sequence of the 2kb region preceding the initiation codon of its members (Table S3). Our analysis identified the response elements of the GHLACS gene family to abiotic stresses such as plant hormones, drought, light, and low temperature (Fig. 3). Among them, the number of light response elements is the largest, followed by MYB response elements. Most members contain MeJA response elements and salicylic acid response elements, while about half members contain gibberellin response elements. Notably, jasmonic acid pathway has been proved to be related to stress response [15, 16]. Furthermore, research has demonstrated the involvement of methyl jasmonate (MeJA), a plant growth and development regulator, in the regulation of gene expression in plants [17–19]. Exogenous MeJA can promote the accumulation of osmotic substances such as proline and soluble sugar, which is beneficial to the osmotic adjustment of plants to adapt to waterlogging stress [20].In addition, under the condition of salt stress, the use of MeJA can enhance the salt tolerance of crops. Enhancing crop salt tolerance not only mitigates the impact of salt stress on crop growth and development but also indirectly modifies the soil environment [21]. A large number of cis-acting elements are related to growth and development and plant hormone response, which indicates that GHLACS gene may involve very complex regulatory patterns in the transcription process. Within the same gene subgroup, the homeopathic elements contained in gene promoters are different, even the homologous genes with high similarity are different. Through the analysis of promoters, we can acquire a more profound comprehension of how the gene family responds to diverse plant hormones. which in turn aids in comprehensively understanding the regulatory network of the GHLACS gene family.
We used real-time quantitative research method to focus on the expression pattern of LACS gene during seed development and germination. The results showed that the expression of most LACS genes was relatively low, and only a few LACS family members responded significantly to salt stress. The expression difference between LACS members in subgroup A and subgroup B is the most significant, which indicates that these two subgroups are very important for plants to adapt to adverse environment. In particular, GHLACS20 was significantly up-regulated and GHLACS38 was significantly down-regulated under stress. Interestingly, members of subgroup D hardly responded to salt stress. Intriguingly, subgroup D members displayed minimal response to salt stress. The expression of the reaction varies among different members, much like how the expression of a single gene can differ across various tissues. We hypothesize that this could be tied to the diverse functional elements each member possesses, as well as each tissue's unique adaptability to their surroundings
Collinearity analysis
We carried out a collinearity analysis on genes from four cotton species (G. hirsutum, G. barbadense, G. arboreum and G. raimondii) to delineate their positional relationship, homology, amplification process, and arrangement sequence on the same chromosome. Firstly, we Blast the genome protein sequences of various cotton species, Then the homologous gene pairs were found by MCScanX [22], Combined with the chromosome length files between genomes, it was finally presented by CirCos [23]. The results are shown by a circle diagram (Fig. S3). A total of 489 repetitive gene pairs were identified in four cotton species (Table.S4). There are 35 pairs of segmental duplications and 454 pairs of whole-genome duplications. We found that out of the 38 genes in upland cotton, there are 40 gene pairs (Fig.S3-c), with 11 pairs being segmental duplications and 29 pairs being whole-genome duplications. In the 39 LACS genes of island cotton, there are 42 gene pairs (Fig.S3-h), with 11 pairs being segmental duplications and 31 pairs being whole-genome duplications. Interestingly, there are no tandem duplications in the LACS genes of the tetraploid cotton species. In G. arboreum, 22 genes formed 8 pairs of LACS homologous genes (Fig.S3-d), all of which were fragment repeats. At the same time, four pairs of LACS homologous genes (Fig.S3-a) were formed from 20 genes in G. raimondii, all of which were fragment replication. These results indicate that fragment replication /WGD is the main reason for gene amplification, and tandem replication does not seem to play a role in the evolution of LACS gene family. In our analysis of diploids (Fig. S3-f), it was found that G. raimondii had a total of 31 gene pairs. As for tetraploids, there were 108 gene pairs between G. barbadense (Fig. S3-e) and 70 gene pairs between G. arboreum and G. hirsutum. Moreover. the relationship between G. barbadense (Fig. S3-g) and G. hirsutum (Fig. S3-j) involved 68 gene pairs, while there were 58 gene pairs (Fig. S3-b) and 60 gene pairs (Fig. S3-i) connecting G. raimondii to G. hirsutum and G. barbadense, respectively.
Determination and analysis of selective pressure (Ka/Ks)
During the evolution of protein, replicated genes may undergo three functional changes: non-functionalization, where they lose their original function; sub-functionalization, where they partially lose their former function; and new functionalization, where they acquire a new function [24]. We leverage the computation of the synonym substitution rate (Ks) for synonym (Ka) to deduce the level of selective constraint, and additionally examine the gene pairs' selection pressure during the evolutionary process. We calculated the Ka, Ks, and Ka/Ks values for 357 pairs of homologous genes (Table.S5) of ten combinations of four cotton varieties (Ga-Ga, Ga-Gr, GB-Ga, GB-GB, GB-Gr, GH-Ga, GH-GB, GH-GH, GH-Gr, Gr-Gr). Ka/Ks < 1 is considered as purification selection, which indicates that natural selection eliminates harmful mutation and keeps protein unchanged. A Ka/Ks ratio greater than 1 indicates positive selection, suggesting that natural selection plays a role in protein evolution. This causes the mutation location to become rapidly fixed in the population, subsequently accelerating gene evolution. On the other hand, a Ka/Ks ratio of 1 indicates neutral selection, implying that natural selection does not impact the mutation process [25]. By utilizing this Ka/Ks ratio, we assessed the selection pressure on recurring gene pairs. The findings revealed that Ka/Ks was greater than 1 in 10 gene pairs and less than 1 in 347 gene pairs. Furthermore, among the 303 gene pairs, the Ka/Ks values ranged from 0 to 0.49. Our research results show that the Ka/Ks ratio of about 97% LACS gene pairs in four cotton varieties is less than 1(Fig. 4), This suggests that the genes within this group are incredibly preserved, thoroughly filtered and chosen, asserting their roles as crucial and challenging to alter.
Specific expression of LACS genes
In order to gain a deeper understanding of the role of the GHLACS gene in plant growth and evolution, we initially selected 12 genes from different subgroups based on the phylogenetic tree. Subsequently, we employed qRT-PCR technology to analyze their expression patterns in various tissues, including young roots, hypocotyls, and cotyledons. The results showed that (Fig. 5), most GHLACS members showed different degrees of response to salt stress, while GHLACS24 and GHLACS38 had almost no response to salt stress in all parts. Specifically, GHLACS23 only responded to salt stress in cotyledons, GHLACS3 responded in young roots and cotyledons, while GHLACS5, GHLACS16 and GHLACS20 responded in hypocotyls and cotyledons. In addition, GHLACS25, GHLACS31 and GHLACS32 all showed a certain degree of response. In particular, the GHLACS25 gene demonstrated a robust response to salt stress across all time intervals and in various types of tissues. This response was notably more distinct in both the hypocotyls and cotyledons. Therefore, we decided to focus on this gene.
Interaction network of GHLACS proteins
To gain insights into the function of the GHLACS protein, we employed the STRING database to explore its interaction network and analyze the protein sequences of homologous genes in Arabidopsis (Fig. 6). By studying the ATLACS protein in Arabidopsis thaliana, we were able to infer the potential function of the GHLACS protein. Research findings show that the protein equivalent to GHLACS25 is ATLACS4 as per the study results. The fact that AtLACS4 interacts with AtLACS, a closely related member of the same family in Arabidopsis thaliana, suggests that GHLACS may also be involved in cellular lipid synthesis and subsequent breakdown through β-oxidation [26]. AT5G60335 is a hydroxyl thioester dehydratase protein, which interacts with ATLACS4, suggesting that GHLACS25 may be involved in the dehydration process of 3- hydroxyl acp intermediate [27]. Additionally, the relationship between LACS4 and FATB could be crucial in providing saturated fatty acids necessary for plant growth and seed development [28]. The interaction between ECH2 and LACS4, which encode monofunctional alkenyl coenzyme a hydratase, indicates that GHLACS25 may be involved in the degradation of cis-unsaturated fatty acids [29]. In Arabidopsis thaliana, ECH2 mutant leads to growth defects in seedlings [30], and ECH2 plays a role in ethylene signal transduction [31]. These findings underscore the significance of ethylene not only in the regulation of plant growth and development, but also in how plants respond to various stresses [32, 33]. Interestingly, in cotton, the interaction between a GHECH2 protein and a GHECH2-like protein and GHLACS25 was verified by double-luciferase detection and yeast two-hybrid technique (Fig. 7), which further explained the important role of LACS protein in growth and development and resistance to stress.
VIGS silencing of GHLACS25
Cotton cotyledons were infected with no-load (EM) and plant virus-induced gene silencing (VIGS) for 10–15 days when cotton cotyledons were completely unfolded. Subsequently, we detected the expression of GHLACS25 gene in wild-type (WT), empty-load (EM) and VIGS plants by qRT-PCR (Fig. 8A). The results showed that the expression level of GHLACS25 gene in the leaves of wild-type and no-load plants was similar, and it was significantly higher than that of VIGS plants that hardly expressed GHLACS25 gene, which indicated that GHLACS25 gene had been successfully silenced. Then, we treated wild-type, empty-load and VIGS plants with 200mM NaCl. After 2 days of treatment, it was found that all three plants wilted to varying degrees, and the stress degree of infected wild-type and empty-load plants was obviously more serious than that of uninfected wild-type plants, and the stress degree of VIGS plants was the most serious, followed by empty-load plants and then wild-type plants (Fig. 8B and C). To some extent, this result shows that the tolerance of G. hirsutum plants to salt stress is affected after the GHLACS25 gene is silenced. Fig. 8 VIGS inquiry experiment. A Expression of GHLACS25 gene in leaves infected by VIGS; B and C Phenotype of cotton infected by VIGS under salt stress for 2 days
Yeast induced expression
Because the response to salt stress at the whole plant level is considered to depend largely on the tolerance mechanism of cells [34]. Therefore, the conservatism of the basic transport mechanism makes Saccharomyces cerevisiae a model system with considerable value for understanding the ion balance in plants [35]. In our research, it was found that PYES2-GHLACS25 and PYES2 were grown in SG-Ura with agar with different salt concentration gradients for 5 days, and we can clearly see that the growth of PYES2-GHLACS25 strain was weaker than that of PYES2 (Fig. 9A). Then, we cultured PYES2-GHLACS25 and PYES2 on SG-Ura broth medium with different salt concentration gradients for 24 hours, and then measured the OD value with spectrophotometer. The relative growth rate of cells was calculated by OD value (Fig. 9B). The growth conditions of PYES2 and PYES-GHLACS25 were different under different salt concentrations. With the increase of salt concentration, the growth inhibition of PYES2 yeast is more serious than that of PYES-GHLACS25 yeast, and this difference is more and more obvious. Combining the two experimental results, we concluded that the salt tolerance of yeast cells was increased after the expression of GHLACS25 was induced in yeast.