We identified the total 7 isoforms of sHsps (Hspb1, Hspb 2, Hspb 3, Hspb 7, Hspb 8, Hspb 9, and Hspb 11) through a comprehensive BLASTP and SMART search approach in the genomes of six selected fish species, including three warm freshwater species; C. carpio, L. rohita, D. rerio and three cold marine water species; S. salar (anadromous fish), O. mykiss (anadromous fish), and Clupea harengus (Oceanodromous fish). We filtered out total of 42 non-redundant protein encoded by 7 sHsps genes to retrieve protein and CDS sequences in representative fish species (Table 1). The sequences were ignored for the analysis if they could not be found in any of the species under consideration. These CDS have been assembled into the comprehensive multiple sequence alignment, which was then used as input for the generation of Bayesian-phylogenetic trees and the genetic characterization of each representative fish species. We also perform an investigation of the genomic sequences of sHsps in representative fish species in order to assess the selection pressure that has been imposed on these genes. Selection pressure is a potential factor in the development of adaptive features. We analyze the genomic sequences of these genes in order to estimate the selection pressure. There is a connection between fishes' ability to tolerate stress and presence of sHsps. In addition to this, we investigated the evolutionary processes that are responsible for the formation of sHsps in fish, as well as their subsequent structural and functional features.
Phylogenetic Analysis
The sHsps genes of six representative fish species were analyzed phylogenetically, and all of the identified genes were divided into three major groups. The sHsps-A group comprises the genes Hspb1/2/3/8, sHsps-B group has Hspb 7 and sHsps-C group Hspb 9/11 genes. Among the representative fish species, the sHsps gene family exhibited stronger homology within the members of warm freshwater fish species (C. carpio, L. rohita, and D. rerio) than the members of cold marine water fish species (S. salar, O. mykiss, and C. harengus). Similarly, sHsps genes in cold-water fish species were more similar to each other than to those in warm-water fish. Moreover, these findings also indicates that the sHspb 9 gene have lost dramatically in S. salar, O. mykiss; both are cold-water migratory anadromous fish species (Fig. 1). Multiple sequence alignment, which is essential for establishing the structure and function of proteins, used to locate the conserved areas in functional domain of each isoform of small heat shock protein. The lightest shade of red is representative of the same amino acids, whereas the lightest shade of green is representative of amino acids that are comparable. The color light yellow is represented by the amino acids that do not share an identity or resemblance with any other amino acid. While the color white is meant to represent the spaces between amino acids (Supplementary Fig. 1).
Structural Organizations of sHsps:
In order to carry out the process of structural characterization of sHsp gene family in fish species, studies of motifs pattern, conserved domains and gene organization were carried out. (Fig. 2A). We began our investigation into the organizational structure of the sHsp genes found in fish by comparing the exon-intron architecture of the respective genes. According to the findings, the architectures of the introns and UTRs were quite distinct from one another, and the number of introns and exons that are present in each species. In Warm-water fish species C. carpio exhibited strong structural similarity with L. rohita across most of the sHsps variants as compared to other representative fish species. While among cold-water fish species S. salar (Atlantic salmon; anadromous fish) showed higher genetic structure similarity with O. mykiss (Rain bow trout; anadromous fish). Additionally, various intron loss or gain events have been discovered in Hspb 3, Hspb 9 and Hspb 11. Consequently, changes in the gene’s structures may have been the driving force behind the divergence that occurred in the sHsp family during the course of fish evolution .
Following this, the structural diversity has been analyzed with MEME, and results showed that the sHsps proteins had one common motif crosspond to conserved domain (Fig. 2B). In general, a common theme distribution appeared for sHsp within each group of fishes (warm freshwater fishes; cold marine water fishes), showing that various fishes have diverse roles that are adaptable to different settings. Maximum number of conserved motifs (5) were detected for Hspb1/2/8/11 genes while minimum (4) was observed in Hspb 3/7/9 70 genes. In addition, in some sHsp groupings, we also discovered a number of distinct motifs. For example, Motif-4 and Motif-10 only was restricted to Hspb 11 in each fish group (Fig. 2B).
The Conserved Domain Database Basic Local Alignment Search Tool (CDD BLAST) was employed to identify the conserved domains, as depicted in Fig. 2C. As the most widely diversified sHsps consist of three domains. Main domain, Alpha Crystallin HSP-p23 like superfamily (ACD) was detected in all sHsps in both groups of fishes. While minor domains, crystallin superfamily domain in Hspb 2 and Herpes-BLF1 domain in Hspb 3 was detected in S. salar and O. mykiss and O. mykiss, respectively. We found that ACD almost have a similar length of the amino acid sequence with little differences within groups of fishes. If we understood better about the differences and similarities across structures of proteins, we might be able to investigate the function of sHsp in a variety of species more thoroughly (Fig. 2C).
sHsp gene family's physiochemical characteristics:
Distribution of sHsp genes on the chromosome, the number of exons, the molecular weight (Da), the number of amino acids (A.A) in each peptide, the aliphatic index (AI), the isoelectric point (pI), the instability index (II), and the Grand Average of the Hydropathicity Index (GRAVY) were examined to determine the physiochemical properties sHsps in fishes. These results are presented in Table 2. Each variant of sHsps gene family were found on comparable chromosome in the groups of warm fresh water fishes while in the group of cold-water fishes, these were indiscriminately distributed on different chromosomes. The molecular weight sHsps ranged from 10 to 25 kDa and isoelectric point range from 4.68 to 6.52. The values of aliphatic index have been found to be > 29.93, that demonstrate the thermostable qualities of all sHsp in both group of fishes. All proteins exhibit the basic nature and appeared unstable based on the instability index which had values more than 40. All sHsps in representative species were hydrophilic due to the lower negative GRAVY values.
3.6. Chromosome Localization and gene Duplication analysis of Hsp in Fishes
Figure 3(A-F), illustrate the chromosomal localization of sHsp proteins for each fish species. In addition, the ratio of synonymous substitutions (Ks) to nonsynonymous substitutions (Ka) was determined for each homologous gene pair found in each fish species. To enhance comprehension of the evolutionary past, it is important to engage in a thorough analysis, the gene-duplication events that occurred within gene family of investigated fish. In C. carpio the three homologous gene pairs; Hspb 1-Hspb 2, Hspb 7-Hspb 8, and Hspb 9-Hspb 11were detected to be segmentally duplicated (Table 3). In L. rohita a total of 3 homologous gene pair of which two (Hspb 1-Hspb 3, Hspb 2-Hspb 8) exhibited tandem duplication while one (Hspb 9-Hspb 11) showed segmental duplication. In D. rerio, 3 homologous gene pair were detected. Among these two gene pairs; Hspb 9-Hspb 7 and Hspb 11-Hspb 8 showed segmental duplication and one gene pair; Hspb 1-Hspb 2 exhibit tandem duplication. S. salar showed two homologous gene pairs Hspb 8-Hspb and Hspb 11- Hspb 3 to be segmentally and tandemly duplicated. O. mykiss also two homologous gene pairs, Hspb 2-Hspb 3 and Hspb 11-Hspb 1 that were detected to be segmentally duplicated. C. harengus exhibited three gene pair Hspb 8-Hspb 2, Hspb 3-Hspb 1 and Hspb 9-Hspb 11 that were tandemly and segmentally duplicated, respectively. All gene pairs of sHsps in all representative fish had Ka/Ks ratios < 1(Table 3).
Table 2
Physiochemical properties of Hsps in fishes
Species | Genes | Chr | MW | AA | PI | II | AI | Gravy |
---|
Cyprinus carpio | Hsp beta 1 | B5 | 23512.59 | 211 | 6.38 | 73.71 | 58.25 | -0.519 |
Danio rerio | 5 | 22408.57 | 199 | 6.52 | 71.52 | 58.74 | -0.529 |
Labeo rohita | 5 | 23349.62 | 211 | 6.52 | 69.66 | 57.77 | -0.454 |
Salmo salar | 13 | 23420.61 | 208 | 5.48 | 47.47 | 83.41 | -0.348 |
Oncorhynchus mykiss | 12 | 19631.23 | 178 | 5.45 | 52.25 | 88.15 | -0.320 |
Clupea harengus | 8 | 25355.86 | 224 | 6.08 | 90.93 | 60.85 | -0.593 |
Cyprinus carpio | Hsp beta 2 | B5 | 19439.24 | 168 | 5.73 | 29.94 | 86.96 | -0.477 |
Danio rerio | 5 | 19628.43 | 169 | 5.54 | 33.41 | 84.73 | -0.482 |
Labeo rohita | 5 | 19502.27 | 168 | 5.54 | 33.30 | 84.64 | -0.520 |
Salmo salar | 11 | 18855.21 | 167 | 6.10 | 32.12 | 77.54 | -0.453 |
Oncorhynchus mykiss | Y | 18883.26 | 167 | 6.10 | 36.36 | 79.28 | -0.437 |
Clupea harengus | 8 | 19585.24 | 169 | 5.52 | 40.72 | 86.45 | -0.453 |
Cyprinus carpio | Hsp beta 3 | B5 | 17243.57 | 150 | 5.08 | 60.07 | 81.20 | -0.433 |
Danio rerio | 5 | 17114.37 | 150 | 5.02 | 52.54 | 81.87 | -0.423 |
Labeo rohita | 5 | 17132.38 | 150 | 4.96 | 64.91 | 78.67 | -0.435 |
Salmo salar | 24 | 19577.08 | 173 | 4.96 | 45.42 | 81.04 | -0.454 |
Oncorhynchus mykiss | 6 | 25888.33 | 230 | 4.97 | 39.15 | 78.65 | -0.486 |
Clupea harengus | 7 | 16709.74 | 149 | 5.05 | 31.47 | 75.97 | -0.493 |
Cyprinus carpio | Hsp beta 7 | B23 | 17514.36 | 160 | 6.07 | 35.89 | 55.50 | -0.608 |
Danio rerio | 23 | 17532.29 | 161 | 5.77 | 41.37 | 56.96 | -0.586 |
Labeo rohita | 23 | 17534.36 | 160 | 7.78 | 38.07 | 56.06 | -0.578 |
Salmo salar | 22 | 10667.85 | 96 | 4.68 | 51.61 | 71.98 | -0.520 |
Oncorhynchus mykiss | 7 | 10508.71 | 96 | 4.79 | 47.50 | 73.02 | -0.388 |
Clupea harengus | 4 | 17621.43 | 161 | 5.51 | 50.16 | 58.70 | -0.552 |
Cyprinus carpio | Hsp beta 8 | A5 | 23715.45 | 209 | 5.12 | 59.26 | 49.00 | -0.861 |
Danio rerio | 5 | 24404.19 | 216 | 4.89 | 58.37 | 54.17 | -0.772 |
Labeo rohita | 5 | 23630.38 | 209 | 5.02 | 58.19 | 50.86 | -0.839 |
Salmo salar | 24 | 23316.06 | 208 | 5.19 | 64.81 | 53.73 | -0.677 |
Oncorhynchus mykiss | 11 | 23925.68 | 215 | 5.02 | 70.62 | 50.79 | -0.722 |
Clupea harengus | 7 | 23364.10 | 2010 | 4.83 | 55.68 | 54.33 | -0.678 |
Cyprinus carpio | Hsp beta 9 | B3 | 23814.34 | 211 | 5.17 | 67.76 | 68.86 | -0.652 |
Danio rerio | 3 | 23249.66 | 204 | 4.85 | 71.87 | 63.09 | -0.790 |
Labeo rohita | 3 | 23725.04 | 211 | 4.65 | 68.85 | 65.69 | -0.655 |
Salmo salar | - | - | - | - | - | - | - |
Oncorhynchus mykiss | - | - | - | - | - | - | - |
Clupea harengus | 1 | 28220.54 | 259 | 5.26 | 65.49 | 65.21 | -0.524 |
Cyprinus carpio | Hsp beta 11 | A21 | 23861.04 | 207 | 5.42 | 56.10 | 58.84 | -0.722 |
Danio rerio | 21 | 23764.69 | 205 | 5.29 | 51.22 | 58.00 | -0.776 |
Labeo rohita | 21 | 23858.00 | 207 | 5.67 | 61.40 | 58.41 | -0.716 |
Salmo salar | 13 | 23838.24 | 209 | 5.23 | 56.03 | 60.14 | -0.619 |
Oncorhynchus mykiss | 5 | 23736.08 | 209 | 5.33 | 46.17 | 57.85 | -0.612 |
Clupea harengus | 12 | 22933.18 | 202 | 5.55 | 48.10 | 63.76 | -0.624 |
MW: molecular weight in Dalton, A.A: number of amino acids, PI: Isoelectric point, AI: Aliphatic index, II: Instability index, GRAVY: Grand average of hydropathicity index.
Table 3
Analysis of Ka/Ks ratio for each gene duplication pair of sHsp in Fishes.
Species | Pair of Gene | Chr | Duplication | Ka | Ks | Ka/Ks |
---|
C. carpio | Hspb 9-Hspb 11 | A15/A21 | SD | 0.36225 | 0.56935 | 0.636252 |
Hspb 1-Hspb 2 | B5/A5 | SD | 0.2225 | 0.3827 | 0.581395 |
Hspb 7-Hspb 8 | A15/A5 | SD | 0.25845 | 0.3876 | 0.666796 |
D. rerio | Hspb 1-Hspb 3 | 5/5 | TD | 0.32925 | 0.4813 | 0.684085 |
Hspb 2-Hspb 8 | 5/5 | TD | 0.31775 | 0.4994 | 0.636264 |
Hspb 9-Hspb 11 | 3/21 | SD | 0.312 | 0.375 | 0.832000 |
L. rohita | Hspb 9-Hspb 7 | 3/23 | SD | 0.34695 | 0.5725 | 0.606026 |
Hspb 11-Hspb 8 | 21/5 | SD | 0.38015 | 0.58195 | 0.653235 |
Hspb 1-Hspb 2 | 5/5 | TD | 0.2013 | 0.38695 | 0.520222 |
S. salar | Hspb 8-Hspb 1 | 24/13 | SD | 0.3425 | 0.5791 | 0.591435 |
Hspb 11- Hspb 3 | 21/24 | SD | 0.3245 | 0.2876 | 1.128303 |
O. mykiss | Hspb 2-Hspb 3 | 13/6 | SD | 0.26965 | 0.5233 | 0.515288 |
Hspb 11-Hspb 1 | 5/12 | SD | 0.28725 | 0.4073 | 0.705254 |
C. harengus | Hspb 8-Hspb 2 | 7/8 | TD | 0.3249 | 0.5831 | 0.557194 |
Hspb 3-Hspb 1 | 7/8 | TD | 0.32085 | 0.42045 | 0.763112 |
Hspb 9-Hspb 11 | 1/12 | SD | 0.3111 | 0.3904 | 0.796875 |
Chr: chromosome; Ka (non-synonymous substitutions); SD: Segmental duplication; Ks (synonymous substitutions); TD: Tandem duplication |
Secondary Structure analysis of Heat Shock Proteins
The associated secondary structure and functional annotations of these proteins were predicted with the use of homologous sequence analysis of sHsps of common carp. Understanding how proteins attach to other proteins, as well as DNA and RNA, is essential to making any biological activity. The intricate details about structural annotations (secondary structure, solvent accessibility, Protein disorder) and functional annotations (proteins, DNA, and RNA binding sites) of each sHsp is shown in Fig. 4A. The output values are the reliability of the positive prediction. The scale of the positive prediction is ranged from 0 to 100. The higher score means more reliable prediction. Color codes, blue, pink and yellow represent the reliability index of positive prediction low (RI = 0–33), intermediate (RI = 34–66) and high (RI = 67–100) respectively for each structural and functional annotation of protein. The prediction of protein binding sites in the corresponding domain region, proposed that the formation of protein-protein interfaces is facilitated by the involvement of distinct regions within Hspb 1, Hspb 2, Hspb 3, Hspb 7, and Hspb 8. In Hspb 9 and Hspb 11 protein binding sites were absent in domain region. These characteristics are necessary for the interaction of a protein with its targets. However, DNA binding positive prediction was only observed in Hspb 9 its reliability was week (RI = 0–33). In all sHsps, RNA binding sites were present while strongest positive prediction was observed in Hspb 3 and Hspb 7 proteins.
The hydrophobic cluster analysis (HCA) is an innovative method for protein sequence analysis. This mechanism provides entry to the diverse range of protein structures that can be folded. At the amino acid sequence level, the differences between order and disorder may be seen since the majority of order-promoting residues are strong hydrophobic amino acids, including as V, I, L, M, Y, and F. These amino acids primarily belong to regular secondary structures and contribute to the densely packed cores of globular domains. Hydrophobic cluster analysis (HCA) takes into account the straightforward distinction that exists between non-hydrophobic and hydrophobic amino acids by employing the representation of two dimensions protein sequence. HCA correspond to the regular secondary structures, with a horizontal shape indicate of the alpha, vertical shape indicate of beta state and hydrophobic residues are outlined. Moreover, Proline, glycine, serine and threonine residues having the phosphorylation activity are represented by special symbols. The development of hydrophobic clusters of each isoform of sHsps were revealed by HCA in regions consistent with conserved domain (Fig. 4B). There was shown to a higher hydrophobic character in the ACD of all sHSps. This hydrophobic character may favor multimerization and play a role in trapping exposed hydrophobic areas of unfolding proteins. The varied stress-protective and structural oligomeric features of the various sHSps can be modulated by the combination of a conserved ACD domain with variable C-terminal extension and N-terminal domain. In addition, the existence of foldable regions is indicated by a high HC density. These regions can belong to either globular, soluble and membrane domains, that depends on the overall amount of hydrophobic amino acids and the lengths of the HCs they contain. On the other hand, regions that are devoid of HCs or that only have a limited number of HCs that are either sparsely distributed or only very small in size often correspond to entirely disordered sequences or flexible linkers.
Site-specific Positive selection analysis
We employed analysis of probability to assess multiple ratio-based models in order to locate codons on sHsps genes that had been the focus of positive selection. Using the CodeML Programme, we were able to determine the values of the parameters associated with the selection of sHsps gene in all six species of fish. Positive selection was investigated by comparing two sets of models (M1a vs. M2a and M7 vs. M8). According the results of the likelihood ratio test (LRT), which had a value of 0 (p 0.05), none of heat shock gene tests were statistically significant in M1a-M2a. In contrast, the likelihood ratio test (LRT) score of 0.25 for Hspb 1, Hspb 2, Hspb 7, Hspb 8, and Hspb 11 indicated that the gene test was significant (p > 0.05) in M7-M8. While, the positive gene test results for Hspb 3, and Hspb 9 were not significant (p > 0.05). On the other hand, the models M1a vs. M2a and M7 vs. M8 were extremely significant for the Hspb 11 gene, with LRT values of 1.000 and 0.9, respectively, as shown in Table 4.
By performing FEL and MEME analyses, in addition to determining the global values, we were able to draw further conclusions regarding the evolutionary signs of positive selection. According to the findings of our research, the genes Hspb 1, Hspb 3, Hspb 8, Hspb 9 and Hspb 11 in fish have been subjected to strong evidence of positive evolutionary selection (Supplementary Table 2, Supplementary Fig. 2). One of the procedures in Bayesian technique employed to determine the regions impacted by selective pressure was the computation of posterior probabilities for each codon. In contrast to websites with lower probabilities, those with higher probabilities exhibit an increased likelihood of undergoing positive selection. Positive selection has been further validated by the use of Selection server, which recognizes adaptive selection in protein at the position of individual amino acid. This allowed us to analyze the false positive outcomes that were produced by the study performed using CodeML and data monkey. The MEC-model analyses the differences in amino acid exchange rates and presents its findings. As a consequence of this, we discovered evidence of adaptive selection at a number of different amino acid locations in Hspb 1, Hspb 2, Hspb 8, Hspb 9, and Hspb 11. In our analysis, we also found that Alpha Crystallin domain (ACD) of sHsp proteins had evolved.
The goal of this research is to obtain a better understanding of the possible intermolecular interactions that these favorably chosen areas of the heat shock proteins have with the conserved functional domains. P106, S107, and S191 were found to be the residues in the Hspb 1 protein-protein interaction residues that were under positive selection, while L67 and R120 were found to be the primary interacting residues that were observed under strong selective pressure in the Hspb 2 protein. Residue S155 was the primary interacting residue that was identified under selection in the Hspb 8 protein. In the Hspb 9 protein, T103, R104, D122, S129, H132, P138, and A144 were the primary interacting residues that were found under selection. In the Hspb 11 protein, N167 was the primary interacting residue that was found under selection (Fig. 5).
Table 4
Results of positive selection sites in Small Heat Shock Proteins (sHsps)
Gene | Model | lnL | LRT | PAML | FEL | MEME |
---|
Hspb1 | M1a M2a M7 M8 | -2095.978830 -2095.978830 -2091.851997 -2090.497209 | 1 0.25 | 66,70 | 108,146,164 | 15,40,108,164,171 |
Hspb2 | M1a M2a M7 M8 | -1793.056124 -1793.056124 -1781.775244 -1780.622530 | 1 0.30 | 0 | 0 | 0 |
Hspb3 | M1a M2a M7 M8 | -1712.286012 -1712.286012 -1710.850284 -1707.305342 | 1 0.02 | 26, 142 | 56,137 | 50,56,71,137 |
Hspb7 | M1a M2a M7 M8 | -986.995064 -986.995064 -988.325121 -986.800342 | 1 2 | 11,79 | 0 | 0 |
Hspb8 | M1a M2a M7 M8 | -2164.206906 -2164.206906 -2158.209370 -2157.238652 | 1 0.3 | 150,188 | 104,130,183,186 | 34,130,160,183,186 |
Hspb9 | M1a M2a M7 M8 | -1913.515426 -1911.848285 -1915.949899 -1910.50877 | 0.1 0 | 59,65,69,70,80, 121,167 | 65,80,142 | 65 |
Hspb11 | M1a M2a M7 M8 | -2196.783521 -2196.783521 -2185.641534 -2185.641669 | 1 0.9 | 97,167 | 0 | 97 |
Coevolution analysis
Through the use of co-evolution analysis, the functional and structural characteristics of favorably selected residues have been investigated further. This was accomplished by locating the residues' coordinated connections with one another. In order to accomplish this goal, it was necessary to locate additional residues that, during the course of evolution, had co-varied with favorably chosen residues. It's possible that the structural or functional relationships between different amino acid sites in a protein are the cause of their coevolving relationship with one another. As a result, we carried out a study of coevolution with homologs of Hspb 1, Hspb 2, Hspb 9, and Hspb 11as inputs. This allowed us to identify a number of coevolving residue pairs that had been identified as being subject to positive selection in above investigations. In order to identify a connection between substantially related residues, a schematic displaying the networks was constructed (Fig. 6). Amino acids that have been found to have a greater number of co-evolutionary interactions than those that have been found to have fewer co-evolutionary contacts have been found to likely evolve more slowly. For example, in Hspb 1, P106 and P211 showed no co-evolutionary contacts with other residues and may have evolved very slowly. However, in Hspb 9 all the positively selected residue displayed greater number of co-evolutionary contacts likely evolve more steadily as compared with the other heat shock proteins exhibiting positive selection. Moreover, we also identified highly connected residues in the network, such as P12, F8, M1 residues in Hspb 9, and P138 residue in Hspb 11 exhibited high conservation (Fig. 6). Nodes in a subnetwork contained the positively selected residues.
Conservation analysis
The degree to which an amino acid in a protein has been conserved throughout evolutionary history is a reflection of the equilibrium that has been discover between the natural inclination of the amino acid to mutate and the overall requirement to preserve the protein's structural integrity as well as its function. Consurf analysis is a web-based tool that use sophisticated probabilistic evolutionary models to determine the evolutionary rates of proteins based on the phylogenetic relationships that exist among the homologues as well as the unique dynamics of the sequences that have been analyzed were estimated. In current study, sHsps demonstrate a lower degree of conservation at the majority of their amino acid sites; as a result, these proteins are developing at a rapid rate (Fig. 7).