The results presented below are based on the NCBI genomes of the salmonids Atlantic salmon, brown trout, rainbow trout, sockeye salmon, coho salmon, chinook salmon and charr (see Material and Methods for details). All genomes, apart from charr and Northern pike, originated from completely homozygous or so-called double haploid animals thus eliminating the added confusion of allelic gene variants. To understand the evolution of genes, the salmonid data are compared against results from the Northern pike genome, a species that is basal to salmonids, but lacks the 4WGD [20](Fig.1). Genomes from the three Salmonidae genomes Coregonus, Hucho hucho and Thymallus thymallus were not included in this study since they contained un-annotated or incomplete genomic regions, thus not enabling informative comparisons.
The origin of the NCBI Salvelinus genome, now annotated as Salvelinus in NCBI, may potentially be Salvelinus malma malma and not Salvelinus alpinus as presented in the original article [21, 22]. Using standardised nomenclature exemplified by Sasa for Salmo salar and Eslu for Esox lucius, we also used Saal for Salvelinus alpinus although it may be Sama. We also use Oncorhynchus for coho salmon, chinook salmon, sockeye salmon and rainbow trout while we use Salmo for Atlantic salmon and brown trout (Fig.1).
Orthology between salmonid regions is a summary of data obtained from Christensen et al. and Sutherland et al. [21, 23] presented in Additional file 1. For brown trout, the linkage groups presented by Leitwein and co-workers [24] do not match the chromosome numbers in the NCBI genome, so regional orthology is currently based on blast match with region specific genes from other salmonids when this was informative.
We chose to define pseudogenes as those genes with internal stop codons and these genes have been given a -ps or ψ extension to the gene name. Partial genes have been given a -pt extension to separate them from remaining full-length bona fide gene sequences. The functional status of MHCI genes must await expression data from multiple tissues, multiple animals and diverse developmental stages.
Evolution of salmonid MHCIa and MHCIb regions
Based on previous data we define the genomic region containing the classical UBA locus as the MHCIa region and the duplicate region containing non-classical genes as the MHCIb region [6, 7]. Genes residing within these two regions also have an -a or -b extension. All salmonid genomes analysed in this study contained well-defined and annotated duplicated MHCIa and MHCIb regions (Additional file 2). The Ia region, containing the UBA locus, was overall identical for all salmonid species with one few exceptions. Brown trout has a unique CD5-like gene in between the SLC39A7a and RING2a_L gene.
The duplicate MHCIb region was also almost identical in all analysed species. The LHX9_L gene found in Northern pike is present in all salmonid MHCIb regions with the exception of Salvelinus. All but Salvelinus and Northern pike also have a varying number of chitin synthase-like (CHS2) genes in between the RXRB and SLC39A7 genes. Chitin synthase is a well-known molecule in fungi and invertebrates, but the functional role in fish and amphibians need to be defined [25]. In chinook salmon there is a duplicate of the entire MHCIb region (Genbank NW_020128813), which could be an assembly artefact as the sequenced animal was a double haploid.
Evolution of U lineage genes
Six Northern pike U lineage genes reside on chromosome 10 here defined as Eslu-UAA through Eslu-UFA (Additional files 2-4). Based on phylogeny, data indicate that there were three original genes where each of the three genes have duplicated into Eslu-UAA and Eslu-UBA, Eslu-UCA and Eslu-UDA and Eslu-UEA and Eslu-UFA (Fig.2, Additional file 3). Eslu-UCA is only a partial sequence and may be a pseudogene. The polymorphic content of these genes remains undefined, but there is one EST and one TSA matching the Eslu-UAA/UBA genes (Genbank GH268323 and TSA GATF010284) and one EST originating from one of the Eslu-UEA or Eslu-UFA loci (EV373903). A seventh pike U lineage gene is located on an unplaced scaffold (Eslu-UGA, NW_022995044), and is a duplicate of the Eslu-UDA gene.
As previous studies have shown that the three extracellular alpha domains of U lineage sequences display different evolutionary patterns [8, 15-17], we made phylogenetic trees of both entire mature extracellular amino acid sequences as well as trees of individual alpha 1, alpha 2 and alpha 3 domain sequences to identify orthology (Fig.2, Additional file 3). Phylogenies of alpha 1 domain sequences shared by distantly related teleost species, show that also non-classical genes share these lineages (Fig.2)[8, 15-17]. Non-classical UEA gene sequences share the alpha 1 domain lineage Va, UGA gene sequences share the alpha 1 domain lineage II and most UCA and UDA gene sequences cluster with the alpha 1 domain lineage I. Also Northern pike U lineage genes share alpha 1 domain lineages with other teleosts. Eslu-UAA and Eslu-UBA alpha 1 domain sequences cluster with alpha 1 domain lineages Vb, Eslu-UDA clusters with lineage IIIa and Eslu-UEA and Eslu-UFA cluster with lineage IIIb sequences. In the alpha 2 domain analysis, all Northern pike sequences cluster together, although the bootstrap value is only 31 percent (Additional file 3). A similar clustering is also seen for all Northern pike alpha 3 domain sequences, with a higher bootstrap value.
Only one salmonid U lineage gene, UHA, resides outside of the two duplicated MHCIa and MHCIb regions (Table 1, Additional file 2&4). Sequences from this gene display strongly supported clusters in all phylogenies. Northern pike and sockeye salmon did not display any UHA gene sequences, but the remaining salmonids all have UHA lineage genes on one homeolog of Northern pike chr.16 (Additional file 1). Atlantic salmon and charr have regionally duplicated UHA lineage genes where at least the duplicate Sasa-UHA2 gene is a pseudogene (Additional file 4). Although the two charr UHA gene sequences are incomplete, there is an expressed UHA1/2-like sequence in Salvelinus malma (Genbank AYG86905.1), suggesting at least one of these UHA loci are functional also in charr. Overall, UHA gene sequences are very different from other U lineage sequences (Fig.2, Additional file 3), suggesting an ancient origin. However, we have not been able to find orthologs in any other teleost, so these genes may have evolved fast in salmonids.
Table 1. Number of MHCI lineage genes in salmonids and Northern pike
|
U
|
Z
|
L
|
S
|
H
|
P
|
Northern pike (Eslu)
|
7 (2)
|
5
|
4
|
1
|
1
|
-
|
Atlantic salmon (Sasa)
|
9 (4)
|
7
|
13 (6)
|
6 (3)
|
2 (1)
|
(1)
|
Brown trout (Satr)
|
8 (1)
|
7 (1)
|
25 (8)
|
3 (2)
|
2 (1)
|
(1)
|
Rainbow trout (Onmy)
|
7 (1)
|
6 (1)
|
14 (3)
|
2 (1)
|
2
|
(1)
|
Chinook salmon (Onts)
|
8 (4)
|
7*
|
16 (8)
|
2 (1)
|
2
|
(1)
|
Coho salmon (Onki)
|
6 (1)
|
5
|
14 (3)
|
2 (1)
|
2
|
(1)
|
Sockeye salmon (Onne)
|
5 (1)
|
5
|
14 (8)
|
2 (1)
|
2
|
(1)
|
Charr (Saal)
|
11* (1)
|
4
|
13 (10)
|
2 (1)
|
2
|
(1)
|
Number of MHCI lineage genes in various salmonids. Partial genes in addition to pseudogenes are given in parenthesis. Star denotes species where there are additional genes on unplaced scaffolds that are most likely assembly artefacts (see Additional file 4).
Only Atlantic salmon has a duplicate annotated U lineage gene in the MHCIa region denoted ULA, a gene that lacks the transmembrane domain (Additional file 2-4)[26]. We know that the UBA loci from Atlantic salmon, rainbow trout, brown trout and sockeye salmon are classical MHCI loci with considerable polymorphism [15, 17, 27-30]. There are currently 48 Atlantic salmon and rainbow trout UBA alleles registered in the IPD-MHC database [31] while 31 and 34 alleles have been defined in brown trout and sockeye salmon. The polymorphic content of UBA loci from coho, chinook and charr remains undetermined.
MHC class I gene richness is most profound in the salmonid MHCIb regions, with brown trout and Salvelinus having four U lineage genes surrounding the TAPBPb and PSMB8b genes (Additional file 2&4). Rainbow trout has three annotated U lineage genes in this region with an additional fourth Onmy-UFA pseudogene reported previously [7]. Previous studies have shown that rainbow trout and Atlantic salmon MHCIb regions contain non-classical MHC genes, displaying low polymorphism and more restricted expression patterns than their classical UBA counterparts [6, 7]. Sockeye, chinook and coho salmon all have two annotated U lineage genes in this region. This region then resembles the three original MHCI genes found on Northern pike chromosome chr.10.
Salvelinus has two additional unplaced scaffolds containing U lineage genes, all clustering with alpha 1 domain lineage I sequences (UXA, UZA1/2; Fig.2, Additional files 3-4). Their origin and location is unknown, but as the sequenced genome does not originate from a double haploid animal, they could be allelic variants of non-classical U lineage genes or assembly artefacts. Chinook salmon also has two additional U lineage genes residing on unplaced scaffolds here denoted Onts-U1 and Onts-U2. Onts-U1 is a partial gene sequence with sequence identity to Onts-UCA. Onts-U2 is a duplicate of the Onts-UEA gene sequence, and most likely represents an assembly artefact as the chinook salmon genome originates from a double haploid.
MHCIb regions also contain a unique UGA gene that is present in all analysed salmonids, located in between the SLC39A7b and RING2Ab genes (Additional file 2). Chinook salmon lacks an annotated UGA gene, although there are expressed chinook sequences supporting a functional UGA locus (e.g. GGDU01219126.1). The gene denoted UGA in Northern pike (Additional file 4) is not an ortholog to the salmonid UGA genes, so UGA is a gene duplication that translocated to the MHCIb region after salmonids split from pike. UGA lineage sequences show strongly supported clusters in alpha 1 and alpha 2 domain phylogenies, while the alpha 3 domain sequences are more dispersed (Additional file 3).
Based on location and phylogenetic clustering, the UEA gene existed in a primordial salmonid, but was then lost in Atlantic and sockeye salmon (Fig.2, Additional files 2-4). All UEA alpha domain phylogenies show strongly supported clusters. Salmonid UCA and UDA gene sequences also form strongly supported clusters in the alpha 1 and alpha 2 domain sequence phylogenies, suggesting they originate from a salmonid ancestor. Duplications from a single primordial UC/DA gene to multiple UCA and UDA genes seem to have occurred individually in the Oncorhynchus and Salmo lineages based on the alpha 2 domain phylogenies, as well as in each individual species (Fig.2, Additional file 3). The gene sequences defined as UFA in charr and brown trout do not cluster in phylogenies, so they represent within species gene duplications. However, the UFA pseudogene previously reported in rainbow trout, clusters with the UFA sequence from brown trout (data not shown), so this gene originated in a salmonid ancestor.
We have previously shown that the Atlantic salmon MHCIb region contains haplotypes with varying number of non-classical Sasa-UCA and Sasa-UDA genes [32]. One sequenced BAC had 30 Kb separating the UDA and UCA genes while another haplotype only had one UCA pseudogene (Genbank FJ969490). The Atlantic salmon genome contains an additional haplotype with 8 Mb separating the Sasa-UCA pseudogene from two additional UCA and UDA genes. Brown trout, the closest relative to Atlantic salmon, does not show this UCA/UDA gene duplication 10Mb upstream suggesting this may be Atlantic salmon specific. With the exception of Salvelinus, salmonids have a U lineage gene located approximately 10 Mb downstream of their UCA genes, a gene we here denoted UMA. All UMA genes contain internal stop codons or are partial gene sequences, suggesting they are nonfunctional. These regions do not contain the same genes as those surrounding the Atlantic salmon genome Sasa-UDA gene 8 Mb upstream of the major MHCIb region (Additional file 2). Nor do these regions resemble the UIA region found in Medaka, where there is approximately 14 Mb between the classical UAA/UBA genes and a UIA gene [16]. Thus, the salmonid UMA gene is a unique gene duplication that occurred early in the salmonid lineage.
Z lineage evolution
In addition to the six U lineage genes, Northern pike also has five Z lineage genes on chr.10 (Table 1, Additional files 2, 4). In comparison, the salmonid MHCIa and Ib regions all have from two to four Z lineage genes per region. Due to the unique position of the Salmo ZAA gene residing in the MHCIa region, we chose to reserve this ZAA name to reflect a location in between the VHSVa induced protein and ATF6a. The remaining sequences are named ZBA through ZDA regardless of phylogenetic clustering. Of pike and salmonid Z lineage genes, only Onmy-ZDAb and Satr-ZDAb are defined as pseudogenes.
Phylogenetic trees of the entire mature extracellular amino acid Z lineage sequences display two well-supported clades, each with two sub-clades. Surprisingly, all Northern pike Z lineage gene sequences cluster together with a strong bootstrap support, suggesting they are within species gene duplications (Fig.3). Based on the two to four Z lineage gene duplicates identified in salmonid MHCIa and MHCIb regions (Additional file 2), one would have expected some orthology between pike and salmonid gene sequences.
The first clade (Fig.3, clade 1) consists of MHCIa region sequences, while the second clade (Fig.3, clade 2) consists of MHCIb region sequences, suggesting the Z lineage genes evolved independently in the MHCIa and MHCIb regions (Fig.3, Additional file 2). Clade 1 gene sequences are further divided into two subclades, one containing Oncorhynchus gene sequences (subclade 1.1) and the other with Salmo gene sequences (subclade 1.2). Subclade 1.1 suggests that one original Oncorhynchus gene expanded to the three Onmy-ZBAa, Onmy-ZCAa and Onmy-ZDAa genes present in this region today where Onmy-ZDAa is a more recent duplicate of Onmy-ZBAa. Although not as strongly supported, Salmo Z lineage Ia genes within subclade 1.2 are also within region duplicates of one common ancestor. Here, the evolutionary process has repeated itself with the Sasa-ZBAa and Sasa-ZDAa genes are duplicates that split from Sasa-ZCAa. The unique Salmo ZAAa gene is also a more recent duplication of the Sasa-ZBA or Sasa-ZDA gene. Charr MHCIa Z lineage sequences show a dual clustering, with the Saal-ZBAa sequence clustering with Oncorhynchus while the Saal-ZCAa sequence clusters with Salmo ZCAa sequences.
Sequences originating from the MHCIb region split into two strongly supported subclusters (Fig.3, subclades 2.1 and 2.2) and in this region Oncorhynchus and Salmo Z lineage genes share an evolutionary history. The subclade 2.1 contains ZCAb sequences while subclade 2.2 contains ZBAb sequences. The only exception is Atlantic salmon sequences where Sasa-ZBAb and Sasa-ZCAb represents a more recent gene duplication (Fig.3). Sasa-ZBAb is the only soluble Z lineage molecule, lacking the transmembrane region [32].
Evolution of L lineage genes
Northern pike has four L lineage genes dispersed on chr.2, 15 and 20 where salmonids have orthologs to the pike genes on chr.2 and chr.20 based on phylogeny and regional orthology (Table 1, Fig.4, Additional file 1 & 4). Nomenclature is based on phylogenetic clustering with previously identified L lineage gene sequences [8, 11], as exemplified by the LGA gene sequences, which form a strongly supported phylogenetic cluster (Fig.5). L lineage genes have exploded in salmonids ranging from 13 genes in charr to 25 genes in brown trout. Most charr L lineage genes are defined as pseudo or partial genes, but this needs verification by expressed sequences. The remaining species have 6-17 bona fide genes.
The previously published rainbow trout Onmy-LAA gene [11], is also found in salmonid species, whereas this gene was lost in Northern pike (Additional file 2). Fragments of this gene is found on Atlantic salmon homeolog chromosomes 13 and 15 and ortholog regions in the other salmonids (Additional file 1, 2, 4), flanked by ANKS1A and SARG genes. Only rainbow trout has a bona fide LAA gene, where the LAA genes from the other species are partial or pseudogenes. The Onmy-LAA sequence is quite distant from the remaining L lineage sequences and forms the base of the phylogenetic tree (Fig.5).
Another older L lineage gene previously described in Atlantic salmon, LIA, [8] has orthologs in all species including Northern pike (Fig.5, Additional file 1, 2, 4). LIA gene sequences are also quite old forming a strongly supported branch quite basal in the phylogenetic tree. Only the charr LIA gene is a pseudogene with an internal stop codon. This LIA gene is flanked by VWA8 and F5 in all species. Although salmonid LIA regions are ortholog to Northern pike chr.16, the pike LIA gene resides on chr.20, suggesting a translocation in a salmonid ancestor. The salmonid homeolog chromosome also hold L lineage genes in most species represented by the LLA and LJA genes, many being pseudogenes. Although not strongly supported, the Northern pike L lineage region on chr.15, here called Eslu-LPA clusters with the LIA gene sequences and is most likely a gene duplication specific for Northern pike. A similar unique gene duplication is seen for the Atlantic salmon Sasa-LKA gene with no ortholog region in other salmonids or Northern pike.
Salmonid LDA gene sequences represent another strongly supported clade, but also clusters with the remaining gene sequences from Northern pike and salmonids (Fig.5, Additional file 2, 4). Salmonid LDA genes reside on an ortholog of pike chr.9 flanked by IRAK1BP1 and IL17RD genes (Fig.4, Additional file 1). These genes are found on pike chr.17, without traces of the LDA gene (data not shown). Only located on one of the salmonid homeologs, the LDA gene most likely translocated to these salmonid regions after the 4WGD event.
Salmonid orthologs to the Northern pike chr.2 genes here defined as Eslu-LBA and Eslu-LCA have expanded a lot with brown trout being the most extreme with twelve L lineage genes on chr.12 (Fig.4 & 5, Additional file 4). FAH and CTXND1/ ARNT2 genes, flanking the two pike L lineage genes on chromosome 2, are also present in ortholog regions represented by Atlantic salmon chr.11 and chr.26 [23]. Most likely due to regional complexity, clustering genes from coho, chinook, sockeye and charr all reside on unplaced scaffolds. Gene expansions have occurred locally after the 4WGD. For instance, Atlantic salmon chr.11 with the two duplicate Sasa-LCA genes is a homeolog of rainbow trout chr.26 containing nine L lineage genes. Brown trout chr.12 and an unplaced chinook scaffold both display a similar L gene expansion with twelve and eight L lineage genes respectively. Most of the chinook genes on this unplaced scaffold are pseudogenes with internal stop codons while most of those on rainbow trout chr.26 are bona fide genes. Phylogenetically, also LEA/LMA genes as well as LFA genes reside in strongly supported clusters suggesting a shared evolutionary history for these sequence clades.
Evolution of S, H and P lineage genes
S lineage genes have previously been described in many teleosts [8]. This gene is also present in Northern pike on chr.1 (Table 1, Additional file 2 and 3). Most salmonids have duplicate S lineage genes on both homeologs, where the SBA gene has been silenced in a primordial salmonid (Table 1, Additional files 2 &4). Atlantic salmon has six S lineage genes residing on unplaced scaffolds where three of these six genes are partial gene sequences and may be pseudogenes. In a previous study, we sequenced a bacterial artificial chromosome (BAC) clone originating from chr.9, which contained one SAA gene in addition to the flanking VWA5 and AKT2 genes [32]. We did not find other BACs positive for the SAA probe, so potentially there are individual differences in the number of SAA genes in Atlantic salmon.
A fifth MHCI lineage described in teleosts is the P lineage, which has expanded to 24 genes in pufferfish [8]. Remnants of this P lineage is lacking in Northern pike while all salmonid P lineage genes have been silenced (Table 1, Additional file 2 & 4). Only one homeolog has remnants of this P lineage gene, suggesting it has been deleted in the duplicated region. The PAA gene is surrounded by PPP1R12A_like and Immunoglobulin light chain (Ig-L) genes. We previously found an IgL gene linked to a UIA gene in Medaka and to Z lineage genes in stickleback [8]. IgL genes are also found linked to the shark MHC region, suggesting it was present in the primordial MHC region [33].
We recently found a sixth MHC class I lineage in teleosts which we denoted the H lineage [10]. One HAA lineage gene is present in Northern pike and all salmonids studied here have HAA and HBA genes on homeologs to this pike HAA gene on chr.3 (Table 1, Additional files 1, 2, 4). All regions have TOX and PPP1R7 genes flanking the H lineage gene. The HAA genes seem functional in all species, while the HBA gene is a pseudogene at least in Salmo species. In coho and chinook, there are expressed reads matching the HBA gene (GGDU01537164.1, GDQG01022515.1), suggesting the homeolog HBA gene has retained a function in some species. The fact that H lineage sequences lack the alpha 3 domain, and has a cytoplasmic domain highly conserved also between distant teleost species, suggests that teleost MHCI may have a broader functional diversity than previously envisioned. Mammalian equivalents with such a molecular structure are the ULBP/ RAET genes, which interact with the NKG2D receptor upon stress or infection [34]. If the salmonid H lineage molecules have a similar function remains to be determined.