The AqE gene landscape in genomes of non-tetrapod vertebrates
In a study of the AqE gene distribution among vertebrates, 86 orders of vertebrates not belonging to the Tetrapoda group were analyzed. Whole-Genome Shotgun sequences (WGS) in the NCBI database at the time of the study were presented only for members of 58 orders out of 86 analyzed. As a result of the analysis in non-redundant protein sequences and WGS databases using BLASTp and tBLASTn, respectively, the genomes of 118 species were analyzed (Additional file 1, Fig. 1).
As a reference protein (query), we used the amino acid sequence encoded by the AqE gene of the channel catfish Ictalurus punctatus. The protein of this organism was chosen because it was the first to study this enzyme [16]. In addition, channel catfish AqE has a predicted exon-intron structure (GeneID:100528876).
Overall, nucleotide sequences homologous to AqE were revealed in 101 species (56 orders). In 16 organisms belonging to 2 orders (Rajiformes and Cypriniformes) the gene was not detected. In another species, Hucho hucho (Salmoniformes), the detected coding sequence is a bacterial gene, which apparently contaminated the samples during sequencing. This fact is confirmed by the introns absence of H. hucho gene and the high homology of the H. hucho protein to bacteria Vogesella perlucida AqE (80,24%) while low identity of the H. hucho protein to AqE of I. punctatus (26.97%).
In three species Tachysurus fulvidraco, Amphiprion ocellaris and Acipenser ruthenus (orders Siluriformes, Pomacentridae и Acipenseriformes, respectively) AqE gene was found in two copies. Such a duplication of the gene is the exception rather than the trend, since in the vast majority of the studied organisms there is only one copy of the gene. In detail, in superclass ray-finned fishes (Actinopterygii) sequences homologous to AqE were detected in members of 50 orders. In members of 19 orders of superclass Actinopterygii WGS were absent, so there is no any data about AqE presence in organisms of these taxa. Only in representatives of one order (Cypriniformes) no any homologies to AqE gene were found. Since WGS of 15 species of Cypriniformes were analyzed, this result can be considered reliable and, accordingly/respectively, the AqE gene was lost in this group (Additional file 1, Fig. 1).
In the cartilaginous fishes (Chondrichthyes) class, WGS were available only for representatives of four groups out of eleven Chimaeriformes (subclass Holocephali, chimaeras), Orectolobiformes and Carcharhiniformes (infraclass Selachii, sharks), Rajiformes (superorder Batoidea, rays).
In the studied representatives of chimaeras and sharks AqE gene was detected (Additional file 1, Fig. 1). In Raja erinacia (rays) no any homologies to AqE gene was found. Since this is the only species with WGS from superorder Batoidea, the gene absence in R. erinacia can be considered both as an evidence of AqE loss in all skates (like in tetrapods) and as a species-specific exception. It is possible that the AqE gene was not detected due to the low assembly level (contig) and low genome coverage (26x). In no rank group Cyclostomata, the AqE gene was detected in representatives of both orders of jawless vertebrates, Petromyzontiformes (lampreys) and Myxiniformes (hagfishes).
Structure Variations of the AqE Genes
To better understand the evolution of the AqE gene in vertebrates, the exon-intron structure of the coding sequence (coding DNA sequence, CDS) was analyzed. It was first defined for 50 species. AqE gene CDS of the predominant number of non-tetrapod vertebrates was quite conservative and on average consisted of 11 exons. Due to the different length of the introns, the length of the CDS with introns varied from 3 kbp to 134 kbp (Additional file 2). In organisms of 26 orders (Fig. 1), the AqE gene encoding a full-sized protein AqE was detected. In representatives of another 21 orders, the gene was characterized as hypothetically whole. These species did not have a transcriptome assembly (TSA), therefore, their exon-intron structure was determined exceptionally by homology with the amino acid sequence of I. punctatus. This approach did not allow the detection of exon 1, since its extent is extremely small (apparently 3 aa). For the same reason, we could not detect exon 11 (the last) due to its significant variability in some species pointed above. Nevertheless, we suppose that these organisms have the full-sized AqE gene. Thus, representatives of 47 orders have a full-sized AqE gene with some species-specific variations. For example, in Lamprogrammus exutus (Ophidiiformes), exon 3 duplication was detected. In Xiphophorus maculatus (Cyprinodontiformes) only part of exon 2 was preserved, but exons 7 and 8 were absent. In Cyprinodon variegatus (Cyprinodontiformes), a fusion of exons 3 and 4, as well as exons 5 and 6, was revealed. In Xiphophorus couchianus (Cyprinodontiformes), exons 2 and 6 were only partially present, and exons 8 and 9 were fused. Whereas in the other seven representatives of Cyprinodontiformes, the gene was similar to the "classical" one. In representatives of 8 orders (Ateleopodiformes, Aulopiformes, Lophiiformes, Beryciformes, Lampriformes, Polymixiiformes, Stylephoriformes, Stomiiformes), several fragments homologous to the query amino acid sequence were found in the genomic sequences (Additional file 1). They were mainly localized on short scaffolds from 2033 to 18322 bp. The assemblies of species from these orders had rather low genome coverage (< 25x) and low assembly level (> 50,000 number of scaffolds) (Additional file 3). This could be the reason that some fragments of the AqE gene were absent in the genomic sequences of representatives of these orders. Therefore, we cannot consider these data as evidence of deletions and pseudogenization, and we believe that the AqE gene in these orders is full-sized.
Only short fragments of the AqE gene were also found in the order Salmoniformes. However, in contrast to the previous case, almost all assemblies had a genome coverage of more than 100x and an assembly level up to chromosome (Additional file 3). Seven species was examined, in five of which the AqE gene was represented only by exons 5 and 6, one species had only fragments of exons 2 and 3 and exon 6, and one had AqE which was most likely a bacterial gene that contaminated samples (Additional file 1).
Accordingly, in this order, the AqE gene is pseudogenized (deleted) and, most likely, not functional. The analysis of transcriptome databases (TSA) showed that AqE gene was active in all organisms which had this gene and transcriptome assemblies (Fig. 1). The presence of alterative transcripts was shown for a number of organisms (Additional file 1). No dependence between presence of alternative transcripts and taxonomic position of organisms were found.
Phylogenetic relationships among the AqE genes of vertebrates
To better understand the evolutionary relationships between AqE genes in vertebrates, a phylogenetic analysis was performed based on the maximum likelihood (ML) method, in which 97 identified AqE proteins were included.
Amino acid sequences with a length of less than 50% of the length of query sequences (orders Aulopiformes and Salmoniformes) were excluded from the analysis. In general, the distribution on clades correlated to taxonomic division. Three orders Beryciformes, Pleuronectiformes и Blenniiformes were exception, whose representatives turned to be distribute on different brunches. Nevertheless, taken into consideration the low bootstrap values within the Euteleosteomorpha clade (Additional file 4) and the associated polytomy, the overall picture of phylogenetic relationships remains classical. The dendrogram also demonstrates that the increase in copies in individual species is associated exclusively with intraspecific duplications.