Data source
In the Canis MitoSNP tool, the information from five bioinformatic tools was used. The information about genomic positions for each gene, genomic sequences for both strands, gene length, amino acid length, and amino acid positions in protein was downloaded from the GenBank database. Based on this data, the codons and positions of nucleotides in the codons were determined. For tRNA molecules, the secondary structure was predicted using the tRNAscan tool (Lowe & Chan, 2016) and confirmed with the annotation proposed by Pütz et al. (2007). The secondary structure of 12S and 16S rRNA molecules was predicted with the use of the RNAfold tool (Gruber et al., 2008) by evaluating minimum free energy prediction (FEP) at 37⁰C and by evaluating thermodynamic ensemble prediction (TEP) at the same temperature. The features of proteins were determined using SOPMA for secondary structure prediction (Combet et al., 2000), Deep TMHMM for transmembrane domain structure prediction (Hallgren et al., 2022), and ConSurf for evaluation of functional and structural regions, buried or exposed residues, and assessment of the conservation grade (Ashkenazy et al., 2010, 2016). The reference sequences of human and dog mtDNA genes and proteins were used in each tool. The collected data was organised in the Excel database along with the information obtained from GenBank.
Alignment and comparison of human and dog mtDNA genes
The reference sequences of human and dog mitochondrial genomes were obtained from GenBank (NC_012920.1 and NC_002008.4). Each of the 37 human and dog mtDNA genes were separately aligned and compared with the use of the Unipro uGene tool (v.37.0) (Okonechnikov et al., 2012) CLUSTAL W algorithm (gap opening penalty = 15.00; gap extension penalty = 6.66; weight matrix = IUB; iteration type = NONE; max iterations = 3;). Especially in the case of the 12S rRNA and 16S rRNA molecules, alignments with the highest homology rate were chosen. Protein-coding genes were translated to amino acid sequences and compared with protein reference sequences. The alignment of each human and dog amino acid sequence was performed with the use of the CLUSTAL W algorithm as well (gap opening penalty = 10.00; gap extension penalty = 0.20; weight matrix = BLOSUM; iteration type = NONE; max iterations = 3; gap separation distance = 4).
Alignment and comparison of the non-coding regions of the human and dog mtDNA genomes
The information on the localisation of human non-coding regions, i.e. MT-7SDNA, MT-HV1, MT-HV2, MT-HV3, MT-OHR, MT-CSB1, MT-CSB2, MT-CSB3, MT-TFX, MT-TFY, MT-4H, MT-3H, MT-LSP, MT-TFL, MT-TFH, MT-TAS1, MT-TAS2, MT-5, and MT-3L, was obtained from the https://www.mitomap.org/MITOMAP database (Lott et al., 2013). The alignment of the above-mentioned non-coding sequences with the canine D-loop sequence or the mtDNA genome was done with the use of the CLUSTAL W algorithm (gap opening penalty = 15.00; gap extension penalty = 6.66; weight matrix = IUB; iteration type = NONE; max iterations = 3;). Although there was no information about the location of these non-coding regions in the canine reference mtDNA genome, we compared the sequences of these two organisms and indicated the localisations of homological positions in the dog genome taking into account the H and L strand and the placement of other genes and regions in the human mtDNA genome (Supplementary Table). Browsing the human genome, the user can find the exact positions for non-coding regions, but these regions were not indicated in the canine mtDNA genome, as they were not determined experimentally.
Annotation of human and dog positions in mRNA, tRNA, and rRNA genes
Each gene position in the human mtDNA sequence was determined according to the numbering in the revised Cambridge Reference Sequence (rCRS, NC_012920.1), and the numbering in the canine mtDNA genes was determined according to the positions in the reference sequence. In the case of tRNA genes, the numbering and positions of structural domains were determined according to the Mamit-tRNA database (Pütz et al., 2007). If there was a gap in the human or dog nucleotide sequence or the amino acid sequence, this position was omitted in the numbering. Each genomic position which corresponded to two genes or regions, i.e. the ND4/ND4L region, was marked with an asterisk (*) and described separately for each gene. All the positions in the database have their own ID number, which is non-informative for users as it is only for the purpose of record ordering in the database.
Canis Snp Finder Content
Canis MitoSNP, the canine mitochondrial DNA database, is composed of four separate pages: Canis SNP finder, tRNA properties, mRNA properties, and Protein properties. The main functionality of the website is to facilitate finding the information about the exact position(s) in the whole genome and/or specific genes of human and canine mtDNA. The user can find chosen positions in either the human or canine mitochondrial genome (Supplementary Fig. 1a).
Additionally, the browser demonstrates whether the position/s is/are identical or different in the other genome. If the user wishes to find information about some positions in a specific gene, there is such a possibility in point 2. After choosing option 2 “SNP position in the specific gene”, the user has a possibility to choose a gene of interest (Supplementary Fig. 1a). Depending on the organism selected in point 1, the user will be presented a list of genes and regions in the genome. In point 3, the user may choose either one position or several positions separated by commas. In the case of selection of consecutive positions, the user should fill the field with the first and the last position of the range separated by the hyphen (Supplementary Fig. 1a). If the user wishes to see all the results for the chosen gene, the option “Show all” must be selected.
After clicking the “Search” button, the user will be presented the Results table (Supplementary Fig. 1b). The number of columns in the Results table depends on the region/gene where the position is localised. For all (tRNA-coding, rRNA-coding, and protein-coding) positions, the following columns are presented: ID (non-informative for the user) genome position, dog mtDNA 5’-3’ strand, dog mtDNA 3’-5’ strand, type/region, gene, gene position, human mtDNA position, human mtDNA ref. seq., identical/different, and human gene/region (Supplementary Fig. 1b). The tRNA-SCAN column is shown for tRNA-coding positions, whereas the secondary structure of FEP at 37 degrees and the secondary structure of TEP at 37 degrees columns are presented for rRNA-coding positions. The following columns: codon, position in codon or region, amino acid (aa) position in protein, amino acid 1-letter, amino acid 3-letter, SOPMA, TMHMM, conservation grade, buried or exposed residue, and functional or structural residue are shown for protein-coding positions. Therefore, the user is able to obtain complete information about the localisation of a position in the genome, in a specific gene, and in the secondary structure of a protein at the same time. In addition, the tool presents data for both genes if the position is part of two separate genes. The Results table may be easily downloaded as an .xlsx file on the user’s computer upon clicking on “download xlsx file”.
tRNA properties
The “tRNA properties” webpage is useful for users analysing changes in the secondary structure of human and canine mitochondrial tRNA genes. There are 22 tRNA genes encoded in mammalian mitochondrial genomes (Kim et al., 1998), from which eight are encoded on the complement (heavy) strand. The users are informed about the positions of tRNA-coding genes in both genomes as well as the length of these genes. The analysis of the homology between the tRNA genes of these two organisms revealed how many transitions, transversions, and gaps differentiate these genes. The user may compare the percentage of homology between human and canine mt-tRNA genes (Supplementary Fig. 2a). The highest homology rate was observed for the MTTM gene (97%), whereas the lowest score was observed for ex aequo the MTTT and MTTQ genes (65%).
Upon clicking on the highlighted gene name of interest, the user is able to see the detailed secondary structure of the canine and human tRNA gene as well as the detailed description of each position of these two tRNA-coding genes (Supplementary Fig. 2b). The data in the table can be downloaded upon clicking on the “download xlxs” button.
mRNA properties
As in the case of tRNA properties, the “mRNA properties” website allows the user to perform a comparative analysis of protein-coding genes, their positions on the genomes, their length, and the number of differences between the human and canine genomes. The highest homology rate observed for protein-coding genes was 75% (MT-CO3), whereas the lowest rate was observed for ex aequo MT-ND6 and MT-ND2 (64%). For each gene, detailed information on the identical and different positions in human and canine protein-coding genes are available upon clicking on the highlighted gene name (Supplementary Fig. 3a). The user is able to compare both genes encoded on the human and canine genomes and verify which amino acid is encoded by each position without the necessity of translation the sequence in another tool (Supplementary Fig. 3b).
Protein properties
The “protein properties” website facilitates the comparison of amino acids in proteins encoded on the mitochondrial genome. The canine and human mtDNA genomes encode 13 protein-coding essential genes of the respiratory chain: seven subunits of complex I (ND1, ND2, ND3, ND4, ND4L, ND5, ND6), one subunit of complex III (CYTB), three subunits of cytochrome c oxidase (COX1, COX2, COX3), and two subunits of ATP synthase (ATP6 and ATP8) (Tkaczyk-Wlizło et al., 2022). The amino acid sequence and composition may vary among these two described species; therefore, the user is informed about the identical positions in both proteins and the differences between them. The highest homology rate of amino acid sequences was observed in the case of the MT-CO1 protein (92%), whereas the lowest homology was noted in the case of the MT-ATP8 gene (58%) (Supplementary Fig. 4a). Upon clicking on “protein gene”, detailed information will be shown to the user (Supplementary Fig. 4b). The amino acids were classified according to Dagan et al., (2002). The classification by volume and polarity was made by dividing the amino acids into six categories: special (C), neutral and small (A, G, P, S, T), polar and relatively small (N, D, Q, E), polar and relatively large (R, H, K), nonpolar and relatively small (I, L, M, V), and nonpolar and relatively large (F, W, Y) (Dagan et al., 2002). Based on the properties, we indicated conservative and non-conservative differences between human and canine proteins. The user is able to download the whole table upon clicking on the download xlsx button.