We found varying levels of conservation among the 14 proteins examined in this study (Fig. 1). Because we used the Deutsch genome as a standardized reference, we were able to make a direct comparison of conservation across all 14 proteins in this dataset. The highest protein similarity within our sample of 167 R. microplus ticks was observed in VDAC, which displayed no aa replacements in any of the sampled populations (Fig. 1). The DNA alignment for VDAC reveals 20 single nucleotide polymorphisms (SNPs) within R. microplus but all are synonymous, equating to a KA/KS ratio of zero (Additional File 5). Four other proteins were highly conserved (RmAQP1, VgR, RmS-1, and Sub), with protein similarity values > 0.98 (Fig. 1). The majority of SNPs in these four genes were synonymous and yielded KA/KS estimates of 0.09, 0.07, 0.09, and 0.16, respectively. Intermediate levels of protein conservation were found in seven proteins (COX3, RmAQP2, Chit, GST, RmS-11, RmS-5, and Vora, Table 1) and ranged from 0.904–0.979; the lowest levels were found in Bm86 (0.889) and MP4 (0.802). Although we did not calculate protein similarity values for other Rhipicephalus species due to low sample size, we note that protein conservation appears to decrease when other Rhipicephalus species are included in the comparison (Additional File 6). As an extreme example, Bm86 exhibited twice the number of aa replacements across seven Rhipicephalus species as compared to within R. microplus alone.
The small number of aa changes in highly conserved proteins were typically spaced far apart. As observed in Fig. 2, more variable proteins exhibited evidence of mutational hotspots, with aa changes clustered in multiple short segments of the protein. Despite the high density of changes in some of these proteins, short stretches of highly conserved peptides can also be found in each protein. One caveat for identifying conserved peptide regions is that our sampling design supports the detection of rare aa replacements in North America, but not in other locations due to smaller sample sizes. Another caveat is that the true amount of protein variation is probably underestimated in our dataset, because: 1) not all aa positions were queried due to the location of priming sites inside exons, 2) not all exons amplified equally well, and 3) tick populations from certain regions (Brazil and Pakistan) tended to fail more often than ticks from North America. However, 60 of the 88 exons had success rates > 90% across our R. microplus samples (Additional File 4) and these provide high confidence for estimating conservation at these exons, especially in North America.
The predicted 3D structural model for R. microplus VDAC (74) is shown in Fig. 3A. In addition to being completely conserved in R. microplus, the VDAC protein was conserved in all 12 R. annulatus from Texas, which were identical to the R. microplus allele (Additional File 6). In the DNA sequences, R. annulatus individuals were variable at the same 20 nucleotide positions as R. microplus, and all SNPs were synonymous (Additional File 5). In contrast, our samples of R. appendiculatus displayed nine aa replacements in VDAC, and two publicly available R. sanguineus sequences (XP_037498097.1 and UFQ89927.1) contained eight replacements, five of which were shared with R. appendiculatus (Additional File 6). Interestingly, only one of these replacements (L136V) was located on an external surface loop (Additional File 7A) predicted in a 3D structural model for VDAC (74). The other aa changes from R. appendiculatus and R. sanguineus were located within the transmembrane barrel and internal loops that extend into the cytoplasm.
Aquaporin-1 was the second most conserved protein across R. microplus from the Americas and Pakistan (Fig. 1, Fig. 3B). It is important to note that ticks carry multiple genes in the aquaporin family (18 reported from Ixodes scapularis) (65) and their annotation has not been standardized across tick species. For instance, RmAQP1 was shown to be a homolog of IsAQP9 in a recent gene tree reconstruction (65). In our dataset, the RmAQP1 protein contained just three changes (I264V, L286I, and T294V) across the full-length protein (316 aa) within R. microplus (Additional File 6). These changes were found in a small number of ticks in lab colonies from Brazil (IPV, POA, and Santa Luiza) and three individual ticks from Texas (Cameron County); no ticks from Mexico, Colombia, or Puerto Rico carried these changes. Ticks from Pakistan all failed to amplify the exon assay (AQP1_E01502) containing these three residues (Additional File 4), as did all 12 R. annulatus samples (data not shown). In the publicly available R. annulatus genome sequence (WOVY00000000.1; Bioproject PRJNA593711), three replacements occur in AQP1 (T223S, M266V, and T294V). The T223S replacement is the most important because it is located in an external loop of the 3D protein model (Additional File 7B). Our R. appendiculatus samples contained 10 other replacements and two GenBank R. sanguineus AQP9 sequences (XP_037510823.1 and KAH7963214.1) contained 17 replacements, but only one was at the same aa position as R. microplus (I264S).
The third most conserved locus was VgR (Fig. 1, Fig. 3C), with the caveat that we assayed only 434 positions of 1799 in the full-length protein, and three of the eight exon assays had lower success rates (80–86%) (Additional File 4). Because VgR is a large protein, we focused on two ligand-binding domains (LBDs) encoded by exons 2–3 (105 aa) and 14–20 (329 aa). We found three aa replacements across R. microplus from the Americas, which represented one inter-class change (N19T) and two intra-class changes (R1193H and T1216S). None of the three replacements were found in the five ticks from Pakistan, which carried changes at three other positions (T56I, P1167Q, and A1184G) for a total of six VgR replacements in our overall R. microplus dataset. R1193H and T1216S are probably linked because they co-occurred in all 60 R. microplus individuals from the Americas that carried them, as well as the 12 R. annulatus samples (Additional File 6). Furthermore, they were found in all North and South American countries that we sampled (Additional File 1). Every R. annulatus tick from Texas carried four replacements observed in R. microplus (N19T, R1193H, A1184V, and T1216S), as well as two others (S63N and A921S). The R. appendiculatus ticks in our study amplified at only two of eight assays (exons 16 and 19), but the data from exon 16 alone identified 11 polymorphic aa residues. A similar level of variation (13 replacements in a 187 aa peptide) is also evident in the partial VgR of R. appendiculatus Muguga strain from Kenya (ATP60167.1) (Additional File 6). Likewise, R. sanguineus (XP_037521270.1) contained 135 replacements across the entire protein, 45 of which overlap with the two LBDs that we screened in R. microplus. The T1216S replacement in R. sanguineus is shared with R. microplus and R. annulatus, and R. sanguineus had an intra-class replacement at position 19 (N19D).
RmS-1 was well conserved across the R. microplus populations we sampled, with a protein similarity of 0.985 across the 336 aa positions that we assayed of 380 in the full-length protein (Fig. 1). Five aa replacements (F101L, E140A, E306K, M337I, and I354V) were observed, all of which occurred in ticks from North America; the E140A change was also found in ticks from South America and Pakistan (Additional File 6). The first four are inter-class changes and I354V is an intra-class replacement. Two replacements (F101L and E306K) were found in surface loops of the protein (Fig. 3D). M337I and I354V appear to be linked, because they both were present in every tick that carried them (seven Texas locations). The R. annulatus ticks sampled in Texas had replacements at four other positions (K52E, E275K, I280M, and L286M), and none carried the E140A replacement that was common in R. microplus. Seventeen aa replacements were present in R. appendiculatus ticks in three of the four assays. Assay A0101 failed in our 10 samples of R. appendiculatus, but a full-length sequence from GenBank (AAK61375.1) shows > 20 replacements in this section of the protein alone (positions 1-100) (Additional File 6). Likewise, R. sanguineus (XP_037521270.1) contains 46 replacements across the entire protein. The other two members of the serpin family that we investigated (RmS-5 and RmS-11) showed much more variation in R. microplus; RmS-5 had 27 replacements in the 360 positions that we assayed (similarity = 0.925) and RmS-11 had 19 replacements in 340 assayed positions (similarity = 0.944).
Subolesin was highly conserved in R. microplus from the Americas (Fig. 1, Fig. 3E) with only a single aa change (I41V) in the 52 positions that we assayed (of 162 in the full-length protein). Our R. microplus samples from Colombia and Pakistan are missing data at the assay that covers position 41, due to failed amplification. The I41V replacement is an intra-class change (isoleucine and valine are both aliphatic acids) and it is possible that the valine replacement would not significantly impact IgG reactivity, but this remains unknown. We found this replacement to be rare but widespread in Texas and Mexico (states of Tamaulipas, Zacatecas, and Campeche) yet absent in our samples from southern Brazil. It was also present in 10 of 12 R. annulatus from Texas. We were unable to obtain data for positions 53–161 in our R. microplus samples because two AmpSeq assays failed to amplify (Fig. 2), despite multiple attempts at designing new forward and reverse primer pairs. Our primer sets for Sub did not amplify any of the R. appendiculatus individuals in our sample set. However, R. appendiculatus GenBank accession QKY58555.1 has one inter-class replacement (N62S), and a second sample (ABA62331.1) has one intra-class (H95R) and two inter-class (A90T and P82A) replacements (Additional File 6). An R. sanguineus sequence from GenBank (XP_037520396.1) carries H95R, plus three different replacements (S84C, A80T and H86P).
It is worth noting that RmAQP2 (homolog of IsAQP1 and RsAQP7) stands out as being well conserved in R. microplus from the Americas but not Pakistan (Fig. 3F). A total of seven aa replacements were found in the full-length protein (294 aa) across all of our R. microplus samples, but three of these (R8H, A136T, G175V) were only found in R. microplus from the Americas (Additional File 6). These aa changes were rare; R8H and G175V were found in just one tick each from Mexico and Colombia. The A136T change was also rare, found only in Brazil and Pakistan. Therefore, AQP2 is more conserved in R. microplus from the Americas than the S-1 protein. The other four replacements (V249L, L254I, D275H, and E276G) occurred only in ticks from Pakistan (Additional File 6). This disproportionate number of changes compared to ticks from the Americas is consistent with the long-term spatial and temporal separation of populations from Asia and the Americas. One change, A136T, sits on an extracellular loop in the middle of published vaccine peptide 2 (residues 125–156) (75). Other Rhipicephalus species contained greater variation within RmAQP2, including R. annulatus (eight changes) and R. appendiculatus (12 changes). The RmAQP2 homolog in R. sanguineus is RsAQP7 (XP_037518224.1), which had 21 replacements and one indel.
In all other proteins, we found decreasing levels of conservation within R. microplus, with MP4 being the least conserved (Fig. 1). The Bm86 protein was the second least conserved protein in our samples from the Americas and Pakistan, with 53 replacements (Additional File 7H) in the 476 positions (of 650 total) that we assayed (Fig. 3H). Many segments of the protein show evidence of mutational hotspots with clusters of aa changes (Fig. 2); 70% of the 10-aa sliding windows contain 1–5 replacements. The only highly conserved region occurs at aa positions 400–480. This region is encoded by Bm86 exons 11 and 12, both of which had a high success rate (96%) in R. microplus and yielded data for all ticks from Brazil and Colombia, as well as three of the five ticks from Pakistan. Therefore, this conserved region was assayed with high confidence.
Understanding the specific location of aa replacements is important for evaluating the risk of vaccine escape from short peptide vaccines developed from proteins such as Sub (64), RmAQP2 (75), Chit (76), and Bm86 (56). In our 167 R. microplus samples, four of 12 (33%) published short peptides contained at least one aa replacement (Fig. 3F, 3G, and 3H). When all GenBank entries are included, the number rises to eight of 12 (67%), and each of these four proteins carries at least one aa replacement in at least one short peptide target (Additional File 6). The SBm7462® construct for Bm86 (55) has multiple replacements within each short peptide, although some may be restricted to certain regions of the world.