A 2,197 nucleotide TTEqV1 genome identified in the metagenomic analysis of plasma from a horse was previously the only sequence within the genus Mutorquevirus40. Our analysis suggests that the reported TTEqV1 genome sequence is incomplete and missing a portion of the UTR region including one of the two conserved 15 nt sequences and a GC-rich tract, both highly conserved features in the UTR of TTVs. An ORF with homology to the ORF2 identified in TTEqV2 (including the highly conserved WX7HX3CXCX5H motif) was also observed in our analysis of TTEqV1, but is not annotated on the NCBI entry.
The novel TTEqV2 genome contains several genomic features with varying levels of similarity to those previously described in other TTV genomes. ORF1 and ORF2 have similar size, position, and amino acid motifs to other publicly available TTV sequences. TTEqV2 and TTEqV1 have similar amino acid motifs within ORF1; however, some differ in position and/or sequence. The ORF1 of both genomes contain two RCR motif IIIs, one of which is in a similar position and has an identical amino acid sequence (YGPK), while the other has both a different position and sequence (YMQK in TTEqV1, YMAK in TTEqV2). The Walker-A and B motifs are in a similar position in both genomes but differ in amino acid sequence (KQTNQGKT for Walker-A and VITADE for Walker-B in TTEqV1, GTSQQGKT for Walker-A and LLTTDE for Walker-B in TTEqV2).
Two GC-rich regions, characteristic of TTV genomes, are located within the UTR of TTEqV2. The first is 70 nt with 78.6% GC, while the second is 67 nt with 92.5% GC. These GC-rich regions, which contain long homopolymeric stretches, were likely the reason initial analysis with only metagenomic data failed to generate a complete genome sequence. Assembly of the final genome required a combination of metagenomic, short, and long-read amplicon sequencing. Similarly, when the first human TTVs were sequenced, it was thought to be a linear genome due to difficulty amplifying and sequencing GC-rich regions41.
Transcription regulatory sites identified in TTEqV2, including the Sp1 site, cap site, and polyadenylation signal, are similar to those characterized in other TTV genomes. The Sp1 site and polyadenylation signal exactly match those described in previously characterized TTV genomes, while the Cap site has a single nucleotide difference which is also seen in TTEqV1 (GGGGCAA[T > C]T)4,5. The TATA-box, which is well conserved in most TTV genomes, appears to be either heavily modified or missing from the expected region of TTEqV2. Generally, TTV genomes have a TATA-box that is 13 nt upstream of UTR motif 1 and conforms to the canonical consensus sequence (ATATAA) with slight variations in some cases. The putative atypical TATA-box in TTEqV2 (ACTTAT), determined based on location relative to the conserved motif, has three nucleotide differences compared to the canonical sequence. The TTEqV1 genome does not include UTR motif 1 or the upstream region containing the TATA-box, so the sequence of this region in the other available Mutorqevirus genomes is unknown. However, an identical atypical putative TATA-box is seen in the representative Tettorquevirus genome (KX262893.1), and one with a single base difference (ACTTAA) is seen in the representative Chitorquevirus genome (MF187212.1). Both of these representative genomes are the only publically available species within their genus, so whether this atypical putative TATA box is conserved in other sequences of the genus is unknown. Interestingly, neither of these sequences cluster with TTEqV2 based on the alignment of ORF1 and come from different host species (Tettorquevirus from feline and Chitorquevirus from lemur).
Nucleotide composition analysis revealed that anellovirus ORF1 sequences tend to be adenine rich, with A3 codons favoured in the sequences analyzed. Previous studies made similar observations in anelloviruses42, swine TTV43 and equine influenza virus sequences37. Interestingly, we observed the opposite trend was observed in the associated host species (horse, pig, dog, human and chicken) for all anellovirus genera analyzed, where A3 codons were underrepresented. A previous study suggested that if codon usage bias in a virus is too similar to that of the host, host translation may be impeded, leading to a greater chance of the virus generating a symptomatic response in the host44. The significance of the observation that the TTEqV2 genome has dissimilar codon usage compared to its equine host remains to be determined.
Although TTV has been proposed to be related to many diseases, there are only a few reports supporting the disease-inducing potential of TTV1. Human TTVs have been proposed to play a role in the pathogenesis of certain diseases, such as hepatitis45, hematological disorders46, respiratory diseases47, rheumatic autoimmune disease48. A recent viral metagenomic study identified a novel betatorquevirus species prevalent in pediatric encephalitis/meningoencephalitis cases, but absent in healthy cohorts5.
Torque teno sus viruses (TTSuVs) have been found at a particularly high frequency in healthy swine 49,50. While considered non-pathogenic on their own, there is increasing evidence that TTSuVs may influence the development or outcome of some diseases51. For example, co-infection with porcine circovirus type 2 (PCV2) and the associated porcine circovirus diseases deserve special attention52. Talso TTSuVs have also been partially contributed to inducing porcine reproductive and respiratory syndrome, porcine dermatitis and nephropathy syndrome, and hepatitis53,54. TTSuV2 viremia may be associated with the level of immunocompetence of the animals51. A study with pigs infected with hepatitis E virus has shown a correlation between TTSuV and the increased risk of developing severe hepatitis in animals co-infected with PCV255. A high prevalence of TTSuV1, but not TTSuV2, in pigs suffering from porcine respiratory disease complex has been shown56. Such viruses would likely be considered components of the host microbiota and unable to cause disease directly, but instead available to be engaged in physiological processes and modulate the organism's response to other pathogens1. The relationship between TTV, disease and host immune response is not well understood and therefore the connection between TTEqV2 and the disease observed in the horse, if any even exists, remains to be determined.
In conclusion, this study describes the discovery of a novel anellovirus species which represents the first complete genome within the genus Mutorquevirus. Comparative genomic analysis showed that TTEqV2 shares many conserved features with previously reported TTVs and it has been recognized as a novel species by the ICTV13. This, along with previous studies using similar methods15–18 demonstrates the power of HTS for characterization of unexpected and/or novel viruses in a variety of hosts and sample types .