The exchange of genes between viruses and eukaryotes through horizontal gene transfer (HGT) is a key evolutionary driver capable of facilitating host manipulation and viral resistance 2,3,5. Host-derived genes are known to be employed by viruses for replication and cellular control 5,6. This is observed across a diversity of viral lineages which encode cellular-derived informational genes like tRNA synthetases and polymerases, as well as operational genes, such as immune effectors and metabolic enzymes 6–15. These genes counter host immunity, hijack cellular machinery, and circumvent nutritional bottlenecks, making them key resources for adaptation 5,16.
Conversely, viral-derived genes in eukaryotic genomes are frequently perceived as inconsequential remnants of viral interactions, or even discarded as contamination in genomic analyses. However, these genes can be co-opted and supplement or supplant existing cellular components and functions. For example, core proteins such as histones and E2F transcription factors have been replaced by viral proteins in dinoflagellates and fungi, respectively 17,18, while viral structural proteins, fusogens, and proviruses are utilized for communication, cellular fusion, and antiviral defense, in mammals and other eukaryotes 2,3,19–22. The co-option of such viral proteins has been found to coincide with cellular innovation and the radiation of major eukaryotic lineages where these genes serve key functions 23,24.
Accordingly, these transfers have important evolutionary, ecological, and health implications, but we nonetheless lack a general understanding of the mode, tempo, and functional patterns of viral-eukaryotic gene exchange due to a lack of systematic analyses across diverse taxa. To reconcile this, we comprehensively characterized viral-eukaryotic gene transfer in 201 eukaryotic and 108,842 viral taxa by developing a phylogenetic pipeline capable of screening thousands of evolutionary trees for HGT-indicative topologies while accounting for phylogenetic statistics and contamination (Extended Data Fig. 1, 2). These analyses identify 1,333 candidate virus-to-eukaryote and 4,807 eukaryote-to-virus transfers, along with 600 transfers with unknown directionality, affecting 2,841 distinct protein families (Fig. 1a, Supplementary Table 1). Phylogenetically ambiguous or long branching HGTs were considered weakly supported and were excluded in downstream analyses (Fig. 1a, Supplementary Table 1), which, along with limitations in taxon sampling, make these figures a conservative estimate of HGT events.
The resulting HGTs revealed trends regarding the nature of viral-eukaryotic gene exchange. Transfers from eukaryotes to viruses were observed approximately twice as frequently as transfers in the reverse direction (Fig. 1a, b). This imbalance is explained by the higher number of viral recipients compared to donors per eukaryotic taxa (Fig. 1c) and the greater number of genes transferred to each viral recipient relative those received per viral donor (Fig. 1d, e). These data also demonstrate a correlation between gene acquisition and donation (rPearson = 0.50, p < 1x10-18, Fig. 1b), suggesting that viral-eukaryotic gene transfer is reciprocal, likely instigated through specific host-virus interactions as opposed to non-specific (e.g., environmental) uptake, and is biased towards viral acquisition. This may reflect the expanded repertoire of eukaryotic genes compared to their viral counterparts, which would generate greater opportunity for viral gene acquisition during host-interaction.
Identifying the taxonomy of donors and recipients revealed the propensity of certain lineages to participate in HGT. Nucleocytoplasmic large DNA viruses (NCLDV or Nucleocytoviricota, including Phycodnaviridae, Mimiviridae, Iridoviridae, Pithoviridae, Asfarviridae, and Poxviridae) contributed to the majority of genetic exchanges (78%), although lineage-specific associations, such as the acquisition of animal genes by herpes- and poxviruses, were also noted, and highlight the variable host breadth of viral groups (Fig. 1f, g, Extended Data Figure 1c). Amongst eukaryotes, gene exchange was more prevalent in unicellular compared to multicellular organisms, and particularly abundant in unicellular opisthokonts (the protist relatives of animals and fungi), the diverse protist clade known as SAR (Stramenopila, Alveolata, and Rhizaria), and other ecologically important algal groups such as chlorophytes and haptophytes. This included numerous HGTs coinciding with the diversification of SAR and the largest influx of viral genes was detected around the origin of the dinoflagellates (Fig. 1f, g). Elevated gene transfer amongst unicellular eukaryotes may result from more frequent encounters with NCLDV, which are hyper-diverse and abundant in aquatic environments 10, as well as a lack of germline segregation, which likely contributes to the reduced frequency of HGTs observed in animals and plants (Fig. 1g) 25. However, gene exchange was more common amongst invertebrates compared to vertebrate animals, and our methodology likely under-represents viral gene transfer in animals due to the under-estimation of retroviral acquisitions, which are commonly observed throughout animal lineages but whose detection is limited by the lack of host-free retroviral genome assemblies 26.
We also noted eukaryotic species harboring particularly large numbers of viral genes (Fig. 1e, g). These included species previously described to contain substantial viral genomic insertions from phycodnaviruses (Ectocarpus siliculosus and Tetrabaena socialis), phycodnaviruses and asfarviruses (Hyphochytrium catenoides), or multiple poorly classified viruses (Acanthamoeba castellanii), indicating single or few sources (Fig. 1g, Supplementary Table 1) 27–30. Other species also exhibited elevated numbers of viral genes derived from multiple NCLDV sources (Fig. 1e, g). Whether these large multigene acquisitions retain functional roles, such as in anti-viral virophage production 31, or reflect remnants of past infections, is unclear. However, large insertions were not detected at ancestral nodes (Fig. 1g), suggesting that viral integrations are recurrent, affect diverse eukaryotic lineages, and are generally only transiently retained, but provide an opportunity for the longer-term retention and co-option of individual viral genes given adaptive significance and selection for fixation.
To investigate the functional relevance of these HGTs, we examined the transfer direction and functional enrichments of exchanged protein families. Of the 1,859 families exhibiting HGT with known directionality, the majority (93%) underwent unidirectional transfer (Fig. 2a). Dividing this dataset by direction, genes involved in viral acquisitions were generally transferred unidirectionally (92%), whereas a larger proportion of families undergoing virus-to-eukaryote transfer participated in bidirectional exchange (29%) (Fig. 2a), suggesting that some of these exchanges may involve transduction (cell-virus-cell HGT). By moving across the phylogenies of all families exhibiting eukaryotic acquisitions, from viral donors towards the root, we estimated that 30.5% (n = 259) of viral genes acquired by eukaryotes were originally eukaryotic, whereas fewer (8.2%, n = 70) originated in prokaryotes (Extended Data Fig. 3, Supplementary Table 1). The remainder had unclear origins (24.2%, n = 205) or were not attributable to a cellular lineage (37.1%, n = 315), suggesting that these genes are either viral innovations or ancient viral acquisitions sharing deep cellular homology undetectable in our dataset (Extended Data Fig. 3a). These data demonstrate that over evolutionary time, viruses have a capacity to mediate intra-eukaryotic and inter-domain HGT through transduction. This suggests that viruses act as a gene conduit between eukaryotic lineages, as in prokaryotes, where viral transduction is key in ecological adaptation and genome evolution 1,4,32.
Direction of transfer was also associated with distinct functional biases. Eukaryote-to-virus transfers were enriched in functions associated with cellular activity and house-keeping, such as metabolic proteins, E3-ligases, and tRNA synthetases (Fig. 2b, Supplementary Table 1, Supplementary Table 3). The enrichment of metabolic proteins highlights the role of cellular-derived genes in reprogramming host metabolism during infection, which appears to be achieved through both de novo metabolite synthesis pathways and uptake (e.g., metabolic enzymes and/or nutrient transporters), as well as cellular recycling via proteolysis (e.g., proteasomal degradation and autophagy) (Fig. 2a, b, Supplementary Table 1, Supplementary Table 3). Additionally, signalling and stress response proteins are frequently acquired and likely also contribute to regulating host physiology, gene expression, immune responses, and viral processing. The functions of viral-derived genes in eukaryotes are less obvious and have fewer functional associations, but are strongly enriched for proteins functioning in glycosylation and, to a lesser extent, nuclear proteins (Fig. 2a, c, Supplementary Table 1, Supplementary Table 3). Bidirectionally transferred genes are also enriched in metabolic processes, protein modification, and stress response proteins, which represent a subset of functions most often acquired by viruses (Fig. 2d, Supplementary Table 1, Supplementary Table 3). These data show that eukaryote-to-virus and virus-to-eukaryote HGTs both involve functional tendencies which are not equivalent, but reflect the different adaptive contexts of viruses and eukaryotes.
To understand how these genes are used in viral and eukaryotic systems, we first examined the subcellular targets of eukaryote-derived viral proteins to understand where the proteins may operate in host cells. Cellular localizations were predicted using a neural network-based approach (DeepLoc) 33, revealing that most eukaryote-to-virus HGTs likely function in the cytoplasm (n = 909), nucleus (n = 482), mitochondrion (n = 284), and extracellular space (n = 214) (Fig. 3a, Supplementary Table 1). However, relative to all eukaryotic protein families, viral-acquisitions were enriched in cytoplasmic, endoplasmic reticulum (ER), extracellular, and peroxisomal proteins, the last of which suggests functions involving lipid catabolism and oxidation (Fig 3b). Moreover, predicted localizations were generally equivalent between donor and recipient proteins, with variation likely resulting from prediction inconsistencies and viral sequence divergence (Fig 3c, 71% consistent), indicating that genes acquired by viruses tend to function in their original subcellular contexts.
To corroborate the predicted localizations and better understand the impact of these genes on cellular compartments, we conducted localization-based functional enrichments revealing additional cellular processes targeted during infection. Cytoplasmic proteins were largely involved in translation, metabolism, proteolysis, and signaling, whereas nuclear proteins mainly functioned in DNA processing, chromatin organization, cell cycle regulation, and protein modification (Fig 3d, e, Supplementary Table 1, Supplementary Table 4). Endoplasmic reticulum proteins were predominantly associated with lipid metabolism and membrane remodeling (Fig. 3f, Supplementary Table 4). Proteins such as sphingolipid synthesis enzymes contribute to the localization bias, since many function in the ER, were frequently transferred (Supplementary Table 1), and are known to be used by diverse viruses for cellular regulation 16,34,35. Additionally, ER remodeling is important for generating membrane-enclosed viral factories 36. Extracellular proteins acquired by viruses were enriched for functions including carbohydrate metabolism, protein maturation, and proteolysis, implying a tendency for cell-surface modulation (Fig. 3g, Supplementary Table 4). These results highlight the key cellular systems targeted by eukaryote-derived genes during infection. However, these processes are also known to be manipulated by viruses that lack eukaryotic genes (e.g., many non-NCLDV viruses), which instead often rely on small, functionally cryptic effectors. This suggests that cellular manipulation strategies are ubiquitous, but that the mode through which modification is accomplished may depend on viral coding capacity (e.g., reduced coding limitations in the NCLDV could permit the use of more and larger eukaryotic genes).
Lastly, to gain insights into the role viral genes play in eukaryotic systems, we inspected the distributions and functions of viral-derived glycosyltransferases, which were strongly enriched in virus-to-eukaryote HGTs (Fig. 2c). We identified 63 instances of eukaryotes acquiring viral glycosyltransferases, of which 13 mapped to ancestral nodes, implying functional relevance under long term selection (Supplementary Table 5). Plotting transfer events and annotations over a eukaryotic phylogeny revealed the functional diversity and recurrent acquisitions of these enzymes across eukaryotic lineages (Fig. 4a, Extended Data Fig. 4). These HGTs were often correlated with morphological and structural synapomorphies including algal cell wall elaboration (e.g., lipopolysaccharide (LPS) and cellulose synthesis enzymes) 37, long-chain polyamine-containing scale formation in haptophytes (spermidine synthase) 38, cellular aggregation in the opisthokonts and dictyostelid slime molds (hyaluronan synthase and GlcNAc transferase), and mitochondrial divergence in the kinetoplastids (fucosyltransferase), a group primarily comprised of animal parasites such as trypanosomes (Fig 4a). Experimental data supported a number of these correlations, including the unusual identification of LPS in the cell walls of Chlorella 39, the importance of hyaluronan in vertebrate tissues 40, and the role of the dictyostelid N-acetylglucosamine transferase, Gnt2, in calcium-independent cellular aggregation 41,42, demonstrating that virally sourced genes are co-opted during the evolution of cellular traits (Fig. 4a). We further examined two glycosyltransferase acquisitions in kinetoplastids, hypothesizing that, given the correlation between the HGT acquisitions and the origin of the highly derived kinetoplastid mitochondria (called kinetoplasts), they should function in that compartment. Phylogenetic analyses revealed that both genes were derived from the NCLDV, highlighted the prokaryotic origin of the fucosyltransferase (COG000231), and confirmed that both genes were conserved throughout kinetoplastids (Fig. 4b, c). Moreover, both proteins localized to the kinetoplast in Trypanosoma brucei (identifiable as a non-nuclear DNA-stained foci) both when tagged with mNeonGreen (Fig. 4d) and by organellar proteomics (Fig. 4e). A recent report also suggests an essential role for the fucosyltransferase in kinetoplast function in T. brucei 43, altogether indicating that these viral-derived glycosyltransferases were co-opted for use in the kinetoplast at the same time as it underwent massive evolutionary change. These data, along with the tendency for viruses to modify cell surfaces, suggest that viral-derived genes may have played various roles in the evolution of cellular morphology across the eukaryotic tree of life, possibly affecting the diversification of eukaryotic forms.
Horizontal gene transfer between viruses and eukaryotes has been observed and assumed to impact genome evolution in both participants, but until now we lacked the systematic characterization of these gene exchanges necessary to generalize their mode and functional significance in both viral and eukaryotic contexts. As with all computational surveys, our dataset is limited by specificity and sensitivity, but nonetheless it provides an extensive resource from which phylogenetic patterns can be observed and their genomic and functional importance may be predicted. From a viral perspective, the apparent ubiquity of host-manipulation strategies suggests that the cellular processes outlined above may represent targets for the development of broad-spectrum, host-targeting, antiviral therapeutics. Indeed, many important emerging human pathogens, such as Ebola virus, Zika virus, and coronaviruses, depend on the manipulation of the same cellular processes outlined above, such as autophagy, proteolysis, ER modification, and sphingolipid metabolism 35,44–46. Functional investigations of eukaryote-derived viral genes, particularly using heterologous expression 7, may also provide insights into how viruses manipulate these cellular pathways while circumventing the need for tractable host-virus model systems. From a eukaryotic perspective, our analyses suggest that viruses can not only mediate intra-eukaryotic gene exchange but that the evolution of cellular morphology and structure has been influenced by viral genes, particularly glycosyltransferases. These have recurrently impacted transitions as fundamental as the evolution of tissues or divergent mitochondria, reminiscent of how retroviral fusogens have repeatedly driven placental evolution in mammals and lizards 22. Our survey also identifies protein candidates for which experimental characterizations would help reveal the full impact of these genes on cellular systems and their role in driving the evolution of eukaryotic complexity.