Evolutionary history of Ku proteins: evidence of horizontal gene transfer from archaea to eukarya

The DNA end joining protein, Ku, is essential in Non-Homologous End Joining in prokaryotes and eukaryotes. It was rst discovered in eukaryotes and later by PSI blast, was discovered in prokaryotes. While Ku in eukaryotes is often a multi domain protein functioning in DNA repair of physiological and pathological DNA double stranded breaks, Ku in prokaryotes is a single domain protein functioning in pathological DNA repair in spores or late stationary phase. In this paper we have attempted to systematically search for Ku protein in different phyla of bacteria and archaea as well as in different kingdoms of eukarya. two of which Saccharomyces cerevisiae. (c) Oxytricha trifallax, Stylonychia lemnae and Tetrahymena thermophila representing Ciliophora. Three sequences of T. thermophila obtained during BLASTP were found to exhibit three different domain architecture. Paramecium tetraurelia was found to exhibit similar domain architecture as that of Oxytricha trifallax. (d) Dictyostelium discoideum representing amoeboza and (e) Arabidopsis thaliana and Ostreococcus lucimarinus representing archaeplastidae. Volvox carteri shares similar architecture to that of Ostreococcus lucimarinus.

structured 5kDa helix-loop-helix region known as the SAP domain 13 . The SAP domain has putative DNA binding properties and has been shown to increase the overall DNA binding a nity of the heterodimers 1415 . Not only this, the C-terminal is also subjected to post translational modi cations to regulate Ku interaction with proapoptotic proteins and recruitment of homeodomain proteins to DNA ends 1617 .

Ku in Prokaryotes
In silico approaches provided the rst hint at the existence of a bacterial NHEJ leading to the identi cation of Ku 70/80 homologs and ATP-dependent DNA ligase homologs 187 . Similarly, the rst experimental validation of a bacterial NHEJ came from the study of Weller et al which showed that in vitro the Mycobacterium tuberculosis LigD (LigDMtub) protein is indeed an ATP-dependent DNA ligase which is stimulated by its cognate KuMtb partner, possibly via interaction between them 19 .The bacterial Ku proteins are small in size ) as compared to the much larger eukaryotic counterparts (70-80kDa). These smaller bacterial Ku homologues represent a conserved 'Ku domain' at the center of the eukaryotic Ku complexes but considerably lack other conserved domains present in the eukaryotic proteins. Among the bacterial species, which are predicted to possess a NHEJ repair system, some actually encode various putative Ku homologs. Examples are actinobacteria of the Streptomyces family, and proteobacteria of the alpha subdivision 8 . Some of the ku genes are even located on plasmids, implying their putative acquisition via horizontal gene transfer.
In contrast to the heterodimeric Ku complex of eukaryotes, the bacterial Ku complexes are predominantly homodimeric in structure, binding preferentially to the ends of double stranded DNA and possibly sequester other factors, including DNA ligases, forming a primordial DNA repair complex. Evidence for the association of the bacterial Ku proteins with other DNA repair proteins may be evident from the operon structure of the bacterial Ku genes. The Ku homologues are often organized into operons containing ATP-dependent DNA ligases. All of these bacteria also contain essential NAD + -dependent DNA ligases; the presence of these additional ATPdependent ligases suggests a more speci c role for these enzymes in vivo roles such as DNA repair/recombination. Often Ku proteins form functional DNA repair complexes with these DNA ligases. To discover the role of Ku and ligase in DSBs repair in prokaryotes, an experiment was performed on mutant Bacillussubtilis bearing deactivating mutations in Ku (YkoU) and ligases (YkoV). These mutants were found to be more sensitive to ionizing radiations in stationary phase and spores 1920 .
In this paper we have looked at the origin and evolution of Ku protein in eukaryotes and prokaryotes. We have thoroughly searched for Ku proteins in bacterial, archaeal and eukaryal domains of selected species. The phyla and kingdom where it is found and where it is missing has been clearly indicated. We have further used the beta barrel core domain of Ku from different eukaryotic and prokaryotic species to draw a phylogenic tree. The phylogenetic relationship between Ku70, Ku80 and prokaryotic Ku has thus been predicted. Further we using the InterPro website, we have predicted the domains of Ku70 and Ku80 in eukaryotes and tried predicting a trend in evolution.

Search of Ku proteins in bacteria
To study the evolution and history of Ku protein in all domains of life, a list of diverse eukaryotes, archaea and bacteria was compiled and Ku protein was searched in these organisms. The eukaryote genome was searched for the presence of homologs of Ku70 and Ku80 domains of humans, while the archaea and bacteria were searched for the presence of homologs of Ku protein in Mycobacterium tuberculosis. Hundred and sixteen genomes covering twenty-eight phyla of bacteria were searched for a presence of Ku. Ku was found only in twenty-ve genomes. A few of the bacterial genomes contained multiple Ku proteins. In this list, Ku was found in actinobacteria, alpha proteobacteria, bacteroides/chlorobi, bobacteres/acidobacteria, rmicutes, themodesulfobacteria, beta proteobacteria, chlamydia/verrucomicrobia, and delta/epsilon proteobacteria ( Table   2). This list is not exhaustive. It is possible that bacterium from any other phylum contain Ku. Those bacteria might just be absent in our list.

Search of Ku proteins in archaea
Similarly, Ku from Mycobacterium tuberculosis was blasted against archaea (taxid: 2157) in NCBI. It was found that only a number of de ned archaeal species contained Ku. These species found in NCBI were Archeoglobus fulgidus, Geoglobus ahangari, Archeoglobus vene cus, Methanosaeta harundinacea, Methanotrix soehngeni, Methanobacterium formicicum and Methanocella paludicola.

Search of Ku proteins in eukaryotes
To search for Ku in eukaryotes, candidate approach was employed. To makes sure that not only the crown groups were included in the search, genomes of organisms of six different kingdoms were blasted. Ku domain of human Ku70 and Ku80 were used for blast searches. The reason behind including these domains instead of bacterial Ku domain is that search of bacterial Ku domain under normal homology search does not yield any results in eukaryotes (unpublished data). The six kingdoms included in this study are eozoa, animalia, amoebozoa, archaeplastida, fungi and ciliophora (belongs to the group chromista). The organisms are divided into three groups: group with both Ku70 and Ku80, group without Ku80 and group without Ku70 (  tree also illustrates that Ku70 and Ku80 share a common node, which is not shared by prokaryotic clade. From this observation in can be comprehended that Ku70 and Ku80 arose from gene duplication event.

Evolution and history of Ku proteins in prokaryotes
The bottom 1/3 rd of tree in Figure 1 covers single domain Ku protein in bacteria and archaea. A striking observation from this part of the tree is that Ku proteins of a single phylum of bacteria tend to cluster together and often form part of the same clade. This can be observed for alphaproteobacteria, betaproteobacteria, chlamydiae/verrucomicobia, bacteroides, rmicutes, actinobacteria and brobacteres. Delta/epsilon proteobacteria do not occur in clusters and are interspersed at different locations in the tree. One beta proteobacteria (Cupraviridus necator 2) is separated from the other betaproteobacteria. This might be because this protein arose from horizontal gene transfer from another species of bacteria and not from gene duplication from Cupraviridus necator 1. Other multiple Ku proteins present in the same species seem to have originated from gene duplication events since they tend to lie in the same clade (Niastellakoreensis, Saccharothrix espanasensis and Bradyrhizobium japonicum). Seven archaeal Ku genes are found in three different clades. Two of the clades contain 2 archaeal proteins whereas 1 clade contains 3 archaeal proteins. Archaeoglobus fulgidus, Geoglobus ahangari and Archaeoglobusvene cus are all related archaea and lie in the same clade. Similarly, Methanosaeta harundinacea and Methanotrix soehngeni lie in the same clade. Lastly, Methanobacterium formicicum and Methanocella palludicola lie in the same clade. It can be hypothesized that eukarya arose from archaea and thus eukaryal Ku70 and Ku80 share similarity to these archaeal Ku proteins.
The rst clade described above is deeply buried in the bacterial clades and thus most likely arose from bacterial horizontal gene transfer. Methanobacterium and Methanocella Ku proteins lies just below fungal Ku80 and thus it is possible that they gave rise to eukaryal Ku protein. Similarly, Methanosaeta andMethanotrix Ku protein lies below Methanobacterium and Methanocella Ku proteins clade. The lack of Ku protein in Asgard superphylum of archaea, the closest archaea to eukaryotes, and Eozoa, the earliest branching eukaryotes, makes the vertical inheritance of Ku from archaeal lineages to eukaryotes unlikely (unpublished data).

Horizontal inheritance of Ku protein between prokaryotes and eukaryotes
There are two possibilities for the inheritance of Ku protein between prokaryotes and eukaryotes. It is possible that Ku rst arose in the prokaryotes and their vertically or horizontally got transferred to eukaryotes.
Alternatively, Ku rst arose in the eukaryotes and horizontally transferred to some prokaryotes. In eukaryotes, Ku domain is often anked by other domains. If it indeed transferred from eukaryote to prokaryote, then only the Ku domain was transferred. Since Ku protein is often a simple, one domain protein in prokaryotes, it is more likely that this domain transferred onto later arising eukaryotes and diverged with multiple anking domains. The rst possibility of vertical transmission of Ku from prokaryote to eukaryote is negated by the ndings of this paper. If there were indeed a vertical transmission of the protein than a lot more than 7 archaeal species would be expected to contain Ku protein. Especially the pronounced absence of Ku protein in Asgard superphylum of archaea (the closest prokaryote to eukaryote) and in the Eozoa kingdom of eukarya, indicates that this vertical inheritance from prokaryote to eukaryote is unlikely. It is possible that a number of archaea and eozoa lost Ku during evolution due to some selective pressure, but this hypothesis is very unlikely.
Evidence that Ku in bacteria, and Ku70 and Ku80 in eukaryotes arise from a common ancestor Another interesting question that can be studied computationally or through wet laboratory experiments is when the NHEJ machinery started getting involved in adaptive immunity. The earliest eukaryotes that show adaptive immunity are jawed sh 30 . It has been hypothesized that these organisms developed adaptive immunity when they acquired recombination-activating genes (RAGs) from transposons. Did NHEJ in these organisms immediately get involved in adaptive immunity, or were there changes in NHEJ that led them to their role in adaptive immunity in higher organism?

Conclusion
On contrary to the estimation that prokaryotic and eukaryotic Ku proteins belonged to different clades, our analysis shows a common node of origin. This led us to hypothesize that Ku from archaea transferred through horizontal gene transfer onto neozoa and then duplicated to form Ku70 and Ku80. Study of Ku proteins in prokaryotes and eukaryotes is both important. In certain eukaryotes NHEJ is important for survival, and study of Ku leads to an understanding of how this is possible. Additionally, since many human pathogens possess a NHEJ repair system, understanding bacterial NHEJ might be of therapeutic interest. Developing anti-NHEJ compounds might be a way to move forward since these compounds might hinder chances of pathogens to survive in genotoxic conditions in certain non-dividing states in different mammal hosts. However, many studies are yet to be conducted and many questions are yet to be answered in this particular eld of study. Phylogenetic tree construction: All the sequences retrieved were named as Genus species | Accession number. Phylogenetic tree was constructed using MEGA-X desktop software downloaded from (https://www.megasoftware.net/). All the sequences were aligned using Muscle and Maximum Likelihood statistical method was employed 23  The data for this paper has been derived from publicly available database

Author contributions
The project was conceived and designed by HKB. AM and SR looked up the databases, searched for sequences and drew family tree and domain organization. AM and SR wrote the introduction and methods section, HKB wrote the results and discussion section. All authors read the paper and made corrections.

Author contributions
We would like to thank Simon K. Shrestha for suggestions to the paper.   Figure 1 Phylogenetic tree inferred by using the Maximum Likelihood method and Whelan And Goldman model24 in MEGA-X. The bootstrap consensus tree inferred from 500 replicates 26 is taken to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. Particular color is assigned to each phyla and kingdom and organism labels are colored accordingly. In addition, corresponding phyla/kingdoms are labelled at left of the tree. From this tree it can be inferred that the Ku70, Ku80 and prokaryotic Ku largely separate into 3 clades. Some of the archaeal Ku share node with Ku70 and Ku80 hinting at the common origin.