Search of Ku proteins in bacteria
To study the evolution and history of Ku protein in all domains of life, a list of diverse eukaryotes, archaea and bacteria was compiled and Ku protein was searched in these organisms. The eukaryote genome was searched for the presence of homologs of Ku70 and Ku80 domains of humans, while the archaea and bacteria were searched for the presence of homologs of Ku protein in Mycobacterium tuberculosis. Hundred and sixteen genomes covering twenty-eight phyla of bacteria were searched for a presence of Ku. Ku was found only in twenty-five genomes. A few of the bacterial genomes contained multiple Ku proteins. In this list, Ku was found in actinobacteria, alpha proteobacteria, bacteroides/chlorobi, fibobacteres/acidobacteria, firmicutes, themodesulfobacteria, beta proteobacteria, chlamydia/verrucomicrobia, and delta/epsilon proteobacteria (Table 2). This list is not exhaustive. It is possible that bacterium from any other phylum contain Ku. Those bacteria might just be absent in our list.
Search of Ku proteins in archaea
Similarly, Ku from Mycobacterium tuberculosis was blasted against archaea (taxid: 2157) in NCBI. It was found that only a number of defined archaeal species contained Ku. These species found in NCBI were Archeoglobus fulgidus, Geoglobus ahangari, Archeoglobus veneficus, Methanosaeta harundinacea, Methanotrix soehngeni, Methanobacterium formicicum and Methanocella paludicola.
Search of Ku proteins in eukaryotes
To search for Ku in eukaryotes, candidate approach was employed. To makes sure that not only the crown groups were included in the search, genomes of organisms of six different kingdoms were blasted. Ku domain of human Ku70 and Ku80 were used for blast searches. The reason behind including these domains instead of bacterial Ku domain is that search of bacterial Ku domain under normal homology search does not yield any results in eukaryotes (unpublished data). The six kingdoms included in this study are eozoa, animalia, amoebozoa, archaeplastida, fungi and ciliophora (belongs to the group chromista). The organisms are divided into three groups: group with both Ku70 and Ku80, group without Ku80 and group without Ku70 (Table 1).
Out of the twelve Eozoa species we looked into, not a single one had Ku70 or Ku80. Among the 7 amoebozoa species we looked into only Dictyostelium discoideum had both Ku 70 and Ku80. Acanthamoeba castellani had Ku80 protein but lacked Ku70 protein. In the other five species—Entamoeba invedens, Entamoeba histolytica, Physarum polycephalum, Hyperamoeba dachnya and Mastigamoeba balamuthi—no Ku70 or Ku80 could be found.
Similarly, among the ciliophora, only Paramecium tetraurelia had both Ku70 and Ku80. Oxytrichatrifallax, Stylonychia lemnae, and Tetrahymena thermophile had Ku70 only but no Ku80. The other four species Euplotes octocarinatus, Euplotes crassus, Anophryoides haemophila and Paramecium caudatum did not show the presence of Ku70 and Ku80. As of archaeplastida, Arabidopsis thaliana and Volvox carteri contain both Ku70 and Ku80. Ostreococcus lucimarinus contains Ku70 but no Ku80. The rest—Griffithsia japonica, Cyanidioschyzon merolae, Chlamydomonas reinhardtii, Scenedesmus obliquus, Acetabularia acetabulum, Micromonas, Glaucocystis nostochinearum, and Cyanophora paradoxa—do not show the presence of Ku70 and Ku80. Among the analyzed fungi, most species show the presence of either Ku70 or Ku80. Saccharomyces cerevisiae, Ustilago maydis, Lacaria bicolor, Puccinia graminis,Schizosaccharmyces pombe, Aspergillus oryzae and Talaromyces marneffei contain both Ku70 and Ku80. Malassezia globosa shows only the presence of Ku70. Mortierella alpine and Neurospora crassa show the presence of Ku80, but do not show the presence of Ku70. Moniliophthora perniciosa, Candida albicans and Penicillium marneffei do not have both Ku70 and Ku80. In animalia Strongylocentrotus purpuratus, Nematostella vectensis, Hydra vulgaris and Monodelphis domestica contain both Ku70 and Ku80. Branchiostoma floridae has Ku70 only. Finally, Hydra magnipapillata and Buddenbrockia plumatellae have neither Ku70 nor Ku80.
From this analysis, it can be shown that out of the six kingdoms analyzed only organisms in Eozoa contain no Ku70 or Ku80. All other 5 kingdoms contain organisms with either Ku80 or Ku70 or both. It is possible that organisms that do not currently show the presence of Ku domain in their genomes might have incompletely sequenced genome (Supp. Table 3). Upon completing sequencing, they might show the presence of Ku domain. Given that complete sequencing does not change the results, it can be surmised in four of the six domains there is presence of organisms that contain Ku70 only and no Ku80.
Phylogenetic tree to analyze Ku evolution
To analyze evolutionary relationship between Ku in bacteria and archaea and Ku70 and Ku80 in eukaryotes a maximum likelihood phylogenetic tree was drawn using full Ku sequences of bacteria and archaea and Ku domains of Ku70 and Ku80 in eukaryotes. The tree is shown in Figure 1. Overall inspection of the tree shows that Ku in bacteria and archaea, Ku domain of Ku70 and Ku domain of Ku80 largely separate into 3 clades. The tree also illustrates that Ku70 and Ku80 share a common node, which is not shared by prokaryotic clade. From this observation in can be comprehended that Ku70 and Ku80 arose from gene duplication event.
Evolution and history of Ku proteins in prokaryotes
The bottom 1/3rd of tree in Figure 1 covers single domain Ku protein in bacteria and archaea. A striking observation from this part of the tree is that Ku proteins of a single phylum of bacteria tend to cluster together and often form part of the same clade. This can be observed for alphaproteobacteria, betaproteobacteria, chlamydiae/verrucomicobia, bacteroides, firmicutes, actinobacteria and fibrobacteres. Delta/epsilon proteobacteria do not occur in clusters and are interspersed at different locations in the tree. One beta proteobacteria (Cupraviridus necator 2) is separated from the other betaproteobacteria. This might be because this protein arose from horizontal gene transfer from another species of bacteria and not from gene duplication from Cupraviridus necator 1. Other multiple Ku proteins present in the same species seem to have originated from gene duplication events since they tend to lie in the same clade (Niastellakoreensis, Saccharothrix espanasensis and Bradyrhizobium japonicum). Seven archaeal Ku genes are found in three different clades. Two of the clades contain 2 archaeal proteins whereas 1 clade contains 3 archaeal proteins. Archaeoglobus fulgidus, Geoglobus ahangari and Archaeoglobusveneficus are all related archaea and lie in the same clade. Similarly, Methanosaeta harundinacea and Methanotrix soehngeni lie in the same clade. Lastly, Methanobacterium formicicum and Methanocella palludicola lie in the same clade. It can be hypothesized that eukarya arose from archaea and thus eukaryal Ku70 and Ku80 share similarity to these archaeal Ku proteins. The first clade described above is deeply buried in the bacterial clades and thus most likely arose from bacterial horizontal gene transfer. Methanobacterium and Methanocella Ku proteins lies just below fungal Ku80 and thus it is possible that they gave rise to eukaryal Ku protein. Similarly, Methanosaeta andMethanotrix Ku protein lies below Methanobacterium and Methanocella Ku proteins clade. The lack of Ku protein in Asgard superphylum of archaea, the closest archaea to eukaryotes, and Eozoa, the earliest branching eukaryotes, makes the vertical inheritance of Ku from archaeal lineages to eukaryotes unlikely (unpublished data).
Evolution and history of Ku70 proteins
At the top of the tree, lies the entire cluster of Ku70 proteins included in the phylogenetic tree. All the Ku70 proteins included in the phylogenetic tree cluster together. Additionally, Ku70 proteins from the same kingdom tend to occur in the same clade with a few exceptions. One exception to this rule is the Ku70 protein from Ostreococcus lucimarinus. While Ku70 from other archaeplastida, Volvoxcarteri and Arabidopsis thaliana, cluster together O. lucimarinus Ku70 occurs separately. This might result from distinction of O. lucimarinus from other species of archaeplastida.
Similarly, another exception is a Ku70 protein from Tetrahymena thermophila. While the rest of the ciliophora Ku70 occur in the same clade, hinting at a common source of origin, one of the three Ku70s of T. thermophila lies within the Ku80 cluster. Of the other two Ku70s of T. thermophila, one lies in the same clade as P. tetraurelia and other branches out earlier. It can be hypothesized that earlier branching occurred in the common ancestor of the two species.
Ku70s of the rest of the fungi from S. cerevisiae to T. marneffei all cluster together, suggesting common ancestry. In animalia kingdom, Ku70 from H. sapiens and M. domestica, the two mammals, occur in the same clade and suggesting a common ancestry. Ku70 protein from other animalia species diverged earlier, but all the animalia Ku70 occur in the same clade. Ku70 in Monosiga brevikolis, a choanoflagellate, lies just below animalia clade hinting similarities between them. Ku70 from the only amoebozoa, D. discoideum, is sandwiched between the kingdoms ciliophora and archaeplastida.
Evolution and history of Ku80 proteins
In between the prokaryotic and Ku70 proteins cluster, lie the rest of the Ku80 proteins interspersed with one Ku70 protein. All the animalia and amoebozoa Ku80 proteins cluster along their kingdoms. The third T. thermophila Ku70 lies next to P. tetraurelia Ku80 protein. T. thermophila has no Ku80 protein, and this Ku70 protein can be considered its Ku80 protein. Seven of the 10 Ku80 proteins from fungi cluster together and the other three lies apart. A. castellani has one Ku80 protein that clusters with D. discoideum Ku80 and it is devoid of Ku70 protein.
On the other hand, two archaeplastidae Ku80s of A. thaliana and V. carteri are apart from each other. V. carteri Ku80 lies next to O. lucimarinus Ku70 suggesting its similarity to archaeplastidae Ku70 protein. Strikingly, the Ku80 protein of Arabidopsis thaliana comes up when Ku70 of Homo sapiens is blasted in the A. thaliana genome. This is the only time when blasting Ku70 pulls up Ku80. This instance suggests that Ku70 and Ku80 possibly arose through gene duplication of a common ancestor. Finally, all the animalia Ku80s lie in the same clade where H. sapiens and M. domestica lies in one nested clade and N. vectensis and H. vulgaris lies in the other hinting a common ancestry.
NJ tree and ML tree with Streptomyces species were drawn (Supplementary Figure 1 and Figure 2). The conclusions drawn earlier does not change with the trees.
Domain architecture of Ku70 protein
From Figure 2, we can perceive that the most common architecture of Ku70 protein across different kingdoms is the core Ku domain flanked by N and C terminus. The only amoebozoa, D. discoideum, which has this protein has this domain architecture in addition to APLF_PBZ domain at the end of C-terminus. Without a clear evolutionary trend, a number of Ku70s across all kingdoms except Eozoa and Amoebozoa, contain an SAP domain next to the C-terminal domain. At the N-terminus of some Ku70 proteins lies the VWF domain.
Domain architecture of Ku80 proteins:
Figure 3 displays various Ku80 domain architecture found within different kingdoms. Ku80 protein of Homo sapiens and M. domesticus contains beta barrel Ku domain flanked by C-terminal and N-terminal domains. At the end of the N-terminal end lies the Ku PK binding domain. This domain architecture is conserved in all Ku80 proteins of the animalia kingdom except Nematostella vecentsis, which has missing Ku PK binding domain and N-terminal domain. Since only this species has missing domains, it is possible that during speciation of this species these two domains were lost. All fungi Ku80 have missing C-terminal domain. This shows that C-terminal domain possibly evolved after Ku80 diverged in the common ancestor of fungi and animalia. Almost all fungi Ku80 possess Ku PK binding domain, suggesting that the common Ku80 ancestor of fungi and animalia had Ku PK binding domain. As for the N-terminal domain, half the fungi have it, while other half don’t have it suggesting it was either lost in some fungi or some fungi and all animalia acquired it after the two kingdoms diverged. Some of the fungi also have VWF domain at the N-terminus of Ku80 protein, although this domain is mostly found in Ku70 proteins. Both amoebozoa Ku80s have an N-terminal domain with VWF domain attached. They both lack C-terminal domain and Ku PK binding domain. Another kingdom, archaeplastida has one A. thaliana Ku80 and one V. carteri Ku80. A. thaliana Ku80 has the same domain architecture as most animalia Ku80s. V. carteri Ku80 has Ku domain flanked Ku PK binding domain. From this domain architecture it has hard to reconcile what the ancestral archaeaplastida Ku80 looked like. Like in fungi, ciliophora Ku80s have missing C terminal domain. It can be hypothesized from domain architecture data that C-terminal domain was missing in the ancestral eukaryotic Ku80. It was later acquired by species in the animalia kingdom and A. thaliana.