Testing the Hypothesis that the Nylonase NylB Protein Arose de novo via a Frameshift Mutation

In 1984, Susumu Ohno hypothesized that the nylon-degrading enzyme NylB arose de novo via a frameshift mutation within a hypothetical precursor protein (PR.C). However, the frameshift hypothesis was never actually tested, and there was no actual supporting biological evidence supporting the hypothesis. For decades, the frameshift hypothesis has been uncritically accepted as the correct explanation for the origin of NylB. In this paper we have surveyed the literature relevant to the frameshift hypothesis as well as the various alternative models that have been published regarding the origin of NylB. We have employed bioinformatic methods and leveraged databases not available when the frameshift hypothesis was first put forward. We searched multiple protein databases to determine the distribution of NylB and any possible homologs. We then determined the distribution of other known nylonases and their possible homologs. We also determined the distribution of Ohno’s hypothetical PR.C. protein and any possible homologs. Lastly, we determined what protein families the various nylonases belong to. We found that the NylB protein is widely occurring, has thousands of homologs, and is found in diverse organisms and diverse habitats. It is not a new or unique protein. Likewise, we found that the other known nylonases are also widely occurring, have thousands of homologs, and are found in diverse organisms and diverse habitats. However, the hypothetical PR.C protein does not show up in any of the same databases, and there is no evidence of any homologs. Conserved domain searches showed that NylB is a member of the beta lactamase protein family – a highly conserved family of enzymes. Likewise, the other known nylonases belong to well-characterized enzyme families.Conclusions Our results very effectively falsify the NylB frameshift hypothesis,


Background
In 1984, Susumu Ohno hypothesized that the nylon-degrading enzyme NylB arose de novo via a frameshift mutation within a hypothetical precursor protein (PR.C). However, the frameshift hypothesis was never actually tested, and there was no actual supporting biological evidence supporting the hypothesis. For decades, the frameshift hypothesis has been uncritically accepted as the correct explanation for the origin of NylB. In this paper we have surveyed the literature relevant to the frameshift hypothesis as well as the various alternative models that have been published regarding the origin of NylB. We have employed bioinformatic methods and leveraged databases not available when the frameshift hypothesis was first put forward.

Results
We searched multiple protein databases to determine the distribution of NylB and any possible homologs. We then determined the distribution of other known nylonases and their possible homologs. We also determined the distribution of Ohno's hypothetical PR.C. protein and any possible homologs. Lastly, we determined what protein families the various nylonases belong to. We found that the NylB protein is widely occurring, has thousands of homologs, and is found in diverse organisms and diverse habitats. It is not a new or unique protein. Likewise, we found that the other known nylonases are also widely occurring, have thousands of homologs, and are found in diverse organisms and diverse habitats. However, the hypothetical PR.C protein does not show up in any of the same databases, and there is no evidence of any homologs. Conserved domain searches showed that NylB is a member of the beta lactamase protein family -a highly conserved family of enzymes. Likewise, the other known nylonases belong to well-characterized enzyme families.Conclusions Our results very effectively falsify the NylB frameshift hypothesis, while they strongly support an alternative hypothesis by Yomo. Like NylB, none of the other nylonases that we examined were substantively new or unique. All had thousands of homologs, and they were found in diverse organisms and diverse habitats. Our findings not only falsify the NylB frameshift hypothesis, they also falsify the long-held assumption that all nylonases evolved after the invention of nylon in 1935.

Background
Nylon is a synthetic polymer that was invented in 1935. From the mid-1950's onward, a variety of enzymes have been discovered that can degrade nylon-6 oligomers into their monomer components. [1,2] Such enzymes have been colloquially referred to as "nylonases". It has been widely assumed that all nylonases have evolved since 1935.
However, there are now many reasons to doubt this assumption.
Below we review various hypotheses that have been developed to understand the origin of nylonase enzymes -in particular the origin of NylB. The three primary hypotheses for the origin of NylB are: 1) Okada et al.'s post-1935 gene duplication and mutation hypothesis; [3] 2) Ohno's post-1935 frame shift hypothesis; [4] and 3) Yomo et al.'s pre-1935 NylB homologs hypothesis. [5] In 1957, Ebata and Morita discovered the first enzyme that could breakdown nylon. [6] They found that Trypsin, a widely conserved enzyme in mammals, was shown to degrade nylon-6 oligomers. This capability obviously existed in Trypsin prior to the invention of nylon in 1935. Trypsin is a protease, and nylon has some protein-like molecular features ( Fig. 1). Therefore, it should not be surprising that Trypsin might degrade nylon. It is important to note that many enzymes that existed long before the invention of nylon might still manifest "nylonase" activity. This does not necessarily imply a newly evolved enzyme function. Hence application of the term "nylonase" can be ambiguous. We will use the term nylonase to refer to all enzymes with measurable or predicted nylonase activity.
In 1966, Fukumura first discovered that a bacterium (Corynebacterium aurantiacum B-2) could metabolize nylon [7], and he isolated two of the enzymes involved. [8] From the mid 1970's to early 1980's Kinoshita, Okada and others published a series of papers on isolation of two nylonase enzymes (eventually named NylA and NylB) from Achromobacter gutatus KI72 (renamed Flavobacteria KI72). [1,3] The corresponding genes were on the plasmid pOAD in KI72. [9] A paralog of NylB named NylB′ was also discovered, which had substantially lower nylonase capability than NylB. [2] In 1993 one more nylonase, called NylC, was discovered on the same plasmid in the same bacterium. [10] The natural ability of the KI72 strain to metabolize nylon is apparently due to the coordinated action of this set of four linked complementary nylonase genes. [2] Yet Kinoshita claimed that all these genes were "newly evolved" since 1935. [11] Okada et al. was the first to present a specific hypothesis regarding the origin of NylB. In 1983, Okada argued NylB was a paralog that arose via a gene duplication event from a linked gene coding NylB′. He assumed this must have occurred sometime after 1935. [3] His model required that the duplicate gene, NylB, would need to acquire 47 residue substitutions via point mutations in just a few decades. Although the paralogous nature of NylB and NylB′ suggests a gene duplication event, there was no direct evidence that it happened prior to 1935, and he gave no reason why NylB′ might not have arisen from NylB instead of the reverse.
In his 1984 paper, Susumu Ohno offered a second major hypothesis for the origin of NylB.
Ohno criticized Okada's 1983 hypothesis because it required too many point mutations to effect so many amino acid substitutions in so little time. Ohno said, "so extensive an amino acid divergence is not expected to occur in so short a time span." Ohno took Okada's published sequence known as RS-IIA (which encoded NylB) and constructed a hypothetical sequence he called PR.C by simply deleting a single nucleotide from the RS-IIA sequence and relabeling it as PR.C. Ohno then claimed PR.C was the ancestral sequence of NylB. He claimed that shortly after 1935, a single nucleotide insertion in the gene encoding his hypothetical PR.C protein yielded the present-day RS-IIA sequence that now encodes NylB. Ohno criticized Okada's hypothesis as being unrealistic because it required so many point mutations, yet the Nylb frameshift hypothesis required an essentially random amino acid sequence to arise from a frameshift and to instantly form a stable, functional, and specific enzyme. Ohno had no direct evidence that the hypothetical PR.C protein even existed because his frameshift mutation was purely hypothetical. Yet the NylB frameshift hypothesis was stated so forcefully that readers accepted his model as if it were history, and his paper continues to be cited as if the hypothetical frameshift mutation was actually an observed fact. [12,13,14,15,16,17] Just 5 years after the NylB frameshift hypothesis was put forward, Kanagawa et al. discovered another NylB enzyme in another bacterium, Pseudomonas NK87, which also had the ability to degrade nylon-6. [18] This effectively falsified Ohno's claim that NylB was unique. This new NylB gene sequence was highly divergent, having only 53% DNA similarity, [19] and had only 35% protein sequence similarity compared to Kinoshita's NylB in KI72 (the one Ohno claimed was truly unique). Kanagawa designated this newly discovered NylB as p-NylB and re-named the previously discovered NylB and NylB′ proteins as f-NylB and f-NylB′, respectively.
In 1991, Kato et al. attempted to explore Okada's hypothesis by experimentally mutating the 47 amino acids in NylB′ that were divergent from NylB. [20] They discovered that only two of the 47 amino acids were required to enhance nylonase activity in NylB′ up to the level of NylB. The two linked genes coding for NylB and NylB′ were substantially divergent (making duplication and divergence in just a few decades very unlikely), yet they were still sufficiently homologous to rule out a single frame shift for the origin of two proteins simultaneously. [4] In 1992, in response to Kanagawa's discovery of p-NylB, Yomo et al. co-authored a paper with Urabe and Okada, to put forward a third competing hypothesis regarding the origin of NylB. Yomo et al. argued that Kinoshita's f-NylB and Kanagawa's p-NylB homologs descended from a common ancestor that existed about 140 million years ago. [5] Yomo et al. wrote: "The distance between P-nylB and F-nylB (or F-NylB′) is much larger than between F-nylB and F-NylB′. The time divergence of F-nylB and P-nylB is estimated to be at least 1.4 × 10 8 years… Therefore, most of the amino acid substitutions from the ancestor of the nylB gene family to its descendants of today might have occurred before the beginning of nylon manufacture." In 1995, experiments by Prijambada, Negoro, Yomo, and Urabe, showed that strains of the bacteria Pseudomonas aeruginosa PAO1 which initially lacked activity toward nylon-6 linear and cyclic dimers could be selectively evolved into a strain that could digest these dimers. [21] The evolved descendant from the ancestral PAO1 that had nylon digesting capability was designated PAO5502. However, Prijambada et al. point out, "a molecular basis for the emergence of nylon oligomer metabolism in PAO5502 is still unknown." In 2007, Sudhakar demonstrated that strains of Bacilus cereus found in the Indian Ocean could digest nylon-6. [22] To confirm this, we searched protein data bases for evidence of NylB in Bacilus cereus.
We will show that multiple lines of evidence falsify Ohno's hypothesis, but are consistent with Yomo's model. Ohno had three primary claims: a) he claimed the NylB protein never existed until sometime after 1935; b) he claimed NylB arose as a de novo protein as the result of a frameshift mutation in a precursor protein; c) he claimed he knew the exact sequence of his hypothetical precursor protein. Since the sequences of the NylB protein and Ohno's hypothetical protein are both known, Ohno's hypothesis is now readily testable using protein databases.
If the NylB frameshift hypotheses were correct, then a protein database search should reveal evidence that Ohno's hypothetical precursor protein actually existed, had a history, and so should have many protein homologs. At the same time there should be clear evidence that the NylB protein really is a unique protein, with no history, and no protein homologs.
Conversely, if the NylB frameshift hypotheses were wrong, then a protein database search should reveal evidence that the hypothetical precursor protein never existed, had no history, and should have few if any homologs. At the same time there should be evidence that the NylB protein is not unique, has a history, and has numerous homologs.

Results
We used a spectrum of search strategies with differing criteria to detect homologs or exact sequences. On one end of the spectrum were searches that were strongly constrained and thus expected to have few if any false positives but may admit a number of false negatives (rejected sequences), and on the other end of the spectrum searches that had relaxed parameters and thus expected to have fewer false negatives but admit a number of false positives. The searches are shown in Table 1, where the most constrained searches results are toward the left, and the relaxed searches toward the right. There was almost a complete absence of the predicted PR.C across the entire spectrum of search strategies. Reruns of the BLASTP search often resulted in numbers ranging from 0 to 9 hits when no limit was imposed for e-values, hence the number was reported as "unstable." Since PR.C corresponds to an exact gene sequence with a specific insertion mutation, a highly relaxed BLASTN would flood the results with false positives of unshifted nylonase homologs. Hence, the more relaxed search strategies were conducted using BLASTP on the predicted amino acid sequence of PR.C rather than BLASTN on the nucleotide sequence.
However, the left column shows the search result of a constrained BLASTN search where the number of hits for the PR.C DNA sequence was adjusted for hits missing the critical segment (DNA sequence coordinates 99-100) where the supposed insertion mutation that created a new start codon was claimed by Ohno to occur after 1935.
The SPARCLE numbers were not adjusted for redundancies and hence yielded many false positives but were provided to give an idea of the degree of representation of the homologs in the databases which SPARCLE surveys. The number "44" for NylC in the rightmost column strongly contrasts with the relatively large numbers for the other nylonases. The figure of 10,250, 46,000, and 68,000 for 6-AH Hydrolase were composite scores produced by adding the BLASTP and SPARCLE hits for NylA, NylB, NylC, respectively (treating numbers for NylB and NylB′ as redundant).
From the lists of predicted nylonases generated through UNIPROT, data was gathered in the Conserved Domain Database (CDD) and then tallied to see the most common domain family for each nylonase (Table 2). Table 2 Homology of various nylonases to known enzyme families such as beta lactamases, The data used to construct Tables 1 and 2 can be found in the in Supplementary Tables   S1, S2, S3, S4, and S5.
Several proteins labeled as NylB were also labeled as beta lactamases. Several proteins labeled as NylA were also labeled as amidases. The listing of these proteins with their accession numbers can be found in the Supplementary Tables S6 and S7. CDD analysis (as provided by GenBank), scored the similarity of the NylB that Ohno studied and COG1680 beta lactamase at 130 bits, which implies the probability that a random amino acid polymer would achieve that level of similarity to the archetypal COG1680 beta lactamase is one chance in 2 130 .
A representative sample of these proteins (mostly those reported with experimentally verified nylonase activity) were also aligned using the MUSCLE alignment algorithm to show some of the conserved features of the NylB homologs, particularly the Serine-X-X-Lysine motif. This Serine-X-X-Lysine motif has been confirmed by X-ray crystallography of NylB in Arthrobacter KI72. [23] The first 8 of 10 proteins in the alignment (Fig. 2) were proteins from organisms that had experimental evidence of nylonase NylB activity, and the last two were provided for comparison as they are remote homologs with only predicted NylBs (as of this writing).
We searched the databases for a NylB homolog in Pseudomonas aeruginosa PAO1 and found a sequence (accession AAG07735.1) that had 100% identity (96% coverage) to a predicted NylB in another strain of Pseudomonas aeruginosa (accession CKI08964.1).
AAG07735.1 was used as one of the proteins featured in the MUSCLE alignment. Conversely, if the NylB frameshift hypothesis were wrong, then a protein database search should reveal evidence that the hypothetical precursor protein never existed, has no history, and has few if any homologs. At the same time there should be evidence that the NylB protein is not unique, and so has a history and numerous homologs. Table 2 shows NylB is in the family of beta lactamases, and NylA is in the family of amidases. Therefore, NylB and NylA are both clearly members of very well-known protein families (independent of the BLAST and SPARCLE results).
Although it could be argued that the absence of PR.C in the databases might be due to the fact it might exist but simply has not yet been found, the most conclusive proof that the NylB frameshift hypothesis is false is that the NylB gene is not at all unique -it is found in many organisms, in many habitats, and has a vast number of homologs.  Table S6).
The NylB frameshift hypothesis was premised upon numerous assumptions that we now know are incorrect, and so his hypothesis is falsified on several levels: 1.
The widely held assumption that all nylonase enzymes evolved since 1935 was incorrect.

2.
Ohno's assumption that the NylB protein was a new and unique protein was incorrect.

3.
Ohno assumed a hypothetical but specific precursor protein that now appears to have never existed, and thus the hypothetical frame-shift mutation appears to have never happened.

4.
Ohno claimed that a random string of amino acids could reasonably be expected to give rise to a specific, functional, beneficial, and stable enzyme. Having all these things happen by chance is so incredibly unlikely that it is hard to imagine. This is especially clear in light of the fact that CDD database indicates that that the probability of NylB would be so similar to beta lactamase by chance would be essentially impossible (2 -130  Overlapping reading frames are known to exist in biology, [26,27] and Okamura [28] has speculated that several such human genes may have originated via frame shift mutations.
However, the larger question of de novo origination of genes and proteins in general and the role of frameshifts specifically in creation of de novo proteins is beyond the scope of this paper. Our present focus is specifically on whether a frameshift mutation after 1935 was the mechanism that created NylB.
We extended our search to look for homologs of other nylonases such as NylB′, NylA, and NylC (all of which were assumed to have evolved since 1935). While Kinoshita did not detect physiological amidase activity for NylA, [1,9] our analysis clearly shows that NylA has amidase homology. Similarly, we found that NylC was homologous to a rare peptidase.
We found several proteins had dual classifications such as beta lactamase and 6aminohexanoate hydrolase (NylB), or amidase and 6-aminohexanoate cyclic hydrolase (NylA). In addition to experiments with proteases like Trypsin, [6] experiments have shown that even triacylglycerol lipases can act as nylonases. [29] Thus it appears that the term "nylonase" could be applied to members of the protease, beta lactamase, amidase, peptidase, and lipase enzyme families. This is in broad agreement with some of Yasuhira et al. and Negoro's findings that NylB and NylB′ are in the beta lactamase family, NylA is in the amidase family, and some nylonases share some passing similarities to lipases. [13,23,30] In every case the proteins were found in various organisms and in various natural habitats -along with a great many homologs. We conclude that all of these nylonases and their close homologs existed prior to 1935, although in some cases there may have been adaptive modifications after 1935. It appears that these various naturally occurring enzymes that happen to be able to degrade nylon have historically acted upon alternative nylon-like substrates.

Conclusions
The focus of this research has been to test Ohno's claim that sometime after 1935 the "nylonase" NylB arose de novo via a frameshift mutation in a precursor gene/protein.
Ohno's hypothesis has been historically impactful -being considered a powerful proof that new genes and enzymes can instantly arise de novo. While the frameshift model was speculative and was never actually tested, it has been uncritically accepted within the scientific community for several decades.
In the last three decades, there have been some authors who have questioned the frameshift hypothesis and have proposed alternative explanations of how NylB might have arisen. While that work did suggest the frameshift hypothesis might be wrong, these authors did not rigorously falsify the frameshift hypothesis, and so the frameshift model has continued to be cited as a fact -even up to the time of this writing.
Thanks to protein databases and bioinformatic tools that were not available until quite recently, we have been able to unambiguously falsify Ohno's hypothesis. We have shown that Ohno's hypothesis can be falsified on multiple levels.
More broadly, we have examined the widely-held assumption that there were no enzymes having nylonase activity prior to the invention of nylon in 1935. Ohno shared this assumption with most of the scientists of his day. However, the primary "nylonases" that have been studied (NylA, NylB, NylB′, and NylC), were all found on the same plasmid, functioning in coordination, suggesting that none of these genes/proteins could have arisen de novo in the very recent past. Our database searches show that all of these enzymes are widely distributed in the biosphere and have thousands of homologs. We also show that these enzymes belong to well characterized enzyme families that are ancient. It is clear that numerous enzymes existed prior to the invention of nylon, which had previously been acting on other substrates, but also happened to have "nylonase-like" activity. In the future the term nylonase might be used with more caution.

Methods
The The gene coding for the NylB protein was contained in a segment of DNA Okada et al.
called RS-IIA. [3] It is worth mentioning that it appears Ohno mislabeled Okada's RS-IIA as R-IIA in his paper. [3,4] Also, it appears Ohno made either a typo in transcription or failed to clearly account for the creation of a premature stop codon in construction of his PR.C from the RS-IIA sequence. Okada's paper and GenBank indicate that the end of Ohno's PR.C (derived from RS-IIA) should be "GCGGCGTGA," not "GCGGCTGA" as was the case in Ohno's paper. Given that Okada's paper was the source of the actual sequence data, with Ohno's work deriving from that paper, the error must be Ohno's and not Okada's.
The UNIPROT searches were easily conducted by simply going to the uniprot.org website and typing search terms such as "NylB", "NylB′", "NylA", "NylC", and "6-aminohexanoate hydrolase." Lists of proteins for each of these nylonases were created by using a simple Java program to filter out duplicate experimental entries. Afterward, manual review of the filtered lists was also conducted to remove spurious search results. The sequences in Table S9 were then put in MEGA 6.0 to generate MUSCLE alignments. We confirmed by inspection that the Serine-X-X-Lysine motif that appears in the MUSCLE alignment ( Fig. 2) agreed with Negoro's X-ray crystallography of NylB. [20,23]