Novel database reveals growing prominence of deep-sea life for marine bioprospecting

doi:10.21203/rs.3.rs-3136354/v1

Download PDF

Article

Novel database reveals growing prominence of deep-sea life for marine bioprospecting

https://doi.org/10.21203/rs.3.rs-3136354/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 08 Aug, 2024

Read the published version in Nature Sustainability →

Version 1

posted

You are reading this latest preprint version

Perceptions that marine bioprospecting will deliver vast commercial benefits have placed ‘marine genetic resources’ at the center of key policy processes yet our knowledge about their importance remains limited. Here, we introduce a novel global database of marine gene sequences referenced in patent filings, the MArine Bioprospecting PATent (MABPAT) Database. It includes 25,682 sequences from 1,092 marine species associated with 3,258 patent filings, identified by analyzing all relevant sequencerecords from INSDC. Microbial life in the deep sea, a vast and remote biome predominantly beyond national jurisdiction, is already attracting significant commercial interest; all of the top 10 patent holders have filed marine gene patents referencing sequences from deep-sea life, and only three companies, BASF, IFF, and DuPont, included sequences from nearly two-thirds of all species. Our findings underscore the need for policymakers to ensure stewardship of deep-sea ecosystems while providing the most updated understanding of the marine bioprospecting landscape.

Biological sciences/Biotechnology

Earth and environmental sciences/Ecology/Conservation biology

Scientific community and society/Business and industry

Marine genetic resources

patent landscape

natural products

bioprospecting

ocean biodiversity

BBNJ

Biodiscovery – the exploration and use of genetic and biochemical properties of biological materials – has a long and rich history. For instance, centuries before the discovery of penicillin from mould in a lab, skin diseases were already being treated in the Kingdom of Jordan via red soils with potent antibacterial properties that have only recently been confirmed (Abuidhail et al. 2014; Falkinham et al. 2009). Other examples include traditional medicines extracted from evergreen shrubs for cancer treatment (Cragg and Pezzuto 2016), derivatives of the foxglove plant used to treat heart problems (Hollman 1996), antimalarial quinine (Achan et al. 2011), and fungi-extracted Podophyllotoxin to treat sexually transmitted diseases (Shah et al. 2021). However, recent advances in genetics and sequencing innovations have spurred an unprecedented growth in the scale of discoveries. Today, bioprospecting – the search for potential products with scientific and industrial value derived from biological resources such as animals, plants, and microbes – often involves large-scale screening, analysis and prediction of prospective biological compounds through the exploration of databases with sequencing data, including DNA extracted directly from environmental samples (Atanasov et al. 2021; Bauman et al. 2021).

In this context, the ocean is considered a promising but largely untapped frontier for biodiscovery (Sigwart et al. 2021). Marine organisms have evolved over millions of years to adapt to extreme conditions of temperature, salinity, light, pressure and water flow (Beraldi-Campesi 2013). These conditions as well as a far longer evolutionary history have contributed to significantly greater taxonomic and functional diversity in marine habitats than in other biomes (Román-Palacios, Moraga-López, and Wiens 2022). Nearly one million eukaryotic species are believed to inhabit the ocean (Appeltans et al. 2012) and the number of archaea and bacteria may be a thousand times higher (Hernández-Ledesma and Herrero, 2013), yet the majority remains undescribed by science.

Despite these knowledge gaps, marine biotechnology – the use of marine organisms and their compounds for wide range of applications in industrial sectors – has managed to distinguish itself from the broader biotechnology landscape. For instance, while nearly half of the approved pharmaceuticals are based on biological compounds produced by living organisms, success rates are 2–4 times higher for compounds from marine organisms (Sigwart et al. 2021; Gerwick & Moore, 2012). Annual sales and licensing revenues from marine drugs have exceeded USD 1 billion annually since 2011 (Blasiak et al. 2022) and prospects for greater commercial growth are substantial: in 2020 alone, more than 1400 new compounds were isolated from marine species (Carroll et al. 2023). Biomolecules extracted from marine bacteria, and other products developed from sequences of larger marine organisms are widely used in food production, diagnostics, bioremediation, and disease treatment (Blasiak et al. 2023). Some notable examples include the discovery of a thermostable enzyme required for the production of lactose-free milk in Archaea Pyrococcus furiosus (Li et al. 2013), seawater cyanobacteria toxins developed into anticancer treatment products (Aesoy and Herfindal. 2022), and the extensive use of green fluorescent protein found in jellyfish Aequorea victoria (Chalfie et al. 1994) as a molecular marker, both in medical and diagnostic contexts and fundamental research.

Establishing a regulatory landscape that keeps pace with rapid advances in biotechnology, while also promoting transparency and equitable access and benefit sharing mechanisms has proven challenging (Wynberg and Laird 2018). The adoption of the Convention of Biological Diversity in 1993 was a crucial milestone, as it defined genetic resources as “any material of marine plant, animal, microbial or other origin containing functional units of heredity of actual or potential value”, and established the fair and equitable sharing of benefits from their use as one of the Convention’s three core objectives (https://www.cbd.int/convention/text/). In 2014, the Convention’s Nagoya Protocol provided a framework to regulate the access and benefit sharing of marine genetic resources (MGR) sampled in national jurisdictions (https://www.cbd.int/abs/text/). Yet some two-thirds of the ocean lies beyond national jurisdiction, and it was not until 2023, following protracted negotiations that the “High Seas Treaty” was agreed upon, including provisions to address MGR from areas beyond national jurisdiction (ABNJ) (https://documents-dds-ny.un.org/doc/UNDOC/GEN/N23/177/28/PDF/N2317728.pdf).

Despite these encouraging developments, the actual and potential value of MGR for marine bioprospecting remains poorly understood. Studies have focused on counting the number of (marine) species referenced in patent documents (Oldham 2014) or included in GenBank (Scholz et al. 2021), while others have focused on developing and analyzing a database of sequences included in patents issued through international patent applications ( Arnaud-Haond et al. 2011; Arrieta et al. 2010; Blasiak et al. 2018; Blasiak et al. 2019). Finally, natural product discovery reports (Katz and Baltz 2016; Jaspars et al. 2016) cover the whole range of biological compounds derived from living organisms of various kinds and their potential applications. A common aspect to all these studies, however, is their lack of focus on the specifics of commercial value, and limited information in patent and GenBank records about the geographical origin of gene sequences, which in many cases are referenced without naming the source species. The unevenness of these data presents a challenge for interpreting the scale, scope and trajectory of marine bioprospecting.

Here we address this gap by creating a novel database that includes all genetic sequences and corresponding patent applications between 1989–2022 that are related to marine bioprospecting. In addition to systematically compiling and presenting key data about the sequences, coded proteins, date of deposition and patent holders, we also address significant data gaps by developing and applying a BlastX sequence similarity model to consider sequences from unnamed species. We reason that addressing sequence composition patents involving MGR is essential for market assessment, and the global governance of MGRs discussed in relation to the new High Seas treaty aiming to ensure sustainable use of marine biodiversity beyond national jurisdiction. We also assess the biodiversity data of species currently considered unique to ABNJ, and highlight the special importance of deep-sea conservation for future biotechnology focused on the innovation and development of naturally-derived products.

Our analysis of patent filings revealed 29,065 nucleotide sequences from 1,467 marine species across 3,635 unique patents, representing approximately 1% of all gene patents submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Many patents referenced multiple sequences, with a majority including both marine and non-marine sequences (Fig. 1A). Overall, marine sequences and species represented only 16% and 15%, respectively, of all sequences and species identified within the 3,635 patents (Fig. 1B, 1C). As a point of comparison, approximately 242,000 marine species have been described to date (WoRMS 2022), corresponding to roughly 10% of the 2.1 million species described by science (https://nc.iucnredlist.org/redlist/content/attachment_files/2022-2_RL_Stats_Table_1a.pdf), suggesting considerable untapped potential of marine bioprospecting. (Fig. 1B, Table S2)

Type of sequences in marine gene patents

The gene patent applicants who referenced the highest number of unique genetic sequences included both protein-coding and non-coding sequences that have uneven potential for natural product discovery(Fig. 2; see Methods: MAPBAT construction for definitions). Most of the companies with a large number of applications referenced protein-coding genes that originate from multiple species, with an average length between 500 and 2000 nucleotides. Some applicants specifically focused on MGRs from a single species, and predominately referenced non-coding sequences. For instance, the Fisheries Research Agency of the National Research and Development Agency in Japan included 1,179 sequences in their patent applications, most of which originated from Japanese eel (Anguilla japonica), yet only 127 are protein-coding sequences. Similarly, The Japan Science and Technology Agency, a government agency of Japan, has referenced 5,190 sequences from the Sea Vase tunicate (Ciona intestinalis), only 150 of which are protein-coding genes.

The majority of same-length short non-coding sequences claimed to originate from the same species have a very wide range of GC content,the percentage of two DNA basic building blocks, which is typical for artificially modified sequences used in amplification or as probes for detecting specific sequences of DNA or RNA (Figure S2). Out of all the patents that include at least one sequence from marine species, 71% contain nucleotide sequences that can potentially encode protein-coding genes which implies that most MGR are used in bioprospecting (Fig. 3A). .

MArine Bioprospecting PATent (MABPAT) Database

While INSDC records provide considerable insight into the genes referenced in patents, only 37.3% of records include the name of source species (primarily filed under the Patent Cooperation Treaty (WIPO), the European Patent Office, the Patent Office of Japan and the Korean Intellectual Property Office). Most of the remaining records (62.7%) are from the United States Patent and Trademark Office, which does not share species name in its records (Figure S3).

Box 1. What is worth extra protection?

Global actors often seek patent protection for their inventions in multiple countries. Filing fees for patent applications extend into the thousands of USD per application and the inventor still has to pay additional fees for each filing (Mehta, Tidwell, and Liotta 2017). For instance, the average cost of filing in the US, including attorney’s fees, has been estimated at around USD 50,000 (https://blueironip.com/how-much-does-a-patent-cost/). Therefore, it is more likely that protection will be sought for highly promising products, methods, or associated biotechnological processes. We identified thirteen patent filings submitted to all national patent bureaus that include identical nucleotide sequences (Table S3). For each claim, we collected the description of the invention, and the protein function if the nucleotide sequence search resulted in a protein-coding gene with annotated function. The scope of patented commercial biotechnological applications is wide, and it usually consists of transferring specific enzymes to cell metabolic pathways to maximize the production of a specific compound. Examples include applications of biotechnology in medicine (enzymes used for skin care), food industry (enzyme used in baking and dairy products), agriculture (production of herbicide-tolerant transgenic crops), industrial production (metal nanomaterials used in products such as creams, shampoos, clothing, footwear, and plastic containers), and in the production of biofuel (isobutanol production and hydrocarbon biosynthesis).

One sequence originating from the methanogenic marine archaea species Methanococcus maripaludis has been the subject of a series of lawsuits between Butamax (now a subsidiary of International Flavors & Fragrances; IFF) and Gevo Inc, a transnational biofuels company. These companies clashed over the production of isobutanol instead of ethanol using yeast, and the enzyme found in M. maripaludis was essential for isobutanol production. After many years of patent wars, the dispute was finally settled through the splitting of the parties' licenses in all fields of isobutanol production (https://biomassmagazine.com/articles/12339/butamax-gevo-settle-patent-dispute). In 2021 all Butamax-owned patents were completely acquired by Gevo (https://finance.yahoo.com/news/gevo-acquires-butamax-patent-estate-130000249.html).

To address this gap, we developed a sequence similarity model (see Methods: BlastX sequence similarity model) and BlastX search tool to query all genetic sequences with unknown origins against the UniProtKB/Swiss-Prot protein sequence database. This model retrieved an additional 5,609 sequences, which can be said with a high degree of certainty to originate from marine organisms. Together with the 20,073 sequences of confirmed marine species, this resulted in a novel database of 25,682 sequences, which we used to construct the MArine Bioprospecting PATent (MABPAT) Database, which is the basis for all subsequent analysis in this paper.

Key actors in marine biotechnology

We found that one hundred entities accounted for 68% of all patents that contain protein-coding sequences (bioprospecting patents). (Fig. 3A; Supplementary Table 4). The remaining 32% are associated with applicants who filed fewer than two patents on average. Based on patent/sequence ratios (see Methods: Patent share estimation), we found that for companies in the top 100 (Table S4), the total number of patent applications would have been underestimated by at least one-third if we had not applied the sequence similarity model. Transnational corporations (1,682 applications) are dominant, although roughly one-fifth of filings are from research institutes and their commercialization centers (667 applications) (Fig. 3B). A total of 78% of all bioprospecting patents were submitted by actors headquartered in the US, Germany or Japan (Fig. 3C).

The number of patents registered by each company is correlated with the total count of unique species included in such patents (r = 0.92, p-value < 0.0001). To illustrate how much biological diversity each of these applicants is drawing upon, we connected patent holders and unique species included in patent claims and aggregated on the domain (Fig. 4A) and phylum level of biological taxonomy (Figure S7). For each flow diagram, we also added information if the corresponding marine species had been observed in a deep-sea environment. The most active users of marine genetic resources are primarily dependent on sequences from bacteria and archaea (Fig. 4A). Deep-sea marine species have attracted interest from all 10 of the largest users of marine genetic resources.

The ten largest actors collectively registered more than two-thirds of all patents in the top 100. These included nine multinational corporations: Bayer and BASF (both headquartered in Germany), Du Pont, IFF, and Chevron (USA), DSM (Netherlands), Takara Holdings Inc, Kao Corporation, and Ajinomoto (Japan). One public research body is also among the top 10, namely the National Institute of Advanced Industrial Science and Technology (AIST) in Japan.

The opacity of marine bioprospecting in ABNJ

Issues of access and benefit sharing related to genetic material from ABNJ are of particular interest as they fall outside the scope of the Nagoya Protocol of the Convention on Biological Diversity and were at the core of negotiations for the “High Seas Treaty” adopted in June 2023. It is therefore notable that among 1,467 species of marine origin referenced in INSDC patent records, only 5 species have been observed in ABNJ, none of them exclusively. Our analysis of species observation data available in the Ocean Biodiversity Information System, a global open-access database on marine biodiversity (https://obis.org; OBIS) identified 5,889 species found exclusively in ABNJ, predominantly from the Arthropoda, Foraminifera, and Nematoda phyla. The complete taxonomic distribution is given in Figure S8. Based on the World Register of Deep-Sea Species (WoRDSS) records, 39% of BBNJ-specific species have been found in deep-sea environments in contrast to only 15% of all species listed in World Register of Marine Species (WoRMS) (Fig. 4B). The spatial distribution of BBNJ-specific species is predominantly in the sub-Antarctic and Antarctic latitudes (Figure S9). The full list of BBNJ-specific species is available on GitHub: https://github.com/zhivkoplias/mgr.

Box 2. Microbial diversity in the global ocean: a sea of unknowns. Microbial species from all domains of life account for more than 95% of the total marine biomass and play a pivotal role in marine ecosystems functioning as a foundational level of food webs, climate regulation elements, and the backbone of immense genetic diversity (Abida et al, 2013). However, as of November 2022, the OBIS database contains only 499 bacteria (https://obis.org/taxon/6) and 7 archaeal species (https://obis.org/taxon/8). The total diversity of microbial marine species, including those uniquely present in some habitats located within BBNJ, is highly underrepresented. Similarly, the WoRDSS list contains almost no Bacteria (18 species) and Archaea (3 species; https://www.marinespecies.org/deepsea/aphia.php?p=browser) as both databases use WoRMS taxonomy as a backbone, and INSDC/NCBI Taxonomy is not supported as one of the data sources in OBIS.

Thanks to the continuous effort of TARA Oceans, Malaspina, BioGEOTRACES, and other projects contributing toward a deeper understanding of microbial diversity, more data has become available to project world map microbial distribution in the global ocean (Delmont et al. 2022; Paoli et al. 2022). Some regional diversity studies, including a study in Clarion Clipperton Fracture Zone, have already demonstrated that the molecular diversity of deep-sea species is equivalent to the one found in coral reef ecosystems (Rogers et al. 2022). Ocean datasets are currently missing a global map of microbial functions (Tara Ocean Foundation et al. 2022). As most marine species listed in patent claims are of microbial origin, our understanding of how many of them are uniquely present in BBNJ, is far from complete.

The majority of the ocean is beyond national jurisdiction, which accounts for 64% of the its surface area, and 95% of its volume (GEF, 2021). Once thought to be largely devoid of life, species are found throughout deep-sea habitats and the water column. While many of these species are thought to be considerably cosmopolitan, hotspots of endemism are found throughout the deep-sea, perhaps most strikingly around hydrothermal vent systems (Van Dover et al. 2018). Only 23.4% of the ocean floor has been mapped at high resolution (https://www.gebco.net/news_and_media/gebco_2022_grid_release.html) and a much smaller percentage has been explored (Amon et al. 2022), and recent years have seen the discovery of multiple previously unknown vent fields (Miyazaki J. et al., 2017; Makabe A. et al., 2017; https://www.nationalgeographic.com/adventure/article/hydrothermal-vents-discovered-azores-science-environment). According to geo-locations of active hydrothermal vents (721 in total), more than half (363) are located in ABNJ.

Marine biotechnology is mainly focused on species that serve as model organisms in basic research and as a backbone in genetic engineering, allowing the creation of new drugs, increasing the efficiency of biotechnological processes for food production and energy production, plant agriculture, or the invention of new materials (Khan et al. 2023; Joseph et al., 2020). Marine species currently represent a small, but important, share that is used as a source for natural product discovery (Jaspars et al. 2016; Sigwart et al. 2021). Unravelling the global scope of commercial activity that involves MGR is a crucial first step towards understanding the value that rests in the biological functions encoded in genetic sequences and pathways to fair and equitable sharing of benefits from its use.

Here we present the MABPAT database, a novel global catalogue of patent sequences derived from marine species over the last three decades. By bringing together multiple data sources on genetic sequences, marine species, and patent filers, the database presents information on species taxonomy as well as information about applicants, including the applicant's name, entity type, country of origin, year of submission, and patent application number. We also specify whether marine species referenced in patents have been observed in deep-sea habitats. To our knowledge, it is the first database combining in-depth information about patent filers and species used in marine bioprospecting. In doing so, the MAPBAT database not only fills an important research gap but also contributes to the transparency and interoperability of MGR use. By making it publicly available (link will be provided here upon manuscript acceptance), we hope to enable further research efforts to inform improved policymaking. The analysis that generated this database also resulted in three key insights that are addressed below.

Rapid technological advances and data governance

Scholars have suggested that the earliest form of a patent system can be traced back 2,500 years ago to ancient Greece, and that the first modern patent law dates back to the year 1474 (Adams J.N. 2019). Little surprise then that the patent system has struggled to keep pace with the rapid advances in genetics and genomics research of the past decades, as seen for instance in the considerable variation in ground rules for patenting genetic sequences across jurisdictions (Jefferson et al. 2013; Blasiak et al. 2018). Key developments over the last thirty years have focused on jurisdictional norms and compliance standards. In 1998, international applications introduced a mandatory data element for sequence description (“organism”), which aimed to indicate biological origin (Jefferson et al. 2015). Yet current international standards (https://www.wipo.int/export/sites/www/standards/en/pdf/03-26-01.pdf) still allow for the inclusion of custom organism names not listed in the Integrated Taxonomic Information System (https://www.itis.gov/), including “Unknown”, “Unidentified” and “Artificial sequence”. In November 2021, INSDC announced the new requirements for all incoming sequences to be introduced, to ensure correct origin disclosure for new submissions. However, new policies will not affect the 24.5 million sequences already stored in INSDC databases (https://www.insdc.org/news/spatio-temporal-annotation-policy-18-11-2021/).

The analysis of patents therefore often depends on either accepting considerable data gaps or developing methods to reconstruct missing data. In this study, for instance, 17.2 million sequences would have been excluded from the analysis due to the lack of species names (primarily from the USPTO, the largest repository of biological sequences and patents). Instead, our sequence similarity model allowed us to reasonably and more comprehensively estimate the patent shares across national states and actor types. This reconstruction allowed us to identify marine origin, focusing on molecular similarities of biological molecules instead of relying on disclosed species names, We we also able to confirm with the higher confidence a trend observed in previous work (Arnaud-Haound et al. 2018; Blasiak et al. 2018) that Japan, the United States, and Germany are the headquarters location for the world’s primary commercial users of MGR. The disproportionate importance of these three states suggests a corresponding responsibility to work towards innovative benefit-sharing and capacity-building mechanisms. These could include, for instance, the establishment of a multilateral fund for the equitable sharing of benefits between providers and users of digital sequence information (DSI), that has been agreed to be finalised at CBD COP16 (https://www.cbd.int/article/cop15-cbd-press-release-final-19dec2022).

Importance of microbial and deep-sea species for marine bioprospecting

Viruses, although known to be 15 times more abundant in the ocean than other microorganisms (Sánchez-Paz et al. 2014), have seen little commercial activity to date beyond a limited focus on those that affect commercial aquaculture production. However, the potential role of viruses in creating proteins of interest for marine bioprospecting could be bigger than we think. Viruses have shaped the majority of the genomes of Archaea and Bacteria via horizontal gene transfer (HGT), the exchange of genetic material between organisms that do not form parent-offspring relationships (Sobecky and Hazen 2009). Bacterial and archaeal species often live in symbiosis and exchange genes with microbial eukaryotes, protists, (Husnik et al. 2021), and together constitute the vast majority of organisms used in marine bioprospecting. Importantly, many archaeal and bacterial species used in bioprospecting live in deep-sea habitats, the majority of which are located in ABNJ. While none of the species found exclusively in ABNJ have become an object of commercial interest, ABNJ-specific species are 2.5 times more likely to live in the deep ocean than marine species in general.

Our analysis of the last three decades of global gene patents indicates that deep-sea species have become an important source for marine bioprospecting. All of the ten largest actors in marine bioprospecting are already using deep-sea species. As a result, there is a logic for benefit sharing from MGR utilization to flow into conservation projects aimed at protecting at-risk deep-sea habitats (Cordes and Levin 2018), not least as a vital source for future biotechnology focused on innovation and development of naturally-derived products. More advanced biodiversity models that put emphasis on safeguarding entire communities with unique functional roles, including microbial species, should also be better integrated into conservation plans (Pollock et al. 2020).

With the successful conclusion of the High Seas Treaty and the recognition of DSI in the legally-binding BBNJ agreement, MGR used for bioprospecting and product discovery opens a new opportunity to protect biodiversity in deep-sea habitats. However, the INSDC database, the largest data repository of DSI up to date, is currently missing from the biodiversity informatics landscape (Bingham et al. 2017; Corrales et al. 2023), therefore genetic diversity and the information on the spatial origin of genetic information is not available on a global level. Adoption of the principles of Open and Responsible Data Governance and the development of MGR data repositories (Oldham, Chiarolla, and Thambisetty 2023) will be a necessary step to overcome the lack of information on MGR in ABNJ.

Multi-stakeholder collaboration in MGR protection

Analysis of bioprospecting patents yielded an asymmetrical distribution of patent registrations, consistent with previous findings (Blasiak et al. 2018, Arnaud-Haond et al. 2011). The sector is dominated by transnational corporations, which have a higher capacity to undertake genomic research. One-third of all patents were held by the ten largest actors, eight of which are large multinational corporations, and none of which conduct marine research themselves but instead rely on public gene databases for sequences with potential commercial applications. While many multi-national pharmaceutical companies have marine biology departments (Trevisanut & Bonfanti 2011), their total share of bioprospecting patents is modest (Table S4). Still, a fair estimate of corporate engagement in marine species discovery is hard to calculate. Marine scientists who study microbial diversity often engage in collaboration with the oil and gas industry for the collection of samples in deep-sea oil wells (Alexander et al., 2022; Franco et al., 2020; Lanzén et al., 2016). With the rising popularity of using remotely operating vehicles (ROVs) for the inspection and maintenance of offshore oil and gas development sites, it is likely that more science-industry partnerships will emerge to support collection of biological data in deep-sea (McLean et al. 2020).

The disproportionate role of a small number of actors also suggests the potential for science-industry collaboration in the spirit of previous efforts with so-called “keystone actors”, which consists in engaging the largest companies in a given sector to enable transformative change (Österblom et al. 2020; Österblom et al. 2022; https://www.weforum.org/ocean-100-dialogues). Voluntary non-binding partnerships such as the Deep Seas Project (https://www.deep-seas.eu), the Common Oceans ABNJ Project (https://www.fao.org/3/CA2245EN/ca2245en.pdf), OSPAR, NEAFC, and Sargasso Sea Commission (https://www.prog-ocean.org/wp-content/uploads/2019/03/STRONG-HS_Lessons-Learnt-Report.pdf) have already made a significant impact on sustainable management in ABNJ by addressing challenges related to illegal, unreported, and unregulated fishing, seabed mining, or pollution, based on integrated and holistic approaches. Regional Environmental Management Plans (REMPs) established by the International Seabed Authority (ISA) could have a great potential for deep-sea biodiversity protection, and prevention of pollution and damage to habitats on regional and sub-regional levels (Christiansen et al. 2022) if the development of REMPs would take a more coordinated approach in accordance with overarching environmental goals (Amon et al. 2022). Such initiatives can foster cross-sectoral dialogue and capacity-building activities that improve the capacity of national governments and local communities to engage in sustainable resource use in ABNJ. While the conclusion of the High Seas Treaty has laid the foundation for improved management in ABNJ, its full implementation is a remote prospect and, in the meantime, voluntary collaborative efforts based on the best available science can help inform future binding mechanisms to ensure conservation and sustainable use. By filling the crucial knowledge gap in understanding the potential of MGR, the MABPAT database represents a first step in that direction.

Acknowledgments

We thank Daria Khvostovetc for providing valuable consultancy in The Patent Cooperation Treaty and European Patent Convention. EZ, AP, and RB are funded by FORMAS, project number 2020-01048. AP is also funded by FORMAS, project number 2019-01220. JBJ is funded by the Knut and Alice Wallenberg Foundation (2021.0343).

Blasiak, R., et al. A forgotten element of the blue economy: marine biomimetics and inspiration from the deep sea. PNAS Nexus, 1(4), pgac196 (2022).
Glover, A. G., et al. An End-to-End DNA Taxonomy Methodology for Benthic Biodiversity Survey in the Clarion-Clipperton Zone, Central Pacific Abyss. J. Mar. Sci. Eng, 4(1), 2 (2015).
Hournan, P. C. H., Hertog, M. G. L., and Katanc, M. B. Analysis and health effects of flavonoids. Food Chemistry, 57(1), 43–46 (1996).
Delmont, T. O., et al. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics, 2(5), 100123 (2022).
Appeltans, W., et al. The Magnitude of Global Marine Species Diversity. Current Biology, 22(23), 2189–2202 (2012).
Husnik, F. et al. Bacterial and archaeal symbioses with protists. Current Biology, 31, 862–877 (2021).
Hernández-Ledesma, B., Herrero, M. Bioactive compounds from marine foods: plant and animal sources. John Wiley, Chicago. (2013).
Corrales, C., Luciano, S. & Astrin, J. J. Biodiversity biobanks: a landscape analysis. https://preprints.arphahub.com/article/103105/download/pdf/ (2023).
Blasiak, R., et al. Corporate control and global governance of marine genetic resources. Science Advances, 4, (2018).
Mehta, A., et al. Chapter 24 - Cyanobacteria: a potential source of anticancer drugs. Advances in Cyanobacterial Biology, Academic Press (2020).
Beraldi-Campesi, H., Early life on land and the first terrestrial ecosystems. Ecol Process, 2(1) (2013).
McLean, D. L. et al. Enhancing the Scientific Value of Industry Remotely Operated Vehicles (ROVs) in Our Oceans. Frontiers in Marine Science, 7, (2020).
Chalfie, M., et al. Green Fluorescent Protein as a Marker for Gene Expression. Science, 263, 802–805 (1994).
Gogarten, M.B, Gogarten J.P., and Olendzenski, L. Horizontal Gene Transfer: Genomes in Flux. Humana Press (2009)
Blasiak, R. et al. Making marine biotechnology work for people and nature. Nat Ecol Evol, 7, 482–485 (2023)
Arnaud-Haond, S., Arrieta, J. M., and Duarte, C. M. Marine Biodiversity and Gene Patents. Science, 331,1521–1522 (2011).
Haward, M. G., and Rogers, A. D. Marine Genetic Resources in Areas Beyond National Jurisdiction: Promoting Marine Scientific Research and Enabling Equitable Benefit Sharing. Front Mar Sci, 8, (2021).
Carroll, A. R., et al. Marine natural products. Nat Prod Rep, 38, 362–413 (2021).
Sánchez-Paz, A., et al. Marine Viruses: the Beneficial Side of a Threat. Appl Biochem Biotechnol, 174(7), 2368–2379 (2014)
Scholz, A. H. et al. Myth-busting the provider-user relationship for digital sequence information. GigaScience, 10(12), 1–8 (2021).
Katz, L. & Baltz, R. H. Natural product discovery: past, present, and future. Journal of Industrial Microbiology & Biotechnology, 43, 155–176 (2016).
Bauman, K. D., et al. Genome mining methods to discover bioactive natural products. Nat Prod Rep, 38(11), 2100–2129 (2021).
Sigwart, J. D., et al. Unlocking the potential of marine biodiscovery. Nat Prod Rep, 38, 1235–1242 (2021).
Cragg, G. M., and Pezzuto, M. Natural Products as a Vital Source for the Discovery of Cancer Chemotherapeutic and Chemopreventive Agents. Med Princ Pract, 25, 41–59 (2016).
Atanasov A. G., et al. Natural products in drug discovery: advances and opportunities. Nat Rev Drug Discov, 20(3), 200–216 (2021).
Jefferson, O. A., et al. Transparency tools in gene patenting for informing policy and practice. Nat Biotechnol, 31(12), 1086–93 (2013).
Oldham, P., Hall, S., and Barnes, C. Patent Landscape Report on Animal Genetic Resources. Report for WIPO (2014).
Shah, Z., et al. Podophyllotoxin: History, Recent Advances and Future Prospects. Biomolecules, 11(4), 603 (2021).
Li, B. et al. Preparation of lactose-free pasteurized milk with a recombinant thermostable β -glucosidase from Pyrococcus furiosus. BMC Biotechnology,13, 73 (2013).
Tara Ocean Foundation, T. O. & Oceans, T. Priorities for ocean microbiome research. Nature Microbiology, 7, 937–947 (2022).
Iii, J. O. F. et al. Proliferation of Antibiotic-Producing Bacteria and Concomitant Antibiotic Production as the Basis for the Antibiotic Activity of Jordan’ s Red Soils. Appl Environ Microbiol, 75(9), 2735–2741 (2009).
Pollock, L. J. et al. Protecting Biodiversity (in All Its Complexity): New Models and Methods. Trends in Ecology and Evolution, 35, 1119–1128 (2020).
Jefferson, O. A., et al. Public disclosure of biological sequences in global patent practice. World Patent Information, 43, 12–24 (2015).
Achan, J. et al. Quinine, an old anti-malarial drug in a modern world : role in the treatment of malaria. Malar J, 10, 144 (2011).
Blasiak, R., et al. Scientists Should Disclose Origin in Marine Gene Patents. Trends in Ecology and Evolution, 34, 392–395 (2019).
Oldham, P., and Kindness, J. Sharing Digital Sequence Information. Study for the European Commission. doi:10.5281/zenodo.6557191 (2022).
Bingham, H., et al. The Biodiversity Informatics Landscape: Elements, Connections and Opportunities. Research Ideas and Outcomes, 3, e14059 (2017).
Centre, M. B., et al. The marine biodiscovery pipeline and ocean medicines of tomorrow. Journal of the Marine Biological Association of the United Kingdom, 96(1), 151–158 (2016).
Román-Palacios, C., Moraga-López, D., and Wiens, J. J. The origins of global biodiversity on land, sea and freshwater. Ecology Letters, 25(6), 1376–1386 (2022).
Christiansen, S., et al. Towards an Ecosystem Approach to Management in Areas Beyond National Jurisdiction: REMPs for Deep Seabed Mining and the Proposed BBNJ Instrument. Front Mar Sci, 9, 1–23 (2022).
Paoli, L. et al. Biosynthetic potential of the global ocean microbiome. Nature, 607, 111–118 (2022).
Wynberg, R., and Laird, S. A. Fast Science and Sluggish Policy: The Herculean Task of Regulating Biodiscovery. Trends in Biotechnology, 36, 1–3 (2018).
WoRMS Editorial Board (2023). World Register of Marine Species. Available from https://www.marinespecies.org at VLIZ. (Accessed 2022-11-15)
Jaspars, M., et al. The marine biodiscovery pipeline and ocean medicines of tomorrow. J. Mar Biol Ass, 96, 151–158 (2016).
Khan, I., et al. Marine Biotechnology: A Frontier for the Discovery of Nutraceuticals, Energy, and Its Role in Meeting Twenty-First Century Food Demands. Marine Biotechnology: Applications in Food, Drugs and Energy, Springer Nature Singapore, 1–22 (2023).
Joseph, I., and Augustine, A. Marine biotechnology for food. Genomics and Biotechnological Advances in Veterinary, Poultry, and Fisheries. Elsevier, 271–283 (2020).
Trevisanut, S., and Bonfanti, A. Intellectual Property Rights Beyond National Jurisdiction: Outlining a Regime for Patenting Products Based on Marine Genetic Resources of the Deep-Sea Bed and High Seas. SSRN Journal, https://ssrn.com/abstract=1861020 (2011).
Österblom, H., et al. Scientific mobilization of keystone actors for biosphere stewardship. Sci Rep, 12, 3802 (2022).
Amon, D. J. et al. Assessment of scientific gaps related to the effective environmental management of deep-seabed mining. Marine Policy, 138, 105006 (2022).
Abida, H., et al. Bioprospecting marine plankton. Mar Drugs,11, 4594–4611 (2013).
Adams, J. N. History of the patent system. Law, 2–26 (2019).
Van Dover, C. L., et al. Scientific rationale and international obligations for protection of active hydrothermal vent ecosystems from deep-sea mining. Marine Policy, 90, 20–28 (2018).
Gerwick, W. H., and Moore, B. S. Lessons from the Past and Charting the Future of Marine Natural Products Drug Discovery and Chemical Biology. Chemistry & Biology,19, 85–98 (2012).
Miyazaki, J., et al. Deepest and hottest hydrothermal activity in the Okinawa Trough: the Yokosuka site at Yaeyama Knoll. R Soc Open Sci, 4, 171570 (2017).
Makabe, A., et al. Geochemical and biological features of hydrothermal vent fields newly discovered in the Okinawa Trough. (2017) https://goldschmidtabstracts.info/2017/2540.pdf (accessed on 2023/06/25)
Cordes, E.E., and Levin, L.A. Exploration before exploitation. Science, 359 (6377), 719–719 (2018).
Arrieta, J.M., Arnaud-Haond, S., and Duarte, C.M. What lies underneath: conserving the oceans’ genetic resources. Proceedings of the National Academy of Sciences, 107(43), 18318–18324 (2010)
Alexander, J. B., et al. Complementary molecular and visual sampling of fish on oil and gas platforms provides superior biodiversity characterisation. Marine Environmental Research, 179, 105692 (2022).
Franco, N. R., et al. Bacterial Composition and Diversity in Deep-Sea Sediments from the Southern Colombian Caribbean Sea. Diversity, 13, 10 (2020).
Lanzén, A., et al. High-throughput metabarcoding of eukaryotic diversity for environmental monitoring of offshore oil-drilling activities. Mol Ecol, 25, 4392–4406 (2016).

Summary statistics of patents that include MGR

The GenBank patent division, the European Bioinformatics Institute database (EMBL-EBI) and the DNA DataBank of Japan (DDBJ) exchange their data daily and together form the International Nucleotide Sequence Database Collaboration (INSDC). Genetic sequences associated with patents were retrieved from the Patent division of GenBank from the National Center for Biotechnology Information (GenBank database) on 10 November 2022 (ftp.ncbi.nih.gov/genbank/); this included 24,600,503 annotated sequences. All files (from gbpat1.seq.gz to gbpat254.seq.gz) were downloaded and processed following the methodology of Arnaud-Haond, Arrieta, and Duarte 2011 to create database entries with information on the nucleotide sequence of DNA, species name, patent number, patent data, and the party registering the patent. This was done by splitting each file into individual sequences and by extracting the data in the ORIGIN field (nucleotide sequence), ORGANISM field (species name), JOURNAL field (patent application number, year of application, patent system, and patent applicant name) for each sequence. Unlike previous studies (e.g., Arnaud Haond et al. 2011, Blasiak et al. 2018) that restricted their analysis to sequences submitted in a given patent system, here we considered both patents submitted in national jurisdictions and those filed under the World Intellectual Property Organization’s Patent Cooperation Treaty (“international” patents).

As of November 2022, sequences from a total of 14,708 different species were included in the GenBank database. To determine the subset of “marine species” within the database, the taxon match tool of The World Register of Marine Species (WoRMS) was used for all database entries, resulting in a filtered list of 4,000 species. Web searches were conducted for each of these species to verify the marine origin and to collect further information about the nature of each species. More than half of the matched species were subsequently excluded as non-unique to marine environments, resulting in a final list of 1,474 marine species which was used to select patent records associated with marine species. See Blasiak et al. 2018 for details of marine origin determination and criteria for filtering.

The taxonomy (Domain and Phylum) of 879 marine species were retrieved from the WoRMS database. In cases where such taxonomic levels were not available, we obtained species taxonomy from NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy) and Wikipedia (https://en.wikipedia.org/wiki/) (220 and 356 species respectively). We did not succeed in matching 19 of the marine species (predominately, marine bacterial strains) into related taxonomic groups due to lack of certainty in organism names. The complete list of marine species selected for this study is given in Table S1.

MAPBAT construction

Marine biotechnology pipelines usually focus on the search for biological compounds that encode new functionality (Rotter et al., 2021). There are two types of nucleotide sequences encoded in DNA: protein-coding sequences and non-coding sequences. The latter could either have a functional or non-functional role in genome regulation, including DNA fragments that code for proteins involved in all cell functions. Except for short peptides like cone snail peptide toxins (Terlau and Olivera, 2004), most natural products are derived from proteins, which are polypeptide chains of a certain length. While identifying the shortest polypeptide chain length to form a protein is still controversial, it is currently estimated at between 50 (Woolfson, Baker, and Bartlett, 2017) to 100 (Brunet, Leblanc, and Roucou, 2020) amino acids or 150 to 300 DNA base pairs, respectively.

Another important metric widely used to analyse genome composition variation in molecular biology and genomics is nucleotide usage, which is normally calculated as GC-content – the percentage of certain nucleotide bases (guanine and cytosine) that form stronger chemical bonds in DNA strings. Modern genetic engineering techniques such as CRISPR (Zhang et al., 2014) have proven to be very useful at enhancing important functions of proteins by altering DNA makeup. This could involve changing individual nucleotides, or introducing short sequences that control gene regulation and protein synthesis. Hence, GC content for modified proteins with similar functionality remains the same. Short DNA sequences, below the shortest DNA length required for protein formation, have various functions, including in the amplification of a specific gene sequence (as PCR primers), and usually have a wide range of GC content.

To predict whether genetic sequences are protein-coding or not, we applied two filtering criteria: sequence length threshold and the presence of open reading frame (ORF) – a gene region that has the potential to be transcribed into RNA and after translated into proteins. Sequences with an ORF longer than 150 base pairs have been considered protein-coding sequences. As most natural products are derived from proteins, we reason that at least one protein-coding sequence has to be included in a patent application, in order to be related to marine bioprospecting. Following that, we selected 12,716 protein-coding together with 7,357 of non-protein sequences associated with marine species that have been submitted as a part of the same application.

For all companies that have registered patents associated with MGR, we counted the total number of nucleotide sequences, and calculated the average sequence length (Figure 2). Based on the shortest protein length estimation, the number of protein-coding or non-coding sequences for each company was identified. In each category, for the 10 companies with the highest counts of genetic sequences attached to patent claims, we calculated the length and DNA composition (GC-content) of each sequence, and colored by distinct species origin (Figure S2).

For each sequence that was included in patent applications submitted in national jurisdictions as well as “international” patents (sequences of special commercial interest), we collected the description of the invention and the protein function, if nucleotide sequence search (BlastX) resulted in a significant match to a protein with annotated function. Web searches were conducted for each of these proteins to collect further information about protein function and potential application. The resulting information about the sequences of special commercial interest is available in Table S3.

Patents owned by subsidiaries were replaced with ultimate owner names of controlled subsidiaries as stated in the Orbis company database, which contains information on around 400 million companies worldwide (Orbis; https://orbis.bvdinfo.com/). For jointly owned patents, the ownership was assigned to the first company on the list. After filtering and removing duplicate entity names, and aggregating subsidiaries, a total of 588 applicants were identified, and web searches were used to collect information about each, including the country where it is headquartered, and the type of entity that it represents. Our classification resulted in five major entity types: multi-national (presence in more than two countries) or national companies, universities and their commercialization centers, governmental agencies, and “other” (predominately, applications submitted by private individuals).

Each record in the resulting MABPAT database includes: (1) the genetic sequence data (2) whether the sequence contains protein-coding information (3) marine species name, (4) the date of deposition in the INSDC, (5) whether species can be classified as “deep-sea” species (6) patent application record number (7) patent bureau (8) applicant name, as well as (9) type of entity and (10) country where it is headquartered.

Deep-sea presence of marine species

The search for presence of species in deep-sea habitats was conducted based on multiple sources. For species in the Eukarya domain of life, we used the World Register of Deep-Sea species, a taxonomic database of deep-sea species (WoRDSS; Glover et al. 2022). As Bacteria and Archaea species are not present in WoRDSS, we used web-search based on the PubMed (https://pubmed.ncbi.nlm.nih.gov/) and Integrated Microbial Genomes and Microbiomes (https://img.jgi.doe.gov/) databases to establish their potential presence in deep-sea habitats. Samples of species collected from deep-sea environments that have already been found to be associated with international patent applications (Blasiak et al. 2018) are also marked as “deep-sea” species.

BlastX sequence similarity model

Sequence similarity models are widely used to identify newly sequenced data or unknown species (Pearson, 2013). We queried 7,467,396 sequences with unknown taxonomic origin (‘unknown’, ‘unidentified’, and ‘synthetic construct’ species tag) – 62.7% of all GenBank records – to conduct sequence similarity BlastX searches (translated nucleotide versus protein) against the database of annotated protein sequences (UniProtKB/Swiss-Prot; UniProt Consortium 2023). BlastX searches with the following set of search parameters – E-value: less or equal to 10^-5, query coverage: more or equal to 80%, hit identity: more or equal to 99% – verified that 24% could be identified to a genus level with at least 95% confidence (correct hit) (Figure S4A). We also tested whether correct hits and searches with confidence below 95% tend to be included in certain patent applications, patented by certain actors or in certain patent systems, and we did not find any major preference (Figure S4B-D). Finally, we compared summary statistics (number of sequences, number of patents, and median year of application) for top 10 largest patent applicants that referenced sequences with disclosed marine origin, and top 10 applicants that referenced sequences with predicted marine origin (Figure S1 and Figure S5 respectively), and found to be similar to each other.

Patent share estimation

The total number of sequences associated with marine species was estimated based on the number of records that included a species name with disclosed marine origin, and newly recovered records for each company. Predictions were made based on a linear regression model:

where

where X_observed is the number of unique genetic sequences referenced in patent applications with species name specified, X_recovered is the number of unique genetic sequences referenced in patent applications for recovered data, and FDL_TP>95%, the fraction of data loss given the chosen set of BlastX search parameters that yielded 95% positive recovery rate (Figure S4). Following parameterization of our model, we found that FDL_TP>95%, was 0.24.

To make a prediction on the total number of patent applications and species referenced, for each company we randomly selected X_recovered divided by FDL_TP>95%. to account for searches that did not result in correct hits. We then estimated how many unique patent application numbers and unique species were referenced in the total number of predicted sequences (X_predicted). To calculate the mean value for each company, the selection was repeated 100 times. The resulting average estimates are shown in Figure S6.

Hydrothermal vents presence and ABNJ-unique species counts

The geolocation of hydrothermal vents was collected from the InterRidge Vents Database (Beaulieu and Szafranski, 2022). The maritime boundaries map of World High Seas (World_High_Seas_v1_20200826/High_Seas_v1.shp) was downloaded from Marine Regions (https://marineregions.org/). Each set of hydrothermal vent coordinates was checked for presence within any of the High Seas polygons. Spatial vector data were analysed with the R package sf version 1.0-9 (Pebesma, 2018).

To establish the list of species uniquely present in ABNJ, we used species geographical abundance data from OBIS. We first retrieved all 28,375 species with at least one occurrence record in ABNJ (https://obis.org/area/1). For each ABNJ-present species, we checked if it was also observed in the territorial waters of any country. Species with at least one occurrence record were excluded. Data were obtained from the OBIS database (2022) using the R package robis version 2.11.0. (https://zenodo.org/record/6969395) and parallel version 3.6.2. (https://rdocumentation.org/packages/parallel/versions/3.6.2).

References for Methods

1. Arnaud-Haond, S., Arrieta, J. M., and Duarte, C. M. Marine Biodiversity and Gene Patents. Science, 331,1521-1522 (2011).

2. Blasiak, R., et al. Corporate control and global governance of marine genetic resources. Science Advances, 4, (2018).

3. Rotter, A., et al. The Essentials of Marine Biotechnology. Front Mar Sci, 8, (2021).

4. Terlau, H., and Olivera, B. M. Conus Venoms: A Rich Source of Novel Ion Channel-Targeted Peptides. Physiological Reviews, 84, 41–68 (2004).

5. Woolfson, D. N., et al. How do miniproteins fold? Science, 357, 133–134 (2017).

6. Brunet, M. A., Leblanc, S., and Roucou, X. Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs. Experimental Cell Research, 393, 112057 (2020).

7. Zhang, F., Wen, Y., and Guo, X. CRISPR/Cas9 for genome editing: progress, implications and challenges. Human Molecular Genetics, 23, 40–46 (2014).

8. The UniProt Consortium et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51, 523–531 (2023).

9. Glover, A.G., Higgs, N., and Horton, T. (2023). World Register of Deep-Sea species (WoRDSS). Accessed at https://www.marinespecies.org/deepsea on 2022-11-15. doi:10.14284/352

10. Jefferson, O. A., et al. Public disclosure of biological sequences in global patent practice. World Patent Information, 43, 12-24 (2015).

11. Beaulieu, S.E., and Szafranski, K. (2020) InterRidge Global Database of Active Submarine Hydrothermal Vent Fields, Version 3.4. World Wide Web electronic publication available from http://vents-data.interridge.org. Accessed 2022-11-15.

12. Pebesma, E. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10, 439 (2018).

13. Pearson, W. R. An Introduction to Sequence Similarity (“Homology”) Searching. Curr Protoc Bioinformatics, Chapter 3:3.1.1-3.1.8 (2013).

There is NO Competing Interest.

Download PDF

Journal Publication

published 08 Aug, 2024

Read the published version in Nature Sustainability →

Version 1

posted

You are reading this latest preprint version

Novel database reveals growing prominence of deep-sea life for marine bioprospecting

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Type of sequences in marine gene patents

MArine Bioprospecting PATent (MABPAT) Database

Key actors in marine biotechnology

The opacity of marine bioprospecting in ABNJ

Discussion

Declarations

References

Methods

Summary statistics of patents that include MGR

MAPBAT construction

Deep-sea presence of marine species

BlastX sequence similarity model

Patent share estimation

Hydrothermal vents presence and ABNJ-unique species counts

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1