Global live vertebrate trade database (GLVTD)
Extracting data on live wildlife trade from databases. We extracted data on the trade in live wildlife from the CITES Trade Database, International Species Information System (ISIS) and LEMIS metadata. The CITES Trade Database (https://trade.cites.org/, last visited on 1 August 2022) is developed and maintained by the UNDP World Conservation Monitoring Centre on behalf of the CITES Secretariat. This database includes more than 25 million entries on records of international trade in CITES–listed species reported by CITES parties (1975–2021). The database covers data on both legal trade and illegal trade (seized data). ISIS is a network of 837 zoos and aquaria that shares information about 2.5 million animals of more than 10,000 species among member institutions 28. The ISIS Database compiled by Conde et al. 28 holds the most comprehensive information on animals kept in the zoos across the world in 2011. LEMIS metadata is based on The United States Fish and Wildlife Service’s (USFWS) Law Enforcement Management Information System (LEMIS) data (2000–2014) derived from legally mandated reports submitted to USFWS, containing 5,207,420 entries on US imports of both live organisms and wildlife (animal and plant) products. The LEMIS data were curated, cleaned, and compiled as the LEMIS metadata for improving data usability by the EcoHealth Alliance29. We obtained data on scientific name, class, family, import country, export country, and year for each transaction from the CITES Trade Database (version 2022.1) between 1975 to 2021 (term “live”) and LEMIS between 2000–2014 (term “LIV”). Not all animals in zoos are sourced from trade, and threatened species (categorized as vulnerable, endangered, or critically endangered) in zoos are being bred for ex-situ conservation or conservation campaign 28. We therefore collected data on scientific name, class, family of animals kept in zoos and countries from ISIS but excluding threatened species. We performed quality control of data by excluding records with duplicated lines or the same importer and exporter countries, and those with no scientific name, or unidentified and hybrid species.
Data on contemporary online trade of wildlife. We searched the websites of live wildlife trade for pets and other uses, and crawled data on listings (advertisements and posts) in websites. We built keys for species names and extracted information of species names and countries from crawled data.
Searching for the websites of live wildlife trade. We searched for the websites of live wildlife trade on Google using search phases “taxon (each group of mammals, birds, reptiles and amphibians) + for sale + country name” for each of 193 countries from March to May 2022 (Table S8-11). We consistently performed the search for all countries using the phases in English. We additionally searched websites in other languages for each country (up to three, by Google Translation) based on widely spoken languages (official or national languages) (quickgs.com). In total, we used 1414 phases in 69 languages for the searches (Table S8-S11). We browsed each website in URLs returned by a search phase (in English) in 10 randomly selected countries in Europe and Asia to choose a cutoff point that balances the quality of search results with search effort 30. These browses revealed that when 20 consecutive websites in a list of returned URLs did not show listings (advertisements or posts) of exotic pets or live wildlife, additional browsing was unlikely to find other relevant websites in the rest of the list. We therefore used this cutoff point in all searches. We browsed 95,965 websites across 193 countries in total, and identified 1463 websites of live wildlife trade in 177 countries. These websites used 47 languages, though approximately 55% (799) were made in English (Table S12), while 44% used other languages, and 1% a mix of two languages.
Scraping online data. We scraped and extracted data on title, contents, scientific name and price of pets, locality (city in a country), date of listings posted, and URLs, for all pages stocked in a website in Jun-August 2022 using Web Scraper on the Chrome browser (https://www.webscraper.io/) in Jun-August 2022. The Web Scraper is a web scraping tool with many advanced features to get exact information from websites. It can perform data scraping from multiple pages, multiple data extraction types (text, images, URLs, and more), scraping data from dynamic pages (JavaScript + AJAX, infinite scroll), browsing scraped data and other functions. We created a sitemap for each website to be crawled and pasted the URL root (webpage 1) of a website for this sitemap in Start URL. We then created a loop through the web pages by repeatedly going to the next page for the scraper by establishing a new column for this function. We clicked on ‘Add new selector’; under root window, we input a name for the column in ID box, selecting ‘Pagination (Beta)’ in the Type box. We clicked on ‘Select’ in the Selectors box and then on Paging button (Next or 2) in the webpage. We selected both root and name of this column in the Parents selector box. and saved these settings by finishing pagination settings. We gave a name for the column of listings, and selected ‘element’ in the Type box (for websites with scrolling listings, selected ‘Element Scroll Down’), and clicked on ‘Select’ in the Selectors box and then on two listings in the webpage (the scraper could automatically select others with same structure). We checked if all listings are selected (in red) by clicking on the Element preview button. For some listings not selected due to different structures, we additionally clicked on these listings. We then saved the settings by finishing the selection of listings. We performed data craping as following:
Cycle. For websites with pages of listings containing all data to be crawled, we simply input a name of a phase to be crawled in ID box, selected ‘text’ in the Type box, clicked on ‘Select’ in the Selectors box, and selected the phase in a listing in the webpage, and saved the settings.
Crawls. For websites with pages (cycle or not) showing parts of information and other information contained in different levels of subordinate linked pages, we selected a name for the phase linked to the information in ID, then selected ‘link” in the Type box and selected the phase in the webpage. The name for this phrase will show in the Parents box. In root window, we clicked on the name, which show the linked page in the webpage, and we set new name in ID and selected a phase to be crawled. For the deeper links in a website, we used the procedure as above.
We clicked on the sitemap file in the Toolbar after all settings finished, and then on “Scrape” to open a configuration table, then on ‘Start Scraping’ by default setting (Request interval and Page load delay (2000 ms)) to run the program. We downloaded the sitemap in XLSX file once the program was done. We crawled all websites of wildlife trade except websites displaying listings in PDF. In this case, we directly downloaded PDF files and transferred them into the text.
Data on keys. We gathered keys from different databases. We downloaded data on scientific names, synonyms and common names in different languages for mammals, birds, reptiles and amphibians from the IUCN Red List and taxonomic websites (mammaldiversity.org; avibase.bsc-eoc.org/; reptile-database.reptarium.cz/; amphibiaweb.org/, last visited on 17 Sep. 2022). We also obtained trade names of species in English, French and Spanish from the CITES Trade Database (2022V.1), and specific names of species in English from LEMIS metadata. In total, we obtained 484,470 species names, including 47,041 names for mammals, 304,246 names for birds, 93,401 names for reptiles and 39,782 names for amphibians.
Extracting species names from crawled data. We extracted string keys for species names from titles, contents or scientific names in the data crawled using the formula of Lookup function combined Find function in Excel 2016 as follows 31:
Formula = LOOKUP(1,0/FIND($X$i:$X$j,Yi),$X$i:$X$j)(1)
Where X is the column of the keys that we want to look up, with i and j indicating the range where keys are located in rows. The column X was sorted in ascending order based on the number of characters contained in a string using the Len function. Y identifies the columns including titles, contents or scientific names where we searched for keys. As the Find function is case sensitive, we transformed keys and crawled data (titles, contents or scientific names) into lowercase using the Lower function before extraction. We matched the extracted keys with the scientific names in the key database using the vlookup function:
Formula = VLOOKUP(xi, y:z,2,0) (2)
Where X is the column returning keys, Y is columns contained synonyms, or common names, traded names, or specific names, and z is the column with corresponding scientific names.
Publications on historical online trade and physical markets. We intensively searched on Google Scholar or Baidu Scholar for publications using the search phases (taxa name + for sale + country) based on the method of website searching above. We reviewed the title and abstract of each publication searched and excluded studies solely on data from the CITES Trade Database, LEMIS and ISIS. We downloaded 110 publications in total (Table S13), including studies on online trade, physical stores or markets, zoos, those on both online trade and physical markets, and on databases of wildlife trade 14. These studies included surveys on legal or illegal wildlife trade, or both. We extracted records of the species names and countries involved in live wildlife trade from these publications.