Samples. Fecal samples from 11 tortoises belonging to 8 genera and 9 species were obtained between November 2020 and March 2022 (Table S1). All samples originated from animals kept at the Oklahoma City Zoo (Oklahoma City, Oklahoma, USA), except one Sulcate (African Spurred) tortoise sample (Centrochelys sulcate), which was obtained from a local farm near Walters, OK, USA (34°28'43.3"N 98°13'33.0"W). Specimen collection from wild tortoise populations is exceedingly difficult, since many of the sampled tortoises spp. are critically endangered, e.g., ploughshare tortoise (36), and/or have a very limited geographic range (Table S1). Freshly deposited samples were placed in 15- or 50-mL conical centrifuge tubes and transferred on ice to the laboratory, where they were stored at -20ºC.
Amplicon-based diversity surveys. DNA extraction from fecal samples was conducted using DNeasy Plant Pro Kit (Qiagen Corp., Germantown, MD, USA) according to the manufacturer’s instructions and as previously described (27). PCR amplification targeting the D2 region of the LSU rRNA utilized the DreamTaq Green PCR Master Mix (ThermoFisher, Waltham, Massachusetts), and AGF-specific primers AGF-LSU-EnvS For: 5’-GCGTTTRRCACCASTGTTGTT-3’, and AGF-LSU-EnvS Rev: 5’-GTCAACATCCTAAGYGTAGGTA-3’ (37). The primers target a ~ 370 bp region of the LSU rRNA gene (corresponding to the D2 domain), hence allowing for high throughput sequencing using the Illumina MiSeq platform. Primers were modified to include the Illumina overhang adaptors. PCR reactions contained 2 µl of DNA, 25 µl of the DreamTaq 2X Master Mix (Life Technologies, Carlsbad, California, USA), 2 µl of each primer (10 µM) in a 50 µl reaction mix. The PCR protocol consisted of an initial denaturation for 5 min at 95°C followed by 40 cycles of denaturation at 95°C for 1 min, annealing at 55°C for 1 min and elongation at 72°C for 1 min, and a final extension of 72°C for 10 min. PCR products were individually cleaned to remove unannealed primers using PureLink® gel extraction kit (Life Technologies), and the clean product was used in a second PCR reaction to attach the dual indices and Illumina sequencing adapters using Nexterra XT index kit v2 (Illumina Inc., San Diego, California). These second PCR products were then cleaned using PureLink® gel extraction kit (Life Technologies, Carlsbad, California), individually quantified using Qubit® (Life Technologies, Carlsbad, California), and pooled using the Illumina library pooling calculator (https://support.illumina.com/help/pooling-calculator/pooling-calculator.htm) to prepare 4–5 nM libraries. Pooled libraries were sequenced at the University of Oklahoma Clinical Genomics Facility using the MiSeq platform.
Sequence data analysis. Forward and reverse Illumina reads were assembled using make.contigs command in mothur (38), followed by screening to remove sequences with ambiguous bases, sequences with homopolymer stretches longer than 8 bases, and sequences that were shorter than 200 or longer than 380 bp. Sequence assignment to AGF genera was conducted using a two-tier approach as recently described (27). Briefly, sequences were first compared by Blastn to the curated D1/D2 LSU rRNA AGF database (www.anaerobicfungi.org), and were classified as their first hit taxonomy if the percentage similarity to the first hit was > 96% and the two sequences were aligned over > 70% of the query sequence length. For all sequences that could not be confidently assigned to an AGF genus by Blastn, insertion into a reference LSU tree (with representatives from all cultured and uncultured AGF genera and candidate genera) was used to assess novelty. These genus-level assignments were then used to build a taxonomy file in mothur, which was subsequently used to build a shared file using the mothur commands phylotype and make.shared. The genus-level shared file was used for all downstream analyses as detailed below.
Alpha diversity estimates (observed number of genera, Shannon, Simpson, and Inverse Simpson diversity indices) were calculated using the command estimate_richness in the Phyloseq R package (39). To compare beta diversity and community structure between AGF communities in tortoises and AGF communities in canonical mammalian hosts, the AGF genus-level shared file from the 11 tortoise samples studied here was combined with the AGF genus-level shared file from a subset of mammalian hosts previously studied (27) (Dataset 1). The subset of mammalian AGF hosts included a comparable size from each of the five most-commonly sampled and numerous mammalian hosts: cattle (n = 25) (Bos taurus), sheep (n = 25) (Ovis aries), goat (n = 25) (Capra hircus), white-tail deer (n = 24) (Odocoileus virginianus), and horse (n = 25) (Equus caballus) generated in a prior study (27) that utilized the same DNA extraction, D2 LSU region amplification, and Illumina sequencing chemistry used in this study (Dataset 1). With this combined AGF genus-level shared file, we used the ordinate command in the Phyloseq R package to calculate weighted Unifrac beta diversity indices, and used the obtained pairwise values to construct ordination plots (both PCoA and NMDS) using the function plot_ordination in the Phyloseq R package.
Quantitative PCR. We quantified total AGF load in the 11 tortoise samples and compared it to a subset of the samples from ten cattle, ten goats, ten sheep, and ten horses (sample names in red text in Dataset 1) using quantitative PCR. The same primer pair (AGF-LSU-EnvS and AGF-LSU-EnvS Rev) used in the amplicon-based diversity survey described above was also used for qPCR quantification. The 25-µl PCR reaction volume contained 1 µl of extracted DNA, 0.3 µM of primers AGF-LSU-EnvS primer pair, and SYBR GreenER™ qPCR SuperMix for iCycler™ (ThermoFisher, Waltham, MA, USA). Reactions were run on a MyiQ thermocycler (Bio-Rad Laboratories, Hercules, CA). The reactions were heated at 95ºC for 8.5 min, followed by 40 cycles, with one cycle consisting of 15 sec at 95ºC and 1 min at 55ºC. A pCR 4-TOPO or pCR-XL-2-TOPO plasmid (ThermoFisher, Waltham, Massachusetts) containing an insert spanning ITS1-5.8S rRNA-ITS2-D1/D2 region of 28S rRNA from a pure culture strain was used as a positive control, as well as to generate a standard curve. The efficiency of the amplification of standards (E) was calculated from the slope of the standard curve and was found to be 0.89.
Isolation of AGF from Tortoises. Isolation of AGF from fecal samples of tortoises was conducted using established enrichment and isolation procedures in our laboratory (25, 40). A sequence-guided strategy, where samples with the highest proportion of novel, yet-uncultured AGF taxa were prioritized, was employed. To account for the poikilothermic (ectothermic) nature of the host, and the fact that the tortoise gut community is often exposed to lower and variable temperatures, we enriched for tortoise-associated AGF at a range of temperatures (30ºC, 39ºC, and 42ºC). Finally, the rumen fluid medium used for enrichment and isolation was amended with cellobiose (RFC medium) in addition to an insoluble substrate (switchgrass) and antibiotics (50 µg ml− 1 chloramphenicol, 20 µg ml− 1 streptomycin, 50 µg ml− 1 penicillin, 50 µg ml− 1 kanamycin, and 50 µg ml− 1 norfloxacin).
Transcriptomic sequencing. Transcriptomic sequencing of 7 representative tortoise-associated AGF isolates (Testudinimyces strains BO1, EO1, GO1, NOS1, T130A.3, and Astrotestudinimyces strains B1.1, and B1.2) was conducted as described previously(27, 35). Briefly, biomass from cultures grown in RFC medium was vacuum filtered and used for total RNA extraction using an Epicentre MasterPure Teast RNA purification kit (Epicentre, Madison, WI) according to manufacturer’s instructions. RNA-seq was conducted on an Illumina HiSeq2500 platform using 2 × 150 bp paired-end library at the Oklahoma State University Genomics and Proteomics Core Facility. RNA-seq reads were quality trimmed and de novo assembled using Trinity (v2.14.0) and default parameters. Assembled transcripts were clustered using CD-HIT (41) (identity parameter of 95% (–c 0.95)) to identify unigenes. Following, peptide and coding sequence prediction was conducted on the unigenes using TransDecoder (v5.0.2) with a minimum peptide length of 100 amino acids (https://github.com/TransDecoder/TransDecoder). BUSCO (42) was used to assess transcriptome completeness using the fungi_odb10 dataset modified to remove 155 mitochondrial protein families as previously suggested (22).
Phylogenomic analysis and molecular dating. Phylogenomic analysis was conducted as previously described (27, 43) using the 7 transcriptomic datasets generated in this study, as well as 52 transcriptomic datasets from 14 AGF genera previously generated by our group (27, 35, 44), and others (22, 45–47), in addition to 5 outgroup Chytridiomycota genomes (Chytriomyces sp. strain MP 71, Entophlyctis helioformis JEL805, Gaertneriomyces semiglobifer Barr 43, Gonapodya prolifera JEL478, and Rhizoclosmatium globosum JEL800) to provide calibration points. The final alignment file included 88 genes that were gap free and comprising more than 150 nucleotide sites. This refined alignment was further grouped into 20 partitions, each assigned with an independent substitution model, suggested by a greedy search using PartitionFinder v2.1.1. All partition files, along with their corresponding models, were imported into BEAUti v1.10.4 for conducting Bayesian and molecular dating analyses. Calibration priors were set as previously described (35) including a direct fossil record of Chytridiomycota from the Rhynie Chert (407 Mya) and the emergence time of Chytridiomycota (573 to 770 Mya as 95% HPD)). The Birth-Death incomplete sampling tree model was employed for interspecies relationship analyses. Unlinked strict clock models were used for each partition independently. Three independent runs were performed for 30 million generations each. Tracer v1.7.1 (48) was used to confirm that sufficient effective sample size (ESS > 200) was obtained after the default burn-in (10%). The maximum clade credibility (MCC) tree was compiled using TreeAnnotator v1.10.4 (49).
Transcriptomic gene content analysis and comparative transcriptomics. Transcriptomic datasets obtained from tortoise AGF isolates (n = 7) were compared to the 52 previously generated transcriptomic datasets from mammalian AGF isolates (22, 27, 35, 44–47). Gene content comparison was conducted via classification of the predicted peptides against COG, KOG, GO, and KEGG classification schemes, as well as prediction of the overall CAZyme content. COG and KOG classifications were carried out via Blastp comparisons of the predicted peptides against the most updated databases downloaded from NCBI ftp server (https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/ for COG 2020 database update, and https://ftp.ncbi.nih.gov/pub/COG/KOG/ for KOG database). GO annotations were obtained by first running Blastp comparisons of the predicted peptides against the SwissProt database. The first SwissProt hit of each peptide was then linked to a GO number by awk searching the file idmapping_selected.tab available from the Uniprot ftp server (https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz). GO numbers corresponding to the first hits were then linked to their GO aspect (one of: Molecular function, Cellular component, or Biological process) by awk searching the file “goa_uniprot_all.gaf” available from GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT). KEGG classification was conducted by running GhostKOALA (50) search on the predicted peptides. The overall CAZyme content was predicted using run_dbcan4 (https://github.com/linnabrown/run_dbcan), the standalone tool of the dbCAN3 server (http://bcb.unl.edu/dbCAN2/) to identify GHs, PLs, CEs, AAs, and CBMs in the 7 transcriptomic datasets.
To identify predicted functions that are unique to tortoise-associated or mammalian-associated AGF, predicted peptides from all 59 transcriptomes were compared in an all versus all Blastp followed by MCL clustering. Clusters obtained were then examined to identify these clusters that are unique to both tortoise isolates, unique to one of them, or present in mammalian associated genera but absent from both tortoise genera (thereafter “GroupD” clusters). KEGG classifications of predicted peptides belonging to each of these groups were then compared.
Quantifying horizontal gene transfer (HGT). We implemented an HGT detection pipeline that was previously developed and extensively validated (44) to identify patterns of HGT in the 7 tortoise-associated AGF transcriptomic datasets. The pipeline involved a combination of BLAST similarity searches against UniProt databases (downloaded January 2023), comparative similarity index (HGT index, hU), and phylogenetic analyses to identify potential HGT candidates. The downloaded Uniprot databases encompassed Bacteria, Archaea, Viruses, Viridiplantae, Opisthokonta-Chaonoflagellida, Opisthokonta-Metazoa, the Opisthokonta-Nucleariidae and Fonticula group, all other Opisthokonta, and all other non-Opisthokonta, non-Viridiplantae Eukaryota. Each predicted peptide from the 7 tortoise isolates transcriptomic datasets was searched against each of these databases, as well as against the Opisthokonta-Fungi (without Neocallimastigomycota representatives). Candidates with a Blastp bit-score against a nonfungal database that was > 100 and an HGT index hU that was ≥ 30 were further evaluated via phylogenetic analysis to confirm HGT occurrence, and to determine the potential donor. All potential candidates were first clustered using CD-HIT and 95% similarity cutoff. Representatives of each cluster were then queried against the nr database using web Blastp once against the full nr database and once against the Fungi (taxonomy ID 4751) excluding the Neocallimastigomycetes (taxonomy ID 451455) with an E value below e− 10. The first 100 hits obtained using these two Blastp searches were downloaded and combined in one FASTA file that was then combined with the AGF representative sequences and aligned using MAFFT multiple sequence aligner, and the alignment was subsequently used to construct maximum likelihood phylogenetic trees using FastTree. At this level, candidates that showed a nested phylogenetic affiliation that was incongruent to organismal phylogeny with strong bootstrap supports were deemed horizontally transferred.
Predicted secretome in transcriptomic datasets. To identify the predicted secretome, DeepLoc 2.0 (51) was used to predict the subcellular location of all predicted peptides from the transcriptomes of a representative of two different tortoise-associated AGF genera (strain T130A.3 and B1.1), as well as one representative of mammalian-associated AGF genera (Orpinomyces joyonii strain AB3). All transcripts encoding peptides predicted to be extracellular (henceforth predicted secretome) were then subjected to run_dbcan4 (https://github.com/linnabrown/run_dbcan) to identify GHs, PLs, and CEs in the predicted secretome. In addition, the predicted secretome was searched for the presence of scaffoldin homologues via Blastp comparison against a scaffoldin database (319 proteins downloaded June 2023 from Uniprot and created by searching the UniprotKB for Scaffoldin and filtering the output by taxonomy using taxid Neocallimastigomycetes [451455]). Finally, the predicted secretome was also searched for the presence of non-catalytic dockerin domains (NCDD) via the NCBI Batch CD-search online tool and identifying the predicted peptides with hits to the CBM_10 pfam02013. All predicted extracellular peptides with NCDD were further subjected to run_dbcan4 to identify co-existing GH, PL, and CE domains.
Proteomics sequencing and analysis. In addition to secretome prediction from transcriptomic datasets, we conducted proteomic analysis on the same tortoise-associated AGF described above (Testudinimyces gracilis strains T130A.3 and Astrotestudinimyces divisus strains B1.1). Proteomic analysis was conducted on two fractions: biomass, and cellulose-bound. Briefly, cultures were grown in RFC media until mid-exponential phase (typically 3 days for Astrotestudinimyces strain B1.1, and 1 week for Testudinimyces strain T130A). Biomass fraction was first collected by centrifugation (3220 xg for 10 minutes at 4ºC). The cellulosomal fraction (in the supernatant) was separated using cellulose precipitation as previously described (52). Briefly, the supernatant pH was adjusted to 7.5, followed by adding cellulose (Sigmacell type 50) (0.4% w/v) and gently stirring at 4°C for 2 hours. Low-speed centrifugation (3220 xg for 10 minutes at 4ºC) was then used to separate the cellulosomal (pellet) fraction. Proteins bound to cellulose in the cellulosomal fraction (pellet) were then eluted in water by agitation at room temperature for 1 hour, followed by removal of cellulose by centrifugation. The soluble eluates were collected by centrifugation, and frozen at -80ºC. For both fraction (biomass, and cellulose-bound), the frozen samples were dried by vacuum centrifugation. Dried samples were redissolved for 30 min at RT in reducing buffered guanidine (6M guanidine HCl, 0.1M Tris HCl, tris(2-carboxyethyl)phosphine, pH 8.5). Debris were removed by centrifugation, and the solutes were alkylated by adding iodoacetamide to 10 mM and incubation for 30 min in the dark at RT. The alkylation reactions were then digested with trypsin using a filter aided sample preparation (FASP) approach (53). For FASP, the samples were loaded into 30-kDa spin filter devices (Sigma®), and subjected to three buffer exchanges into 8 M urea, 0.1M TrisHCl, pH 8.5, followed by three additional buffer exchanges into digestion buffer (100 mM TrisHCl, pH 8.5). For the final buffer exchange, samples were concentrated to ~ 10 µl in digestion buffer, followed by dilution with 75 µl of digestion buffer containing 0.75 µg of trypsin/LysC mix (Promega). Reactions were digested overnight at 37ºC, and the trypsinolysis products were recovered by centrifugation of the FASP device. Recovered peptides were desalted using centrifugal devices loaded with C18 resin following the manufacturer’s recommendations (HMMS18R, The Nest Group). The desalted peptides were frozen and dried by vacuum centrifugation, and redissolved in 0.1% aqueous formic acid immediately prior to analysis by LC-MS/MS.
For LC-MS/MS, peptides were injected onto a 75 µm x 50 cm nano-HPLC column packed with 1.9-micron C18 beads (Thermo PN 164942) connected to an Easy-nLC 120 nano-HPLC system configured for two-column vented trap operation. Peptides were separated by gradient chromatography using 0.1% aqueous formic acid as mobile phase A and 80:20:0.1 acetonitrile/water/formic acid as mobile phase B. Peptides separations used a gradient of 4–32% mobile phase B delivered over a period of 120 minutes. Eluted peptides were ionized in a Nanospray Flex Ion source using stainless steel emitters (Thermo). Peptide ions were analyzed in a quadrupole-Orbitrap mass spectrometer (Fusion model, Thermo) using a “high/low” “top-speed” data-dependent MS/MS scan cycle that consisted of an MS1 scan in the Orbitrap sector, ion selection in the quadrupole sector, high energy collision in the ion routing multipole, and fragment ion analyses in the ion trap sector. The details regarding the MS programming are provided (Table S3).
RAW files from the mass spectrometer were searched against the corresponding transcriptome predicted peptides database using the MaxQuant application (v2.0.2.0, (54)). Searches utilized MaxQuant defaults, supplemented with two additional peptide modifications: deamidation of N/Q residues, and Q cyclization to pyro-glutamate. The MaxQuant “match between runs” algorithm was not used. Sequences for reversed-sequence decoy proteins and common contaminants proteins were utilized for the database searches, but were removed from the final MaxQuant protein results.
Data availability. Illumina amplicon reads generated in this study have been deposited in GenBank under BioProject accession number PRJNA997953, and BioSample accession numbers SAMN36694530- SAMN36694536. RNA-seq reads from tortoise isolates have been deposited in GenBank under BioProject accession number PRJNA997953, and BioSample accession numbers SAMN36694608- SAMN36694614.