Field sampling
During a research cruise on the MRV Scotia in the spring of April 24 to May 9 of 2014, water samples were collected from location FIM6a (60° 38’N, 4° 54’W) at depths 5 m and 700 m. This sampling site lies on the Fair Isle-Munken line [21] near the Foinaven oil field development area, approximately 3 and 9.3 nautical miles from the Petrojarl Foinaven and Glen Lion production facilities, respectively. Collection of seawater samples (3 L volumes) was performed using 10 L Niskin water bottles mounted on a CTD (conductivity, temperature, depth) carousel, based on the sampling procedures of MRV Scotia [21]. CTD casts confirmed these samples at these water depths were from two specific water masses respectively, the Modified North Atlantic Water (MNAW) and the Norwegian Sea Arctic Intermediate Water (NSAIW) masses [22]. Immediately after recovery, some of the collected seawater was used to rinse, at least three times, two Nalgene carboys (10L each; acid-washed, acetone-rinsed, and dried) prior to filling, and immediately stored at 10°C onboard the vessel until return to the laboratory at Heriot-Watt University for immediate use within SIP experiments.
SIP incubations
Prior to the preparation of the SIP incubations, each of the two seawater samples (collected at 5 m and 700 m depths) were processed to remove, as far as possible, dissolved organic carbon/matter that could potentially act as an alternative carbon source and potentially redirect microbial activity away from the labeled substrates during SIP. For this, 800 ml of each sample was filtered through 0.22 μm MCE filters (47 mm diameter; Millipore Sigma). The bacterial biomass collected on the filters was washed with a few milliliters of sterile synthetic seawater medium ONR7a [23], and then the biomass was re-suspended in sterile 40 ml of the ONR7a to act as the inoculum for SIP.
SIP incubations were performed using 125-ml sterilized glass screw-top Erlenmeyer flasks with caps that were lined with aluminum foil to prevent sorption of hydrocarbons. ONR7a medium was used in these incubations because, as explained above, we wanted to prevent the introduction of exogenous and potentially bioavailable sources of carbon. For incubations utilizing either the surface 5m depth water inoculum or the deep water (700 m depth) inoculum, each flask contained 16 ml of ONR7a medium, 1 mg of labeled (13C) and/or unlabeled n-hexadecane and 4 ml of inoculum. Uniformly [U-13C] and unlabeled (12C) n-hexadecane of >99% purity was obtained from Sigma-Aldrich. Each of the two inocula, from the 5 m and 700 m depth, were used to set up SIP experiments with both hydrocarbon substrates. For SIP, duplicate flasks were prepared with 1 mg of U-13C-labeled n-hexadecane, and a second set of duplicates was prepared with 1 mg of the respective unlabeled hydrocarbon. An additional set of triplicate flasks was prepared containing unlabeled n-hexadecane to monitor its disappearance by gas chromatography–mass spectrometry (GC–MS). Samples were periodically taken from these flasks for DNA extraction and subsequent measurement by quantitative PCR (described below) to determine the abundance of target organisms identified through SIP. For each SIP experiment, an additional set of triplicate flasks was prepared to act as acid-killed controls (pH<1) containing unlabeled hydrocarbons and amended with 750 μl of 85% phosphoric acid. All flasks were incubated on an orbital shaker (150 rpm) in the dark at 21°C. The endpoint when each SIP experiment was terminated was determined by tracking the disappearance of the hydrocarbon in the triplicate unlabeled flasks by GC-MS. At this point, whole DNA from the total volume in the paired flasks amended with the [U-13C] hydrocarbon and the corresponding paired set with unlabeled hydrocarbons was extracted using a standard protocol [24].
DNA gradient ultracentrifugation and identification of labeled 16S rRNA genes
Isopycnic ultracentrifugation of DNA from each of the SIP experiments utilizing the 5m or 700m seawater samples amended with 13C-labeled n-hexadecane resulted in the visual separation of two bands (~1 cm apart from each other) that were in the lower half of the polyallomer tubes and which was consistent with the expected location of the ‘heavy’ and ‘light’ DNA bands. With the respective acid-killed controls, it later became apparent that instead of concentrated acid, a very dilute concentration had been added to these control incubations and which was apparent from the degradation data (Fig. 1). This was insufficient to completely suppress microbial activity which explains why some degradation of the hydrocarbon was observed in these controls. Nonetheless, this did not affect our assessment in monitoring degradation of the n-hexadecane in the “live” incubations and did not preclude our selection of the endpoint for when to terminate these experiments for extraction of DNA from the 13C incubations. We selected an endpoint of 5 days, at which point DNA extractions were performed on each of the duplicate 13C incubations for subsequent isopycnic ultracentrifugation to isolate the 13C-enriched ‘heavy’ DNA for analysis. Subsequent denaturing gradient gel electrophoresis (DGGE) analysis of the fractions derived from the labeled incubations performed for each of the two water samples showed clear evidence of isotopic enrichment of DNA. This is evident from the observed clear separation of, and different banding patterns between, the 13C-enriched and unenriched DNA fractions by DGGE (Suppl. Fig. S1, Additional File 1) and by the distribution of qPCR-quantified 16S rRNA gene sequences (Suppl. Fig. S2, Additional File 2). Combined fractions containing 13C-enriched DNA from each of these 13C incubations were used to construct 16S rRNA gene clone libraries. Fractions from the duplicate ultracentrifuge tube from each of the 13C incubations were similarly analyzed to confirm our results (data not shown).
Caesium chloride (CsCl) gradient ultracentrifugation and identification of 13C-enriched DNA
Total extracted DNA from each of the duplicate unlabeled and 13C-labeled incubations was added to caesium chloride (CsCl) solutions (1.68 g/ml) and the 13C-enriched and unenriched DNA separated by isopycnic ultracentrifugation and gradient fractionation, following the method of [25] with the following modifications. About 3 ml of mineral oil was added to the top of each CsCl solution until the polyallomer tubes were almost full prior to then heat-sealing them. The tubes were then ultra-centrifuged for 40 hours using a fixed-angle rotor 70.1Ti (Beckman Coulter) at 187,000 ×g at 20°C with a BC Optima L-100 XP ultracentrifuge (Beckman Coulter).
Following isopycnic ultracentrifugation, denaturing gradient gel electrophoresis (DGGE) was performed on each fraction from the SIP tubes to visualize and confirm the separation of DNA. For this, amplification of each fraction was carried out using PCR as described by [26] with bacterial primers 341f (5’-CCTACGGGAGGCAGCAG-3’) and 534r (5’-ATTACCGCGGCTGCTGG-3’), the forward of which contained a 40 nucleotide GC clamp (5’-CGCCCGCCGCGCGCGGCGGGCGGGGCGGGGGCACGGGGGG-3’) [27]. PCR products were confirmed on a 1.5% (w/v) agarose gel alongside a HindIII DNA ladder (Invitrogen, Carlsbad, CA, USA). DGGE was performed using 6.5% acrylamide gels containing a denaturant range of 30–70% (100% denaturant contains 7.0 M urea and 40% molecular-grade formamide). After electrophoresis for 16 h at 60ºC and 60 V, gels were stained with ethidium bromide (1:25,000 dilution) for 15 min, and then imaged with a InGenius3 gel imaging system (Syngene) and accompanying software to crop the gel images to only the regions displaying bands. The 13C-enriched heavy DNA fractions were selected based on the DGGE evidence, which is discussed below.
16S rRNA gene libraries of 13C-enriched DNA
16S rRNA clone libraries, each comprising 48 clones, were prepared from combined fractions containing the 13C-enriched DNA from each of the two SIP experiments using general bacterial primers 27f and 1492r [27]. Cloning of PCR products was performed using the TOPO-TA cloning kit for sequencing (ThermoFisher Scientific). Clones were partially sequenced by GeneWiz (UK) using primer 27f. After excluding vector sequences, poor-quality reads and chimeras, the clone sequences were grouped into operational taxonomic units (OTUs) based on applying a 97% sequence identity cut-off. A representative clone sequence was selected from each dominant OTU identified in each of the libraries and used to obtain a near-complete 16S rRNA gene sequence. Sequences were edited and assembled using Consed/Phred/Phrap [28]. BLASTn searches and RDP-II were used to check for close relatives and phylogenetic affiliation. The near-complete 16S rRNA gene sequence (>1400 bp) representing each of the major SIP-identified OTUs were compared with related GenBank sequences (Suppl. Fig. S3, Additional File 3).
Primers for qPCR targeting the 16S rRNA genes of these major SIP-identified OTUs were developed (Table 1) and used to determine the abundance of each group over time during incubation with the n-hexadecane. During incubations of the 5m water sample with unlabeled n-hexadecane, the 16S rRNA gene copy number for the Thalassolitus OTU-2.6, Lentibacter OTU-1.1, Oleibacter OTU-2.4 and Alcanivorax OTU-2.14 increased, respectively, by approximately 9, 16, 17 and 18 orders of magnitude after 5 days (Fig. 2A). Similarly, during incubations of the 700m water sample with unlabeled n-hexadecane, the 16S rRNA gene copy number for the Dokdonia OTU-4.12 (6 orders), Glaciecola OTU-3.32 (10 orders), Phaeobacter OTU-4.3 (12 orders), Marinobacter OTU-3.15 (13 orders), Alcanivorax OTU-4.22 (18 orders) and Oleibacter OTU-3.27 (20 orders) increased, respectively, by approximately 6, 10, 12, 13, 18 and 20 orders of magnitude after 5 days (Fig. 2B). An increase in the 16S rRNA gene copy numbers of these OTUs in these two SIP experiments provides further confirmation of their enrichment on the n-hexadecane as growth substrate. These increases also coincided with an increase in the total concentration of DNA (Fig. 2) as a proxy for cell growth. This coupled with the disappearance (biodegradation) of the hydrocarbon and the appearance of the 16S rRNA genes of these organisms in the most heavily 13C-enriched DNA fractions suggests that these organisms performed a primary role in the degradation of the n-hexadecane.
Real-time quantitative PCR
To quantify sequences in the dominant OTUs, primers for real-time quantitative PCR (qPCR) were developed using AliView [29] and the NCBI Primer Blast online tool [30]. Primer specificity was confirmed with the NCBI Primer BLAST tool. The optimal annealing temperature of each primer pair was determined using an Applied Biosystems (Foster City, CA. USA) Mastercycler gradient thermal cycler. The template for these reactions, and for the construction of respective standard curves for quantitative PCR, was a plasmid containing a representative sequence that had been linearized using an appropriate restriction endonuclease and purified using the QIAquick nucleotide removal kit (Qiagen, Valencia, CA, USA).
Purified DNA from time-series incubations with unlabeled hydrocarbon was quantified using a NanoDrop ND-3300 fluorospectrometer (Thermo Scientific) and the Quant-iT Picogreen double-stranded DNA (dsDNA) kit (Invitrogen). From each of the two SIP experiments, just one replicate of the duplicate 12C- and 13C-labeled incubations was selected for further analysis based on fractions from it containing the highest total amount of DNA. SIP-identified sequences were quantified in each separated SIP fraction by qPCR. Single reactions were performed on each triplicate DNA extraction (from triplicate samples) from the time series containing unlabelled hydrocarbons.
Phylogenetic tree of the 13C-enriched community
The 16S rRNA genes of the SIP-identified sequences were assembled by using the program Sequencher 5.3 (GeneCodes Corp., Ann Arbor, MI). The consensus sequences were submitted to GenBank and checked for close relatives and phylogenetic affiliation using BLASTn. Search results were used as a guide for tree construction, and additional related 16S rRNA sequences identified from the BLASTn search were retrieved from GenBank. The software package MEGAX (version 10.2.4) was used to align the sequences using MUSCLE and to construct a neighbor-joining tree with Jukes–Cantor correction. The tree was bootstrapped 1000 times and gaps in the alignment were ignored. Roseibacillus ishigakijimensis strain MN1-741 (NR041621), Verrucomicrobium spinosum strain DSM 4136 (NR026266) and Haloferula chungangensis strain CAU 1074 (NR109435) were used as an outgroup.
Metagenomic sequencing and assembly of 13C-enriched DNA from SIP
Illumina library preparation, sequencing and assembly of four samples were completed by the Joint Genome Institute (JGI [31, 32]; data are available under project IDs 3300039448, 3300039456, 3300039449, and 3300040958). The four samples represent the heavy (13C-enriched) DNA from each of the duplicate SIP incubations using the seawater inoculum collected at 5 m and 700 m depths. Paired-end sequencing was performed on an Illumina NovaSeq 6000 platform with an average insert size of 241 and fragments of 300 bps. Raw reads were quality filtered following BBtools pipeline [33] and assembled with metaSPAdes version 3.14.1 [34]; Suppl. Table 1,2, Additional File 9,10). Coverage information was obtained by mapping all high-quality reads of each sample against the assembly using the BWA-MEM algorithm in paired-end mode (bwa-0.7.12-r1034[35]).
Genome binning
Assembled metagenomic data (contigs >2000 bp) was binned using MetaBAT v2.12.1 [36] and CONCOCT v1.1.0 [37], and resulting MAGs were combined using DAS Tool v1.1.2 [38]. First, each of the mapping files were summarized using jgi_summarize_bam_contig_depths and then MetaBAT was run using the following settings: --minCVSum 0 --saveCls -d -v --minCV 0.1 -m 2000 and CONCOCT as follows: --clusters 400 --kmer_length 4 --length_threshold 3000 --seed 4 --iterations 500. For each of the two binning tools, a scaffold-to-bin list was prepared, and the DAS Tool run on each of the eight scaffold files as follows: DAS_Tool -i Concoct.scaffolds.tsv, Metabat.scaffolds.tsv -l concoct,metabat -c assembly.contigs.fasta –debug -t –write_bins 1 –search_engine blast. The accuracy of all the MAGs was evaluated by calculating the percentage of completeness and gene duplication using CheckM lineage_wf v1.0.5[39] (Suppl. Table 3, Additional File 11). All MAGs were greater than or equal to 50% of completeness and <10% gene duplications (according to checkM). MAGs can be found at NCBI within Bioproject PRJNA816150. MAG relative abundance was calculated as previously described in De Anda et al., 2021 using the bin_abundance.py script from MetaGaia (https://github.com/valdeanda/MetaGaia) (Suppl. Table 4, Additional File 12).
Phylogenetic reconstruction and taxonomy
GTDB-Tk classify_wf [40] was used for preliminary taxonomic identification of the 42 individual genomes (Suppl. Table 5, Additional File 13), and then 37 conserved marker proteins were extracted using phylosift [41]. We used the 30S ribosomal protein S2 protein of the 37 marker proteins identified to perform a BLASTp search against the RefSeq database, to obtain the closest publicly available genomes. Based on GTDB-tk v1.5.0 results of predicted taxonomy we also downloaded from RefSeq 30 genomes from Actinobacteria used as an outgroup. For these reference genomes, we also extracted the 37 marker genes using phylosift and added them for analyses. An alignment of the extracted assembled MAGs and reference genomes were generated using MAFFT [42] as follows: –globalpair –maxiterate 16 –reorder. The alignment was trimmed using trimAL [43] -automated1. The phylogeny was constructed with RAxML[44] as follows: raxmlHPC-PTHREADS-AVX -f a -m PROTGAMMAAUTO -N autoMRE -p 12345 -x 12345.
Metabolism reconstruction
Gene prediction for individual MAGs was performed using Prodigal v2.6.3 [45]. Predicted genes of individual MAGs were further characterized using several databases: KofamScan [46], and InterProScan v5.31-70.0 [47]. Searches over these databases were performed using default parameters. For KofamKOALA, only hits above the predefined threshold for individual KOs were selected. We then implemented, rbims https://github.com/mirnavazquez/RbiMs, an open software that reads, evaluates and visualizes the annotation profile output derived from KofamScan. We used the function “read_ko” to calculate the abundance of each KO within each MAG and the function “mapping_ko” to link each KO to other KEGG database features, and we also linked each KO to the rbims database, which includes a manually curated definition of the aerobic hexadecane degradation pathway [19](Suppl. Table 6, Additional File 14).
Phylogenetic reconstruction of Alk proteins
The metagenome entropy-based score MEBS [48] was used to search the Pfams associated to each of the alk genes in the assembled MAGs; alkB (PF00487), alkT (PF07992, PF18113), alkG (PF00301), alkJ (PF00732, PF05199), and alkH (PF00171). We performed a BLASTp search against the non-redundant database from NCBI and queried all the proteins previously identified for each gene and used the first hits as references for phylogenetic reconstructions. We also downloaded from the UniProt database (The UniProt Consortium) the sequences from well-characterized AlkB proteins: Q0VTH3, Q0VKZ3, O31250 and P12691; AlkG: Q9HTK7, Q9HTK8, P00272, Q9WWW4 and Q0VKZ2; AlkT: P17052, P42454, Q0VTB0, Q9HTK9, and Q9L4M8; AlkJ: Q00593, and Q9WWW2; AlkH: P12693. The coding sequences that contained the Pfam and the references were concatenated and aligned using MAFFT [42] as follows: –globalpair –maxiterate 16 –reorder, and phylogeny was generated using iqtree [49] with the following parameters: iqtree -alrt 1000 -bb 1000 -bnni (Suppl. Fig. 4-7, Additional File 4-7)).
The genomic neighborhood rubredoxin and alkane monooxygenase system and alkG-like alignment
We used the operon mapper web server [50] to identify the operons in the assembled MAGs. We focused on the operons where AlkB was present. We extracted the sequences that belonged to the COG1194 found next to the AlkB based on the operon mapper analysis. We also downloaded from UniProt (The UniProt Consortium) and NCBI well-characterized AlkG proteins: P00271, Q9WWW4, Q0VKZ2, WP_138436252.1, WP_161463810.1, WP_089423380.1, WP_084394766.1, WP_015486580.1, Q9HTK8, and Q9HTK7. The reference sequences and the COG1194 sequences were aligned using MAFFT [42] (–globalpair –maxiterate 16 –reorder).