5.1 Site location and sampling. Cabin Branch is a site in the Daniel Boone National Forest in Kentucky near the border with Tennessee. Limestone was added to the channel as a passive remediation strategy and groundwater flows out from an emergence and across a limestone-lined channel before entering a pond, the Rose Pool (Figure 1). The microbial communities within Cabin Branch are predominantly the Fe(II)-oxidizing taxon Ferrovum myxofaciens  Samples were collected on March 16, 2017 between 10 am and 2 pm. Sample collection was approved by and performed in collaboration with the staff at Daniel Boone National Forest.
5.2 Geochemical analyses
5.2.1 Aqueous geochemistry. Temperature and pH were measured onsite using a WTW 330i meter and probe (Xylem Analytics, Weilheim, Germany) and conductivity was measured with a YSI 30 conductivity meter and probe (YSI Inc., Yellow Springs, OH, USA). Total ammonia (NH4(T) = NH3 + NH4+) and ferrous iron (Fe(II)) concentration were determined using Hach DR1900 Portable Spectrometers (Hach Company, Loveland, CO) using the salicylate method (Hach Method 8155).
Water samples were collected with a 140 mL syringe directly after rinsing with sample water three times before filling. Water was filtered through 0.8/0.2 μm Supor® syringe filters and dispensed into the appropriate sample bottles following a 10 mL flush to minimize contamination from the filter. Samples for ion chromatography (IC) analysis of major anions (Thermo Scientific Dionex ICS 5000+ ion chromatography system) were stored in 15 mL polypropylene centrifuge tubes. Samples for inductively coupled plasma optical emission spectroscopy (ICP-OES) analysis (Thermo Scientific iCAP 6000 series ICP-OES) of major cations and for inductively coupled plasma mass spectrometry (ICP-MS) analysis (Thermo Scientific X Series 2 ICP-MS) for trace elements were stored in acid-washed (3 day soak in 10 % TraceMetal Grade HNO3 (Fisher Scientific, Hampton, NH, USA) followed by triple rinsing with 18.2 MΩ/cm deionized water) 15 mL polypropylene centrifuge tubes and acidified with 400 μL of concentrated, OmniTrace Ultra™ concentrated HNO3 (EMD Millipore, Billerica, MA, USA). Field blanks were taken using 18.2 MΩ/cm deionized water transported to the field in 1-liter acid washed Nalgene bottles. IC, ICP-OES, and ICP-MS analyses were carried out by the Analytical Geochemistry Laboratory in the Department of Earth and Environmental Sciences at the University of Minnesota.
Samples for dissolved inorganic carbon (DIC) analysis were filtered into a gas-tight syringe and then injected into Labco Exetainers® (Labco Limited, Lampeter, UK) pre-flushed with He, with excess He removed following introduction of 4 mL of filtered sample with minimal agitation. Samples were inverted and stored on ice until returned to the lab, where 1 mL of concentrated H3PO4 was added and the samples shipped to the Stable Isotope Facility (SIF) at the University of California, Davis for analysis. DIC analysis for concentration and 13C isotopic signal using a GasBench II system interfaced to a Delta V Plus isotope ratio mass spectrometer (IR-MS) (Thermo Scientific, Bremen, Germany) with raw delta values converted to final using laboratory standards (lithium carbonate, δ13C = -46.6 ‰ and a deep seawater, δ13C = +0.8 ‰) calibrated against standards NBS-19 and L-SVEC. Samples for dissolved organic carbon (DOC) analysis were filtered through a 0.2 µm polyethersulfone syringe filter that had been flushed with ~ 30 mL of sample and then ~ 40 mL put into a 50 mL centrifuge tube and then immediately flash-frozen on dry ice and kept frozen in the dark until analysis at the SIF. DOC analysis for concentration and 13C isotopic signal were carried out using O.I. Analytical Model 1030 TOC Analyzer (O.I. Analytical, College Station, TX, USA). The standard deviation for DIC is determined from variation in check standards run by SIF. The standard deviation for DOC is determined from replicate analyses of each sample.
5.2.2 CO2 assimilation. A microcosm-based approach was employed to assess the potential for inorganic carbon uptake in situ through the addition of NaH13CO3. In situ microcosms were performed between noon and 4 PM under full or partial sun. Samples were collected using pre-sterilized spatulas or forceps and ~ 500-mg was placed into pre-combusted (12 h, 450°C) serum vials. Samples were overlaid with spring water from the collected site and serum vials were capped with gas-tight black butyl rubber septa. Assays were initiated by addition of NaH13CO3 (100 µM final concentration) (Cambridge Isotope Laboratories, Inc., Andover, MA, USA). All assays were performed in triplicate. At the emergence, both green and white filaments were present. We performed assays on each by sampling predominantly green filaments and predominantly white filaments. We assumed the majority of uptake in the green biomass would be due to photoautotrophy and thus performed one set of microcosms on the bulk filaments/biofilm (green + white) in the light and then separated the biofilm types (green or white) for microcosms performed in the dark.
We assessed the potential for photoautotrophic (light) and chemoautotrophic (dark) NaH13CO3 uptake. Assays were stopped by flash freezing vials on dry ice. All vials were stored at -80 °C until processed (described below). Reported values of 13C-labelled DIC uptake (carbon fixation rates) reflect the difference in uptake between the biomass in the assays that received NaH13CO3 and the natural abundance biomass samples described below. For comparisons between mean 13C uptake rates, a one-way ANOVA followed by post hoc pairwise comparisons between treatments was conducted using a Turkey honest significant difference (HSD) within the R software package (R version 3.3.2). Mean rates with p-values < 0.05 were considered significantly different.
5.2.3 C and N concentration and stable isotope signals. Natural abundance samples were collected at the time of sampling, flash frozen on dry ice, stored in liquid N2 for transport and stored at -80 °C until processed. Assays were thawed and biomass was washed with HCl (1 M) to remove any extra 13C-labelled DIC, triple washed with 18.2 MΩ/cm deionized water, then dried (60°C for three days). Dried biomass was ground/homogenized with a cleaned mortar and pestle (ground with ethanol silica slurry, triple rinsed with 18.2 MΩ/cm deionized water, dried). Dried and ground samples for determination of C and N concentration and stable isotope signal were weighed and placed into tin boats, sealed, and analyzed via an Elementar pyrocube elemental analyzer (EA) periphery connected to an Isoprime 100 continuous flow IRMS (IR-MS) at the University of Minnesota. Linearity corrections were made using NIST Standard 2710, and δ13C values were calibrated using reference standards USGS-40 and USGS-41 and checked with a laboratory working standard (glycine). Total uptake of DIC was calculated using DIC numbers determined for the source water (described above). All stable isotope results are given in delta formation expressed as per mil (‰). Carbon stable isotopes are calculated as:
δ13C= [((Ra)sample/(Ra)standard)-1] × 103, (1)
where Ra is the 13C/12C ratio of the sample or standard and are reported versus the Vienna Pee Dee Belemnite (VPDB) standard.
Precautions were taken to minimize cross contamination of natural abundance samples with incubation assays. Natural abundance samples were processed and analyzed in turn, prior to incubation assays. All laboratory processing and weighing equipment was cleaned with 80% ethanol between each sample. Standard checks and blanks were run within each analysis batch (49 samples and standards per batch) to check for memory effects or cross contamination of samples, with none detected.
Assimilation rates were calculated from the following parameters: Absolute isotopic stable carbon ratios were used to determine the difference between the total amount of 13C in natural abundance samples and incubation assay replicates. The difference represents total mass of 13C-labelled DIC taken up during incubations. Incubation duration times were recorded in the field. Using the organic carbon content, the uptake rate is then calculated from the total µg C taken up divided by the grams of organic C per gram of sediment, and that was divided by the number of hours the incubation was carried out over (typically ~ 2 hours).
5.3 Molecular analyses
5.3.1 DNA extraction. Samples for DNA extraction were collected at the time of sampling using sterile spatulas. Samples were flash frozen on dry ice, stored in liquid N2 for transport and stored at -80 °C until processed. Triplicate samples were collected from each sampling site. DNA was extracted from ~250 mg of biomass using a DNeasy PowerSoil Kit (Qiagen, Carlsbad, CA, USA) according to the manufacturer’s instructions. DNA was extracted from each replicate sample (n=3) and the concentration of DNA was determined using a Qubit dsDNA HS Assay kit and a Qubit 3.0 Fluorometer (Invitrogen, Burlington, ON, Canada). The concentration of DNA in the replicates was within 5% for each sample. Equal volumes of each extraction were pooled, and the concentration of the pooled extract was determined as described above. As a negative DNA extraction control, DNA was extracted from the filter used for the field blank water sample (described above). No DNA was detected in the control and sequencing failed to generate amplicons (see below for amplicon sequencing details).
5.3.2 DNA sequencing. Amplicons were sequenced at the University of Minnesota Genomics Center (UMGC) with MiSeq Illumina 2 × 300 bp chemistry using the primers 515F (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA-3’) and 806r (5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT-3’) targeting V4 hypervariable region of bacterial and archaeal 16S SSU rRNA gene sequences . Amplicon libraries were created by UMGC following their improved protocol for library preparation which enables detection of taxonomic groups that often go undetected using existing methods .Each sample was sequenced once. For metagenomic sequencing, total DNA was submitted to the UMGC and sequenced using HiSeq2500 High-Output 2 x 125 bp chemistry. The three samples were samples on a single lane.
5.3.3 16S rRNA analysis. 16S rRNA sequences from Cabin Branch were used to examine community composition. Primers and unpaired sequences were removed using trimmomatic ) The surviving reads were processed in dada2 (v.1.4) following the pipeline tutorial . Briefly, forward reads were trimmed to 210 bp and reverse reads were trimmed to 120 bp based on their quality profiles. Sequences with ambiguous bases and those with more than 2 expected errors were removed. Error rates were estimated using the learnErrors command. Sequences were dereplicated using the derepFastq command and the unique sequence variates were inferred using the dada command. Forward and reverse reads were merged using mergePairs. Contigs shorter than 250 or longer than 256 bp and chimeric sequences were removed. The surviving unique, denoised sequences are referred to as denoised sequence variants (DSVs). Taxonomy was assigned using the Silva training set  Eukaryotic sequences and those unclassified at the domain level were removed. The closest cultured and environmental relatives were identified using BLASTN .
We retrieved 1,589 aligned, nearly-full length Ferrovum sequences from the Silva non-redundant database to serve as a reference alignment for the analyses described below. A 16S rRNA phylogeny was constructed by retrieving the sequences used by Ullrich et al.  with Ferritrophicum radicicola, Nitrospovibrio tenuis, and Nitrospira lenta used as an outgroup.. These sequences were aligned using MAFFT on the Cipres Science Gateway  using the aligned Ferrovum sequences from the Silva database as a reference alignment. A tree of the smaller subset of sequences was constructed in RAXML-HPC2 on XSEDE  also on the Cipres Science Gateway. Sequences of DSVs classified as Ferrovum were added to the alignment using the –addfrags option in MAFFT . Non-full-length sequences were added to the tree using the evolutionary placement algorithm. Trees were rooted and visualized in the interactive tree of life . Raw sequences were uploaded to the NCBI Sequence Read Archive under BioProject PRJNA554371.
5.3.4. Metagenomic analysis. Individual metagenomes were assembled and a co-assembly of all metagenomes was constructed following the “tutorial on assembly-based metagenomics” .Trimmed, quality-controlled sequences were assembled using MegaHit  using default parameters except minimum contig length, which was set at 1000 base pairs. Reads were mapped to the assembly using bowtie2  and depth was calculated using the jgi_summarize_bam_contig_depths command in Anvi’o . Contigs were binned using default parameters in metabat using . Bin completeness was determined with CheckM . Protein coding regions were identified with prodigal (within CheckM) and GhostKoala was used to annotate protein coding sequences [16, 56]. Pathways were considered to be absent from the taxon when none of the key genes associated with that pathway were present in the MAG.
Each bin was uploaded to KBASE  and annotated with Prokka using the “Annotate Assembly and Re-annotate Genomes with Prokka (v1.12)” app. A concatenated gene tree containing each bin and four published Ferrovum spp. genomes was constructed using the “Insert Set of Genomes Into Species Tree 2.1.10” app. This app uses up to 49 ribosomal and single-copy genes to construct a phylogenetic tree. MAGs that were more closely related to Ferrovum spp. than other taxa were selected for further analysis. The pairwise ANI between each bin and published Ferrovum genomes was calculated using anvi-compute-ani in Anvi’o. If MAGs shared >98% pairwise ANI and were phylogenetically cohesive, we considered them to be representing the same populations and the bin with the highest completeness was chosen for further analysis. Raw reads, assemblies, and MAGs were uploaded to the NCBI Sequence Read Archive under BioProject PRJNA554371. The abundance of Ferrovum in each metagenomes using One Codex .
16S rRNA. A custom blast database containing the 16S rRNA gene from the Ferrovum myxofaciens type strain (NR_117782.1) using the makeblastdb command in BLAST+. Each of the Ferrovum bins were searched for 16S rRNA genes using BLAST and a cutoff value of E-30.
Cyc-2. A custom blast database of the high molecular weight Cyc-2 like proteins was constructed using the methods described above. Each of the Ferrovum bins were searched for the Cyc-2-like protein identified by Ullrich et al., by blasting translated nucleotide sequences against the blast database with an E-value of E-120. The retrieved sequences, database sequences, and a sequence for Cyc-2 from an Acidithiobacillus sp. (AVK42811.1) were aligned using MAFFT and a maximum likelihood tree was constructed as described above. The tree was rooted on Acidithiobacillus and visualized in iTOL.
5.3.5. Comparison with other Ferrovum genomes. Existing Ferrovum genomes are located within either the Group I or Group IV Ferrovum clades . The genome for Ferrovum sp. JA12 (Group I) and Ferrovum myxofaciens (Group IV) were downloaded from NCBI. CheckM was used to determine genome completeness, and within CheckM, prodigal was used to identify protein coding sequences. Ghost KOALA was used to annotate protein coding sequences using the genus_prokaryotes database. Protein coding sequences were also annotated with prokka. We then manually compared the gene content of MAGs to that of existing Ferrovum genomes.