A. Microbial composition of exocellular free DNA and other D-DNA pools
Summed across six F-DNA samples collected in the North Pacific Subtropical Gyre, the majority of annotated genes in DNA-based metagenomic libraries were derived from bacteria and viruses (Figure 1), with minimal contributions from Eukarya and Archaea (<1.0 and 1.1%, of all annotated sequences, respectively). The proportion of F-DNA virus sequences in surface waters ranged 10-35% above or at the deep chlorophyll maximum (75-125 m), about the same as the proportion of Bacteria (16-34%) over the same depth range. However, in the mesopelagic zone (250-1000 m), the proportion of F-DNA sequences from viruses was lower (2-16%) , whereas the proportion of bacterial-derived sequences was higher 38-45%. Across all F-DNA metagenomic libraries, the microbial assemblage was dominated (69-85% of family-level annotated sequences) by three main taxonomic groups: Pelagibacteraceae, Myoviridae, and Prochlorococcus (Figure 1); all of which have been documented as abundant microorganisms in the North Pacific Subtropical Gyre (6, 30, 31). Ubiquitous heterotrophs of the family Pelagibacteraceae contributed an average of 23% of F-DNA sequences over all depths sampled, with a range of 2-46%. Prochlorococcus represented 43, 42, and 53% of F-DNA sequences collected from 125, 500, and 1000 m, respectively, and represented an average 23% of family-annotated sequences. Of these Prochlorococcus sequences (125, 500, and 1000 m) more than 85% were from the high-light (HL) ecotypes (Table S2). The highest proportion of Myoviridae F-DNA sequences was observed in the upper euphotic zone, 45% (75 m) and 69% (100 m). Of the viral-derived sequences constituting the F-DNA samples, the majority (>90% of annotated viral sequences with known hosts) were dominated by viruses known to infect Synechococcus (38% ±4.6), Prochlorococcus (31% ±4.8), Pelagibacter (16% ±7.2), and other Cyanobacteria (7.9% ±1.3) (Figure S1). Synechococcus viruses peaked at the deep chlorophyll maximum on two occasions (43%). Other viruses known to infect SAR116 (5.4% ±2.2) and Vibrio (1.1% ±0.6), were present at much lower proportions in the F-DNA. F-DNA-derived Vibrio sequences were highest in mesopelagic samples (250; 2.3%, 500; 1.0%, 1000; 1.4%), but were only <1% in euphotic samples.
Among other taxonomic groups contributing to metagenomic F-DNA libraries, Archaeal contribution was minimal (ranged 0.18-2.03% for all samples). Of those, the most were most highly similar to those of ammonia-oxidizing Thaumarchaeota (0.02-1.6%), which were most prevalent at depths greater than 100 m. Of the three domains contributing to F-DNA metagenomes, Eukarya was the lowest (0.28-1.45%). The taxonomic families that were most abundant included heterokont Aureococcus, coccolithophore Noelaerhabdaceae, and Bathycoccaceae.
DNA from the vesicle D-DNA samples collected throughout the euphotic and mesopelagic zones (75-500 m; Figure 1) was also sequenced and compared. Summed across all samples, Bacteria contributed overwhelmingly to these metagenomic libraries (77% ±22 all annotated sequences). Viruses, Archaea, and Eukaryota contributed an average of 18.4, 3.7, 0.6%, respectively to all annotated sequences (Table S1). Of the bacterial sequences, at the family-level sequences were heavily dominated by Pelagibacter ubique (81% ±8 of family-level annotated sequences), with only 34% ±12 of sequences left unannotated, the lowest of all three D-DNA fractions. Other bacterial sequences that contributed to the vesicle samples, include those from Rhodospirillaceae (2.9%), Rhodobacteraceae (2.7%), Flavobacteriaceae (1.0%), and Prochlorococcus (0.5%). Viral sequences in the vesicle metagenomic DNA libraries were highest in the upper euphotic zone and were dominated by both Podoviruses (7%) and Myoviruses (6%).
Averaged across all virus fraction metagenomic libraries, these samples were dominated by annotated sequences derived from viruses (Figure 1), consistent with previous reports of this D-DNA fraction utilizing transmission electron micrographs and epifluorescence analyses (29). The viral metagenomic libraries had the lowest number of recovered sequences, many of which were novel and unannotated (61-73% unannotated across all samples). Recovered genes ranged from 13,080,771-15,551,348, and averaged at 13,822,361 (±1,071,408) (Table S1). For all samples, viral family-level annotated sequences were nearly split between Myoviruses (23-38%) and Podoviruses (19-32%), with minimal contributions from Siphoviruses (4-7%). Of the metagenomic libraries contributing to the viral libraries, Prochlorococcus phages, other Cyanophages and Pelagibacter phages were the most abundant (Figure S1). Synechococcus and Prochlorococcus phages dominated the euphotic samples (75-125 m). At the deep chlorophyll maximum, Synechococcus and Cyanophage sequences in the virus fraction metagemomes peaked. In mesopelagic samples, Pelagibacter phages and Vibro phages increased in proportion in the virus fraction. These depths are consistent with both cellular host and as well as virus abundances previously reported at Station ALOHA (6, 31, 32).
Overall, the DNA-based metagenomic libraries developed from the three dissolved DNA pools were distinct with respect to their microbial DNA compositions. The vesicle fraction was primarily dominated by a single taxonomic family (Pelagibacteraceae) across all depths, the viral fraction was dominated by bacteriophages, and lastly the exocellular F-DNA pool had both bacterial and viral derived DNA. While DNA from the former two pools have been previously described by metagenomic analyses, the composition DNA in the F-DNA fraction has not been previously reported.
B. Depth of origin of D-DNA throughout the water column
To infer the depths of origin of different D-DNA fractions, we mapped DNA sequences against a depth-resolved microbial gene catalogue from Station ALOHA (Figure 2; 6, 25, 31). The objective was to determine whether genes from the D-DNA fractions matched Station ALOHA genes recovered from the same sampling depths as the D-DNA, or whether the D-DNA was potentially transported from other depths.
The viral samples were most similar to Station ALOHA annotated genes, that matched the depth at which they were collected. This was particularly evident in the mesopelagic zone samples collected from 250-500 m (Figure 2). In these samples 25% of the genes were derived from their respective collection depths, with only 13% from the euphotic zone (5-200 m). Viral samples collected from the DCM had high contributions from typical DCM depths (100-175 m; 29-36%), as well as neighboring upper euphotic (5-75 m; 15%) and upper mesopelagic zones (200-250 m; 13%), with only 1% from the lower mesopelagic zone (500-1000 m). Of the three D-DNA fractions, the viral samples had the highest average of genes with unknown depths (43% ±4.5%).
In contrast to the viral samples, the vesicle and F-DNA samples appeared to contain both autochthonous and allochthonous DNA (Figure 2). In the euphotic zone samples (75-125 m), as expected, sequences were dominated (>50%) by surface-derived DNA (5-200 m), with minimal mesopelagic zone contributions (<10%). However, in mesopelagic zone vesicle and F-DNA samples (250-1000 m), genes originated primarily from the upper euphotic zone (5-75 m; >30%), and to a lesser extent (<20%) the depth from which they were collected. Of these mesopelagic zone samples, the shallowest F-DNA sample (250 m) had the most depth-diverse genes, originating from depths throughout the euphotic and mesopelagic zones (5-500 m).
C. Size distributions of environmental F-DNA through the water column
The size spectra of recovered F-DNA was measured by capillary electrophoresis, following density gradient separation and buffer exchange. Seven F-DNA samples collected throughout the euphotic and mesopelagic zones (5-1000 m) were measured to assess the degradation of samples and molecular weight distributions prior to sequencing (Figure 3). Samples collected in the upper euphotic zone (5-100 m) had a distinct peak (<5000 bp peak width) of high molecular weight F-DNA (HMW; referred to here as >1,000 bp) and a lower proportion of low molecular weight (LMW; <1,000 bp) F-DNA, ranging between 24-38%, compared to mesopelagic zone (250-1000 m) samples which ranged between 33-65%. In these upper euphotic zone samples, there was a high proportion of F-DNA 1000-40,000 bp (62-73%), whereas in mesopelagic zone samples this HMW F-DNA tended to be lower (48% average, 35-66%). Lower euphotic (125 m) and mesopelagic zone samples (250-1000 m) tended to have a broader range of F-DNA sizes, suggesting that this DNA may have been degraded. In mesopelagic zone samples, the HMW F-DNA decreased, the maximum peaks of HMW-DNA in samples 250-1000 m were less distinct (>10,000 bp peak width) and there was more F-DNA between peaks. In samples 5-100 m, <25% of the DNA was <350 bp. At 1000 m the peaks were unpronounced, suggesting a notable level of degradation in this deep sample.
D. Comparing vesicle, viral, and F-DNA fractions by non-metric multidimensional scaling
To compare all dissolved DNA fractions (vesicles, viruses, and F-DNA) with each other and previously reported Station ALOHA viral and P-DNA metagenomic sequences, two-dimensional ordination methods were employed. Bray-Curtis dissimilarity based non-metric multidimensional scaling (NMDS) of dissolved DNA and the Station ALOHA gene catalogue (Figure S2a & b, respectively; 0.02 µm filtered “viral” and 0.2 µm filtered “cellular” communities) microbial communities were compared on family-level annotated metagenomes. This comparison confirmed that the viral D-DNA samples collected in this study were similar in composition to previously characterized Station ALOHA virioplankton communities recovered from the same respective depths (Figure S2a, stress = 0.11). As for F-DNA sequences, upper euphotic zone samples (75-100 m) clustered with their respective Station ALOHA viral samples, whereas lower euphotic (125 m) and mesopelagic zone (250-1000 m) samples clustered together and not with their respective sample depths. Similarly, vesicle D-DNA did not cluster with any Station ALOHA viral samples.
The same D-DNA sequences were compared to the cellular microbial community genes utilizing NMDS (stress = 0.07; Figure S2b). From this analysis two distinct cellular communities emerged, euphotic (75-125 m) and mesopelagic (250-1000 m) Station ALOHA samples clustered together, in general consistent with previous reports (6). Viral D-DNA did not cluster with any Station ALOHA cellular communities. Similarly, F-DNA fractions did not cluster with their respective Station ALOHA depths, deep F-DNA samples (500 and 1000 m) clustered with cellular communities filtered from 75 m, revealing a potential cellular origin of this F-DNA. Upper mesopelagic (250 m) F-DNA clustered nearest cellular metagenomes collected from 125 m. Surface (75 m) vesicle D-DNA clustered with euphotic zone cellular communities, whereas lower euphotic (125 m) and mesopelagic (500 m) vesicles samples clustered with mesopelagic Station ALOHA cellular samples. D-DNA samples that had high proportions of viral sequences (Figure 1) clustered together (Figure S2b; viral D-DNA, 100 m F-DNA, and total dissolved DNA from 100 and 250 m).