Sample collection
Sample collection occurred at the MiCRO time series at Newport Pier in Newport Beach, California, USA (33.608°N and 117.928°W) between September 2009 and 2021 at daily to monthly (weekly, on average) temporal resolution. Sampling protocols have been previously described21,23,27. Four autoclaved bottles were rinsed with nearshore surface water before collection and immediate transport to the lab. A summary of all metagenomic samples is listed in Supplementary Table 1.
A total of 267 DNA samples (replicated) were collected through filtration of 1 L of seawater through a 2.7 μm GF/D and a 0.22 μm polyethersulfone Sterivex filter (Millipore, Darmstadt, Germany) using sterilized tubing and a Masterflex peristaltic pump (Cole-Parmer, Vernon Hills, IL). DNA was preserved with 1,620 μl of lysis buffer (23.4 mg/ml NaCl, 257 mg/ml Sucrose, 50 mM Tris-HCl, 20 mM EDTA) and stored at -20°C before extraction.
Particulate organic matter (POM) was collected using two autoclaved bottles from Newport Pier. Particulate organic carbon/ particulate organic nitrogen (POC/PON) and particulate organic phosphorus (POP) were each collected by filtering 300 ml of seawater through pre-combusted (500°C, 5 hours) 25 mm GF/F filters (Whatman, MA). POC and PON were collected on the same filter. POC/PON and POP both had six replicates collected, a triplicate for each bottle. After filtration, samples were placed on petri-dishes and stored in a -20 °C freezer.
Nutrients were collected from the filtrate of the particulate organic matter (POM) sampling. Sample water initially went through a 25 mm GF/F with a nominal pore size of 0.7 µm (used for POM) and was re-filtered through a 0.2 μm syringe filter into a prewashed 50 ml tube. Filtrate was then stored in a −20°C freezer before the determination of nitrate concentration and phosphate as soluble reactive phosphorus (SRP) concentration.
Shore-station measurements
Temperature and chlorophyll were recorded via the Southern California Coastal Observing Systems (SCCOOS) automated shore station (Newport Pier) located at the sampling site. Sensors include a Seabird SBE 16plus SeaCAT Conductivity, Temperature, and Pressure recorder and a WetLabs WetSTAR sensor for chlorophyll. Measurements were taken at 4 min resolution, but we transformed the data into daily averages to align with other environmental measurements. The El Niño-Southern Oscillation (ENSO) was measured using the Ocean Nino Index v5 (ONI) from the National Weather Service, Climate Prediction Center.
Nitrate measurements
From 2011 to 2018: Nitrate samples were thawed in a refrigerator overnight. 50 ml standards were created from an artificial sea water and potassium nitrate solution (10 µM). Standards ranged from 0 µM to 10µM concentration of potassium nitrate solution across 10 vials. 500 µl of ammonium chloride solution (4.7 M) was added to each 50 ml sample and standard and mixed. A sample was poured into a column of copperized cadmium fillings27,55 and 15 ml was used to flush the system. 25 ml was then collected from the column and 500 µl of sulfanilamide solution was added to the sample, mixed, and allowed to react for 6 minutes. 500 µl of N-(1-Naphthyl)-ethylenediamine dihydrochloride solution was then added to the tube, mixed, and allowed to react for 20 minutes. The sample was read on a spectrophotometer set at a wavelength of 543 nm55. From 2018 to 2019, samples were processed using a QuickChem FIA 8500 autoanalyzer (Lachat Instruments, Loveland, Colorado, USA), with a detection limit of 0.014 µmol. Beginning in 2019, samples were processed with spongy cadmium. Spongy cadmium was created with a cadmium sulfate solution (10 g CdSO4 in 50 ml DI water). Zinc sticks are added to the solution and allowed to sit for 8 hours. After sitting, the zinc sticks were washed with 6 N HCl, along with several drops of 6 N HCl added to the CdSO4 solution and drained. Precipitated cadmium was covered with 6 N HCl and stirred to break up the cadmium. The cadmium was drained and rinsed 10 times with DI water. The cadmium in this state was used for this protocol. 5 ml of sample was placed in borosilicate glass tubes (acid-washed) with 1 ml of NH4Cl solution (0.7 M) added to each. 0.2 g of spongy cadmium was added to each and then capped off, laying horizontally on a mechanical shaker table (100 excursions/ minute) for 90 minutes. 5 ml of sample was pipetted out of the tube and placed in a disposable culture tube. 250 µl of color reagent (mixture 5 g sulfanilamide, 0.5 g N-(1-Naphthyl)-ethylenediamine dihydrochloride mixed in 50 ml phosphoric acid (8.51 M), diluted in 500 ml of nanopure water) was added, vortexed, and allowed to react for 10 minutes. The sample was then read using a spectrophotometer set at a wavelength of 540 nm. Samples were compared to a set of six standard solutions containing artificial sea water and diluted 100 µM KNO3 (standard KNO3 concentration: 0 µM to 60 µM). Standards were treated the same as samples56.
Phosphate measurements
SRP concentrations were determined using the magnesium induced co-precipitation (MAGIC) protocol57,58. SRP samples to be processed were moved to the refrigerator overnight to thaw. Once thawed, 0.4 ml of 3M NaOH was added to each tube and shaken vigorously. After 5 minutes the samples were placed into a centrifuge set at 1500 g for 20 minutes. Supernatant (P-free seawater) was poured into a separate glass (needed for standards) from the sample tubes and allowed to dry for an hour. 6 ml of 0.25 M HCl was added to all tubes and shaken to dissolve each pellet. 0.66 ml of Arsenate (2:2:1 parts sodium metabisulfite (0.74 M), sodium thiosulfate (88.5 mM), sulfuric acid (3.5 N)) was added to each tube and allowed to react for 15 minutes in the dark. 0.7 ml of mixed reagent (2:5:1:2 parts ammonium molybdate tetrahydrate (24.3 mM), sulfuric acid (5 N), potassium antimonyl tartrate (4.1 mM), and ascorbic acid (0.3 M)) was added to the tubes and allowed to react for 30 minutes in the dark. Samples were then read on a spectrophotometer set to a wavelength of 885 nm, using a 0.125 M HCl solution as black and rinsing agent. Ten standards were made using the P-free seawater collected from the samples and mixed with 1 mM KH2PO4 creating different dilutions (2 nM to 1 µM KH2PO4). Standards were treated the same as the samples once made. From 2018 to 2019 samples were sent out to be processed on an auto-analyzer. These samples were also run in parallel with samples using the MAGIC protocol for comparison.
Particulate organic matter
POC/N samples were processed according to Sharp (1974)59. Filters for POC/N were dried at 55 °C for 24 hr and then wrapped with tin discs (CE Elantech). The wrapped filters were combusted in a FlashEA 1112 (Thermo-Scientific) using the NC Soils setup. The oxidation reactor was set to 900 °C, the reduction reactor was set to 680 °C, and the oven was set to 50 °C. Oxygen gas (UN 1072, Airgas) was injected at a flow rate of 250 ml/min, allowing sample combustion to occur at 1,800 °C. Helium gas (UN 1046, Airgas) was used as the carrier gas with the carrier flow rate set to 130 ml/min and the reference flow rate set to 100 ml/min. The compounds serving as standards were acetanilide (71.1% C, 10.4% N) and atropine (70.6% C, 4.8% N). The minimum detection limits for this analysis were 2.4 μg C and 3.0 μg N.
POP was analyzed using a modified ash-hydrolysis protocol58. Samples were removed from the freezer and placed in combusted scintillation vials; along with a set of K2HPO4 (1.0 mM-P) to be used as standards. 2 ml of MgSO4 (0.017 M) was added to each vial, covered in tin foil, and placed in an 80 °C oven overnight. The vials were moved to a 500 °C combustion oven for 2 hours. Once cooled 5 ml of HCl (0.2 M) was added to each vial and put back into the 80 °C oven for 30 minutes. The solution in the vials was transferred to a 15 ml centrifuge tube. Each vial was rinsed with 5 ml of DI water, which was transferred to the respective centrifuge tube. 1 ml of mixed reagent [2:5:1:2 parts ammonium molybdate tetrahydrate (24.3 mM), sulfuric acid (5 N), potassium antimonyl tartrate (4.1 mM), and ascorbic acid (0.3 M)] was added to each centrifuge tube in 30 second intervals and moved to the dark. Tubes were centrifuged for 2 minutes to concentrate any glass fibers or debris to the bottom of the tube. After 30 minutes samples were then analyzed using a spectrophotometer set to a wavelength of 885 nm, with a HCl (0.1 M) solution as a blank and rinsing agent between samples.
Environmental data analysis
Environmental data was collected at higher sampling frequency compared to DNA. We included all data points to reduce uncertainty for monthly and annual estimates of variation. We fitted a linear model to all data points using 12 monthly and 12 annual (2011 - 2021) categorical factors. This was done in Matlab.
DNA extraction
To extract DNA, we use an adapted protocol60. Sterivex filters were first incubated at 37°C for 30 minutes with lysozyme (50 mg/ml final concentration). Next, Proteinase K (1 mg/ml) and 10% SDS buffer were added and incubated at 55°C overnight. A solution of ice-cold isopropanol (100%) and sodium acetate (245 mg/ml, pH 5.2) was used to precipitate the released DNA, which was then pelleted via centrifuge and resuspended in TE buffer (10 mM Tris-HCl, 1 mM EDTA) in a 37°C water bath for 30 min. DNA was purified using a genomic DNA Clean and Concentrator kit (Zymo Research Corp., Irvine, CA). Finally, extracted DNA was stored at -80°C.
DNA sequencing
DNA concentration was assessed using a Qubit dsDNA HS assay kit and a Qubit fluorometer (ThermoFisher, Waltham, MA). Next, 30-60 ng of genomic DNA per sample was visually examined via agarose gel electrophoresis to check for degraded DNA. Finally, a Nanodrop ND-1000 (ThermoFisher, Waltham, MA) was used to assess sample purity and verify that A260/A280 values were between 1.6-2.0 and A260/A230 values were between 2.0-2.2. Frozen DNA was transported in 25-500 μl of 1xTE DNA suspension buffer per sample to the DOE Joint Genome Institute (JGI).
A total of 236 samples collected between 2011 and 2020 underwent successful short-read metagenomic sequencing at JGI. After QC, total of 120 samples required additional size selection, thus, an input of 10.0 ng of genomic DNA was sheared around 300 bp using the LE220-Plus Focused-ultrasonicator (Covaris) and size selected with a double SPRI method using Mag-Bind Total Pure NGS beads (Omega Bio-tek). Using the KAPA-HyperPrep kit's (Roche) one-tube chemistry of end-repair, A-tailing, and ligation with NEXTFLEX UDI Barcodes (PerkinElmer), the sample was enriched using 7 cycles of PCR. For all samples, the prepared libraries were quantified using KAPA Biosystems' next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. Sequencing of the flow cell was performed on the Illumina NovaSeq sequencer using NovaSeq XP V1.5 reagent kits, S4 flowcell, following a 2x151 indexed run recipe. A total of 3.21 Tbp of short-read metagenomic data was produced at an average of 13.6 Gbp/sample, exceeding target sequencing depth.
An additional 31 metagenomic libraries were sequenced from samples collected in 2021-2022 (total 0.259 Tbp, average 8.36 Gbp/sample), this data is available through the National Center for Biotechnology Information Sequence Read Archive (BioProject ID PRJNA624320). For sequence library preparation please see Supplementary information.
Assembly and annotation
Short-read metagenomes were processed following the Joint Genome Institute’s (JGI) metagenomics workflows61. Raw paired-end sequences were quality controlled using rqcfilter2 from BBTools (v38.94)62. Specifically, ‘bbduk’ was used to remove adapters, perform quality trimming by removing reads where quality drops to zero, removing reads that contain 4 or more “N” bases, removing reads that have an average quality score < 3, and removing reads with a minimum length of < 51 bp. Bbduk was also used for artifact removal of homopolymer stretches of 5 G’s or more at the ends of reads. Bbmap was used to remove reads that matched at 93% identity to host and common microbial contaminant sequences. Read error correction was performed using ‘bbcms’ from BBTools (v38.94) with a minimum count of 2 and a high-count fraction of 0.6. Corrected reads were then assembled using metaSPAdes (v3.15.0)63 with the ‘metagenome’ flag and kmer sizes of 33, 55, 77, 99, and 127. Contigs smaller than 200 bp were discarded. Assembled reads were mapped back to the contigs to obtain coverage information using ‘bbmap’ from BBTools (v38.94) with ‘interleaved’ as true, ‘ambiguous’ as random, and the ‘covstats’ option specifying a contig coverage file. The assembled reads were structurally annotated using tRNAscan-SE (v2.0)64, RFAM65, CRT-CLI (v1.8), Prodigal (v2.6.3)66, and GeneMarkS-2 (v1.07)67 as previously described61. Structural annotation results were merged to create a consensus structural annotation, which was then used for functional annotation. On average, 71.99% of quality filtered reads (Minimum: 22.67%, Maximum: 100%) were assembled.
Functional annotations were predicted using Last (v983)68 and custom hidden Markov models implemented through HMMER (v3.1b2)69 and TMHMM (v2.0)70. Functional annotations were assigned using multiple protein family databases: KO71, EC72, COG73, TIGRFAM74, and Pfam75. Phylogenetic assignments for each read were assigned based on the best Last hits of the protein coding genes (CDSs). A consensus phylogenetic assignment for each contig was generated using a majority rule, whereby the lineage at the lowest taxonomic rank to which at least 50% of CDSs on the contig was assigned. JGI’s pipeline can be implemented using the National Microbiome Data Collaborative’s (NMDC) open-source online platform Empowering the Development of Genomics Expertise (EDGE)76.
Community analysis
The microbiome time-series analysis was based on many of the recommendations from Coenen et al (2020)77. The unit of biodiversity (akin to an OTU) is either taxonomic (at ‘Family’ level) or unique protein families using primarily KEGG Orthologs (KO) but also COG, Pfam, or TIGRFAM classifications. Occurrence was normalized to summed total coverage for each sample. We excluded biodiversity units with a frequency below 5E-5 resulting in a low occurrence of absent units (i.e., zeros). Only having a few instances of zeros for the relative abundances allowed us to use linear models (e.g., PCA) rather than applying non-linear tools (like Bray-Curtis similarity and PCoA) when comparing samples or units of biodiversity.
We normalized each biodiversity unit to a mean of zero and a standard deviation of one (i.e., z-score) when analyzing the temporal variation of individual taxa or genes. From this, we fitted models with monthly (12 levels representing seasonal variation) and yearly (12 levels representing interannual variation) categorical factors. A low frequency band-pass filter using ‘lowess’ (Matlab ‘smoothdata’ function) was applied when assessing temporal changes. There was a strong autocorrelation with temporally adjacent samples sharing 92.5% taxonomic and 98.7% functional similarity.
Average genome size
The average genome size was calculated according to the standard JGI IMG pipeline. It was estimated as: NBases_assembly / NGenomes. NGenomes, the median coverage of conserved single-copy core genes (139 marker genes with associated Pfam IDs) as described previously78.
Compositional variance analysis
We used PERMANOVA79,80 to quantify whole community composition variance associated with seasonal and interannual changes. We used sampling month and year as well as their interactions as factors. Community composition was estimated using taxonomic (family level) or functional (COG, KO, Pfam, and TIGRfam) changes. We estimated the Euclidean distance of the normalized frequency of each unit of biodiversity for the compositional matrix. We used ‘adonis’ from the ‘vegan’ package (v2.6-4) in R81,82 for this analysis.
Multi-table co-inertia analysis
A joint principal component analysis of both taxonomic (family level) and functional changes (COG, KO, PFam, and TIGRFam) was performed as previously described83. In brief, we used Multi-table Co-Intertia Analysis from the ‘ade4’ package (v1.7-22) in R84,85. The first two principal components are presented from this analysis.
Annotation keyword model
A ‘Natural Language Processing Model’ was trained on all gene annotations. This was done using the ‘spaCy’ Python package86. Subsequently, we calculated the time-series of all keywords weighted by the relative frequency of the associated gene. Next, we found the seasonal oscillation and annual change of each keyword.