Sporophyll sample collection and In-Situ experimental setup
Sporophylls (i.e., reproductive tissue) of M. pyrifera sporophytes containing zoospores, the progenitors of gametophytes, were collected from four natural spanning regions in southern California (Arroyo Quemada 34. 468783º N -120.121417 º W, Leo Carrillo 34.042933º N -118.934500º W, Catalina Island 33.446882º N, -118.485067 º W, and Camp Pendleton 33.29091º N -117.499969º W).
Genetic groups were first distinguished in these regions in 2015 and are characterized by their rates of genetic divergence (Johannson et al. 2015; Osborne et al. 2023). The northernmost population, Arroyo Quemada, experiences moderate ocean temperatures, with a winter average of 14°C, and a summer average sea surface temperature of 17°C. At Leo Carrillo, summer temperatures range from 16–20°C, while winter temperatures average between 12–16°C. The central population at the Channel Islands is shifting to higher temperatures, with summer averages around 21°C and winter averages approximately 13°C. Lastly, the temperature range is the most extensive at Camp Pendleton, the southernmost population. Here, summer temperatures peak at about 24–28°C, and in winter they drop to around 8–14°C (JPL MUR MEaSUREs Project, 2015). These patterns present a notable, albeit non-linear, sea surface temperature gradient from north to south across these marine environments.
While detailed descriptions of the 2019 farm design can be found in Osborne et al. (2023), the main features will be briefly summarized here. Gametophyte cell cultures were fragmented and
seeded on polyvinyl strings exposed to increasing white light levels over four days, with the resulting crosses growing for a month until sporophytes developed and were shipped to a marine laboratory at the University of California, Santa Barbara. Prior to out-planting, the sporophyte strings were attached to a seeding line, and ten seedling lines with 250 genotypes each were fastened to longlines by divers, with one set of 500 genotypes out-planted to two adjacent longlines on the farm in May 2019. Ocean temperatures (Fig. 2.) at the experimental farm increased throughout the summer following a brief period of upwelling in May with temperature peaking in August, which is typically the warmest month of the year. Temperature data was collected via 5 HOBO Pendant Temperature and Light Loggers (model # UA-002-64), arrayed across alternating lines and recording data at 10-minute intervals. Daily averaged temperatures were recorded as 15.5 ± 1.6 ºC (mean ± SD, range = 9.9–21.8 ºC). Approximately 125 days after out-planting, the 1,686 surviving sporophytes were harvested and returned to the laboratory, where they were weighed wet. The wet mass of the 491 (out of 500) surviving genotypes (averaged across replicates) ranged from 5 g to 467.8 g.
Gametophyte isolation and DNA Extraction
Fertile sporophylls were shipped overnight to the University of Wisconsin-Milwaukee where we induced spore release via the Oppliger method approximately 24 hrs. post-collection (Osborne et al. 2023; Oppliger et al. 2011). Sporophylls were first cleaned by being dipped into solutions for no more than 30 seconds to remove excess epiphytes and bacteria. Zoospore was induced by submerging whole sporophylls into sterile Provasali-Enriched Seawater (PES) (Provasali 1968) made with Instant Ocean Sea Salt and ultrapure water (Symplicity water system) at a salinity of 34 parts per thousand (PPT) (Osborne et al. 2023). Zoospores were inoculated in 60 x 15 mm Petri dishes with 10 mL of PES. For each sporophyte donor, two final densities of 10 and 100 zoospores. mm− 2 that allows for single gametophyte isolation (Osborne et al. 2023). were incubated in growth chambers at 12º C with a 12:12 hour light:dark photoperiod at 6 ± 3 photons m− 2 s-1 red light. PES media was replaced every five weeks. Gametophytes were allowed to form from Zoospores into approximately 100 µm sizes before being sexed and isolated following standardized protocols published in Osborne et al. (2023). Before the extraction of DNA, individual gametophyte samples were mildly centrifuged, and the supernatant discarded, yielding 50–100 mg of gametophyte biomass. Gametophyte tissue was ground using liquid nitrogen. The NucleoSpin 96 Plant Kit (Macherey-Nagel) obtained high-quality genomic DNA from 500 female and 100 male gametophyte lines. There was no administration of antibiotic treatment before the extraction of DNA (Osborne et al. 2023).
Gametophyte Sequencing, SNP Calling and Preprocessing
Genomic DNA was extracted from gametophytes for whole-genome re-sequencing, using an Illumina S4 Novaseq platform to yield 11.2 GB of 150 base pair reads per sample, with a total of 559 samples sequenced. The raw reads were processed to remove adapters and low-quality sequences, then aligned to the M. pyrifera nuclear genome. Duplicate reads were marked, and genetic variants were called with a ploidy of 1. The resultant raw VCF file was filtered to extract biallelic SNPs, with further filtering based on scaffold presence, mean depth, and missing data percentage. The final dataset comprised 504 gametophytes and 1,515,399 biallelic SNPs. Missing data were imputed using LinkImpute, and the final imputed dataset, MAF0, contained no allele filtration. A second dataset, MAF5, was created by excluding SNPs with a minor allele frequency less than 0.05, resulting in 504 gametophytes and 602,493 SNPs. The full explanation of gametophyte sequencing, SNP calling and reprocessing methods can be found in the supplementary methods.
FI Normalization and HST Estimation
The extracted FI data was preprocessed and used to estimate HST for the screened 204 gametophytes. Given the fact that FI values reflect the size of gametophytes and that the tested gametophytes did not have the same initial size (at week 0), a normalization step was required. Each replicate had three FI values, FI.0, FI.2, and FI.4, for weeks 0, 2, and 4, respectively, for 5 different treatment temperatures (9,180 recorded FI values in total). These values were transformed for each replicate under a particular treatment as follows:
$$Norm.FI.w=\text{ln}\left(\frac{FI.w}{FI.0}\right), w\in \left\{\text{0,2},4\right\}.$$
Next, for each replicate, we used a linear regression with a zero intercept to model HST based on the normalized FI values:
where \(Norm.FI={\left(Norm.FI.0,Norm.FI.2,Norm.FI.4\right)}^{T}\) and \(Week={\left(\text{0,2},4\right)}^{T}\), respectively. To estimate HST, we used the R function lm() with the formula Norm.FI ~ Week + 0. The HST for a treatment temperature for a genotype was calculated as an average of the replicates’ HST values (1,020 values as a result; 5 HST values for a genotype).
Correlation Analysis
Upon merging the heat stress experiment data (165 female gametophytes) and the farm experiment data (491 parental female gametophytes), we obtained an intersection in 164 female gametophytes, with 145 having genotype information. These 145 genotyped gametophytes are therefore present in the MAF0 and MAF5 datasets. We graphed these 164 gametophytes in Fig. 3, placing the HST data on the x-axis (5 facets for the 5 treatment temperatures) and the sporophytes' biomass on the y-axis. Depicted regression lines and correlations with p-values were estimated and drawn using the geom_smooth() and stat_cor() functions in R (R Core Team 2023). The former belongs to the ggplot2 package (Wickham 2023) and was used with parameters method = "lm" and formula = "y ~ x", while the latter is from the ggpubr package (Kassambara 2023). To define the top 20 performing gametophytes, we scaled both the HST at 25° C and the biomass data (BM) to the range \([0, 1]\) using the following formulas:
$${HST}_{scaled}=\frac{HST-min\left(HST\right)}{max\left(HST\right)-min\left(HST\right)}, {BM}_{scaled}=\frac{BM-min\left(BM\right)}{max\left(BM\right)-min\left(BM\right)},$$
A Euclidian distance was calculated between \((1, 1)\) and \((HS{T}_{scaled}, {BM}_{scaled})\) for each gametophyte. The 20 gametophytes closest to the point \((1, 1)\) were then selected based on these distances. Among these 20 gametophytes, 18 had genotypes, which we emphasized in Fig. 3 by different colors. The population distribution was the following: 4 from Arroyo Quemado, 6 from Catalina Island, and 8 from Camp Pendleton.
Genetic Diversity Analysis
To assess how genetically distant two gametophytes are, we applied the following formula:
$$GenDist\left({g}_{1},{g}_{2}\right)=1-IBS\left({g}_{1},{g}_{2}\right),$$
Where 𝑔1 and 𝑔2 are the genotypes of two gametophytes based on SNPs, and 𝐼𝐵𝑆(𝑔1, 𝑔2) is Identity-by-State between two genotypes showing the fraction of shared alleles. The proposed metric ranges between 0 and 1, with 0 representing identical genotypes and 1 being the opposite case. To estimate the genetic diversity of a set of gametophytes, we averaged the genetic distances of all unique gametophyte pairs in the set, i.e.:
$$GenDiv\left(S\right)=\frac{1}{\left|P\right|}\sum _{\left({g}_{1},{g}_{2}\right)\in P}GenDist\left({g}_{1},{g}_{2}\right),$$
where 𝑆 is the set of gametophytes, 𝑃 is the set of the unique pairs, and \(\left|P\right|\) is the number of the unique pairs.
We aimed to investigate whether the genetic diversity of our 18 top-performing gametophytes is abnormal. For that, we modeled a genetic diversity distribution based on 50,000 random sets of gametophytes. Each set had the same population distribution (4 from Arroyo Quemado, 6 from Catalina Island, and 8 from Camp Pendleton), and all sets were generated in a way that all population-specific subsets were not duplicated, meaning if we leave gametophytes of one population in the sets, all sets will keep being unique. Additionally, we estimated what the percentile of the modeled distribution is the genetic diversity of the top 20 performing gametophytes. This analysis was performed for both SNP datasets, MAF0 and MAF5.
Table 1
Top 20 performing female gametophytes based on HST and progeny’s biomass; AQ – Arroyo Quemado, CI – Catalina Island, CP – Camp Pendleton.
Female Gametophyte | Population | HST | Sporophyte's Biomass | Genotyped | Rank |
CP.46.F.A1 | CP | -0.087 | 467.80 | No | 1 |
CP.04.F.A3 | CP | -0.054 | 403.75 | Yes | 2 |
CP.70.F.D3 | CP | -0.047 | 378.75 | Yes | 3 |
AQ.59A.F.C1 | AQ | -0.069 | 352.67 | Yes | 4 |
AQ.62.F.A5 | AQ | -0.029 | 328.75 | Yes | 5 |
CI.17.F.C3 | CI | -0.104 | 344.80 | Yes | 6 |
AQ.28.F.A5 | AQ | -0.106 | 310.67 | Yes | 7 |
CP.45.F.D4 | CP | -0.080 | 300.25 | No | 8 |
CI.43.F.D3 | CI | -0.065 | 282.00 | Yes | 9 |
CP.73.F.B2 | CP | -0.172 | 300.00 | Yes | 10 |
CP.01.F.D3 | CP | -0.057 | 270.00 | Yes | 11 |
CP.67.F.C4 | CP | -0.062 | 266.25 | Yes | 12 |
CI.36.F.B4 | CI | 0.031 | 245.00 | Yes | 13 |
CP.05.F.D2 | CP | -0.116 | 259.00 | Yes | 14 |
CI.22.F.D2 | CI | -0.130 | 262.00 | Yes | 15 |
CI.01.F.A4 | CI | -0.032 | 241.67 | Yes | 16 |
CP.41.F.D1 | CP | -0.015 | 234.40 | Yes | 17 |
AQ.21.F.C1 | AQ | -0.315 | 328.40 | Yes | 18 |
CP.66.F.B1 | CP | -0.074 | 230.40 | Yes | 19 |
CI.58.F.C2 | CI | -0.394 | 386.75 | Yes | 20 |
Supplementary Methods
Extracted gametophyte genomic DNA was prepared for DNA library preparation (BGI North America NGS labs) and whole-genome re-sequencing. Post-library preparation, an Illumina S4 Novaseq platform was used to sequence the samples, yielding 11.2 GB of 150 base pair reads for each sample. A total of 559 samples underwent sequencing. Raw 150 base pair Illumina reads were trimmed of adapters, low-quality reads, and tails using a fast set with standard parameters (Chen et al. 2018).
Trimmed reads were aligned to the nuclear genome of M. pyrifera (Diesel et al. 2023) using hisat2 v2.1 with standard parameters (Kim et al. 2019). Bam files had their duplicates marked using the GATK4 v4.1.2 command “MarkDuplicates”, and then multiple bam files for a single individual genotype were collapsed into a single bam file using samtools (Van der Auwera et al. 2013; Li et al. 2009). Genetic variants were called using the GATK4 v4.1.2 with ploidy set to 1 (Van der Auwera et al. 2013). Individual GVCF files were then merged and converted into a raw VCF file containing variant information and used for downstream applications using GATK v4.1.2 (Van der Auwera et al. 2013).
The obtained raw VCF file was utilized for the extraction of biallelic SNPs. Then, we performed hard filtering with the thresholds suggested by GATK for filtering germline short variants. Next, we kept SNPs present in scaffolds and had a mean depth between 2 and 10 and a percentage of missing data of less than 7.5%. Finally, gametophytes with a mean depth of less than 2 were excluded from the final dataset. Additionally, we removed all monomorphic SNPs generated after removing the gametophytes. The filtered VCF file was comprised of 504 gametophytes and 1,515,399 biallelic SNPs. All extractions were performed using vcftools v0.1.14 (Danecek et al. 2011) and R (R Core Team 2023).
To impute missing data, we developed the following pipeline. We recorded the filtered VCF file into the PLINK binary format (*.bed, *.bim, and *.fam) and divided data based on scaffolds. Each scaffold was imputed 100 times using LinkImpute v1.1.5 (Money et al. 2015) with the parameter –nummask = 500000 if the number of non-missing entries was more than 500000; otherwise, we used the number of non-missing entries itself as the parameter value. Between two alleles, the final imputed one is that have a bigger score (the sum of two scores equals 100). Then, the imputed scaffolds were merged back. All manipulations with SNP data at this stage, except the imputation, were performed using PLINK v1.9 (Chang et al. 2015)d
The resulting dataset was MAF0 (imputed; 504 gametophytes and 1,515,399 SNPs), indicating no allele filtration. Additionally, we prepared a dataset with filtrated alleles based on minor allele frequency. SNPs with minor allele frequency less than 0.05 were excluded resulting in 504 gametophytes and 602,493 SNPs. We marked this dataset as MAF5.