Soil and plant sampling
Accessions of old wheat lines were provided by NordGen, Alnarp, Sweden (https://www.nordgen.org/en/). The seeds of 5 accessions of wheat ancestors (T. turgidum ssp. dicoccum (Tt), T. monococcum ssp. monococcum (Tm) and T. monococcum ssp. aegilopoides (Tm)), 5 accessions of landraces (T. aestivum ssp aestivum), one accession of T. aestivum ssp. spelta (Tas) and 4 modern wheat cultivars (T. aestivum ssp aestivum) were sown on a sandy clay loam soil in the autumn 2017 in 50 cm rows at Aarhus University, Flakkebjerg, Denmark (55.322631°N, 11.394336°E) (Table 1). For data analysis, all accessions were categorized as modern cultivars, landraces, Tas or wheat ancestors, Tt and Tm (Table 1).
For each cultivar, root samples including tightly attached soil, were collected destructively in quadruplicates at BBCH10 (emergence of first leaf), BBCH12 (first leaf fully unfolded) and BBCH21 (first side shoot visible) [42] during the autumn 2017 starting from October 31 (Table S1). Each root sample was a composite of four plants, which were randomly uprooted and carefully freed from the soil loosely attached soil.
Bulk soil samples were collected from the plough layer (0-20 cm) in between crop rows at each sampling point. In each sampling line (space between two crop rows) 10 soil sub-samples were collected, thoroughly mixed and from those two replicate composite bulk soil samples were collected (n=10). All samples were frozen at -80oC until further processing for DNA extraction.
Sample homogenization and DNA extraction
Soil and root samples were freeze-dried for 48h at -111oC and 0.0026 Pa using a CoolSafe CS110-4 Pro freeze dryer (LaboGene, Denmark) and finely crushed using zirconium oxide grinding balls (10mm) and a Fast & Fluid Management shaker-SK350 (Fast & Fluid, Sassenheim Netherlands). DNA was extracted from homogenized soil and root samples using the NucleoSpin® DNA 96 Soil Kit (Macherey-Nagel GmbH & Co KG, Düren, Germany) adapted to a Biomek® FCP Laboratory Automation Workstation (Beckman Coulter™, CA, USA). Four negative controls (500µL of PCR graded water) were included per extraction plate and used for amplicon library preparation together with the samples. Concentration of extracted DNA was measured using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, MA, USA) with a Qubit dsDNA HS Assay Kit (range 0.2-100ng; Invitrogen, Life Technologies). DNA was then stored at -20oC.
Amplicon library preparation and sequencing
Extracted DNA was used to produce bacterial and fungal amplicon libraries (Table S2). The fungal ITS2 region was amplified using fITS7 and ITS4 primers (Ihrmark et al., 2012) and the bacterial 16S rRNA V3-V4 region was targeted using primers S-DBact-0341-b-S-17/S-D-Bact-0785-a-A-21 [43]. PCR was performed in a reaction mixture of 25 μl consisting of 1 × PCR reaction buffer, 1.5 mM MgCl2, 0.2 mM dNTPs, 1 μM of each primer, 1 U of GoTaq Flexi polymerase (Promega Corporation, Madison, USA) and 1µl of DNA template. PCR was conducted in a GeneAmp PCR System 9700 thermal cycler (Thermo Fisher Scientific) using 94 °C for 5 min, followed by 25 cycles at 94 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, and a final elongation step at 72 ◦C for 10 min. For fungal amplicon library preparation, an annealing temperature of 57 °C was used as recommended [44].For dual indexing, primers including indexing tags were used in a PCR for 10 cycles, with the thermal cycler program as described above. In addition to dual indexing, internal barcodes of varying length were added to the forward primer for combining samples within each index combination as described earlier [45, 46]. After PCR, amplicon size was confirmed by visualization in a 1.5% agarose gel using SYBR staining, PCR products were pooled, precipitated and re-eluted as described earlier [47]. In order to remove primers and shorter reads, the pooled DNA was separated on a 1.5 % agarose gel and amplicons of the expected size (300–450 bp) were extracted using a QIAquick Gel Extraction Kit (Qiagen, Copenhagen, Denmark). The DNA concentration of the amplicon library was evaluated using a Qubit® Flurometer (Thermo Fisher Scientific). Amplicon libraries were shipped to Eurofins MWG (Ebersberg, Germany) for sequencing on an Illumina MiSeq platform using a dual indexing strategy.
Amplicon sequencing data analysis
Quality processing and bioinformatics of the reads obtained from Illumina MiSeq were analysed as described earlier [48]. Briefly, paired end reads were filtered with Phred quality scores > 30 and merged with an overlapping minimum read length of 30 base pairs using VSEARCH v. 2.6 [49]. Forward and reverse primers, as well as internal barcodes were trimmed and reads with less than 200 base pairs were excluded. Before clustering at 97% similarity, all reads were dereplicated and screened using VSEARCH. Taxonomy assignments for the clustered operational taxonomic units (OTUs) was done using SILVA v. 132 for bacteria and UNITE v. 8.0 for fungi in QIIME v 1.9, using assign_taxonomy.py [50, 51]. OTUs unassigned at kingdom level or assigned as chloroplast or mitochondrial sequences were removed from the datasets. Furthermore, samples with less than 2000 reads were excluded from downstream analysis.
Bacterial and fungal annotation tables were analyzed in R v.3.5.2 (R Core Team, 2017), using the RStudio development environment (RStudio Team) and making use of the ‘phyloseq’ package [52]. The OTU tables were transformed, by either rarefaction or relative abundance, before executing diversity-based calculations. Alpha diversity was estimated using observed OTU richness and Shannon diversity measures, using the mean value from rarified OTU tables generated 100 times at a sampling depth of 2000 reads per sample. Significant differences between diversity indices were determined using the Kruskal-Wallis rank sum test. Bray-Curtis distance matrices were used for beta diversity analysis on OTU level representing > 2000 reads per sample and transformed to relative abundance. Variance partitioning and significance for experimental factors were detected by PERMANOVA (R package: ‘vegan’). Dissimilarity between samples (based on Bray-Curtis distances) was visualized using NMDS plots.
Community assembly model
To study the importance of selection versus neutral processes in community assembly, a modified version [53] of the model described by Sloanet al. [25] was applied. This neutral community assembly model (NCM) predicts the frequencies with which taxa should occur in target communities based on their abundance in the source community (also referred to as metacommunity). In short, the model predicts that abundant taxa in the source community will be observed more frequently in the target communities due to increased immigration opportunities, while rare taxa will more likely be lost in the target community due to ecological drift. The model fit is determined by the coupled parameter Nt*m; where Nt is the size of the community (the average number of reads was considered as an estimation of the community size); m is the migration rate. The migration rate is the estimated probability that the random loss of an individual in the local community will be replaced by immigration from the source community. The fitting of the model parameters was performed in R where binomial proportion 95% confidence intervals (based on the Wilson method) around the model predictions were calculated using the ‘HMisc’ package [54]. In this study, root samples were considered as local communities, while the composition of the source community was inferred by averaging the composition of the bulk soil samples. Among the local communities, two separate groups were identified: roots communities associated with the modern opposed to old cultivars; these two analyses were performed for the fungal and bacterial communities separately. OTUs falling between the 95% confidence interval were considered to be present as a result of neutral dynamics of birth, death and immigration from the source (bulk soil); OTUs falling outside the confidence interval (and with the ratio between observed frequency and predicted frequency greater than 1.5 or lower than 0.5) were found with frequencies disproportionally higher or lower than predicted by the model based on their abundances in the soil and therefore considered to be present in the communities as a result of deterministic factors. Pearson Correlation between the frequency of the sequence variants estimated by the model and that of observed was used to determine the statistical significance of model fit. The analysis was performed considering only OTUs shared by the bulk soil community (source) and the root community (target). In this case, all sequence variants that were detected in the root, but not in the bulk soil, and vice versa were excluded, as it is routinely done [53, 55, 56].
Net relatedness index (NRI), an index of community phylogenetic structure, was used to study the phylogenetic community composition within each sample [57, 58]. NRI quantifies the grade of phylogenetic clustering of a given community compared to a stochastically assembled community composed of taxa from the regional pool. The NRI was calculated by comparing the observed mean pairwise phylogenetic distance (MPD) of OTUs in each sample with the MPD of a null distribution, obtained by randomising the OTUs of the regional pool and their relative abundances across the tips of phylogeny. Values of NRI greater than +2 are typical of communities where taxa appear more closely related than expected by chance (phylogenetic clustering); while values of NRI less than -2 indicate the presence of taxa more distantly related (phylogenetic over-dispersion) [59]. The function ‘ses.mpd’ in the package ‘picante’ was used to calculate the index, employing the first 6000 most abundant OTUs and using ‘taxa.labels’ as the reference null model [60].
Average NRI values were calculated for bacterial communities from bulk soil (SOURCE); from roots of old accessions (OLD) and modern cultivars (MODERN). This analysis was employed to study the relative contribution of stochastic and deterministic factors on the phylogenetic assembly in the three different environments. Parametric and non-parametric analyses were performed to ask whether the average NRI values were statistically different from zero and between the three environments.