Two resistant and two susceptible cotton cultivars were chosen as parents for crossing: Gossypium hirsutum cultivars, Zhongzhimain2 (Z2, resistant against V. dahliae) crossed with Jimian11 (J11, susceptible), and LocalVIR875-1 (L1, susceptible) with Zhongmiansuo49 (Z49, resistant) (Fig. 1A). F2 segregating populations were obtained from the two crosses and used to study heritability of rhizosphere microbiome.
Site description and field experiment design
In 30 April 2019, a field experiment was set up at the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (Anyang, China) (Fig. 1C). The field, was used for evaluating cotton cultivar resistance against V. dahliae (disease nursery), were artificially inoculated with the pathogen 20 years ago. The soil at the experimental site (36˚03′44″N, 114˚28′52″E) is classified as cambisol type soil . A completely randomized block design with four blocks was used. Twenty-five F2 plants from each cross were randomly allocated to block. Within each block, there were six plots, each was 5 m long with two rows (0.8 m between two rows); neighbouring plots were separated by 0.8 m; each of the six plots was randomly assigned to one of the six groups of plants: four parents and two F2 populations. There were 100 F2 plants for each cross. On 22 August 2019, wilt severity of all individual plants was recorded on a scale of 0 to 4 as described previously . For cv. Z2, there were 68, 19, 11, 2 and 0 plants with severity score of 0, 1, 2, 3 and 4, respectively; the corresponding values for cv. J11 were 8, 8, 22, 51 and 11 plants, for cv. L1 were 15, 13, 25, 39 and 8 plants, and for cv. Z49 were 52, 29, 15, 4 and 0 plants (Table S1). For Z2×J11 cross, there were 23, 27, 28, 17 and 5 plants with severity score of 0, 1, 2, 3 and 4, respectively; the corresponding values for L1×Z49 cross were 31, 34, 19, 14 and 2 plants (Fig. 1B; Table S1).
Rhizosphere sample collection
During late August (at the boll-forming stage) (Fig. 1C), approximately 16 weeks after sowing, rhizosphere soil samples were collected and stored as described previously . For F2, all plants were sampled; for each of the four parents, only four plants were sampled from each block, giving 16 plants per parental genotype. Each plant was carefully removed from the soil with a spade. Root system of the plant were first vigorously shaken to remove loosely adhering soil particles. Plant fine root were cut into pieces of approximately 2 cm length with a pair of sterile scissors. Rhizosphere soil samples were harvested in 500-ml screw-cap bottles with ca. 20 g roots. Each bottle was filled up to 300 ml with 1:50 TE buffer (1 M Tris, 500 mM EDTA, and 1.2% Triton diluted in sterile distilled water) and shaken at 270 rpm for 1 h. The root-washing suspension was filtered with sterile cheesecloth and centrifuged at 4,000 ×g for 20 min . The supernatant was discarded by pipetting. This procedure was repeated three times before the pellets were re-suspended in the remaining solution, transferred to a 2-ml Eppendorf tube and centrifuged at 14,000 ×g for 20 min. The pellets were immediately frozen and stored at -80˚C before DNA extraction.
DNA extraction and sequencing
Soil pellets were re-suspended in 500 μl MoBio PowerSoil bead solution, and DNA was extracted from the resulting pellets (250 mg) using the MoBio PowerSoil DNA Isolation Kit (MoBio Laboratories, Carlsbad, CA, USA) according to the manufacturer’s protocol. The extracts were checked on a 1% agarose gel and DNA concentration was estimated by NanoDrop ND-2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). DNA was stored at -80°C until further analysis.
For bacteria, the V3-V4 hypervariable region of the 16S rRNA gene was amplified in triplicates for each sample using the 341F/805R primers . For fungi, primers ITS5/ITS2  were used to amplify the ITS1 region in triplicates for each sample. PCR reactions and the extraction and purification of amplicons followed previously established methods . Sequencing libraries were generated with the Ion Plus Fragment Library Kit 48 rxns (Thermo Scientific, USA) following the manufacturer's recommendations. The quality of each library was assessed on a Qubit 2.0 Fluorometer (Life Technologies, USA). Finally, total DNAs were submitted to Novogene Co. Ltd. (Beijing, China) for Illumina sequencing, the libraries were sequenced on an Ion S5™ XL platform (Thermo Fisher Scientific, Waltham, MA) to generate single-end reads. In total 528 libraries were sequenced: 264 rhizosphere samples (2 crosses × 100 F2 plants + 4 parents × 16 replicates) each for 16S rRNA gene and ITS rRNA gene.
Sequencing processing and taxonomy assignment
The clear reads from the service provider were used to generate OTUs and assign taxonomy, following an established pipeline . Briefly, high-quality sequences were obtained by quality control and filtering of sequence quality with very stringent criteria followed our previous publication  and was carried out separately for the two type of data sets (16S and ITS). High quality sequences were first dereplicated and unique sequences with only 1 read were discarded. Then all unique sequence reads were sorted by their respective frequencies and then grouped into operational taxonomic units (OTUs) based on 97% or greater identity. All OTU were processed using the UPARSE pipeline (Version 10.0)  unless specified otherwise. The clustering algorithm also removed chimeras. The SINTAX algorithm (https://www.drive5.com/usearch/manual/sintax_algo.html) then assigned each OTU representative sequence to taxonomic ranks by alignment with the gene sequences against Unite V7 fungal database  and RDP training set (v16) bacterial database . Then an OTU table (a sample-by-observation contingency table) was generated by aligning all sequences filtered with far less stringent criteria with the OTU representative sequences as described by Deakin et al. .
Statistical data analysis
The median-of-ratios method implemented in DESeq2 [31,32] was used to normalise the OTU counts before any statistical analysis. All statistical analyses were carried out in R 4.0.3. There were six groups of samples, classified into two hierarchical levels: two crosses (Z2×J11 and L1×Z49), and three groups within each cross (two parents, each with 16 replicates, and the 100 F2 progeny plants).
Alpha (a) diversity (Shannon and Simpson) indices were calculated using the R vegan 2.3-1 package . The rank of a diversity indices were subjected to ANOVA to assess the differences between two crosses and between three groups within each cross via a permutation test for significance. Beta (b) diversity indices were not calculated since the main objective of this study was to compare the variance among the 100 progeny samples with the estimated residual variance for estimation of heritability.
To determine whether there is significant genetic variability among the 100 F2 progeny from each cross, a F-test was carried out to compare two variance estimates: variance [VT] among the 100 progeny samples, and the random variance [VE] measuring random environmental errors experienced in the experiment. This random variance was estimated as the residual variance of the ANOVA of the four parents, each with 16 replicates. If the F test showed that VT is significantly greater than VE, then the genetic variance [VG] component was estimated as (VT – VE). Consequently, broad sense heritability was estimated as VG/VT.
Two datasets were used for analysis of genetic components: principal component (PC) scores and the normalised data of those OTUs with highest counts. To generate PC scores, only those OTUs with highest counts accounting for 99.99% of the total normalised counts were retained for PC analysis (PCA). Before PCA, the normalised counts data were first logarithm transformed on the natural base and then standardised. The 100 OTUs with the highest counts data were selected for genetic component analysis. For each PC or OTU, VE was estimated from the parent samples and VT from the 100 progeny samples. Similarly, two single degree contrasts were used to test whether the two parents of each cross differ significantly. Within the analysis of each data type (PC or OTU), the Benjamini-Hochberg (BH) adjustment was used to correct for the false discovery rate associated with the multiple testing. Statistical significance was determined at the 5% level (BH adjusted). BLASTn searches against the GenBank non-redundant database was then used to further characterise those OTUs with significant genetic components.