A unique and reliable fecal DNA extraction method for 16S rRNA gene and shotgun metagenomic sequencing in the analysis of the human gut microbiome

Background: The gut microbiome is widely analyzed using high-throughput sequencing, such as 16S rRNA gene amplicon sequencing and shotgun metagenomic sequencing (SMS). DNA extraction is known to have a large impact on the metagenomic analyses. The aim of this study was to select a unique and best performing DNA extraction protocol for both metagenomic sequencing methods. In that context, four commonly used DNA extraction methods were compared for the analysis of the gut microbiota. Commercial versions were evaluated against modied protocols using a stool preprocessing device (SPD, bioMérieux) in order to facilitate DNA extraction. Stool samples from nine healthy volunteers and nine patients with a Clostridium dicile infection were extracted with all protocols and sequenced with both metagenomic methods. Protocols were ranked using wet- and dry-lab criteria, including quality controls of the extracted genomic DNA, alpha-diversity, accuracy using a mock community of known composition and repeatability across technical replicates. Results: Independently of the sequencing methods used, SPD signicantly improved eciency of the four tested protocols compared with their commercial version, in terms of extracted DNA quality, accuracy of the predicted composition of the microbiota (notably for Gram-positive bacteria), sample alpha-diversity, and experimental repeatability. The best overall performance was obtained for the S-DQ protocol, SPD combined to the DNeasy PowerLyser PowerSoil protocol from QIAGEN. Conclusion: Based on this evaluation, we recommend to use the S-DQ protocol, to obtain standardized and high quality extracted DNA in the human gut microbiome studies.

The choice of the DNA extraction method has been demonstrated to strongly affect the detection of bacterial communities [35,36,[46][47][48]. DNA extraction is a sophisticated process, including sample weighing, sample homogenization, bacterial cell lysis, and DNA puri cation, for which each step still requires improvements and guidelines. For instance, the standard weighing procedure can be tedious and time-consuming to collect the same volume of fecal material for all samples. Sample homogenization could also have an impact on the bacteria that can be detected [49,50]. Surprisingly, few studies have reported the use of commercial devices to standardize the handling of fecal samples prior to DNA extraction. [31,51,52]. Also, as the cell wall of Gram-positive bacteria is composed of a thick layer of peptidoglycan, bead-beating is now recommended to improve the lysis [53,54]. Nevertheless, beads can vary in size and material (e.g, ceramic, glass, zirconia or silica), which may also play a role in the lysis e ciency. Even if commercial solutions provide standardized methods for bacterial lysis and DNA puri cation, some laboratories still use in-house protocols, making di cult the selection of one goldstandard protocol.
Recently, twenty-one DNA extraction protocols were compared in a multicentric study across three continents, using shotgun metagenomic sequencing (SMS) [35]. From the analysis of two healthy individuals, the authors proposed a QIAGEN protocol, named Q, as the standardized reference protocol for DNA extraction in human gut microbiome studies. However, it has been shown that different fecal samples may vary in terms of bacterial composition (gram-positive vs gram-negative cells) [35,55], microbial load (high vs low bacterial cells per fecal material) [56,57], disease-related clinical status (healthy vs sick individuals) [58][59][60] and stool consistency (separate hard lumps vs watery) [61][62][63]. A comparison study with a higher number of individuals including both healthy and sick donors would be of both clinical and technological interests to address the variability and heterogeneity of fecal samples.
Although SMS has the potential to deeply investigate microbial communities [64,65], amplicon sequencing targeting the 16S rRNA gene is often the preferred and the most cost-effective metagenomic method in the analysis of clinical cohorts [66,67]. Obviously, these sequencing methods have their own limitations and biases, which are important to consider for the selection of one DNA extraction protocol in human gut microbiome studies.
To address these considerations, our study evaluated four commercially available DNA extraction methods, using both SMS and 16S rRNA amplicon sequencing. These protocols were tested as recommended by the manufacturers, but also with an upstream stool preprocessing device (SPD), designed to facilitate DNA extraction [51]. The protocols were evaluated according to wet-lab as well as dry-lab criteria, using nine healthy individuals and nine Clostridium di cile infected patients.

Stool samples.
Fecal samples from nine healthy volunteers and nine patients with Clostridium di cile infection (CDI) were provided by a certi ed testing laboratory in France and tested for Clostridium di cile toxins. Upon reception, each fecal sample was freshly aliquoted into 24 tubes (8 protocols x 3 replicates) and frozen at -80 °C until extraction, the − 80 °C storage being known to maintain a stable microbial community for long-term period [68].
Microbial mock community.
The microbial mock community was prepared by mixing nine bacteria (  [36]) and the ZymoBIOMICS DNA Mini kit (#D4300, protocol 1.1.0, ZymoResearch). These protocols were also tested in combination with a stool preprocessing device (SPD, #421061, bioMérieux, [51]). This device was designed to facilitate and standardize fecal sample preparation before nucleic acid extraction. It includes a spoon for a 200 mg calibrated sample and a vial containing a buffer for sample resuspension, glass beads for homogenization and two lters for retaining fecal debris. After 5 minutes hands-on-time, the ltrate is ready-to-use for downstream DNA extraction. Protocols of extraction methods as well as SPD are detailed in Supplementary Methods. DNA was extracted in triplicates from fecal samples and from the microbial community. A260/A280 ratio was assessed using the DropSense 96 system (Trinean). Genomic DNA size was assessed using the Genomic DNA ScreenTape (#5067-5364, Agilent) on the 2200 TapeStation system (Agilent). DNA concentrations were estimated using the QuantiFluor One dsDNA kit (#E4870, Promega) with the GloMax system (Promega).
16S rRNA gene library preparation and sequencing.
16S rRNA gene libraries was prepared according to Illumina's protocol (# 15044223 RevB, [69]). In order to minimize the risk of cross-contamination and pipetting errors, the work ow was automated using a highthroughput liquid handler; the Freedom EVO NGS workstation (TECAN) [70]. Brie y, V3-V4 hypervariable regions were rst ampli ed from 12.5 ng of genomic DNA, using the following primers: (i) Forward Primer: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGAGGCAGC-AG and (ii) Reverse Primer: GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVG-GGTWTCTAAT and 2x KAPA HiFi HotStart ReadyMix (Kapa Biosystems). PCR cycle conditions were 95 °C for 3 min, 25 cycles of (95 °C for 30 seconds, 55 °C for 30 seconds 72 °C for 30 seconds), then a nal extension of 72 °C for 5 minutes. The libraries were puri ed using AMPure XP beads (Beckman Coulter). Dual indexes and sequencing adapters from the Illumina Nextera XT index kits (Illumina) were added in a second PCR using 2x KAPA HiFi HotStart ReadyMix (Kapa Biosystems). Cycle conditions were 95 °C for 3 minutes, 8 cycles of (95 °C for 30 seconds, 55 °C for 30 seconds, 72 °C for 30 seconds), then a nal extension of 72 °C for 5 min. Readyto-sequence libraries were puri ed using AMPure XP beads (Beckman) and quanti ed by uorescence using the QuantiFluor One dsDNA kit (# E4870, Promega) with the GloMax system (Promega). Quality control was performed using a 2200 TapeStation system with the DNA 1000 screenTape (# 5067-5582, Agilent). The library pool was quanti ed by qPCR with the KAPA Library Quanti cation Kit for Illumina platforms (Kapa Biosystems). Sequencing was performed on a MiSeq system (Illumina) with the MiSeq Reagent v3 kit (600 cycles) in a 2 × 300 bp mode.
Shotgun metagenomic library preparation and sequencing. 16S rRNA gene pro ling.
After quality control with FastQC (v0.11.3), overlapping paired-end reads were merged with PEAR (v0.9.10). Quality trimming and ltering of amplicons were performed with trimmomatic (v0.36) and SGA (v0.9.9). The following parameters were used: maximum of 20 low-quality base calls in the whole sequence, no ambiguous bases (N), a minimum Phred quality score of 15 over a 4 bp sliding window, a minimum average quality score of 25, and a minimum length of 100 bp after trimming. During the PCR ampli cation process, artefactual sequences can be generated from multiple parent sequences, and are called chimeric sequences. These sequences were removed using the UCHIME de novo algorithm, which is integrated in the usearch6.1 QIIME pipeline. Sequences were grouped into operational taxonomic units (OTUs) based on a 97% identity threshold using usearch6.1 through the "pick_open_reference_otus.py" QIIME script. OTUs recruiting less than 0.005% of the total number of sequences were ltered out, as recommended by [71]. Taxonomic annotation was performed in QIIME using the RDP classi er trained on SILVA rRNA reference database (v123).
Reads were trimmed and ltered based on the sequence quality and length, exactly as for the 16S rRNA gene data analysis. Reads were annotated using the Centrifuge software [72] on the NCBI Refseq genome database (v.2018, complete genomes and scaffolds [73]).

Statistical analysis.
Annotated tables were normalized by a "total count" method (at the OTU level for 16S rRNA gene sequencing, at the species level for SMS). All subsequent analyses were performed in R (version 3.3.1). The repeatability was assessed by calculating a coe cient of variation for each bacterium present in all replicates of a condition for every patient. Alpha-diversity (Shannon indices) was calculated for each sample using the vegan package. The accuracy of the protocols was evaluated on the mock community sample by calculating the Euclidean distance between expected and predicted abundances (log2), using the "philentropy" R package. Differentially abundant bacteria between protocols with or without the SPD were identi ed using the DESeq2 package. For each criterion (except for alpha-diversity), the statistical signi cance of the differences between protocols was computed with a pairwise Wilcoxon rank test. For multiple comparisons, P-values were corrected by Benjamini Yakuteli correction and adjusted P-values below 0.05 were considered statistically signi cant. The alpha-diversity values varied greatly from one patient to another, so the patient effect was controlled in a linear model using the "limma" package, and statistics were computed with the empirical Bayes method.

Study design
In our study, four commercial DNA extraction protocols were evaluated based on the supplier's recommendations: the NucleoSpin Soil kit (Macherey-Nagel, named MN), the DNeasy PowerLyzer PowerSoil kit (Qiagen, named DQ), the QIAamp Fast DNA Stool kit (Qiagen, named QQ), and the ZymoBIOMICS DNA Mini kit (ZymoResearch, named Z). In order to facilitate the rst steps of DNA extraction, they were also tested with an upstream stool processing device, named SPD (See Supplementary Methods for detailed protocols). The resulting protocols were named as follows: S-MN stands for SPD + MN, S-DQ for SPD + DQ, S-QQ for SPD + QQ and S-Z for SPD + Z.
We analyzed fecal samples from nine healthy volunteers and nine patients suffering from CDI. A de ned mixture of bacterial species (mock community) was also prepared and sequenced to assess the e ciency and accuracy of DNA extraction, by comparing the observed bacterial abundances to the theoretical ones. DNA extraction protocols were rst compared using 16S rRNA gene amplicon sequencing for a total of 456 samples (18 fecal samples and 1 mock community, in triplicates). DNA extraction protocols were also evaluated using SMS as a read-out for a reduced number of samples (n = 56), including fecal samples from six individuals (three healthy individuals and three CDI + patients) and the mock community (Fig. 1).
Quality and quantity of extracted DNA When selecting a DNA extraction protocol, su cient genomic DNA of high quality is desirable for preparing metagenomics libraries. In the present study, we evaluated the DNA yield, DNA fragment size and DNA quality. A protocol that performs poorly on these criteria would likely skew measured bacterial compositions, as only a small portion of bacterial communities present in the original sample would be analyzed.
Considerable variability was found in the extraction yield for the tested protocols ( Fig. 2.a), which is in line with previous studies [36]. Except for MN, DNA extraction protocols in combination with SPD seemed to recover as much or more DNA yields compared to their commercial versions. Notably, increases were observed for S-QQ (p-value < 0.1) and S-Z (p-value < 0.05), compared to QQ and Z, respectively. A same DNA yield was recovered for the protocol DQ with and without the use of SPD (p-value > 1). SPD seemed to negatively affect the extraction yield when coupled with the protocol MN (p-value < 0.01). Out of the eight extraction protocols tested, protocols S-MN and Z signi cantly recovered the lowest DNA concentrations.
In practice, a best performing protocol would be a protocol for which the highest number of samples could be prepared for sequencing. Here, for a given protocol, we measured the percentage of samples whose DNA concentration was superior to 5 ng/µl, threshold corresponding to the minimal DNA concentration recommended to prepare 16S rRNA gene sequencing libraries (Table 1). In our hands, none of the tested protocols was able to retrieve DNA for all the samples, with a concentration superior to this threshold. Except for S-MN, the best performances were observed when the protocols were combined with SPD. S-Z recovered enough DNA material for 88% of samples, followed by MN (86%), S-QQ (82%) and S-DQ (81%).
Regarding the fragment size of DNA, variations were also observed between the extraction protocols. QQ and MN protocols yielded the shortest DNA fragments with a median size around 12,000 bp, which was shorter than S-QQ (p-value > 0.1) and signi cantly shorter than the other ones (p-value < 0.01, Fig. 2.b). The longest DNA fragment sizes were observed for S-MN with an average size of 21,000 bp, followed by DQ, S-DQ and Z with DNA fragments around 18,000 bp (p-value > 0.1).
We also assessed DNA purity using the A260/280 ratio. A ratio of 1.8, which is generally accepted as "pure" for DNA, was observed for S-DQ (Fig. 2.c). A ratio below 1.8 was observed for the protocols MN, S-MN, Z, S-Z and DQ, which may indicate the presence of protein, phenol or other contaminants. A ratio close to 2 was assessed for QQ and S-QQ, suggesting the possible presence of RNA in samples (p-value < 0.01 in comparison with the other protocols). Except for MN, the protocols combined with SPD generated DNA of purity equal or superior to their standard versions.
Observed microbial diversity and performance in extracting Gram-positive bacteria In addition to the wet-lab criteria, the extraction quality was also evaluated using 16S rRNA gene amplicon and SMS data, by investigating the observed microbial diversity of samples (Fig. 3). This alphadiversity has been recently described as a good indicator of DNA extraction performance, being positively correlated with the Gram-positive bacteria extraction [35].
We observed a lower microbial diversity for CDI + patients compared to healthy volunteers (Data not shown), con rming the results of previous studies [74]. As a considerable variability was found within each group of individuals, we corrected the individual effect in the statistical model to emphasize differences between extraction protocols. Interestingly, the alpha-diversity was equal or the highest when the samples were extracted with a SPD-associated protocol, independently of the sequencing method used. Regarding the 16S rRNA gene data, the median alpha-diversity values were above 4.5 for S-QQ, S-MN, S-Z and S-DQ, and below 4.4 for QQ, MN, Z and DQ (at least p-value < 0.05, Fig. 3.a). For SMS data, SPD seemed to improve the alpha-diversity values for MN, Z and DQ but with limited effect on the QQ protocol ( Fig. 3.b).
We then evaluated if the observed diversity was associated with an effective Gram-positive bacteria recovery. For this purpose, we assessed the ratio Firmicutes/Bacteroidetes, two main phyla, commonly found in the gut microbiota. Firmicutes and Bacteroidetes are phyla of bacteria, which are, for the most part Gram-positive and Gram-negative, respectively. In theory, the ratio Firmicutes/Bacteroidetes should be improved by a protocol performing well for the extraction of Gram-positive bacteria [75]. Remarkably, this ratio was increased for the four protocols combined with SPD in comparison to their standard versions, in both 16S and SMS data (Fig. 4). To quantify more precisely the SPD effect on microbial community composition, DESeq2 was used to test the differential abundance of taxa between standard vs SPD-combined protocols. For each patient, the relative abundance of the Firmicutes phylum increased signi cantly, whereas the Bacteroidetes phylum decreased signi cantly with the use of SPD. This analysis was also performed at the family level, where SPD led to a signi cant decrease of Gramnegative families and a signi cant increase of Gram-positive families (Supplementary Table 1). Altogether, our results were consistent with a positive effect of SPD on the observed alpha-diversity, by improving the recovery of Gram-positive bacteria.

Extraction protocol accuracy
In order to estimate the accuracy of the extraction protocols, a mock community consisting of nine bacterial species of known respective abundances was prepared and sequenced. The protocol accuracy was estimated by calculating the Euclidean distance (the lower the distance, the better the prediction) between observed and expected abundances at the genus level (Fig. 5). Interestingly, the bacterial abundances were better predicted using SMS than using 16S rRNA gene sequencing. Independently of the metagenomics methods, these predictions were even better when SPD was used upstream for the protocols QQ, MN and Z. The same effect was observed with DQ combined with SPD but only from SMS data. Based on 16S rRNA gene data, DQ was the most accurate protocol, followed by S-MN, S-QQ and S-DQ. Regarding SMS data, S-DQ was the best performing protocol, followed by S-QQ, S-MN and S-Z. In our hands, QQ was the less accurate protocol in all the conditions tested. Detailed bacterial abundances at the genus level are plotted in Supplementary Fig. 1.

Protocol repeatability
Using 16S rRNA gene sequencing data, the eight protocols were next evaluated for repeatability across the variations of bacterial abundances between triplicates of a same stool sample (Fig. 6). The calculated coe cient of variation was the highest using the four standard protocols (QQ, MN, Z and DQ).
We observed a signi cant increase (p value < 0.01) of the repeatability when the protocols were coupled with SPD compared to their standard versions. The median of the coe cient of variation was divided by 1.57 between QQ (13.2) and S-QQ (8.4), 1.35 between Z (11.7) and S-Z (8.7), 1.21 between MN (10.9) and S-MN (9) and 1.24 between DQ (12.6) and S-DQ (10.2). S-QQ was the most repeatable protocol, followed by S-Z, S-MN and S-DQ.

Protocols overall performance
In our study, eight DNA extraction protocols were evaluated using both wet-and dry-lab criteria, with two sequencing read-outs. Taking altogether, no single protocol performed the best for all tested criteria. To help in data interpretation, we ranked the protocols according to a score which was assigned to each criterion based on the observed results (Fig. 7). For a given criterion, the scores ranged from 0 (the worst result obtained in our dataset) to 10 (the best result obtained in our dataset). These scores were then plotted using a spider chart: a score of 0 represents the center, whereas a score of 10 is the vertex. The generated areas were then used to help in selecting the best-overall performing DNA extraction protocol.
Overall, the protocols combined with SPD performed better compared to their standard version (Fig. 7). In our hands, S-DQ showed the best overall performance ( Fig. 7.a). This modi ed protocol performed well for the quality of the extracted DNA, the observed diversity and the accuracy. However, the S-DQ performance was slightly inferior to other protocols regarding DNA yield but this difference was only signi cant compared to S-Z (p-value < 0.1, Fig. 2.a), which performed poorly on the other criteria. Even if S-DQ was not the best protocol for this criterion, enough DNA material was produced to prepare and sequence the metagenomics libraries. S-DQ was also found to be less repeatable than S-Z, S-MN and S-QQ but the slight difference was only signi cant compared to S-QQ (Fig. 6), which performed poorly on other important criteria, including accuracy and diversity.
Considering the standard versions of the protocols, DQ had the best overall performance (Fig. 7.b). This protocol performed well in terms of accuracy from both metagenomics sequencing methods. QQ generated the highest diversity in SMS results, but the difference with DQ was not signi cant (p-value > 0.1, Fig. 3.b). DQ generated lower diversity than MN in 16S rRNA gene sequencing results, but this difference was also not signi cant (p-value > 0.1, Fig. 3.a). Finally, MN and Z were slightly more repeatable than DQ, but not signi cantly (p-value > 0.1, Fig. 6).

Discussion
DNA extraction is a crucial step of the metagenomics work ow, known to be in uenced by many parameters, which are di cult to evaluate exhaustively. In addition to in-house protocols, new commercial solutions are now emerging, making di cult the choice of a good protocol for the gut microbiota. Benchmarking protocols is thus crucial to understand the potential biases and to avoid errors during data interpretation. Recent gut microbiome studies compared various DNA extraction protocols but were limited to a low number of fecal samples, mainly from healthy individuals [27][28][29][30]. As a consequence, the performance of such protocols may not be guaranteed for a clinical cohort.
Our study is the rst, to our knowledge, to compare four commercial DNA extraction protocols using both metagenomics sequencing methods on an adequate number of stool samples for statistical analysis and biological conclusion (n = 18). In an effort to streamline fecal preparation prior to DNA extraction, the commercial protocols were also tested in combination with a stool preprocessing device. As recommended by recent studies, we also included a positive control, the mock community, so that we could reliably assess the accuracy of extraction protocols. The mock was made up of nine bacterial species and processed alongside fecal specimens. The eight protocols tested were ranked based on wetand dry-lab criteria. The global aim was to identify one method that perform well and generate the most accurate and reproducible data.
In addition to healthy donors, patients suffering from a Clostridium Di cile Infection (CDI) were also recruited, allowing to test the protocols on samples with various microbial composition, consistency and biomass. CDI is a burning issue, as Clostridium di cile, a gram-positive bacterium, is the leading cause for diseases from mild diarrhea to pseudomembranous colitis in hospitalized patients [76]. Fecal microbiota transplant (FMT) is emerging as a new option for recurrent CDI [77]. Identifying which bacteria are already present (recipient) and have been transferred (donor) is essential and requires the use of highly sensitive, robust and fast metagenomics techniques [78,79].
In our study, a total of 456 and 56 samples were analyzed using 16S rRNA gene sequencing and SMS, respectively, allowing to have unprecedented comparison results. Even if, as expected, SMS is more sensitive and accurate in bacterial detection, our present ndings indicate good agreement between the two sequencing methods but also between the samples from the two groups of individuals. Interestingly, our results show that no single DNA extraction protocol performs well on all the criteria tested, which complicates the identi cation of the best performing protocol. Considering the strategy of selection described above, we consider S-DQ as being the best-performing protocol overall for extracting human fecal samples. This protocol generated an amount of good quality DNA that was compatible with subsequent library preparations for all samples. Regarding the dry-lab criteria, S-DQ combined the best results in terms of alpha-diversity, extraction of gram-positive bacteria, repeatability and accuracy in bacterial detection.
Remarkably, the bioinformatics analysis also shed light on the added value of the stool preprocessing device for the all the extraction protocols. In our study, the protocols in combination with SPD have in common the rst steps of the procedure. This includes the shaking and the mechanical lysis with zirconia and silica beads 0.1 mm. In such combination, we observe an increase of the observed alpha-diversity. Our results are in good agreement with Costea et al. who showed that these parameters of the protocol were positively associated with the observed diversity, which is a good indicator of an e cient lysis [35]. Biased protocols are also known to cause overrepresentation of gram-negative bacteria due to the ine cient lysis of gram-positive bacteria. For the SPD-combined protocols, we observed an increase of the relative abundance of Gram-positive bacteria and a corresponding decrease in the relative abundance of Gram-negative bacteria, which led to an increase of the Firmicutes/Bacteroidetes ratio. The SPD can therefore provide more accurate characterization of the microbiota by reducing the ratio biais. In terms of repeatability, SPD also showed promising results. This device and similar approaches would be of particular interest to limit variations when several experimenters, and even different labs in case of multicentric studies, perform DNA extraction. Lastly, the use of our in-house mock community, composed of both gram-positive and gram-negative bacteria cells, made possible to benchmark the protocols in terms of bacterial abundance predictions. Our results demonstrate that SPD in combination with any of the four protocols is more accurate in assessing the bacterial abundances than the protocols in their standard version.
We are also aware that all the protocols may not have been tested in optimal parameters. The commercial protocols were tested using the beads provided in the kit on a Retsch system for 5 minutes. In our hands, protocol Z was the worst performer according to wet-lab criteria. Today, Zymo Research recommends other bead-beating protocols than the one tested, which could have improved its performance [80].
Such device prior DNA extraction may add additional costs to the DNA extraction reactions but from our perspective, getting unbiased microbiome data is priceless. In our data set, we have also shown that the DQ protocol is the best protocol among the commercial solutions.

Conclusion
We have shown that both metagenomics methods are in good agreement when comparing DNA extraction protocols.
We recommend the S-DQ protocol to extract microbial DNA from human stool samples. While we have only tested S-DQ on fecal samples, we suppose that it might also work well with other types of microbiota samples, although some modi cations may be necessary. SPD appears to be a new way to improve the overall performance of any DNA extraction protocol. We propose to now include stool preprocessing devices in new microbiome studies to streamline and standardize DNA extraction.             Protocol repeatability. A coe cient of variation (SD/mean) was calculated between replicates for each genus within each sample. Standard protocols are colored in sky blue, whereas protocols with SPD are colored in purple. The boxplot is topped by a heatmap showing the pairwise Wilcoxon test p-values.

Abbreviations
Signi cant differences between protocols are highlighted in green. Protocol repeatability. A coe cient of variation (SD/mean) was calculated between replicates for each genus within each sample. Standard protocols are colored in sky blue, whereas protocols with SPD are colored in purple. The boxplot is topped by a heatmap showing the pairwise Wilcoxon test p-values.
Signi cant differences between protocols are highlighted in green.