Testing Mediation Effects in High-Dimensional Microbiome Data with False Discovery Rate Control

Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Results: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset, which identified a plausible mediator.

multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset, which identified a plausible mediator.

Background
While most microbiome studies conducted so far have focused on bivariate associations between the microbiome and their covariates of interest [1,2], increasing studies have emerged recently to elucidate the biological mechanisms underlying the complex interplay between environmental exposures, the microbiome, and clinical outcomes. In many cases, it is of interest to understand whether the microbiome play a mediating role between an exposure and an outcome [3][4][5], as depicted in Figure 1(a). For example, does diet have some effect on inflammatory bowel diseases that is mediated through the perturbation of the gut microbiome [4]? How does the change in the gut microbiome due to antibiotic exposure cause the change in mouse body weight [5]? It is of particular importance to single out the specific microbes that are responsible for the overall mediation effect, which is essential for researchers to develop clinical interventions to modify the outcome by modulating the mediating microbes [6].
Let T denote the exposure (treatment), M = (M 1 , . . . , M J ) the J mediators, O the outcome, and Z the confounding covariates; using this notation, the mediation relationships are shown in Figure 1(b). To claim a mediation effect of a microbe, both the exposure-microbe and microbe-outcome associations (given the exposure) are required to be significant. Thus, the null hypothesis of no mediation at microbe j is a composite null that consists of no microbe-outcome association, no exposure-microbe association, or neither: which are referred to as the type-I, type-II, and type-III null hypotheses, respectively. It is highly likely that different microbes are under different types of null. For example, antibiotic use may perturb a large number of microbes but most of them do no modify mouse body weight, whereas some microbes remain intact from antibiotic use but do interact with the body weight; of course, there are microbes that are not associated with either factor. In this example, we have all three types of null.
Read count data from 16S amplicon or metagenomic sequencing are typically summarized in a taxa count table; here we use "taxon" generically to refer to any feature such as operational taxonomic unites, amplicon sequence variants, or any other taxonomic or functional grouping of bacterial sequences. The taxa count data have unique and complex features.
They are high-dimensional with typically many more taxa than samples, and it is desirable to test all taxa simultaneously for mediation effects with multiple testing correction that controls the false discovery rate (FDR). The data are also sparse (having 50-90% zero counts), compositional (measuring relative abundances that sum to one), and highly overdispersed. In addition, microbiome studies may have complex exposures or outcomes that can not only be continuous but also discrete, as well as multivariate (comprising multiple components such as categorical variables with more than two levels). These studies also often have small sample sizes (e.g., 50-100) and complex designs (e.g., clustered data [7], matched sets [8], longitudinal sampling). The capability to handle all these features is essential for any statistical method to be practically useful.
Because it is challenging to conduct mediation analysis in the high-dimensional setting, some existing methods are limited to testing the overall mediation effect at the community level, while others attemp to identify mediating taxa but have no control of the FDR. Specifically, MedTest [9] and MODIMA [10] base their tests on distance matrices that summarize the high-dimensional data into between-sample dissimilarity measurements, and thus produce a global p-value only. Also, they treat the taxa as if they are all under the same type of null in their permutation procedures that provide the null distributions of the test statistics.
Although they can accommodate binary outcomes without any modification, MedTest does not allow multivariate exposures or outcomes, whereas MODIMA does not allow adjustment of confounding covariates. Finally, MedTest, by using the top ten principal components of the distance matrix as ten mediators, may not capture mediation effects in rare taxa (even when the Jaccard distance is used in the omnibus test); MODIMA highly depends on the choice of the distance metric and currently does not provide an omnibus test. CMM [11] (and CMMB, the extension for binary outcomes [12]) and SparseMCMM [5] consider the high-dimensional mediators individually and conduct estimation and testing of mediation effects at both the community level and the taxon level. However, SparseMCMM selects mediating taxa using regularization techniques and CMM using confidence interval estimates, so neither has formal control of the FDR. Also, they are limited to analyzing simple continuous outcomes. Because they add pseudocounts to the (massive) zero counts when performing some log-ratio transformation of the read count data as a way to handle compositionality, they are prone to false positive findings as demonstrated in the context of testing taxon differential abundance [13].
Zhang's method [14] shares all the features in CMM and SparseMCMM, except that Zheng's method tests only one taxon in one run, which is the first isometric log-ratio (ilr) transformed variable (because other ilr variables are not interpretable), and is impractical for use with hundreds and thousands of taxa.
In this article, we focus on testing, rather than estimation, of marginal mediation effects at individual taxa with a goal of controlling the FDR. This strategy is very common in the initial scan of high-dimensional features in omic studies [15][16][17]; "fine mapping" of mechanistic mediators and formal estimation of their mediation effects can be performed more easily after the dimension is greatly reduced. We find that, the testing objective can be achieved by utilizing inverse regression and the Linear Decomposition Model (LDM) framework [7,8,18,19] we developed originally for testing microbiome associations [7], which together enable us to handle all the aforementioned data complexities in a natural way. In the methods section, we first describe our method for testing individual taxa for mediation and then a method that aggregates the taxon-level information to test the overall mediation in a community. In the results section, we present extensive simulation results in which we numerically compared our method, which we call LDM-med, to alternative methods, and we apply the new method to data from a murine study. We conclude with a discussion section.

Methods Motivation
Our starting point is the following classical model for multiple mediators [20]. For a continuous outcome and J continuous (potential) mediators with no exposure-mediator and mediator-mediator interactions, the model specifies a linear model for each mediator and a linear model for the outcome that includes all mediators: where the notation was introduced in Figure 1(b). It can be derived that the overall (total) mediation effect through (M 1 , . . . , M J ) takes the form J j=1 α 1,j θ 2,j [20]; note that α 1,j characterizes the effect of T on M j given Z and θ 2,j the effect of M j on O given Z and T and all other M j s. When the mediators are independent of one another conditional on Z and T , each product α 1,j θ 2,j can be interpreted as the mediation effect through a single mediator M j .
Even if the mediators are not conditionally independent, a non-zero value of α 1,j θ 2,j indicates a non-zero contribution of M j to the overall mediation effect. Thus, our objective can be achieved by testing whether α 1,j θ 2,j = 0 at each potential mediator. However, the forward outcome model (2), although describing the mediation process in a natural order and enabling intuitive forms for the mediation effects, are not easily generalizable to an outcome that is discrete or multivariate. In addition, model (2) does not permit a large number of mediators, e.g., more mediators than samples, unless some regularization is imposed.

Inverse regression model
The limitations of the forward outcome model motivated us to adopt the inverse regression model that exchanges the positions of the outcome and mediators. Inverse regression is a commonly used approach, which, for example, has been widely used in genetics studies of multiple phenotypes [21][22][23]. It has a key advantage to allow different types of outcomes including multivariate outcomes. Also, it tends to offer other flexibilities, for example, allowing cell-type-specific mediation effects in testing the mediation of DNA methylation CpG sites [24].
Here we assume that a mediating taxon acts through its relative abundance, so M j denotes the relative abundance of taxon j, although our methodology can easily accommodate presence-absence data (M j takes 1 or 0 value indicating the existence of taxon j in a sample) without modification. We find that, by properly orthogonalizing the covariates against each other, we can obtain an inverse regression model that has the mediator model (1) "embedded" in. To this end, we define T r to be the residual of T after orthogonalizing against Z and O r the residual of O after orthogonalizing against (Z, T ). We consider this inverse regression model: This model implies E(M j |Z, T ) = β 0,j + β T Z,j Z + β 1,j T r , which is the mediator model (1) after orthogonalizing T against Z. Thus, β 1,j = α 1,j . Although β 2,j = θ 2,j due to inverse regression and marginal modeling of M j in (3), we have that β 2,j = 0 and θ 2,j = 0 coincide. As a result, testing H 0j : β 1,j β 2,j = 0 (4) is equivalent to testing α 1,j θ 2,j = 0. We can test (4) by obtaining the least-squares estimates, β 1,j and β 2,j , from (3), forming the test statistic | β 1,j β 2,j |, and using permutation to provide the null distribution of the test statistic. All of these can be achieved using the LDM with a slight modification on the test statistic.
Here we give a brief overview of the LDM. It was originally developed for testing bivariate associations between the microbiome and the covariates of interest. It is based on a linear model that regresses the microbiome data on the sequentially orthogonalized covariates that include first the confounding covariates that we wish to adjust for and then the covariates that we wish to test. The inference is based on permutation (i.e., permuting the orthogonalized covariates) to circumvent making parametric assumptions about the distribution of the micro-biome data. The analyses of the raw relative abundance data and the arcsine-root transformed relative abundance data are performed, separately, and their results are combined to provide an omnibus test. The LDM allows an arbitrary number of taxa (including arbitrarily rare taxa) to be tested simultaneously with FDR control, supports continuous, discrete, or multivariate covariates, accommodates clustered data [8], and offers analysis at the (transformed) relative abundance scale or the presence-absence scale [18,19].

Testing mediation effects at individual taxa
As mentioned after equation (4), it is most natural to consider the test statistic: To provide a reference distribution for this statistic under the composite null of no mediation, we calculate the following statistic under the bth (b = 1, . . . , B) permutation: 1,j and β j is inherently conservative in the sense that its distribution is more spread out than the true distribution of U j under a specific type of null (unknown). Finally, the permutation p-value for taxon j is calculated to be j ≥ U j }, which is corrected for multiple testing by Sandve's sequential stoping rule [25] as implemented in the LDM. We refer to this approach to testing individual taxa as LDM-med-product. However, it is unclear how to handle multivariate exposures or outcomes, in which case there are more than one element in β 1,j or β 2,j .
A second way is to based the test statistic on the p-values p 1,j and p 2,j for testing β 1,j = 0 and β 2,j = 0, respectively, which naturally handle multivariate exposures or outcomes and are directly available from the LDM. We consider the test statistic: and assess the significance of Z j by using the same permutation procedure as above and calculating the statistic: 2,j , respectively, among all permutation replicates [26]. Note that max(p 1,j , p 2,j ) can also be directly used as an analytical p-value for testing a single mediator [27], but here we choose permutation for inference because permutation is more robust and the permutation replicates are readily available from running the LDM. Similarly to U j ≤ Z j } and corrected for multiple testing as in the LDM. We refer to this approach as LDM-med-maxP. In fact, this approach was found to be equivalent to LDM-med-product in simple settings, for example, when all variables are normally distributed [27]. However, both LDM-med-product and LDM-med-maxP tend to be inefficient for detecting significant mediators, which is a consequence of the conservative U j and the stringent correction over all J tests. A third approach is to directly apply the MultiMed procedure [28] to p 1,j and p 2,j , which was developed to improve efficiency of testing multiple mediators by restricting the testing to a subset of taxa that have relatively small p 1,j and p 2,j . Here, we briefly describe this procedure; the theoretical properties that guarantee the FDR control can be found in the original papers [28,29]. First, for a given significance level α, find the subset of taxa with relatively small p 1,j to be ω S1 = {j : p 1,j < α/2}, and denote the cardinality of the subset by S 1 = C(ω S1 ). Similarly, find the subset with relatively small p 2,j to be ω S2 = {j : p 2,j < α/2} and denote S 2 = C(ω S2 ). Then, the downstream testing of mediation is restricted to taxa at the intersection of the two subsets, which can greatly alleviate multiple testing correction. For taxon j ∈ ω S1 ∩ ω S2 , define the subset-adjusted p-value: Taxon j is declared to be a mediator if the FDR-adjusted p-value We call this approach LDM-med-subset. Although the subset-based approach has shown to be more efficient than the approach based on the product of coefficients (similar to our first approach) in the context of controlling the family-wise error rate [28], it is of interest to re-evaluate these approaches in the context of controlling the FDR.

Testing the overall mediation effect in a community
If all taxa are under some types of null (not necessarily the same type), we declare a null community with no mediation effect. Given the p-values at individual taxa, it is straightforward to construct a global test statistic by combining the individual p-values. Here we chose the p-value to be Z j = max(p 1,j , p 2,j ), and we adopt the Harmonic mean method [30] to aggregate Z j s, which is more robust to the dependence structure among taxa than Fisher's method. The harmonic mean of Z j s is J/ J j=1 Z −1 j , where a smaller value corresponds to a stronger evidence against the null hypothesis. To have a usual test statistic with a reverse directionality, we choose the statistic for the global test: We assess the significance of Z global via permutation, since permutation is more robust and the permutation replicates are readily available. The statistic based on bth permutation replicate is Z Finally, the permutation p-value for the global test is given by We call this test LDM-med-global, which is a natural extension of LDM-med-maxP but is also compatible with LDM-med-subset, which is also based on the association p-values p 1,j and p 2,j .

Simulation studies
Our simulations were based on data on 856 taxa of the upper-respiratory-tract (URT) microbiome [31] and the mediation model (1) and the forward outcome model (2) as generative models. Suppose that the exposure variable T i is binary and that 50 samples were exposed (T i = 1) and 50 unexposed (T i = 0). We considered continuous outcomes as well as binary outcomes. We considered three mediation mechanisms, in which we assume the mediating taxa are the top five (1-5th) most abundant taxa, five (51-55th) less abundant taxa, and a selected mixture (4-5th and 51-52nd) of the two sets, and we refer to them as M-common, M-rare, and M-mixed. In all scenarios, we selected a set of taxa (6-10th) to be associated with the exposure but not the outcome, and another set of taxa (11-15th) to be associated with the outcome but not the exposure, corresponding to the type-I and type-II null taxa, respectively.
To generate the read count data, we first set the baseline (when T i = 0) relative abundances of all taxa for all samples, denoted by π i = (π i1 , π i2 , . . . , π iJ ), to the population means that were estimated from the real data. To induce the effects of the exposure T i on a set of taxa (e.g., the mediating taxa or type-I null taxa), for those unexposed we kept π i unchanged; for those exposed we decreased π ij for some of the associated taxa by a percentage, which equals β TM for the mediating taxa and α TM (0 or 0.6) for the type-I null taxa, and we redistributed the decreased amount evenly over the remaining of the associated taxa; this way ensures that the relative abundances of non-associated taxa remain intact and the modified π i satisfies the compositional constraint (unit sum). Specifically, in M-common, the increasing set of the mediating taxa was taxa 1-2 and the decreasing set was taxa 3-5; in M-rare, the two sets were taxa 51-52 and 53-55; in M-mixed, the two sets were taxa 4 and 52 and taxa 5 and 51. Among the type-I null taxa, the two sets were taxa 6-7 and 8-10. The modified π i represents the mean relative abundances conditional on the exposure. Then, we introduced sample heterogeneity by drawing the sample-specific composition π i = (π i1 , π i2 , . . . , π iJ ) from the Dirichlet distribution Dir(π i , θ) with mean π i and overdispersion θ that was set to 0.02 (as estimated from the real data). Finally, we generated the read count data M i = (M i1 , M i2 , . . . , M iJ ) using the Multinomial distribution with mean π i and the library sizes (sequencing depth) sampled from N (10000, (10000/3) 2 ) and left truncated at 500.
To generate the outcome that is influenced by the mediating taxa, denoted by M, and the type-II null taxa, denoted by N , we partitioned each set of taxa into two subsets (M 1 and M 2 , N 1 and N 2 ) with approximately equal total relative abundance. In particular, we set M 1 and M 2 to be the increasing and decreasing sets, respectively, that were determined earlier relative to the exposure and have similar total relative abundance; we set N 1 and N 2 to be taxa 11-12 and 13-15, respectively. To simulate a continuous outcome, we used the model where scale(.) is a scaling function that standardizes a variable to have mean 0 and standard deviance 1, β TO was fixed at 0.2, α MO was fixed at 0 or 0.4, and the error term ǫ i was drawn from N (0, 0.5 2 ). It can be verified that the taxa that are neither mediators nor type-II null taxa were uncorrelated with the outcome after controlling for T i , owing to the counterbalancing effects of taxa in M 1 and M 2 (N 1 and N 2 ) on the outcome. To simulate a binary outcome, we calculated the probability Pr(O i = 1|T i , π i ) = exp(µ i )/{1 + exp(µ i )} with µ i being the same linear predictor as in (5), without the error term ǫ i .
To simulate a confounder, we note that a confounder has effects on the exposure, the microbiome, and the outcome (Figure 1(b)). Thus, we first simulated the binary confounder Z i with 70% "success" rate among the exposed and 30% among the unexposed. Then, we used the same decreasing and increasing sets of the mediating taxa as determined earlier, now with the deduction percentage γ ZM = 0.3, and the same operation as for the exposure to further modify π i for those with Z i = 1. Finally, we modified the linear predictor for the outcome to include the term γ ZO Z i with γ ZO = 0.7.
Prior to analysis, we filtered out taxa with fewer than 5 presence in the sample, which resulted in ∼460 taxa remaining in analysis. For testing the overall mediation effect, we applied LDM-med-global and compared it to MedTest and MODIMA whenever the latter are applicable and valid. For MedTest, we adopted the omnibus test based on the Bray-Curtis and Jaccard distances, which focus on abundant and less abundant taxa, respectively, and thus form a complementary pair. For MODIMA, we chose Bray-Curtis, as it is the most commonly used distance in the literature and was also frequently used in the MODIMA paper. The type I error and power were assessed at the nominal level 0.05 based on 10000 and 1000 replicates of data, respectively. For testing mediation effects at individual taxa, we compared the three approaches: LDM-med-subset, LDM-med-maxP, and LDM-med-product.
The sensitivity (proportion of the truly mediating taxa that were detected) and empirical FDR were assessed at the nominal FDR 10% based on 1000 replicates of data. Note that none of CMM, SparseMCMM, and Zhang's method worked for our simulated data, as they either gave errors (due to the large number of taxa or extensive zero counts) or ran forever (> 10 hours).

Simulation results
The type I error results of the global tests in M-common, M-rare, and M-mixed are summarized in Tables 1 and S1. We considered 12 scenarios under the global null hypothesis, each corresponding to a specific combination of the three types of null taxa. Clearly, MedTest and MODIMA easily lost control of the type I error in the presence of a mixture of the type-I and type-II null taxa. Note that the type-III null can be viewed as a special case of either the type-I or type-II null, so, for example, the first row in Table 1 corresponds to a global null where all taxa are under the type-II null. In all cases, LDM-med-global controlled the type I error; in fact, it was conservative as expected.
In the presence of a confounder (Table S2) For testing mediation at individual taxa, all three LDM-based approaches controlled the FDR in all scenarios (Figures 2-5, S1-S2). As expected, the subset approach had substantially improved sensitivity over the product and maxP approaches, while the latter two had similar performance. For this reason, we chose LDM-med-subset as the method for testing individual taxa. We refer to the whole methodology as LDM-med, which consists of LDM-med-global and LDM-med-subset.

Murine microbiome study
We analyzed the data generated from a murine microbiome study [32], which was conducted to explore whether the sub-therapeutic antibiotic treatment (STAT) would alter the gut microbiome composition and whether the shift of the gut microbiome would affect the body weight gain later in life. We focused on male mice for this analysis. The male mice were first randomized into the STAT and control groups, which was treated as a binary exposure variable. Then, their fecal samples were collected longitudinally at days 21 and 28. Bacterial DNAs were extracted from the fecal samples, sequenced for the 16S rRNA gene, and summarized into a taxa count table that initially contained 149 genera. Samples with less than 1800 reads, and genera with less than 10% presence or 0.01% mean relative abundance were filtered out, so the final taxa count table in our analysis included 41 genera and 36 mice (23 exposed to STAT and 13 unexposed) each having microbiome measurements at both time points. The mice body weight (in grams) prior to sacrifice was measured and used as a continuous outcome variable. There were no additional covariates to be adjusted for, as all potential confounders had been well-controlled in the randomized experiment.
It can be seen from Figure S3 that mice exposed to STAT were heavier than the control mice, with a small Wilcoxon p-value 0.011. We wish to know how much was this effect of STAT on body weight mediated through the gut microbiome. We tested the mediation effects at both the community level and the individual taxon level (at nominal FDR 10%) using LDMmed, MedTest, MODIMA, and SparseMCMM whenever they were applicable. Note that, although the outcome variable somewhat deviated from the normal distribution ( Figure S3), all methods were robust because LDM-med treated the outcome as a covariate, and MedTest, MODIMA, and SparseMCMM based their inference on permutation.
We first restricted our mediation analysis to the cross-sectional microbiome data at day 28 only and summarized the main results in Table 2 We also performed mediation analysis of the longitudinal microbiome data at both days 21 and 28 and again summarized the main results in Table 2. Note that the outcome was observed only once per subject. While no other methods exist to analyze mediation of the microbiome data with correlations, LDM-med can easily handle such data because the LDM can handle clustered data (by setting perm.within.type="none" and perm.between.type="free").
Here, a time variable (1/0) indicating day 28 was included as a covariate Z, as the microbiome composition was found to be significantly different between the two times (p-value 0.040 by the LDM for testing matched pairs). The results of mediation analysis by LDM-med were largely consistent with the previous results based on the data at day 28 only. The detected mediators based on the two datasets seemed non-overlapping at FDR 10% but substantially overlapped if the FDR was raised to 20%. We again performed analysis of bivariate associations between the exposure and each taxon using the LDM (adjusted for the time effect), and the taxon and the outcome conditional on the exposure using the standard linear regression (treating the outcome as the dependent variable, letting the covariates be the exposure variable, the relative abundances of the taxon at days 21 and 28, and testing the joint effect of the two relative abundance variables using the F test). The results were again largely consistent with the previous results on bivariate associations using the data at day 28 only (Table S3).
Finally, to illustrate the capability of LDM-med to handle categorical outcome variables, we converted the continuous outcome variable into three categories by the 33rd and 66th percentiles. For this type of outcome variables, only LDM-med and MODIMA were applicable, neither of which, however, identified any significant mediation effect.

Discussion
We proposed a new approach to mediation analysis of the microbiome that is based on inverse regression and the LDM. Thus, our method LDM-med offers maximum robustness to the complex features in the taxa count data (i.e., high-dimensionality, sparsity, compositionality, and overdispersion), and provides extensive flexibility to accommodate various exposures and outcomes and study designs. Specifically, using the simulated and real data, we have demonstrated the capabilities of LDM-med to deal with null taxa under different types of null hypothesis of no mediation, binary and multivariate outcomes, clustered data with the exposure and outcome variables varying between the clusters, and confounding covariates. In addition, LDM-med can also handle clustered data with the exposure and/or outcome vari-ables varying within the clusters [8], and perform analysis at the presence-absence scale using a rarefaction-without-resampling approach [18]. In summary, LDM-med can be highly useful in practice.
We have added LDM-med to our existing LDM R package. The computation of LDM-med is as efficient as the LDM. For example, using a single-thread MacBook Pro laptop (2.9 GHz Quad-Core Intel Core i7, 16GB memory), it took 46s to analyze one of our simulated datasets having 100 samples and ∼460 taxa (after filtering). The murine dataset was at a smaller scale, consisting of 36 mice and 41 genera, so it took only 5s and 12s to analyze the data at day 28 only and the data at both day 21 and day 28, respectively.
LDM-med is based on marginal models for each mediator, and thus the identified mediators may not all be true biological mediators, which are called "probable mediators" but not "true mediators" [28]. This compromise was made in order to obtained controlled FDR for the selected mediators, which we deem as critical in the initial "scan" of high-dimensional features to generate "targets" to follow up in the downstream mechanistic study. This strategy has been very common in the analysis of high-dimensional omic data [15,17,28].
Our treatment of compositionality is different from that of CMM, SparseMCMM, and Zhang's method, in that our method is based on the relative abundance data directly whereas the other methods are based on some type of log-ratio transformation of the relative abundance data. The two treatments correspond to two biological models for how microbial communities may change across conditions. The hypothesis in the other methods corresponds to the scenario that a small number of microbes have "bloomed" while the absolute counts of the others have not changed, which is commonly referred to as the "compositional" hypothesis. However, microbes interact with each other: not only do they compete for resources, but they also change their environment in ways that favor some microbes and suppress others. Because the microbiota are a community, it is not unreasonable to expect that many taxa changes when the condition changes. This hypothesis is referred to as the "community change" null hypothesis; the concepts "community state types" and "alpha diversity" exemplify this approach. When the "community change" hypothesis seems more reasonable, a method that applies directly to the relative abundance data such as the LDM and LDM-med is more appropriate.

Conclusions
We present a new approach to testing high-dimensional mediators with FDR control. Our approach is very versatile, allowing an arbitrarily large number of taxa to be tested simultaneously, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. Using extensive simulations, we showed that our method always preserved the FDR of testing high-dimensional mediators and had much better sensitivity than alternative approaches. The flexibility of our method for a variety of mediation analyses was illustrated by the application to a murine microbiome dataset, which identified a plausible mediator. Thus, our method can facilitate the search for microbial mediators in a wide variety of microbiome studies. The R package is computationally efficient and freely available.

Funding
This research was supported by the National Institutes of Health awards R01GM116065 (Hu) and R01GM141074 (Hu).

Availability of data and materials
The new method has been implemented in the R package LDM v3.0, which is available on       lists the type(s) of null among all taxa, where the type-III null is "merged" into the type-I or type-II null whenever possible. Note: [Ruminococcus] is a species that is misclassified to the genus Ruminococcus and is now awaiting to be formally renamed through the appropriate Code of Nomenclature. * Taxa that were not detected at FDR 10% but detected at 20%. † The weight gain measurements were categorized into three categories by the 33rd and 66th percentiles. The detected taxa are organized such that the common taxa appear in the same lines.    The type I error rates 0.073 and 0.069 after adjusting for the confounder were slightly inflated, due to the small sample size 100, and was reduced to 0.067 and 0.055 when the sample size was increased to 200.