Detecting Sparse Microbial Association Signals Adaptively From Longitudinal Microbiome Data Based on Generalized Estimating Equations

Background: The relationship between the compositions of microbial communities and various host phenotypes is an important research topic. Microbiome association research addresses multiple domains, such as human disease, diet and medicine. Statistical methods for testing microbiome-phenotype associations have been studied recently to determine their ability to assess longitudinal microbiome data. However, existing methods fail to detect sparse association signals in longitudinal microbiome data. Methods: In this paper, we developed a novel method, namely, aGEEMiHC, which is a data-driven adaptive microbiome higher criticism analysis based on generalized estimating equations, to detect sparse microbial association signals from longitudinal microbiome data. aGEEMiHC adopts a generalized estimating equations framework that fully considers the correlation among diﬀerent observations from the same cluster (individuals) in longitudinal data, and it integrates multiple microbiome higher criticism analyses based on generalized estimating equations by setting diﬀerent working correlation structures. Thus, the proposed method is robust to diverse correlation structures for longitudinal data. Results: The proposed method shows a stable performance for diverse association patterns in both sparsity levels and phylogenetic relevance. Extensive simulation experiments demonstrate that it can control the type I error correctly and achieve superior performance according to a statistical power comparison. In our simulation, we applied aGEEMiHC to longitudinal microbiome data with various types of host phenotypes to demonstrate the stability of our method. aGEEMiHC is also utilized for real longitudinal microbiome data, and we found a signiﬁcant association between the gut microbiome and Crohn’s disease. Conclusions: aGEEMiHC is a statistical method that facilitates association testing for sparse microbial association signals from longitudinal microbiome data, and it can be applied to situations in which the true underlying correlations among diﬀerent observations from the same cluster in longitudinal data are unknown. It is worth noting that our method also ranks the signiﬁcant factors associated with the host phenotype to provide potential biomarkers. The R package GEEMiHC is available at https://github.com/xpjiang-ccnu/GEEMiHC. between the expected and observed quantiles, in the case of association and nonassociation. Test statistics for GEEMiHC can be obtained combining the uGEEHC test, wGEEHC test and their modulated forms for the association signals with low sparsity levels. Finally, the test statistic for aGEEMiHC, as an optimal test, is made up of test statistics for GEEMiHCs with diﬀerent working correlation structures. The p value is calculated by permutation approach.

Longitudinal data can also reduce potential unmeasured confounding effects by repeating measurements for each subject [25,26]. As an extension of OMiRKAT [15] from crosssectional analysis to longitudinal studies, the correlated sequence kernel association test (CSKAT) is proposed to address correlated data, including paired data and longitudinal data [27]. The disadvantage of CSKAT is that it can only handle the outcomes with a Gaussian distribution (e.g., body mass index). Subsequently, researchers further proposed a distance-based kernel association test based on the generalized linear mixed model (aGLM-MMiRKAT) [28], which is adapted for data with outcomes that include Binomial or Poisson distributions.
Both CSKAT and aGLMMMiRKAT have less power in detecting sparse association signals from longitudinal microbiome data. To overcome this difficulty, we propose the aGEEMiHC method by integrating multiple generalized estimating equations (GEEs) with different working correlation structures [29] to conduct a higher criticism analysis [30]. Compared with MiHC [23], aGEEMiHC adopts multiple GEEs instead of a GLM and thus can explore different working correlation structures to accommodate various correlations among different observations for the same cluster. The proposed method regards each subject from longitudinal data as a cluster and fully considers the information for each cluster, and it can be applied to data with outcomes that include an exponential distribution, such as binary data and count data.
Through numerous simulation experiments, we find that the type I error of aGEEMiHC is always controlled close to the significance level of 5%. Compared with previous methods, aGEEMiHC has almost the best performance for different association patterns (i.e., sparsity levels and phylogenetic relevance) and longitudinal data with unknown correlations among different observations for the same cluster. Finally, we apply aGEEMiHC to diverse types of microbiome datasets to explain the stability of our methods.

Methods and materials
In this section, we introduce GEEMiHC and aGEEMiHC in detail, including the models for our method and the calculation of the p value. GEEMiHC adopts the GEE-based [29] higher criticism (HC) test [30] to detect sparse association signals from longitudinal microbiome data. aGEEMiHC, an optimal test, integrates three GEEMiHCs with different working correlation structures (Fig. 1).

Generalized estimating equations and marginal score statistics
We assume that the dataset has collected n subjects (i.e., clusters), and that each cluster contains response variables (i.e., outcomes), m OTUs and l covariates. In the experiment, the observations are repeated k i times for each subject at different points in time. We assume that when the ith cluster is observed for the tth time, we obtain the response variable y it , which represents the host phenotype of interest (e.g., health/disease-related factor), the abundance data of m OTUs o it = (o it1 , o it2 , . . . , o itm ) T and (l + 1)-dimension vectors of covariates X it = (1, x it1 , . . . , x itl ) T , where i = 1, 2, . . . , n; t = 1, 2, . . . , k i (please refer to the Input Data in Fig. 1). It is common to assume that the observations from different clusters are independent. However, different observations for the same clusters are considered to have certain correlations [31]. The true intracluster correlation is usually unknown [14]. Let E(y it |X it , o it ) = µ it ; we establish the connection between the marginal mean and other factors (i.e., covariates and OTUs) by specifying that where α = (α 0 , α 1 , . . . , α l ) T denotes the intercept and regression coefficient of X it , and β = (β 1 , . . . , β m ) T denotes the regression coefficient of o it . g(·) is a link function, and the variance of response variable y it is Var( Then,ġ −1 (γ it ) is the first partial derivative of g −1 (γ it ) with respect to γ it . For the binary response variable, the variance Var( When θ = (α T , β T ) T , let Y i = (y i1 , . . . , y iki ) T and µ i (θ) = (µ i1 (θ), . . . , µ iki (θ)) T represent the vector of responses and expectation for all observations of the ith cluster, respectively. Let X i = (X i1 , . . . , X iki ) T denote the k i × (l + 1) matrix of covariates and represent an k i × k i diagonal matrix, with its tth diagonal element representing the variance for response variable y it . Then, the estimator θ is calculated by solving the following: where is the working covariance matrix and R(τ ) is the working correlation matrix, which can be understood as a weight matrix [32]. τ is a disturbance parameter. The working covariance matrix V i is proposed to solve the main problem of variance bias due to the correlation of longitudinal data in repeated measurements. We consider three frequently used working correlation structures: autoregressive (AR), exchange (EX) and independence (IN) [33], which are formulated as follows: where ρ is the correlation coefficient [34]. As a symmetric matrix, k ′ th main diagonal elements of M k ′ are 1 while the others are 0, where k ′ = 1, . . . , k. GEE fully considers correlations among different observations for the same cluster by flexibly using the working correlation structure. The choice of these structures usually depends on different scenarios. When the IN structure is adopted, the GEE estimator is reduced to GLMs. In addition, it is worth noting that the estimators for GEE with different working correlation structures are consistent for cross-sectional data but different for longitudinal data.
The GEE-based marginal score statistic is as follows: where The vector of the expected valuesμ 0 consists of the estimator of µ it based on Eq. 1 under the null hypothesis, and P = W − WX(X T WX) −1 X T W, where X = (X T 1 , . . . , X T n ) T and W is the diagonal matrix with the jth diagonal element Var(y it |X it , o it ). Then, Z GEE j is the direction and magnitude of the jth OTU, which is the standard normal distribution [35]. Because P(|Z GEE j | > N (0, 1)), the marginal p value for the jth OTU p j can be calculated.

Higher criticism analyses and rank order for significant factors
Donoho and Jin [30] proposed the higher criticism test, which was sensitive to sparse association signals, and this test was subsequently widely researched [36,37] and applied [35]. In microbiome association studies, Koh and Zhao [23] introduced the weighted HC (wHC) test, whose weighted factor is based on the cophenetic distance of the phylogenetic tree: where D j,j ′ denotes the cophenetic distance and ζ(j) is a cluster with jth OTU by using the partitioning-around-medoids algorithm (PAM) [38]. Here, we consider association signals with close phylogenetic relevance to be amplified by adding the weighted factor w j to the GEE-based HC test (i.e., GEEHC test). Following the HC test and MiHC, the test statistics for the original and weighted GEE-based HC test (uGEEHC, wGEEHC) are as follows: where r j is the descending rank of p j and w j is the weight of the jth OTU; The higher criticism test regards the largest deviation between the expected and observed quantiles as test statistics and is sensitive to association signals with high sparsity levels. Thus, we consider the factor represented by the largest deviation to be the most significant factor (OTU) associated with the host phenotype at a high sparsity level. Let uGEEHC j and wGEEHC j represent the jth largest uGEEHC and wGEEHC (Eq. 5, 6). We make lists r u and r w according to the corresponding OTU names of uGEEHC j and wGEEHC j . Moreover, we calculate the corresponding test statistics for uGEEHC and wGEEHC based on GEE with different structures and establish the rank order for significant factors.

Adjustments for low sparsity levels
In fact, the associations between microbes and the host phenotype are not unique in microbiome studies. As previously discussed, there are cases where a small number of OTUs are associated with a host phenotype and where multiple OTUs act together. With the decrease in sparsity level, a higher criticism test gradually loses power [35]. Some adjustments can be made, and the modulated test statistics are integrated to adapt to different scenarios [23]. We consider the single largest deviation between expected and observed quantiles of significance to be adjusted to the weighted average of the first λ maximum deviations (i.e., uGEEHC 1 , . . ., uGEEHC λ ), where λ ∈ Γ and Γ is a subset of {1, . . . , m}. The models for tests after adjustments for low sparsity levels are as follows: where λ j ′ is the j ′ th weight and ∑ λ j ′ =1 λ j ′ = 1. For extremely high sparsity levels, we take λ 1 = 1, that is, the original higher criticism test. For low sparsity levels, we can make appropriate adjustments, for example, an increase of λ and reasonable assignment of λ j ′ (i.e., the increase of λ j ′ with a larger j ′ ). To facilitate comparisons, we take λ j ′ = 1 λ and Γ = {1, 3, 5, 7, 9}.

Microbiome higher criticism analysis for longitudinal data
It is worth exploring whether the microbiome associated with the host phenotype has phylogenetic relevance. To apply this unknown situation, the GEEMiHC a high-efficiency method is proposed, which is in the following form: where P uGEEHC (λ) and P wGEEHC (λ) are the p values for the test statistics uGEEHC (λ) and wGEEHC (λ) , respectively; and P Simes is the p value for the Simes test [39], whose model is as follow: For the multiple hypothesis test, the Simes test is much better than the classic Bonferroni in finding several highly correlated signals. Because the Simes test is sensitive to sparse association signals, we obtain the test statistic for GEEMiHC by synthesizing the above tests. It is worth noting that the minimum p value in the above tests is regarded as the test statistic for GEEMiHC and not the final p value for GEEMiHC. It is common to minimize the p value as the test statistic [15,16,17]. Let GEEMiHC(AR), GEEMiHC(EX) and GEEMiHC(IN) denote GEEMiHC with corresponding working correlation structures. Of course, MiHC based on the GLM framework is only a special case of GEEMiHC (i.e., GEEMiHC(IN)).
In addition, GEE with a predetermined structure may have a good fit for longitudinal data with similar structures, thus leading to good performance of GEEMiHC for detecting microbial association signals. However, GEEMiHC may not be stable for longitudinal data with diverse structures. In microbiome studies, the correlation structure among different observations for the same cluster is uncertain. Despite GEEMiHCs strong power for detecting sparse association signals from longitudinal microbiome data, the GEE-based approach suffers from an important challenge in terms of selecting the best working correlation structure. The setting for the structure has a certain influence on the GEE estimator [40], and the underlying correlation structure among different observations from the same cluster varies and is hard to establish. When the cluster size is large enough (i.e., "large n"), the estimators for the GEE approach with different structures may be consistent [31]. However, it might be a little different for a small sample. Actually, the subject size of most microbiome studies is insufficient. The GEE approach does not provide an effective way to determine the setting of the working correlation structure [33]. To accommodate these uncertain correlation structures, it is valuable to analyze various related structures to obtain more accurate conclusions. We further propose the adaptive GEEMiHC, i.e., aGEEMiHC, which integrates multiple GEEMiHCs with different working correlation structures: where T AR GEEMiHC , T EX GEEMiHC and T IN GEEMiHC represent test statistics for GEEMiHC(AR), GEEMiHC(EX) and GEEMiHC(IN), respectively (Eq. 9). For longitudinal data with an unknown underlying correlation within clusters, GEEMiHCs cannot always have robust stability, but aGEEMiHC can perform well. It is common to perform an optimal test based on multiple tests [18,41].

P values calculation
The asymptotic distribution was utilized to accurately calculate the p value for uGEEHC [30]. However, it requires a large test and is independent of each other. The scale of OTUs in the microbiome makes it difficult to achieve convergence conditions. Therefore, we adopt the permutations approach to calculate the p value, which has been widely used [15,18,23,42]. The details of the calculation processes of the p value can be found in Additional file 1.

Previous methods
To our knowledge, CSKAT [27] and aGLMMMiRKAT [28] are the methods of performing association analyses for longitudinal microbiome data. Since the latter is an extension of the former and can be applied to data with response variables that shown an exponential distribution, only the latter is selected here for comparison. In consideration of the finite of statistical methods for longitudinal microbiome data, we also compared the methods for testing microbiome-phenotype associations from cross-sectional microbiome data, including MiHC [23], aMiAD [16], aMiSPU [17], OMiAT [18] and OMiRKAT [15]. Unfortunately, these methods present a disadvantage in analyzing cross-sectional data because they do not consider the differences among individuals. Although the fitting effect is great, the method may lead to conclusions that run counter to reality.

Results
To show the power of our methods on different association patterns (i.e., sparsity levels and phylogenetic relevance), we considered diverse association patterns and designed corresponding experiments. Through several comparative methods (please see the "Previous methods" section), we demonstrate the potent power of our methods. To illustrate the stability of GEEMiHC, experiments were performed using datasets of diverse types.

Simulation study
The simulation is based on previous research [17,23,28]. We selected the 100 most abundant OTUs from real throat microbiome data [43] and generated an OTU count table based on the Dirichlet-multinomial model. Then, we utilized the R package ape [44] to generate a phylogenetic tree. Considering the influence of sample size, we selected two sample sizes with cluster sizes of n = 20 and 40, respectively, and they are divided into two equal parts. Without a loss of generality, we assume that the number of repeated observations k i is 2 for the former part and k i is 3 for the latter part. The OTUs for the tth observation in the ith cluster are denoted as O it = (O it1 , . . . , O itm ) T . Considering that there may be certain correlations among microbial communities observed at different times, we update the value for subsequent observations by using the random disturbance function [28], that is, We generate continuous response variable by specifying that where β is the effect size; Λ is a subset of {1, 2, . . . , m}, which consists of OTUs related to a host phenotype; x it1 and x it2 are cluster-specific (e.g., gender) and non-cluster-specific covariate (e.g., age), respectively, which are generated from the Bernoulli distribution B(1, 0.5) and the standard normal distribution N (0, 1); and ε it is an error term from the standard normal distribution. Here, we modulate covariate x it2 to perform different observations correlated with the R package mvtnorm [45]. To apply to the correlation that generally occurs within clusters, we consider an unstructured correlation structure with the fewest restrictions, namely for the number of repeated observations k i = 3. ρ ∈ [−1, 1] is the correlation coefficient. The "scale" function is the standardization function.
To estimate type I error rates, we assume β = 0 under the condition that hypothesis H 0 is true. The significance level α = 0.05 is controlled, and the simulation is repeated 5000 times. Considering the impact of effect size β on power (Fig. 2), we assign β = 1, 0.5 for n = 20, 40. Based on Eq. 12, the corresponding response variable y it is generated and the simulation is repeated 2000 times to estimate its power. To show the performance of our methods with different association patterns (i.e., different sparsity levels and phylogenetic relevance), we adopted two benchmarks to select the microbial association signals (i.e., OTUs associated with the host phenotype). (1) Random microbial association signals or signals with phylogenetic relevance are selected. (2) Based on different sparsity levels, we selected 2%, 4%, 6%, 8%, 10% and 12% of OTUs associated with the response variable. To illustrate that the rank order for significant factors has a certain reference value for the host phenotype, we also performed many simulation experiments. We considered a constant OTU ID and selected the fixed part associated with the host phenotype. In addition, the simulation is repeated 1000 times. To ensure a fair and effective comparison, all the simulated data in this section are used uniformly and available at https://github.com/SunHan5/GEEMiHC-SupplementaryMaterial.

Type I error
We find that the type I error rates of our methods and previous methods can be satisfactorily controlled at the significance level of 5% (Table 1). For individual GEEHC tests (i.e., uGEEHC (λ) and wGEEHC (λ) ) and omnibus GEEMiHC tests (i.e., GEEMiHC and aGEEMiHC), type I error rates are mostly well controlled (Additional file 1: Table S1).

Power
Here, power comparisons are presented: (1) power comparisons for individual GEEHC tests (i.e., uGEEHC (λ) and wGEEHC (λ) ) and the Simes test (Additional file 1: Figure S1-S6 (AR, EX, IN, n = 20, 40)); (2) power comparisons between aGEEMiHC and GEEMiHCs ( Fig. 3 (n = 20) and Additional file 1: Figure S7 (n = 40)) and (3) power comparisons for aGEEMiHC with previous methods (Fig. 4 and Additional file 1: Figure S8 (n = 40)). For the individual GEEHC tests, uGEEHC (λ) and wGEEHC (λ) with the EX or IN structure possess strong power for low λ at the high sparsity levels, although the power for the Simes tests declines rapidly with the decrease of sparsity levels. The results of individual tests with AR structures are disappointing; however, the power for the Simes test is relatively stable (Additional file 1: Figure S1-S6). Compared with GEEMiHC, the power of the aGEEMiHC always shows specific improvements at any association pattern ( Fig. 3 and Additional file 1: Figure S7). Compared with other methods, the power of aGEEMiHC is nearly the highest (Fig. 4 (n = 20) and Additional file 1: Figure S8 (n = 40)). The differences in the association patterns for selecting microbial association signals randomly is quite significant. In addition, the gap between aGEEMiHC and MiHC becomes more evident as the sparsity level decreases (Fig. 4).

Accuracy
We selected OTUs so that they were associated with the host phenotype based on different association patterns. To illustrate that the selected OTUs have large deviations between expected and observed quantiles (Eq. 5, 6), we selected the corresponding number of OTUs from the rank order for significant factors. There was a frequency of OTU occurrences when 2% ("711", "277") and 6% of OTUs were selected randomly ("165", "223", "67", "847", "598", "480") (Additional file 1: Figure S9A, S10A). For the former case (i.e., 2%), these associated OTUs appeared at the top of most lists for each method. Compared with the former case, the results of the latter case (i.e., 6%) were not as obvious, although the corresponding OTUs can still be found. Combining different sparsity levels and phylogenetic relevance, we noticed that the preselected OTUs associated with the host phenotype could be accurately found in the case of association signals with high sparsity levels (Additional file 1: Figures  S9, S12, S15, S18). As the sparsity level decreased, preselected OTUs could also be found through GEEHC. Unfortunately, the effect was not as significant as before (Additional file 1: Figure S9-S20). Compared with the other structures, the result of GEEHC with the AR structure is not satisfactory. The selection pattern (i.e., random or phylogenetic) had little effect on the results. In contrast, a larger sample size corresponded to more obvious results.

Simulation experiments for datasets of diverse types
We utilize three sets of simulation data based on a prior study [28] to indicate that it can be used for different types of longitudinal microbiome data, which can be found in the R package GLMMMiRKAT. In this study, 20 clusters are included, and the measurement times of each cluster are not all the same. A total of 59 samples are included in each dataset, and they differ in that their response variables are from Gaussian, Binomial and Poisson distributions. We observe that there is a slight disparity among GEEMiHCs, although aGEEMiHC invariably discovers significant association signals for different datasets. In contrast, except for aGLMMMiRKAT, the other comparison methods cannot always test association signals, e.g., MiHC for dataset C (Table 2). Moreover, other methods do not work for all datasets, such as OMiAT and OMiRKAT. Compared with MiHC, GEEMiHC(AR) and GEEMiHC(IN), GEEMiHC(EX) has more significant results (i.e., lower p value) on binary data (Table 2 dataset B). Thus, we further designed a comparative study to explore the difference. The dynamic change for the 20 most influential OTUs selected from the rank order is shown in Fig. 5 and Additional file 1: Figure S21-S27. An OTU named "2700" is considered the most important OTU for the host phenotype by GEEHC with an EX structure. It has a larger value for observations confirmed as patient relative to nonpatient within each cluster, e.g., clusters of green, gray, light green, among others. Unfortunately, it does not appear in the list of GEEHC with other structures and MiHC.

Real data applications
Microbial group association study for cross-sectional microbiome data To illustrate that aGEEMiHC still maintains efficiency for cross-sectional data, we conducted real data experiments to study the association between smoking status and nasopharyngeal and oropharyngeal microbial communities. Charlson et al. gathered the nasopharyngeal and oropharyngeal microbiome data of the subjects [43]. A total of 856 OTUs were collected from 60 samples, including 32 nonsmokers and 28 smokers, and they were already available in the R package MiSPU [17]. A total of 273 OTUs with a mean relative abundance over 3D histogram for the abundance of the 20 most influential OTUs selected from the rank order r EX u for significant factors. We transformed the abundance value (i.e., the min-max normalization). The left side of the dotted red line is full of nonpatients, and the right side represents that at least one observation of all clusters is a patient. The color differences in the blocks at the bottom of the image denote different clusters, and the block with a red dot denotes a sample of the patient.  Table S2). A significant relation is also detected by previous methods except aMiSPU (Table S3). In addition, the 10 most influential OTUs detected by MiHC can be discovered in the top 10 list of GEEMiHC (Additional file 1: Figure S28).

Microbial group association study for longitudinal microbiome data
To explore the volatile microbial signatures of patients with Crohn's disease (CD) from longitudinal data, Vázquez-Baeza et al. collected stool samples from 31 subjects, which included 19 CD subjects and 12 control subjects [46]. The raw data can be found at the European Bioinformatics Institute (EBI: ERP104742), and data processed by using the QIIME pipeline [47] can be found in Qiita (https://qiita.ucsd.edu/, ID:2538). Since many repeated values are produced by multiple measurements, only a portion of the samples is included in our experiment. According to the time span of data acquisition, we select different observations (at least once and at most 5 times) for each cluster. A total of 81 samples (Scenario 1) were eventually included in our experiment. We further preprocessed the data and selected 71 samples (Scenario 2) from 81 samples to prove the robust stability. Then, we singled out 372 and 376 OTUs under the condition that the relative abundance was not less than 10 −4 for Scenarios 1 and 2, respectively. For covariates, we selected age and smoking status, which may have a stronger effect than other factors, including gender [48]. Finally, we used MEGA7 [49] to construct a corresponding phylogenetic tree according to the dataset selected. In practice, we discovered a significant association between CD and the gut microbiome. For scenario 1, all individual and omnibus GEEMiHC tests showed a significant association, while half of the comparison methods (e.g., aGLMMMiRKAT) could not achieve this association (Table 3). Moreover, over half of the individual unweighted tests (i.e., uGEEHC) and MiHC did not show a significant association in scenario 2 (Additional file 1: Table S4). This finding implies that these tests either establish an inconsistent association mode or ignore the differences among individuals, resulting in biased conclusions and insufficient robustness in processing longitudinal data. In contrast, aGEEMiHC maintains stable p value. According to the p values for individual tests, we can find a high probability that a variety of OTUs are associated with CD (i.e., low sparsity level). We also generated Q-Q plots between the expected and observed quantiles and compiled lists of the 10 most influential OTUs (Additional file 1: Figure S29-S31). In addition, we depict the differences in the 50 most important OTUs between CD patients and nonpatients in scenario 1 (Additional file 1: Figure S32-S37). We can conclude that the gut microbiome is more abundant in nonpatients than in patients. Among the top 30 OTUs, some can be detected for most of the nonpatients, including Lachnospiraceae (OTU ID 3586, 7929, 9498, et al.), Ruminococcaceae (1784, 4760, 594, et al.) and others, while an empty space is observed for all patients. Previous research has ever reported differences in these bacteria between CD patients and healthy people [50]. The processed data and mapping file are available at R package GEEMiHC and https://github.com/SunHan5/GEEMiHC-SupplementaryMaterial, respectively.
To further illustrate that our ranking of OTUs had a certain significance, we divided 372 OTUs in scenario 1 into 6 groups on average based on uGEEHC with AR structure and utilized support vector machines (SVMs) to classify diseases. Due to the limited sample size, we adopted the 10-fold cross-validation method. Default values are selected for all parameters in SVM. We concluded that the accuracy of the group formed by the top OTUs reached a maximum value, indicating that they are most likely to be potential biomarkers of CD compared with other groups (Additional file 1: Table S5).

Discussion
In this paper, we adopt the GEE framework to analyze longitudinal data, rather than generalized linear mixed model. Because we consider that GEE can adapt to longitudinal data with different correlations between samples by setting corresponded working correlation structure, and we can further integrate these GEEs with diverse structures to detect sparse microbial association signals adaptively from longitudinal microbiome data. It should be noted that GEE is recently utilized in a different scenario to estimate both predictor effects and OTU correlations [51] in the analysis of microbiome data. In addition, aGEEMiHC achieves a superior and stable performance according to a statistical power. Considering the use of permutation-based method used to calculate the p-value, the computing performance of aGEEMiHC may encounter challenges when processing a big data. aGEEMiHC as well as other microbiome-based association tests focuses on the analysis of the association between the microbiome and a host phenotype, but they cannot evaluate the causal relationship.
Although we only design 16S rRNA gene sequencing data as real data experiments, aGEEMiHC can also be utilized to analyze metagenomic sequencing data. In the analysis of metagenomics, we consider that average nucleotide identity (ANI) [52,53] can also be adopted to describe interspecies relationships. Furthermore, there is still much room for the development of methods to detect sparse microbial association signals from longitudinal microbiome data, for instance, the association between the microbiome and multiple host phenotypes [54] or a host phenotype with multiple categories, such as disease severity [55].

Conclusions
In this paper, we propose a novel method, aGEEMiHC, to test sparse microbial association signals from longitudinal microbiome data. We fully consider the information from correlations among different observations from the same cluster by utilizing a generalized estimating equations framework. Despite the good performance of GEEMiHC for different association patterns (i.e., sparsity levels and phylogenetic relevance), it is an important challenge to select the best working correlation structure. To adapt to diverse correlation structures, we further develop aGEEMiHC, which is a data-driven comprehensive test that integrates GEEMiHC with different structures. In contrast to GEEMiHC, aGEEMiHC is more powerful and stable for detecting association signals from datasets with diverse types. We utilized it on both real cross-sectional data and longitudinal data to detect the associations between smoking status and the nasopharyngeal and oropharyngeal microbial communities as well as the associations between Crohn's disease and the gut microbiome. It retained excellent performance over different types of longitudinal microbiome data. In addition, we also ranked the significant factors associated with the host phenotype. It can identify the influential OTUs on the host phenotype to some extent, and they can be considered potential disease biomarkers.