Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest. What's more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate the disease state of the samples. Thus, the proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.

Figure 1

Figure 2

Figure 3
Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11
Loading...
Posted 18 Sep, 2020
On 27 Oct, 2020
Received 27 Oct, 2020
Received 17 Oct, 2020
Invitations sent on 22 Sep, 2020
On 22 Sep, 2020
On 17 Sep, 2020
On 16 Sep, 2020
On 16 Sep, 2020
On 18 Aug, 2020
Received 15 Jul, 2020
On 23 Jun, 2020
Invitations sent on 04 Jun, 2020
On 04 Jun, 2020
Received 04 Jun, 2020
On 12 May, 2020
On 11 May, 2020
On 11 May, 2020
On 08 May, 2020
Posted 18 Sep, 2020
On 27 Oct, 2020
Received 27 Oct, 2020
Received 17 Oct, 2020
Invitations sent on 22 Sep, 2020
On 22 Sep, 2020
On 17 Sep, 2020
On 16 Sep, 2020
On 16 Sep, 2020
On 18 Aug, 2020
Received 15 Jul, 2020
On 23 Jun, 2020
Invitations sent on 04 Jun, 2020
On 04 Jun, 2020
Received 04 Jun, 2020
On 12 May, 2020
On 11 May, 2020
On 11 May, 2020
On 08 May, 2020
Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest. What's more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate the disease state of the samples. Thus, the proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.

Figure 1

Figure 2

Figure 3
Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11
Loading...