Background: Virtually every biological system is governed by the complex relations among its components. Identifying such relations requires a rigorous or heuristics-based search for patterns among variables/features of a system. A number of algorithms have been developed to identify two-dimensional (involving two variables) patterns employing correlation, covariation, mutual information, etc. It seems obvious, however, that comprehensive descriptions of complex biological systems may also include more complicated multidimensional relations, which can only be described using patterns that simultaneously embrace 3, 4, and more variables. The main challenges in the search for such multidimensional patterns include: (a) computational complexity of the search; (b) distinction of statistically significant patterns from false patterns which can be observed in large data sets simply by chance; and (3) integration of heterogeneous data types (numerical, Boolean, categorical, etc.) in a single pattern.
Results: This manuscript presents an attempt to address some of these challenges by defining multidimensional Boolean patterns in a way permitting to: (a) accommodate heterogeneous multi-omics data, (b) formulate criteria for separating trivial from non-trivial patterns, and (c) identify conditions, required for a given pattern to predict the values of selected feature(s). Additionally, the proposed definition of the pattern’s strength (pattern’s score) and minimal population threshold permits estimation of the statistical significance of detected patterns using scores distributions of artificial datasets created by randomizing original data.
Conclusion: To test the proposed approach we performed a search for all possible 2-, 3-, and 4-dimensional patterns in historical data from the Human Microbiome Project (15 body sites) and collection of H. pylori genomes associated with gastric ulcers, gastritis, and duodenal ulcers. In all datasets under consideration, we were able to identify hundreds of statistically significant multidimensional patterns. These results suggest that such patterns may dominate the landscape of microbial genomics/microbiomics systems.