Background Maximizing the insights gained from large epidemiological and biomedical datasets remains a fascinating challenge for life science researchers. For example, while usual analysis protocols may find numerous correlations between the studied factors, none tries to explain why many individuals deviate significantly from the identified general trends. Furthermore, while common practices can identify relationships between pairs or, at most, small groups of factors, they are unable to determine the global relationship network between all variables.
Methods As a solution to these challenges, we have developed a new analysis workflow available in the MUVIS package within R statistical software. Here we apply these methods to a dataset from the Yazd Health Study consisting of 300 variables measured for 4010 individuals.
Results Although Yazd Health Study is a general health survey of intensively studied medical variables, our methods were still able to uncover new insights. For example, in analyzing the correlation between blood high-density lipoprotein and total cholesterol, we determined diet to be a more important factor than exercise or age in explaining why individuals may have anomalous blood high-density lipoprotein levels based on their total cholesterol levels. We identified gender-based differences in how diet may exert these effects. Furthermore, analysis of global interaction networks found both interesting clusters of variables, such as factors denoting health-conscious and health-indifferent people, and individual connections, such as between hypertension and skin cancer.
Conclusions As a result, we conclude that MUVIS provides a robust statistical framework for the high-throughput discovery of novel insights from large biomedical datasets.