Denoising large-scale biological data using network filters

doi:10.21203/rs.3.rs-66071/v2

Download PDF

Research article

Denoising large-scale biological data using network filters

https://doi.org/10.21203/rs.3.rs-66071/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 25 Mar, 2021

Read the published version in BMC Bioinformatics →

You are reading this latest preprint version

Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.

Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.

Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Bioinformatics

Networks

Denoising

Machine Learning

kavranclausetadditionalfile1.pdf
Supplemental Figures and Tables Figure S1 - Filter performance on rewired synthetic networks. Figure S2 - Filter performance on modular synthetic networks, including the sharp filter. Figure S3 - Distribution of assortativity coefficients of network modules with Human Protein Atlas Data. Figure S4 - KNN regression of Human Protein Atlas data with all network filters. Table S1. Cell types from the Human Protein Atlas dataset averaged together to form a single healthy tissue vector.

Download PDF

Journal Publication

published 25 Mar, 2021

Read the published version in BMC Bioinformatics →

Editorial decision: Minor revision
05 Feb, 2021
Review #2 received at journal
01 Feb, 2021
Review #1 received at journal
25 Jan, 2021
Reviewer #2 agreed at journal
17 Jan, 2021
Reviewers invited by journal
05 Jan, 2021
Reviewer #1 agreed at journal
05 Jan, 2021
Editor assigned by journal
04 Jan, 2021
Submission checks completed at journal
04 Jan, 2021
Editor invited by journal
04 Jan, 2021

You are reading this latest preprint version

Denoising large-scale biological data using network filters

Status:

Journal Publication

Version 2

Abstract

Figures

Full Text

Supplementary Files

Status:

Journal Publication

Version 2