Kernel Weighted Least Square Approach for Imputing Missing Values of Metabolomics Data

doi:10.21203/rs.3.rs-140282/v1

Download PDF

Research Article

Kernel Weighted Least Square Approach for Imputing Missing Values of Metabolomics Data

https://doi.org/10.21203/rs.3.rs-140282/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Mass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomics analyses. It yields a high dimensional large scale matrix (samples × metabolites) of quantified data that often contain missing cell in the data matrix as well as outliers which originate from several reasons, including technical and biological sources. Although, in the literature, several missing data imputation techniques can be found, however all the conventional existing techniques can only solve the missing value problems but not relieve the problems of outliers. Therefore, outliers in the dataset, deteriorate the accuracy of imputation. To overcome both the missing data imputation and outlier’s problem, here, we developed a new kernel weight function based missing data imputation technique (proposed) that resolves both the missing values and outliers. We evaluated the performance of the proposed method and other nine conventional missing imputation techniques using both artificially generated data and experimentally measured data analysis in both absence and presence of different rates of outliers. Performance based on both artificial data and real metabolomics data indicates that our proposed kernel weight based missing data imputation technique is a better performer than some existing alternatives. For user convenience, an R package of the proposed kernel weight based missing value imputation technique has been developed which is available at https://github.com/NishithPaul/tWLSA .

Bioinformatics

Computational Biology

Computational Mathematics

Metabolomics

Missing data imputation

Weighted least square

Receiver operating characteristic (ROC) curve

Area under the ROC curve (AUC)

Support vector machine (SVM)

Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and

accessed as a PDF.

Due to technical limitations, tables are only available as a download in the Supplemental Files section.

Download PDF

Editorial decision: Major revision
05 Feb, 2021
Reviews received at journal
22 Jan, 2021
Reviewers agreed at journal
20 Jan, 2021
Reviewers invited by journal
17 Jan, 2021
Editor assigned by journal
17 Jan, 2021
Editor invited by journal
13 Jan, 2021
Submission checks completed at journal
11 Jan, 2021
First submitted to journal
04 Jan, 2021

You are reading this latest preprint version

Kernel Weighted Least Square Approach for Imputing Missing Values of Metabolomics Data

Status:

Version 1

Abstract

Figures

Full Text

Tables

Supplementary Files

Status:

Version 1