Mass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomics analyses. It yields a high dimensional large scale matrix (samples × metabolites) of quantified data that often contain missing cell in the data matrix as well as outliers which originate from several reasons, including technical and biological sources. Although, in the literature, several missing data imputation techniques can be found, however all the conventional existing techniques can only solve the missing value problems but not relieve the problems of outliers. Therefore, outliers in the dataset, deteriorate the accuracy of imputation. To overcome both the missing data imputation and outlier’s problem, here, we developed a new kernel weight function based missing data imputation technique (proposed) that resolves both the missing values and outliers. We evaluated the performance of the proposed method and other nine conventional missing imputation techniques using both artificially generated data and experimentally measured data analysis in both absence and presence of different rates of outliers. Performance based on both artificial data and real metabolomics data indicates that our proposed kernel weight based missing data imputation technique is a better performer than some existing alternatives. For user convenience, an R package of the proposed kernel weight based missing value imputation technique has been developed which is available at https://github.com/NishithPaul/tWLSA .

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
Posted 13 Jan, 2021
On 05 Feb, 2021
Received 22 Jan, 2021
On 20 Jan, 2021
Invitations sent on 17 Jan, 2021
On 17 Jan, 2021
On 13 Jan, 2021
On 11 Jan, 2021
On 04 Jan, 2021
Posted 13 Jan, 2021
On 05 Feb, 2021
Received 22 Jan, 2021
On 20 Jan, 2021
Invitations sent on 17 Jan, 2021
On 17 Jan, 2021
On 13 Jan, 2021
On 11 Jan, 2021
On 04 Jan, 2021
Mass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomics analyses. It yields a high dimensional large scale matrix (samples × metabolites) of quantified data that often contain missing cell in the data matrix as well as outliers which originate from several reasons, including technical and biological sources. Although, in the literature, several missing data imputation techniques can be found, however all the conventional existing techniques can only solve the missing value problems but not relieve the problems of outliers. Therefore, outliers in the dataset, deteriorate the accuracy of imputation. To overcome both the missing data imputation and outlier’s problem, here, we developed a new kernel weight function based missing data imputation technique (proposed) that resolves both the missing values and outliers. We evaluated the performance of the proposed method and other nine conventional missing imputation techniques using both artificially generated data and experimentally measured data analysis in both absence and presence of different rates of outliers. Performance based on both artificial data and real metabolomics data indicates that our proposed kernel weight based missing data imputation technique is a better performer than some existing alternatives. For user convenience, an R package of the proposed kernel weight based missing value imputation technique has been developed which is available at https://github.com/NishithPaul/tWLSA .

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12
This is a list of supplementary files associated with this preprint. Click to download.
Loading...