Comparison of polynomial and machine learning regression models to predict LDPE, PET, and ABS concentrations in beach sediment based on spectral reflectance

doi:10.21203/rs.3.rs-1633429/v1

Download PDF

Research Article

Comparison of polynomial and machine learning regression models to predict LDPE, PET, and ABS concentrations in beach sediment based on spectral reflectance

https://doi.org/10.21203/rs.3.rs-1633429/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Microplastic (MP) contamination on land has been estimated to be 32 times higher than in the oceans, however, despite most MPs potentially found in soils there is a distinct lack of research on soil MPs compared to marine MPs. Beaches are bridges between land and ocean and present equally understudied sites of microplastic pollution. Visible-near-infrared (vis-NIR) has been applied successfully for the measurement of reflectance and prediction of low-density polyethylene (LDPE), polyethylene terephthalate (PET), and polyvinyl chloride (PVC) concentrations in soil. The rapidity and precision associated with this method make vis-NIR particularly promising. The present study explores two novel data processing approaches in this field. First, using a spectroradiometer, the spectral reflectance data was measured from treated beach sediment spiked with virgin microplastic pellets (LDPE, PET, and acrylonitrile butadiene styrene (ABS)). The concentration of spiked microplastics in the treated sediment was increased sequentially. Using the recorded spectral data, polynomial regression and machine learning-based regression methods were applied and predictive models developed for each microplastic. Both methods generated models of good accuracy with R² values greater than 0.7, root mean squared error (RMSE) values less than 3 and standard deviation (SD) < 0.06. Additionally, the optimum wavelengths of each microplastic for their detection in the beach sediment were found to be similar by both the methods in the used vis-NIR spectrum (325 nm – 1075 nm), indicating a higher vis-NIR spectrum range is not required for detection. This study is the first assessing predictive abilities of models created by polynomial regression and machine learning algorithms for soil microplastic contamination and another step towards standardizing the quantification of microplastics in soil samples using vis-NIR spectroscopy.

Microplastics

Soil

VNIR

Machine Learning

Polynomial Regression Model

Plastics are popular due to their durability, malleable properties, and low-cost manufacturing (Thompson et al. 2009). However, their overuse and improper disposal methods have led to severe plastic pollution in the environment (An et al. 2020; Horton et al. 2017; Ter Halle et al. 2017). Plastics that end up in the environment can then through chemical, physical or biological environmental factors break down into smaller fragments known as microplastics (MPs). Several studies reported large numbers of MPs in the marine environment (Anbumani & Kakkar 2018; de Souza Machado et al. 2018; Chae & An 2018). MPs can serve as transport for toxic chemicals, as well as a habitat for harmful microorganisms (Campanale et al. 2020). They impact and threaten microbial composition, ecosystem health and food chains (Hamid et al. 2018; Rahman et al 2021).

However, most of the plastic waste in the marine environment originates from the use of plastics inland (He et al. 2018). Consequently, microplastic contamination on land is estimated to be 32 times higher than in the oceans (Gionfra 2018). Sources of plastic contamination in the soil environment include sewage sludge which contains primary microplastic (microbeads), fertilizers and personal care products (Gionfra 2018, Bouwmeester et al. 2015). Other sources include landfills and wastewater irrigation (Nizzetto et al. 2016; Bläsing & Amelung 2018). Additionally, a vast amount of low-density polyethylene (LDPE) is used for agriculture and for the mulching application (Duis & Coors 2016). Importantly, these MPs come in contact with soil surfaces from these sources and then seep into subsoils, thus entering the soil environment (Chae & An 2018). They degrade over time into smaller pieces and leak into the groundwater which is used for drinking (Rillig et al. 2017). Additives in the plastics can leach out which can be harmful to the soil biota (de Souza Machado et al. 2018). Furthermore, due to plastics’ hydrophobic surface, they absorb other toxicants such as organochlorine pesticides, metals and polychlorinated biphenyls (PCBs) (Horton et al. 2017). Lastly, other than absorbing toxicants, the surfaces of soil MPs can harbour microbial pathogens containing antibiotic resistance genes, which can increase the spread of antibiotic resistant microbial diseases (Huang et al. 2021).

Despite most MPs being potentially found in soils there is still a distinct lack of research on soil MPs compared to marine MPs (Huang et al. 2021). There is even less research on monitoring soil MPs (Yang et al. 2021). The need to develop standardized methods of quantifying MPs in soil is well recognized (Yang et al. 2021, Ng et al. 2020, Wang et al. 2020). The vast majority of studies used Raman spectroscopy, Fourier Transformed Infra-Red (FTIR) and Pyrolysis–gas chromatography–mass spectrometry (Pyr-GC–MS) to quantify MPs (Huang et al. 2021, Löder & Gerdts 2015). All these methods are time-consuming as the samples must go through density separation to separate out the MPs (Corradini et al. 2019).

The use of visible-near-infrared (vis-NIR) spectroscopy to identify and quantify MPs has been less explored but successfully used to measure reflectance and predict the concentration of MPs in soil (Ng, Minasny & McBratney 2020; Corradini et al. 2019). Manley (2014) showed that molecules containing X-H chemical bonds i.e. O-H, C-H, give a measurable spectral profile in the vis-NIR spectrum. Thus, through vis-NIR spectroscopy, spectral visualization, establishing relationships between absorption values at specific wavelengths, and appropriate regression model, one can predict and measure the amount of MPs. Through vis-NIR spectroscopy it is also possible to undertake qualitative analyses (classification of plastics) as differences in physical properties are reflected in the spectra.

The potential of machine learning-based microplastic detection and quantification via computer vision and FTIR-spectroscopy has been explored in aquatic ecosystems (Chaczko et al. 2019; Chen et al. 2020; Harshitha et al. 2020; Massarelli et al. 2021), but there are only a few selected studies on the combination of vis-NIR spectroscopy data and machine learning techniques for microplastic detection in soil (Corradini et al. 2019; Ng, Minasny & McBratney 2020). Polynomial regression has been used to predict COVID-19 transmissions, execution time of computer programs, residual stresses of transverse beams and stress determination of hole-drilling (Matthew and Adeyinka 2020; Huang et al. 2010; Trebuňa et al. 2016; Ostertagová, 2012); however, no studies have yet used it for the quantification of MPs in beach sediment.

In this study, beach sediment was collected and treated to obtain a treated sediment sample. The soil particles in the sediment were standardized to same size by sieving it through a metal sieve, followed by repeated density separation to remove any MP and impurities in the sediment. Then it was spiked with increasing concentrations of virgin low density polyethylene (LDPE), polyethylene terephthalate (PET) and acrylonitrile butadiene styrene (ABS) micro pellets. The reflectance of the spiked sediment was recorded through vis-NIR spectroscopy (325 nm – 1075 nm), and predictive polynomial regression and machine learning models developed.

2.1. Overview of methodological approach

The experiment consists of 4 steps. Figure 1 shows the overview of the methodological approach in this study. Sandy beach sediment was treated and spiked with varying concentrations of LDPE, PET and ABS MPs. The reflectance of the spiked sediment was recorded through vis-NIR spectroscopy, and predictive polynomial regression and machine learning models developed and validated.

2.2. Sample preparation

Collection and pre-treatment of beach sediment

Sandy beach sediment was collected from Damai Beach, Sarawak (1°45’05.5”N 110°18’50.0”E). A sterile metal spoon was used to collect the top 5 cm layer of beach sediment and transferred into a sterile 1 L glass beaker. The mouth of the glass beaker was securely covered with aluminum foil to prevent contamination from the environment during transport to the laboratory. The removal of MPs and preparation of the beach sediment sample was adapted from He et al. (2018). The beach sediment was sieved using a metal sieve with a mesh size of 1 mm to remove shells, leaves and other large organic substances. 400 g of sieved beach sediment was transferred into a new 1 L glass beaker and density separation was carried out (400 mL of saturated 8.56 molar NaCl, HiMedia, Germany, solution was added into the glass beaker containing the sieved beach sediment). The mixture was stirred for 10 minutes using a large metal spoon and left overnight, after which the suspension was decanted carefully. Density separation and decanting were repeated twice to ensure all impurities were removed from the beach sediment. To remove excess NaCl after the density separation, the sediment was poured into a 63 µm metal sieve and 1 L of Milli-Q was allowed to run through sediment in the metal sieve. The sediment was then transferred into a glass beaker and allowed to oven-dry at 40 °C for 6 hours to obtain a treated beach sediment sample.

Reflectance measurements of artificially contaminated beach sediment samples

20 g of the purified beach sediment were transferred onto a watch glass and spiked with virgin LDPE or ABS or PET micro pellets at sequential increments of 0.1% w/w. The microplastic pellets were obtained from Fraunhofer-Institute Karlsruhe, Germany and less than 5 mm in size (Jang 2020). ASD HandHeld 2 VNIR Spectroradiometer (Malvern Panalytical, Worcestershire, United Kingdom) was used to record the reflectances in the vis-NIR wavelength range of 325 nm to 1075 nm. For each concentration, the reflectance was recorded using the contact probe at five different locations, working clockwise from the outer edge of the sample to the center of the sample. A total of 46 different concentrations were prepared and recorded per plastic type (ranging from 0.1% to 15% w/w) with five (5) replicates recorded per concentration, resulting in a total of 241 readings (see Fig. 1) per plastic.

2.3. Overview of reflectance processing approaches

The resulting reflectance measurements were analysed using both OriginPro 2021 (OriginLab, Northampton, MA, USA) and Scikit-Learn to build predictive models for the three microplastic types in beach sediment samples.

In OriginPro 2021, pre-processing approaches were applied to the respective microplastic reflectance data, after which PCA was implemented in order to obtain the wavelengths which displayed the most significance. Polynomial regression models were constructed for LDPE, PET & ABS respectively using the wavelength of the highest significance. The accuracy was assessed via R-squared value (R²), root mean squared error (RMSE) and standard deviation (SD).

For machine learning, Scikit-Learn software library (Pedregosa et al. 2011) was implemented in order to identify and select the most significant features (i.e. wavelengths) for each respective microplastic using the feature importance algorithm and Random Regressor algorithm available in the Scikit-Learn library. The feature importance acts as an indicator for each individual contribution of every corresponding feature in a particular classifier (Saarela & Jauhiainen 2021). From the regression algorithm selection pipeline, Random Forest (RF) Regressor was used for LDPE, whereas K-nearest neighbor (KNN) Regressor was used for PET and ABS in developing the regression models.

The regression models, R², RMSE and SD obtained via spectral processing in OriginPro 2021 and via machine learning were then compared.

2.4. Development of predictive models

Polynomial Regression Models

Spectral files were imported in Viewspec Pro and an average spectrum created for five replicate readings. The summary was exported as an excel file, imported to OriginPro 2021 (OriginLab, Northampton, MA, USA), and visually inspected using a scatter plot before further processing. 2^nd order polynomial Savitzky-Golay was applied to the spectral data of LDPE, ABS and PET to remove background noise without changing the overall spectrum shape (Dai et al. 2015).

Principal Component Analysis (PCA) models were created for all three plastics. The score and loading plots were used to find the most significant wavelengths which had the greatest statistical difference among the samples and to classify the plastics. The selected wavelengths for PET, LDPE, and ABS respectively are listed in Table 2. Using the significant wavelengths as predictors, three (3) polynomial regression models were constructed for each of the plastics.

The significant wavelengths of the plastics were analyzed using the Rank model plugin (OriginLab, Northampton, MA, USA) and regression plots developed. The regression model was plotted using the most significant wavelength’s reflectance values against MP concentration. R², SD and RMSE were generated in the software (Table 2). Additionally, the residual plots of each of the polynomial regression models were observed to check whether the errors were normally distributed.

Machine learning model

First, the feature importance function and random regressor algorithm from Scikit-Learn library was used to select fifteen features (wavelengths) from the vis-NIR readings of the LDPE, PET and ABS data. The selected features and its importance scores are provided in Fig. 4a – 4c. The reflectance data from the highest scored wavelength from feature importance function were split into 70% for training and 30% for testing data. Next, a pipeline of regression algorithms with default hyperparameter settings from the Scikit-Learn library was created. The regression algorithms included in the pipeline are included in Table S2. Training data from the microplastic samples were iterated into the pipeline and the regression model with the lowest mean squared error (MSE) computed using cross-validation was returned. From the regression model selection pipeline, RF Regressor was selected for LDPE data and KNN was selected for PET and ABS data. Then, the training data for each MP sample was used to train the baseline model of the selected algorithms by using default hyperparameter settings. Next, the n_estimators, max_depth and min_samples_split hyperparameters from the RF regressor for the LDPE samples were chosen for tuning. The leaf_size, n_neighbors and p settings for the KNN regressor were selected for tuning for the PET and ABS samples. The best hyperparameter combination settings were determined by using the GridSearchCV function in Scikit-Learn and the hyperparameter-tuned models trained using the training dataset. The models developed were tested using the testing data and the regression graph of predicted vs actual values from the models plotted (Fig. 3a, 3c, 3e). The performance of baseline vs tuned models were compared using the computed mean absolute error (MAE), MSE, RMSE and R² values. Learning curves were plotted to determine the models were not overfitted (Fig. S3a – S3c).

3.1. Microplastic reflectances and regression models

Averaged reflectances recorded using ASD HandHeld 2 VNIR Spectroradiometer across all concentrations and all replicates of each microplastic-sediment sample are shown in Fig. 2 (including an average reflectance of treated beach sediment without any plastic; diamond). The reflectances of PET (triangle), ABS (square) and LDPE (cirle) were similar in shape but separated by reflectance intensities with PET recording the highest value and LDPE the lowest. Just the treated beach sediment displayed two overlaps with LDPE around 570 nm and 720-800 nm.

Applying Akaike's Information Criteria (AIC) and Bayesian Information Criteria (BIC) on the data available (using the rank model plugin, OriginLab, Northampton, MA, USA), a 9th order Polynomial regression model was selected as the best curvilinear statistical model. The following equation describes the polynomial regression model:

Where n is the degree of polynomial.

The polynomial regression plot in Fig. 3b, 3d, 3f for all the MPs followed a curvilinear trend. The highest reflectance for PET was at concentration 13%, for LDPE it was at 3% and for ABS it was at 15%. Table 2 summarises the polynomial regression models developed for PET, LDPE and ABS samples. The residual plots of each polynomial regression model (Fig. S6a – S6c) were observed to verify the regression plots were not overfitting the data and appropriate for the dataset used. No patterns were observed in the residual plots for the polynomial regression models, thus indicating the errors were normally distributed and independent (Trebuňa et al 2016).

One-way Anova tests, conducted to assess the mean reflectance values at each concentration level, confirmed that the MPs were statistically different from each other (p < 0.05; Table S7a – S7c). RMSE are widely used to observe differences between observed and predicted values of regression models and the RMSE for PET, LDPE and ABS (in the polynomial regression plots) were 0.014, 0.0093 and 0.0015; respectively (see Table 2).

For the machine learning models, the tuned RF model for LDPE outperformed the baseline model. Meanwhile, there was no improvement in the error metrics of the tuned KNN model for PET. Lastly, the baseline KNN model for ABS outperformed its tuned model. Table 2 summarises the best regression models for LDPE (tuned RF model), PET (baseline KNN model) and ABS (baseline KNN model). Refer to Table S4 for the comparison between baseline vs. tuned models for all MP samples.

3.2. Selected features

Feature selection using RF Regressor and feature importance algorithms was used to rank important features. Please refer to Table S1 for the feature importance values of each wavelength.

3.3. Optimal wavelengths selection using PCA Loading

Table 1 summarises the PCA loadings results of the three MPs. The loadings plots (Fig. S4a – S4c) were used to find the optimum wavelengths. It was seen from the PCA results that the first three PCs explained more than 70% of the total variance for all the three MPs (Table S5). The first three principal components explained 82% of the variance in the dataset for LDPE while for ABS and PET it was 79% and 73% respectively. This is an indication that the PCA model is a decent fit and the large complex multivariate data can be reduced to 3 principal components without losing much of the data (Jolliffe & Cadima, 2016). The PCA loadings plots were observed to narrow down on the most significant variables. A study by Sagar et al. (2021) states that in a large multivariate dataset there are many insignificant variables that are not needed for creating the forecasting model.

Table 1: Optimal wavelengths selected using PCA loadings.

Sample	Method	Cumulative percentage of Variance (%) explained in first 3 PCs	Wavelengths
LDPE	PCA Loading	82%	336, 398, 437, 684, 941
ABS	PCA Loading	79%	340, 369, 435, 543, 686
PET	PCA Loading	73%	358, 435, 543, 658, 683

In this study, PCA was also used as qualitative analysis. In previous studies, machine learning was used to quantify MPs but not classify them through vis-NIR spectroscopy (Huang et al. 2021). PCA classified LDPE and PET successfully (see Fig. S3) in this study. On the scores plot it can be seen that the LDPE and PET were on different quadrants which was further supported by Fig. 1 where the LDPE and PET lines were clearly separated. PET and ABS, on the other hand, were not so clearly separated.

3.4. Predictive accuracy and optimum wavelengths of both methods

Table 2: RMSE, R2, and selected optimum wavelengths generated from polynomial regression models and Machine learning linear regression models.

	Polynomial Regression Models				Machine Learning Linear Regression Models
Sample	RMSE	R2	SD	Optimum wavelengths	RMSE	R2	SD	Optimum wavelengths
LDPE	0.0093	0.98	0.063	336, 398, 437, 684, 941	2.0	0.83	0.033	1072, 347, 776, 329,769
PET	0.014	0.98	0.014	358, 435, 543, 658, 683	2.7	0.66	0.040	333, 373, 372, 332, 327
ABS	0.0015	0.98	0.0078	340, 369, 435, 543, 686	1.7	0.86	0.056	367, 373, 338, 374, 346

Table 2 summarizes the regression plots developed by the two methods. The RMSE and SD values for the polynomial regression plots were smaller compared to the machine learning models. Additionally, the R² values were higher for the polynomial regression plots. The commonly selected optimum wavelengths were around 300, 500 and 600 nm for both polynomial and machine learning regression models.

The optimum wavelengths highlighted using the ML feature importance technique typically fell within the range of 327 nm to 374 nm. The highlighted wavelengths indicate that the important wavelengths for regression model prediction are mostly within the noisy range as seen in the scatter plots (Fig S1a – S1c). The hyperparameter tuning also did not result in any significant increase to the performance metrics of the regression model except for the RF regression model. This observation is contrary to other reports where studies have shown that the RF is an excellent ML algorithm even without hyperparameter tuning (Probst et al. 2019). However, this study shows that the KNN baseline models for the PET and ABS samples resulted in good model performance while the tuned RF model had slightly higher performance metrics than its baseline model for the LDPE sample. Generally, the hyperparameter tuning did not contribute to significant improvements in any of the regression models. Progressive improvement in the learning curves for all models also indicate that the increase in training data number improves the model’s performance (Raschka & Mirjalili 2017). The learning curve gap of KNN model for ABS sample had the smallest gap in between train and validation error curves indicating low model variance and the model had a low tendency to overfit. Furthermore, despite the relatively small training data set, the performance metrics indicate the models were well trained especially for the RF and KNN trained for LDPE and ABS predictions respectively.

4.1. Polynomial regression models

The significant wavelength vs. concentration scatterplots of all the MPs showed a curvilinear trend (Fig. S5a – S5c). Sagar et al. (2021) suggests that if a curvilinear trend is observed among the variables, then for prediction, a polynomial regression is advantageous. The author also talks about a study where a 6^th degree polynomial regression was used for predicting COVID-19 in India. The regression model of the study that Sagar et al. (2021) talked about, was chosen based on highest R² and lowest RMSE values (Yadav 2020). In this study, based on Akaike and Bayesian Information Criterion statistics, a 9^th degree polynomial regression was used for predicting microplastic concentrations. AIC and BIC chose the best model among other candidate models by using probabilistic statistical measures to characterize a model’s predictive performance and its complexity (Fabozzi et al. 2014) along with the highest R² and lowest RMSE values. Table S6 summarizes the scores of AIC and BIC for the three polynomial regression models. According to the data in this study, each plastic in the same beach sediment had their own curvilinear trend. This unique spectrum readings of each plastic have been observed in other studies as well (Shan et al. 2018, Corradini et al. 2019, Li et al. 2020). These unique spectrum trends created by each plastic within the vis-NIR spectrum range (325 nm – 1075 nm) of the ASD HandHeld 2 VNIR Spectroradiometer helped in detecting the three MPs and characterizing two of the MPs (see Fig. 2 and Fig. S3).

RMSE, R² and SD values (Table 2) were more favorable for the polynomial regression plots compared to the machine learning regression plots. This could be because of the curvilinear trend of the plots which is a better fit for a polynomial graph.

Interestingly, the optimum wavelengths selected by both the models were similar. For the LDPE spiked beach sediment sample, both the models had an optimum wavelength around the 325 nm – 400 nm and 900 nm – 1075 nm range. This is suggestive of not requiring the full spectrum of vis-NIR. Furthermore, within this vis-NIR spectrum, PCA was able to differentiate the MP LDPE and PET. Therefore, it is not required to use time consuming FTIR procedure and advanced spectroradiometer for detection of MPs.

But it was also observed that some of the significant wavelengths selected by both the models fell in the noisy area of the spectrum, between 325 nm to 350 nm (Fig. 2). It is possible that the algorithms and PCA mistook the disturbance caused by the noisy data in the spectrum as the most significant variable (Renner et al. 2019).

To the best of our knowledge, no previous studies used polynomial regression to predict MPs in soil. This study generated regression models with good RMSE and R² values as seen in Table 2. Hence it is possible to use the developed polynomial regression plots for predicting MPs in beach sediment within the spectrum range of 325 nm to 1075 nm.

4.2. Machine learning models

A study by Moroni et al. (2015) highlighted that the LDPE and PET samples peak at wavelength greater than 1100 nm, the feature importance algorithm applied in this study highlighted different ranges of wavelength importance for the machine learning algorithms to learn. Generally, for PET and ABS, the important features are around the 300 nm range while for LDPE it is around the 300 nm and 700 nm ranges. This indicates although LDPE and PET samples peak and better recognized in wavelength greater than 1100 nm, these wavelengths are not necessarily important for the ML algorithm model development.

To the best of our knowledge, there are no known studies using ML-based techniques for MPs detection and quantification in soil using vis-NIR data. The closest related study by Corradini et al. (2019) reported the application of multilinear regression by regressing the known MPs concentration with absorbance readings at 350 nm – 2500 nm for LDPE and PET samples. In Corradini et al. (2019), a Bayesian approach was applied to determine the most probable linear regression model. From the same study, the R²values reported were 0.95 and 0.87 respectively in comparison to 0.83 and 0.66 obtained from this study for LDPE and PET respectively. Although the R²values from Corradini et al. (2019) show better regression models, the detection limit was only at 10 g kg-1 (1% w/w). Meanwhile, our study exhibits higher detection limit of up to 15% w/w of MPs concentration, particularly for LDPE and ABS samples where the R² values are the highest (R²>0.80). Considering MPs contamination in soil samples are typically beyond 1% w/w detection limit (Ng, Minasny & McBratney 2020), there is a potential of using vis-NIR and ML linear regression technique for the detection of higher concentration of MPs in soil sediment.

For further study, varying concentrations can be used to measure the predictability of the model. Our results suggest that in the future a more accurate polynomial regression model can be built to predict the concentrations of MPs in the beach sediments. Furthermore, differently colored MPs and polymer types can be used to create the regression model as plastics products in our environment have a wide range of color and material.

Our study explores two novel statistical methods in vis-NIR spectroscopy of soil MPs. First, the reflectances of three different virgin microplastics were measured in treated beach sediment, thus standardizing the soil sample. Second, polynomial regression models were developed and compared with machine learning algorithms for predicting the MPs in the soil sample.

The results show that the best linear regression models developed for LDPE, PET and ABS using machine learning algorithms resulted in R² values of 0.83, 0.66 and 0.86 with RMSE values of 2.0, 2.7 and 1.7 respectively. The best models developed were from the baseline model except for LDPE whereby hyperparameter tuning resulted in slightly higher accuracy metrics in comparison to its baseline model. The learning curves also indicated that the models' accuracy increased with respect to the training data number suggesting that the ML models can be further improved with the addition of more training data. Previous studies on MPs detection have shown low detection limits. While the detection limit was not quantified in this study, the relatively high accuracy metrics developed for samples up to 15% w/w concentration of MPs, indicates the potential of using this technique to detect MPs with higher detection limit. On the other hand, the polynomial regression technique also displayed several advantages. The R² values for LDPE, PET and ABS were 0.98 for all three MPs with RMSE values of 0.0093, 0.014 and 0.0015 respectively. The RMSE values were significantly lower compared to the ML learning regression models. Additionally, PCA in regression modelling was able to distinguish between LDPE and PET successfully in the scores plot, a form of qualitative analysis that has not been done via ML as of yet. With both the methods producing similar optimum wavelengths for each MP in the applied vis-NIR spectrum range, it is not required to use advanced spectroradiometer and FTIR for detection of MPs in soil.

Acknowledgement

The authors would like to thank Dr. Xavier Chee of Swinburne University of Technology Sarawak Campus, for his help with collecting the soil samples.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Author Contributions

Faisal Raiyan Huda: Conceptualization, Investigation, Formal analysis, Writing – Original Draft. Florina Stephanie Richard: Software, Formal analysis, Writing – Review & Editing. Ishraq Rahman: Formal analysis, Writing – Review & Editing. Clarence Tay Yuen Huaa: Writing – Review & Editing. Christabel Anfield Sim Wanwena: Writing – Review & Editing. Moritz Müller: Supervision, Methodology, Writing – Review & Editing. All authors read and approved the final manuscript

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

An L, Liu Q, Deng Y, Wu W, Gao Y, Ling W (2020) Sources of microplastic in the environment. In: He D, Luo Y (eds) Microplastics in Terrestrial Environments. The Handbook of Environmental Chemistry, vol 95. Springer, Cham pp 143–159
Anbumani S, Kakkar P (2018) Ecotoxicological effects of microplastics on biota: a review. Environ Sci Pollut Res Int. 25(15):14373–14396. https://doi.org/10.1007/s11356-018-1999-x
Andrade C (2019) The P Value and Statistical Significance: Misunderstandings, Explanations, Challenges, and Alternatives. Indian J Psychol Med 41(3):210-215. https://doi.org/10.4103/IJPSYM.IJPSYM_193_19
Bläsing M, Amelung W (2018) Plastics in soil: Analytical methods and possible sources. The Science of the total environment 612 422–435. https://doi.org/10.1016/j.scitotenv.2017.08.086
Bouwmeester H, Hollman PC, Peters RJ (2015) Potential Health Impact of Environmentally Released Micro- and Nanoplastics in the Human Food Production Chain: Experiences from Nanotoxicology. Environ. Sci. Technol 49:8932–8947. https://doi.org/10.1021/acs.est.5b01090
Campanale C, Massarelli C, Savino I, Locaputo V, Uricchio VF (2020) A detailed review study on potential effects of microplastics and additives of concern on human health. Int J Environ Res Public Health 17(4):1212. https://doi.org/10.3390/ijerph17041212
Chaczko Z, Wajs-Chaczko P, Tien D, Haidar Y (2019) Detection of Microplastics Using Machine Learning. Proc - Int Conf Mach Learn Cybern 2019-July: https://doi.org/10.1109/ICMLC48188.2019.8949221
Chae Y, An YJ (2018) Current research trends on plastic pollution and ecological impacts on the soil ecosystem: A review. Environ Pollut. 240:387–395. https://doi.org/10.1016/j.envpol.2018.05.008
Chen Y, Wen D, Pei J, Fei Y, Ouyang D, Zhang H, Lou Y (2020) Identification and quantification of microplastics using Fourier-transform infrared spectroscopy: Current status and future prospects. Curr Opin Environ Sci Heal 18:14–19. https://doi.org/10.1016/J.COESH.2020.05.004
Corradini F, Bartholomeus H, Huerta Lwanga E, Gertsen H, Geissen V (2019) Predicting soil microplastic concentration using vis-NIR spectroscopy. Sci Total Environ 650:922–932. https://doi.org/10.1016/J.SCITOTENV.2018.09.101
Dai Q, Cheng JH, Sun DW, Zeng XA (2015) Advances in feature selection methods for hyperspectral image processing in food industry applications: a review. Crit Rev Food Sci Nutr. 55(10):1368–1382. https://doi.org/10.1080/10408398.2013.871692
de Souza Machado AA, Kloas W, Zarfl C, Hempel S, Rillig MC (2018) Microplastics as an emerging threat to terrestrial ecosystems. Glob Change Biol 24: 1405-1416. https://doi.org/10.1111/gcb.14020
Duis K, Coors A (2016) Microplastics in the aquatic and terrestrial environment: sources (with a specific focus on personal care products), fate and effects. Environ Sci Eur. https://doi.org/10.1186/s12302-015-0069-y
Fabozzi FJ, Focardi SM, Rachev ST, Arshanapalli BG (2014) Appendix E: Model Selection Criterion: AIC and BIC. In The Basics of Financial Econometrics. https://doi.org/10.1002/9781118856406.app5
Gionfra S (2018) Plastic pollution in soil. Isqaper-is.eu. https://www.isqaper-is.eu/key-messages/briefing-papers/125-plastic-pollution-in-soil.
Harshitha NK, Varghese LS, Harshitha MR, Jinsha VK 2020, ‘Microplastic detection in water using image processing’, International Journal of Applied Engineering Research 15:85-88. https://www.ripublication.com/ijaerspl20/ijaerv15n1spl_12.pdf
He D, Luo Y, Lu S, Liu M, Song Y, Lei L (2018) Microplastics in soils: Analytical methods, pollution characteristics and ecological risks. TrAC Trends in Analytical Chemistry. 109: 163-172. https://doi.org/10.1016/j.trac.2018.10.006
Horton AA, Walton A, Spurgeon DJ, Lahive E, Svendsen C (2017) Microplastics in freshwater and terrestrial environments: Evaluating the current understanding to identify the knowledge gaps and future research priorities. Sci Total Environ. 586:127–141. https://doi.org/10.1016/j.scitotenv.2017.01.190
Huang J, Chen H, Zheng Y, Yang Y, Zhang, Y, Gao B (2021) Microplastic pollution in soils and groundwater: Characteristics, analytical methods and impacts. Chemical Engineering Journal 425: 131870. https://doi.org/10.1016/j.cej.2021.131870
Huang, L, Jia, L, Yu, B, Chun, BD, Maniatis, P & Naik, M (2010) Predicting execution time of computer programs using sparse polynomial regression. 23rd International Conference on Neural Information Processing Systems 1:883–891. https://dl.acm.org/doi/10.5555/2997189.2997288
Jang FHA (2020) Interaction of microplastics with trace metals and bacteria; and potential impacts on fish. PhD thesis, Swinburne University of Technology Sarawak, Malaysia.
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci. 374(2065):20150202. https://doi.org/10.1098/rsta.2015.0202
Li Y, Yao J, Nie P, Feng X, Liu J (2020) An effective method for the rapid detection of microplastics in soil. Chemosphere. 276: 128696. https://doi.org/10.1016/j.chemosphere.2020.128696
Löder MGJ, Gerdts G (2015) Methodology used for the detection and identification of microplastics—A critical appraisal. In: Bergmann M, Gutow L, Klages M, editors. Marine Anthropogenic Litter. Cham: Springer International Publishing pp. 201–227 (https://link.springer.com/chapter/10.1007/978-3-319-16510-3_8)
Manley M (2014) Near-infrared spectroscopy and hyperspectral imaging: non-destructive analysis of biological materials. Chem Soc Rev 43:8200–8214. https://doi.org/10.1039/C4CS00062E
Massarelli C, Campanale C, Uricchio VF (2021) A Handy Open-Source Application Based on Computer Vision and Machine Learning Algorithms to Count and Classify Microplastics. Water 2021, Vol 13, Page 2104 13:2104. https://doi.org/10.3390/W13152104
Matthew E, & Adeyinka O (2020) Application of Hierarchical Polynomial Regression Models to Predict Transmission of COVID-19 at Global Level. International Journal of Clinical Biostatistics and Biometrics, 6(1). https://doi.org/10.23937/2469-5831/1510027
Moroni M, Mei A, Leonardi A, Lupo E, Marca FL (2015) PET and PVC Separation with Hyperspectral Imagery. Sensors. 2015; 15(1):2205-2227. https://doi.org/10.3390/s150102205
Ng W, Minasny B, McBratney A (2020) Convolutional neural network for soil microplastic contamination screening using infrared spectroscopy. Sci Total Environ 702:134723. https://doi.org/10.1016/J.SCITOTENV.2019.134723
Nizzetto L, Langaas S, Futter M (2016) Pollution: Do microplastics spill on to farm soils?. Nature 537(7621):488. https://doi.org/10.1038/537488b
OriginLab Corporation, 2021, Origin(Pro), software, Version 2021b, Northampton, MA, USA.
Ostertagová E (2012) Modelling using polynomial regression. Procedia Eng. 48:500–506. https://doi.org/10.1016/j.proeng.2012.09.545
Pedregosa F, Varoquaux G, Gramfort A et al. (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res. 12(null):2825–2830. https://dl.acm.org/doi/10.5555/1953048.2078195
Probst P, Boulesteix AL, Bischl B (2019) Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research. http://jmlr.org/papers/v18/18-444.html.
Rahman I, Mujahid A, Palombo E, Müller M (2021) A functional gene-array analysis of microbial communities settling on microplastics in a peat-draining environment. Mar Pollut Bull 166: 112226. https://doi.org/10.1016/J.MARPOLBUL.2021.112226
Raschka S, Mirjalili (2017), Python Machine Learning - Second Edition: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow, Packt Publishing, Limited, Second Edi.
Renner G, Nellessen A, Schwiers A, Wenzel M, Schmidt TC, Schram J (2019) Data preprocessing & evaluation used in the microplastics identification process: A critical review & practical guide. TrAC Trends in Analytical Chemistry, 111: 229-238. https://doi.org/10.1016/j.trac.2018.12.004
Rillig MC, Ziersch L, Hempel S (2017) Microplastic transport in soil by earthworms. Sci Rep 7:1362. https://doi.org/10.1038/s41598-017-01594-7
Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 2021 32 3:1–12. https://doi.org/10.1007/S42452-021-04148-9
Sagar P, Gupta P, Kashyap I (2021) A forecasting method with efficient selection of variables in multivariate data sets. Int J Inf Tecnol 13:1039–1046. https://doi.org/10.1007/s41870-021-00619-9
Shahul Hamid F, Bhatti MS, Anuar Norkhairiyah, Anuar Norkhairah, Mohan P, Periathamby A (2018) Worldwide distribution and abundance of microplastic: How dire is the situation? Waste Manag Res. 36(10):873–897. https://doi.org/10.1177%2F0734242X18785730
Shan J, Zhao J, Liu L, Zhang Y, Wang X, Wu F (2018) A novel way to rapidly monitor microplastics in soil by hyperspectral imaging technology and chemometrics. Environmental Pollution 238: 121 – 129. https://doi.org/10.1016/j.envpol.2018.03.026
Ter Halle A, Jeanneau L, Martignac M, Jardé E, Pedrono B, Brach L, Gigault J (2017) Nanoplastic in the North Atlantic Subtropical Gyre. Environmental Science & Technology 51(23): 13689-13697. https://doi.org/10.1021/acs.est.7b03667
Thompson RC, Swan SH, Moore CJ, Saal FS vom (2009) Our plastic age. Philos Trans R Soc B Biol Sci 364:1973–1976. https://doi.org/10.1098/RSTB.2009.0054
Trebuňa F, Ostertagová E, Frankovský P, Ostertag O (2016) Application of Polynomial Regression Models in Prediction of Residual Stresses of a Transversal Beam. American Journal of Mechanical Engineering 4:247-251. http://pubs.sciepub.com/ajme/4/7/3
Wang W, Ge J, Yu, X, Li H (2020) Environmental fate and impacts of microplastics in soil ecosystems: Progress and perspective. Science of The Total Environment 708: 134841. https://doi.org/10.1016/j.scitotenv.2019.134841
Yadav RS (2020) Data analysis of COVID-2019 epidemic using machine learning methods: a case study of India. Int. j. inf. tecnol 12:1321–1330. https://doi.org/10.1007/s41870-020-00484-y
Yang L, Zhang Y, Kang S, Wang, Z, Wu C (2021) Microplastics in soil: A review on methods, occurrence, sources, and potential risk. Science of The Total Environment 780: 146546. https://doi.org/10.1016/j.scitotenv.2021.146546

Appendix.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Comparison of polynomial and machine learning regression models to predict LDPE, PET, and ABS concentrations in beach sediment based on spectral reflectance

Status:

Version 1

Abstract

Figures

1. Introduction

2. Materials And Methods

2.1. Overview of methodological approach

2.2. Sample preparation

Collection and pre-treatment of beach sediment

Reflectance measurements of artificially contaminated beach sediment samples

2.3. Overview of reflectance processing approaches

2.4. Development of predictive models

Polynomial Regression Models

Machine learning model

3. Results

3.1. Microplastic reflectances and regression models

3.2. Selected features

3.3. Optimal wavelengths selection using PCA Loading

3.4. Predictive accuracy and optimum wavelengths of both methods

4. Discussion

4.1. Polynomial regression models

4.2. Machine learning models

5. Conclusion

Declarations

References

Supplementary Files

Status:

Version 1