Root concentration factor is an important substance-specific characterization parameter for plant uptake of organic contaminants from soils in life cycle impact assessment (LCIA); however, the availability of a reliable dataset and building of robust predictive models remain challenging due to the complexity of chemical-soil-plant root interactions. Here we developed end-to-end machine learning models to devolve the interaction complexity by training on a unified dataset with 341 data points covering 72 chemicals. The gradient boosting regression tree (GBRT) model based on the extended connectivity fingerprints (ECFP) demonstrated a superior prediction performance with R-squared of 0.77 and Mean Absolute Error (MAE) of 0.22. In addition, partial dependence analysis was used to determine the nonlinear relationships in the chemical-soil-plant root system. Feature importance analysis revealed the relationship between and chemical topological structures. Stemming from its simplicity and universality, the GBRT-ECFP model provides a promising tool for LCIA to better characterize the human and ecological impacts of chemicals in the environment.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5
The full text of this article is available to read as a PDF.
There is NO Competing Interest.
This is a list of supplementary files associated with this preprint. Click to download.
SI Table S2 heatmap spredsheets
Supplementary Information
Supplementary Information - updated 05/03
Loading...
Posted 08 Mar, 2021
Posted 08 Mar, 2021
Root concentration factor is an important substance-specific characterization parameter for plant uptake of organic contaminants from soils in life cycle impact assessment (LCIA); however, the availability of a reliable dataset and building of robust predictive models remain challenging due to the complexity of chemical-soil-plant root interactions. Here we developed end-to-end machine learning models to devolve the interaction complexity by training on a unified dataset with 341 data points covering 72 chemicals. The gradient boosting regression tree (GBRT) model based on the extended connectivity fingerprints (ECFP) demonstrated a superior prediction performance with R-squared of 0.77 and Mean Absolute Error (MAE) of 0.22. In addition, partial dependence analysis was used to determine the nonlinear relationships in the chemical-soil-plant root system. Feature importance analysis revealed the relationship between and chemical topological structures. Stemming from its simplicity and universality, the GBRT-ECFP model provides a promising tool for LCIA to better characterize the human and ecological impacts of chemicals in the environment.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5
The full text of this article is available to read as a PDF.
There is NO Competing Interest.
This is a list of supplementary files associated with this preprint. Click to download.
SI Table S2 heatmap spredsheets
Supplementary Information
Supplementary Information - updated 05/03
Loading...