The objective of the paper is to explain how we used soft sensing based on machine learning models to estimate some water quality parameters in lined pond conditions from an indoor commercial shrimp (Litopenaeus vannamei) farm in Vietnam. Specific water quality parameters provide valuable insight into shrimp pond conditions which are critical for managerial decision making. Some parameters can be easy to measure using relatively inexpensive hand-held sensors submerged in the water and require minimal experience. Other parameters are far more expensive to measure because they require experienced labour, time consuming processes such as laboratory analyses of pond water samples, and ongoing materials costs.
Soft sensing refers to the process of estimating a variable from other directly measured variables. In this case, estimating variables that are difficult or time consuming to measure (ammonia, settling solids and total suspended solids) from variables that are easy and quick to measure along with pond input data. The aim is to reduce the time, cost, and requirement for experienced labour to monitor key pond water quality parameters. The study summarises the machine learning models we adopted and the accuracies we achieved in estimating key water quality parameters using soft sensing for commercial, super-intensive indoor shrimp farming.
We investigated different machine learning models to accurately estimate the target parameters. We investigated several different machine learning models for predicting the above target variables including Neural Network, long short-term memory Networks, Recurrent Neural Network, and Convolutional Neural Network etc. But these deep learning models did not produce good estimation results. This is most likely because these algorithms require huge volumes of data for effective training and the current data set is very small. Support Vector Regression was a good choice for modelling on small data sets. However, SVR models sometimes generate negative values that makes it unsuitable for estimation of WQ parameters. We used an ensemble tree-based modelling (Random Forest) approach that produced accurate as well as positive predictions hence making it suitable for these datasets.
We conducted multiple validation process to understand the effectiveness of the machine learning models. We used a leave-out-one-pond cross validation approach where we left one pond for testing and used the remaining ponds within a trial for model training. These validations were performed within a single trial (called ‘within trial’). In another validation approach, we trained models on ponds from one or multiple trials and tested on ponds from a separate trial (called ‘cross trial’).
Ammonia estimation results based on machine learning models indicate that more accurate estimations were achieved using the ‘within trial’ validation than the ‘cross trial’ validation. This variability of ammonia among ponds in initial trials lead to relatively worse ‘cross trial’ estimation performance. However, ‘cross trial’ validations at later stages provided the highest accuracy. This demonstrates that as protocols are managed more consistently, estimating ammonia with high accuracy could become very likely.
For total suspended solids estimation, the predicted value provided a reliable enough estimate for pond managers to make informed decisions about the total suspended solids concentrations in the pond. There are some occasions where total suspended solids was underestimated. In occasions where this occurred, the estimation aligned itself with the actual values within the next few samples. Therefore, using the more frequently measured turbidity values to estimate total suspended solids might provide a more realistic indicator of the changes in pond conditions from day to day. Estimation of settling solids was highly inaccurate compared to total suspended solids and further investigation is needed on this front.