According to the final map, three classes of "poor," "very poor," and "good" are the three major classes of GWQI in the study area. This ranking designates different ground-water usage for each class. The most dominant groundwater quality class is "poor," which is associated with GWQI of 51 to 75 (Table 2). The poor-quality groundwater concentrates mainly in eastern parts of the Mazandaran plain southeast of the Caspian sea (Fig. 8). The poor GWQI class is associated with the areas of high industrial and residential sectors, high evaporation, low precipitation, and various transmissivity values (Fig. 2). The "good" GWQI is the second dominant class of groundwater quality in the study area. This area is mostly concentrated southwest of the study area and is associated with an area of high aquifer transmissivity, high precipitation, low evaporation, and fewer industrial sectors. Finally, the northwest part of the study area is identified to have the poorest groundwater quality and is classified as a "very poor" GWQI class (Fig. 8). This area is heavily occupied by the industrial sector and is associated with an area of high population, high evaporation, and high aquifer transmissivity (Fig. 2). The area lacks the "excellent" GWQI except for very small localities.
According to WHO water quality standards (WHO 2004), each class of GWQI is suitable for particular usage. Subsequently, excellent GWQI (0–25) is ideal for drinking without further treatments; good GWQI (26–50) is permissible for drinking water as well as industrial purposes. It is permissible to use poor GWQI (51–75) for irrigation, while very poor class (> 75) is not suitable for human consumption and irrigation purposes.
One of the main factors determining groundwater quality is the geological units containing the groundwater (Umar et al. 2013). As reported in previous studies, the geochemical interactions between water and rocks can affect groundwater quality by introducing heavy metals (Khatriand Tyagi 2015; Kubier et al. 2019). In our study area, however, geological units comprised of alluvial sediments of the Qua-ternary period with minimum variation. Therefore, geological units were removed from the sets of predictors due to their invariability.
The aquifer characteristics affect the groundwater quality in various ways. Shallow unconfined aquifers are more exposed to pollutants compared to deep aquifers. The sample in this study was taken from the unconfined aquifer, and the depth of the monitoring wells varies between 20 to 40 m. In the study area, deep wells (> 100 m depth) that are less affected by pollutants are used for residential drinking purposes.
Six ML algorithms used in this study showed strong performance in GWQI modeling. However, significant differences were observed during the accuracy assessment (Table 4; Fig. 5). The three models that showed very high performance in GWQI modeling are RF, SVM, and XGB. These three models have proven to be robust algorithms in environmental modeling (Sahour et al. 2021a,b, and c ). RF slightly outperformed the other models. This was reported in previous studies. For example, RF and SVM were used for water quality index modeling in Algeria and the results revealed an overall outperformance of the RF model (Sakaa et al. 2022). In another research compared SVM, RF and ANN models to develop a chemical sensor and concluded that the RF algorithms perform better. This is not a general rule. Performance of ML algorithms varies based on the type, distribution, and size of datasets (Gayathri et al. 2022, Najwa Mohd Rizal et al. 2022, Waziry et al. 2022, Wang and Zeng 2022).
The models' performance was highest in classifying the excellent and very poor classes according to the ROC curves (Fig. 4). This shows that the predictors can better explain extreme (very high and very low quality) classes compared to the other two classes.
Data structure and attributes are important factors in implementing a specific ML algorithm. For example, ANN in this study provided satisfactory results. It also has proven to be a strong tool for environmental modeling (Alshehri et al. 2020; Gholami et al. 2021 a and b; Gholami and Sahour 2022). However, compared to other models performed poorly during the accuracy assessment. The ANN models are designed to work better on large dataset and may not perform well on small dataset (Mao et al. 2006).
The adopted methodology uses already available datasets for the prediction of groundwater quality. The method is cost-effective and can be used for other regions with similar geological and climate settings. The generalization of trained algorithms is one of the major limitations of ML-based methods for water quality modeling. For example, a model that is trained in one location may not be suitable for prediction in other regions. The performance of models depends on the area's characteristics and, in most cases, can-not be generalized for prediction in other locations, especially those with different geolog-ical and climate settings (Garza-Pérez et al. 2004).
The source of pollution can be site-specific and varies from region to region. There-fore, it is imperative to analyze further the controlling factors of groundwater quality to prioritize treatment practices. In this study, we used two variable importance analysis that considers the interaction among the variables and ranked them based on their im-portance. The most important factor in groundwater quality in the study area was found to be the distance from manufacturers and industrial sectors. This is especially true for local groundwater quality, as very poor quality was found around manufacturers and industrial towns. As further analysis using the partial dependence plot showed (Fig. 6), the quality of groundwater increases with the increase in the distance from industrial lo-cations. This is a crucial finding since the focus for groundwater quality treatments can focus on the industrial sector and adopt strategies to reduce the pollutants and industrial discharges to the aquifer system. In agricultural lands, the consumption of nitrogen ferti-lizers is an influential factor affecting groundwater quality. The predominant type of ag-riculture in the area is rice farms that rely on fertilizers, which introduce nitrates to uncon-fined aquifer systems and degrades the groundwater quality. The population density was identified as the second important factor affecting groundwater quality. This param-eter affects groundwater quality in different ways. Population density has a direct rela-tionship with introducing pollutants to the aquifer systems and groundwater extraction. Additionally, there are various small businesses, including small farms, small manufac-turers, and local livestock and poultry farming, with the local communities introducing pollutants to the groundwater system.
In this study, GWQI was used as an indicator to evaluate the groundwater samples' quality. As explained earlier, this indicator assigns weights to the various chemical and physical parameters of water to measure its quality. One of the limitations of this method is ignoring other harmful materials that may exist in groundwater samples. For example, radioactive elements and bacterial pollution are not included in GWQI and, therefore, were not considered in this study.