Evaluation and prediction of irrigation water quality of an agricultural district, SE Nigeria: an integrated heuristic GIS-based and machine learning approach

39 Poor irrigation water quality can mar agricultural productivity. Appraising the irrigation water quality requires the 40 computation of various conventional quality parameters which are often time-consuming and associated with errors 41 during sub-index computation. It becomes critical therefore, to have a visual assessment of the irrigation water quality 42 and identify the most influential water quality parameters for accurate prediction, management, and sustainability of 43 irrigation water quality. The overlay weighted sum technique was used to generate the irrigation water quality (IWQ) 44 map of the area. The map revealed that 72.5% of the area (within the southeastern parts) were suitable for irrigation 45 while 28.4% (found in isolated traces) were unsuitable. Multilayer perceptron artificial neural networks (MLP-ANNs) 46 and multiple linear regression models (MLR) were integrated and validated to predict the IWQ parameters using Cl - , 47 HCO 3-SO 42, NO 3-, Ca 2+ , Mg 2+ , Na + , K + , pH, EC, TH and TDS as input variables, and PI, MAR, SAR, PI, KR, SSP, 48 and PS as output variables. The two models showed high performance accuracy based on the results of the coefficient 49 of determination ( R 2 = 0.513-0.983). Low modeling errors were observed from results of the sum of square errors 50 (SOSE), relative errors (RE), adjusted R-square ( R 2adj ), and residual plots; further confirming the efficacy of the two 51 models, although the MLP-ANNs showed higher prediction accuracy with respect to R 2 . Based on the sensitivity of 52 the MLP-ANN model, HCO 3 , pH, SO 4 , EC, and Cl were identified to have the greatest influence on the irrigation 53 water quality of the area. This study has shown that the integration of GIS and Machine Learning can serve as rapid 54 decision tools for proper planning and enhanced agricultural productivity. 55


Introduction
Water is an important resource for drinking, agriculture, and industrial purposes.However, the global upsurge in industrialization, population growth, mining, over-exploitation of available water sources, and poor waste management strategies, has led to the deterioration and further decline in its quality (Kouadri and Samir 2021;Unigwe et al. 2022;Okamkpa et al. 2022;Omeka et al. 2022).Hence, improving water quality through adequate quality assessment and improved resource management strategies would significantly reduce the economic burden of water treatment and as a consequence, serve as a buffer for economic sustainability and improved crop yield.According to the climate change projections, the global decline in water quality will only worsen if adequate remedial measures are not put in place (Pleguezuelo et al. 2018;Omeka and Egbueri 2022).This problem may be compounded especially in areas plunged by anthropogenic activities such as mining, agriculture and poor management of its effluents.In the distant past, water quality, especially for agriculture was often disregarded due to the readily available good quality water (Ayers and Westcot 1985).In recent times however, the increased anthropogenic stresses on water sources have led to a global concern among researchers and water resource managers alike.The use of poor water quality for irrigation will remain a lingering challenge to economic sustainability as a problem relating to soil and crop yield (e.g.soil salinity, decline in irritation rate and plant growth, plant toxicity etc.) unless adequate remedial measures are adopted (Ayers and Westcot 1985;Mokhtar et al. 2022).
This study was focused on the Okurumutet-Iyamitet agriculture and mine province in southeastern Nigeria where intensive mining activities (for barite) and agriculture have been ongoing.The mine wastes (tailings), mine water and its effluents are being disposed of into nearby lands and surface water bodies.The excavated overburden is known to contain potentially toxic elements (El-Amari 2014).Agriculture accounts for the major occupation of the area, where intensive use of agrochemicals (e.g.inorganic fertilizers and pesticides) are a common practice.During rainfall, the chemical elements (from agrochemicals) may infiltrate and percolate the groundwater aquifer system or be transported as run-off from agricultural lands into nearby surface water bodies.In this region both surface water and groundwater are being used for irrigation, however, the increased agricultural activities in the area has put more pressure on the available water resources.As a result, most of the irrigation water are being sourced from effluent-derived surface water bodies such as ponds and streams, and this can result in potential negative impacts on the irrigation water quality, the soil quality and crop yield (Aravinthasamy et al. 2020;Mokhtar et al. 2022).Following the need for the regular monitoring of irrigation water quality for sustainable agricultural development, some irrigational indices have been developed by several researchers.These include residual sodium carbonate (RSC), sodium adsorption ratio (SAR), soluble sodium percentage (SSP), potential of salinity (PS); Kelley's ratio (KR), and permeability index (PI) (Kelley, 1963;Richards, 1954a;Doneen, 1964;Ayers and Westcot, 1985;Todd and Mays 2005;Gholami and Srikantaswamy 2009).Although the use of these models have proven useful in irrigation water quality assessment, as they have the capacity to mathematically transform the irrigational water quality parameter scores into a numerical value.
Accordingly, the value of each score depicts the overall state of the water for irrigation purpose; and interpretations can then be drawn from these results, using their classification criteria (Setshedi et al. 2021).The major disadvantage with the use of these indices is the discrepancy usually observed between the different indices; as one parameter may ascertain a particular water to be suitable while the other may interpret otherwise.Moreover, several computational errors may arise during sub-index calculation (Kouadri et al. 2021).It is therefore important to predict and conduct a comparative analysis of the performance level of the various irrigation indices, to know which one would be more accurate in irrigation water quality assessment.This can be achieved by integrating several machine learning models (Gaya et al. 2020).The advantages of the use of the machine learning models is that predictions can be made from nonlinear data, with very high prediction accuracy (Gaya et al. 2020;Egbueri et al. 2022).Also, all functions from a complex dataset can be learned and predicted more accurately (Ahmed et al., 2019;Mokhtar et al., 2022).Asides their efficiency and reliability in water quality prediction, the use of machine learning in water quality assessment has shown to be less expensive, less time-consuming, and less tedious, as compared to the conventional water quality assessment techniques.
Machine learning models have been widely applied by several studies for the assessment of the drinking and irrigation water quality in various regions of the world.For instance, Artificial neural networks (ANN) have been employed in the prediction of TDS and other physicochemical parameters of a groundwater aquifer in Canada (Pan et al. 2019).Some machine learning algorithms such as support vector machine (SVM), radial basis function neural network (RB-NN) and backpropagation neural network (BP-NN) have been integrated and compared to predict the water quality in constructed wetlands (Mohammadpour et al. 2015).Panneerselvam et al. (2021) in their study assessed the drinking water quality of Maharashtra, India using the RB-NN.An ideal model for irrigation water quality estimation have been developed by integrating ANN and multiple linear regression models (MLR) (Yildiz and Karakuş 2019).This approach proved to be efficient in irrigation water quality assessment, through the use of several water quality parameters.Similarly, the water quality assessment of Brahmani River was carried out using a comparative regression and correlation analysis (Nayak 2020).This study concluded that regression analysis can serve as a more valuable technique for the monitoring and prediction of water quality trends.
From the review of previous literatures, the various water quality indices have been simultaneously applied to classify water samples into different categories.This process is often associated with bias in water quality appraisal and drawback in decision-making.Hence, this study has proposed a composite irrigational water quality (IWQ) zonation map that integrates the conventional irrigational water quality indices and prediction of the most influential irrigational water quality index for rapid decision-making in an agricultural province in southeastern Nigeria.The prediction of the accuracy of the irrigation water quality indices was done using the multilayer perceptron ANN (MLP-ANN), and the Multiple linear regression (MLR) Machine learning models.The GIS-based spatiotemporal modeling approach (using the Kriging interpolation and overlay weighted sum technique) was applied to delineate the suitable water zones for irrigation purposes by combining all the traditional irrigation water quality indices in the GIS environment.
Generally, ANN and MLR models have been found very useful in forecasting environmental conditions compared to other advanced machine learning algorithms due to their efficiency, flexibility and ease of implementation (even within systems of low computational strength).ANN for one, has found usefulness in most environment studies because they can efficiently predict linear and non-linear quantitative variables (Egbueri et al. 2022).Due to its ability to carry out quantitative analysis, ANN can easily identify and make up for limitations caused by data input and analysis (Corominas et al. 2014).MLR on the other hand, has low computational power and is simple to implement, compared to other machine learning techniques, making it easy for data manipulation and computation; even in less sophisticated systems.MLR modeling technique was integrated in the present study to complement the ANN modeling in predicting the irrigational water quality (output) using the linear relationships existing between the input and output variables.
Some studies have been carried out in the area regarding hydrogeochemical assessment of water resources for drinking and domestic purposes using conventional methods (Adamu et al. 2014(Adamu et al. , 2015)).Reports from the studies concluded that the water sources in the area have been polluted due to barite mining activities.However, the literature review showed that no study has attempted the irrigation suitability assessment of water resources in the area using the integrated machine learning and GIS modeling approach.Hence, the present study aims to develop and validate two supervised machine learning models (ANNs and multiple linear regression) to predict the irrigation water quality.A composite irrigation water quality zonation map of the agricultural district, has also been developed using GIS-based spatiotemporal models.It is worthy of mentioning that this study is the first to predict the irrigational water quality using a joint supervised machine learning and GIS-based modeling approach in southeastern Nigeria.It is expected that findings documented from this study will aid policymakers in adequate water resource management and decision making.It would also serve as boost to farmers for enhanced irrigational water management and agricultural productivity.

Study area description, geology and hydrogeology
The study area occurs within the southeastern part of Obubra, enclosed within the central part of Cross River State, Nigeria.The area is dominated by mining (for barite) and intensive agricultural activities, where intensive use of agrochemicals (e.g.inorganic fertilizers and pesticides) are a common practice.The area occurs within latitude 5° 51'00" and 5° 55'00" North, and longitude 8° 19'00" and 8° 24'00" East of the Greenwich meridian, with a low to gently undulating topography, ranging in height from 50-180m above sea level, with no notable ridges or hills (Fig. 1).The area is found within the tropical rainforest.The vegetation and climate of the tropical rainforest is characteristic of that of the equatorial region (Edet and Okereke 2014).Two forest reserves are known to occupy the northern and southern parts of the area; with the Cross River National Park occupying the northern parts while the Cross River National Forest covers the southern parts (Fig. 1).Areas that are not conserved are mostly dominated by extensive anthropogenic activities such as farming, mining, lumbering and settlement, giving rise to secondary forest and derived savannah vegetation.Two major seasons (wet and dry seasons) modifies the climate of the area; the wet season spans from March to September while the dry season spans from October to April.As reported by Edet and Okereke (2014), on average, the yearly rainfall occurs between 2018mm to 3370mm.
The streams in the area are structurally controlled by dendritic drainage patterns and drained by the perennial Cross River (CRBDA 1992;Adamu 2014Adamu , 2015)).The groundwater aquifer systems in the area is recharged by the perennial Cross River and its tributaries (CRBDA 1992).According to a report by the Cross River Basin Development Authority (CRBDA), groundwater occurrence in the area is controlled structurally by a conjugate joint pattern, with the depth to water table varying between 20m and 25m.Although secondary structures like fractures, joints, fissures and other weathered lithologic units determine the condition of groundwater in basement complex rocks and shales, they are significant in aquifer transmissivity (Kudamnya et al. 2019).
An observation of Fig. 1 shows that the geology of the area is underlain by both basement and sedimentary rock units.
The basement is represented by the Precambrian Oban massif, while the sedimentary rock units are represented by the Southern Benue Trough and Mamfe Embayment.Reports by some authors (Rahman et al. 1988;Ekwueme 1987), revealed that the major lithostartigraphic units of the basement include schists, Granite and gneisses, amphibolites, pegmatite, metaperidotites, dolerites and charnokites.On the other hand, the sedimentary area is represented by the Cretaceous Mamfe formation with rocks such as sandstone, shales, and siltstone units making up the major lithostratigraphic framework (Akpeke 2008;Adamu et al. 2015).

Field sampling and laboratory analysis
For the present study, 21 water samples (n=21) representing surface water and groundwater were randomly collected from streams, ponds and boreholes used for drinking and irrigational purposes.Random sampling was the preferred method of sampling due to the low number of boreholes and the presence of disjointed stream channel patterns found within the study area.Hand-held global positioning system (GPS) device was used to identify locations of sampling site.Samples from the stream were collected from the down-stream to upstream channel to prevent changes in water composition, hence ensuring quality control and assurance in the field.Sterilized 1-litre polythene bottles were used for sample collection, and as a standard procedure, the containers were well rinsed with the source water before sample collection, to avoid any variations in the composition of the water samples.Before sample collection, 5-10% of nitric acid (HNO3) was used to wash the polythene bottles; which were thereafter carefully rinsed with de-ionized water.
Fast-changing parameters (electrical conductivity, total dissolved solids, pH, temperature and total dissolved solids) were measured in-situ using a hand-held multi-parameter meter (HANNA model HI 8314).
In the laboratory, all analytical methods were performed following standard procedures as per American Public Health Association (APHA 2012).Summary of all analytical procedures are presented in Table 1.To ensure accuracy, the triplicate of each sample was used to compute the mean, with the mean taken as the value of the given sample.
Additionally, the ionic balance error (IBE) was used to check the accuracy of the hydrochemical data, following the electro-neutrality principle as per Freeze and Cherry (1979).This principle states that the sum of all cations must equal those of anions in a particular sample expressed in meq/L (Eq.1).The IBE results from this study were within the acceptable standard value of ≥5%.

GIS-based irrigational water quality zonation
The aim of the geospatial modeling was to get a visual representation of areas of poor and suitable irrigation water quality.This was achieved by using the ArcGIS 10.4.1 software (ESRI, Redlands, CA) while Mocrosoft Excel software (ver.2016) was used for computation, and analysis of the irrigational water quality parameters.The calculated irrigational quality parameters were imported to the GIS environment (ArcMap) using the Arch raster calculator.The obtained data were first normalized using the Logarithmic method; to check for accuracy of the of the data distribution and modeled output, the root mean square error (RMSE) was used.The CARTO-digital elevation model (CARTO-DEM) data of the area were obtained from National Aeronautics and space administration (NASA) (https://search.earthdata.nasa.gov/search) to demarcate the boundary of the study area.
Some selected water quality parameters such as Magnesium adsorption ratio (MAR), electrical conductivity (EC), Potential salinity (PS), total hardness (TH) and total dissolved solids (TDS), were considered for the irrigation quality zonation map.These parameters were selected following their significance on irrigation water quality.Accordingly, different weight rankings (e.g.suitable; moderately suitable; and unsuitable) (Doneen 1964;Todd 1980;Wilcox 1955;Singh et al. 2019).Spatial distribution maps were then generated for each parameter by integrating the inverse distance Weighting (IDW) technique and simple kriging (SK).The data was statistically transformed and interpolated using the normal score transformation.The irrigation water quality zonation map was then generated using the weighted overlay analysis techniques (Fig. 2).To achieve this, the following steps were followed: (1) A reclassification of all the input parameters was done in the arch GIS environment in rasterized form into three distinct classes-suitable, moderately suitable and unsuitable classes-according to their suitability for irrigation purposes (2) Distinct ranking was assigned to each class; the unsuitable class was assigned to rank 1, the moderately suitable class was assigned to rank 2, and the suitable class was assigned to rank 3.
All thee input parameters were assigned an equal weightage of 1 (Table 2).As proposed by Singh et al. (2019), there is an equal importance for all the irrigation water quality parameters in irrigation water quality assessment.However, an observation of the results of the irrigation water quality (Table 2) revealed that some variables (e.g.MAR, SSP, PI, SAR, and KR) seemed to occur under the suitable class, hence, were not used in the weighted overlay analysis (Singh et al. 2019).Therefore, to produce the irrigation zonation map, the ranks and weights of each parameter was multiplied.
Thereafter, the product of all the input rasters was summed up following the relation expressed in Eq. ( 2) Where w represents the weight of the factor class (parameter), r refers to the rank of the factor class

Evaluation of irrigational water quality parameters
Plant yield for agricultural production, soil quality, and human health have been adversely affected by poor irrigational water quality.Thus, this necessitates the need for the appraisal of our water quality of irrigational purposes.According to Subba Rao (2017), the determination of the usefulness of water for irrigation purposes is dependent on the evaluation of the alkali and salinity conditions of the water.Considering the facts established above, the irrigation suitability parameters such as Sodium absorption ratio (SAR), Permeability Index (PI), sodium percentage (PS); Kelly's ratio (KR), and total hardness (TH) were calculated using equations in Table 3

Prediction of irrigation water quality using machine learning models
In the present study, two supervised machine learning models (multiple regression and artificial neural networks) was used to predict the irrigational water quality of the area.For both models, irrigational water quality indices (SAR, PI, KI, MAR, SSP, PS) were used as dependent or outcome variables while Cl -, HCO3 -, SO4 2-, NO3 -, Ca 2+ , Mg 2+ , Na + , K + , pH, EC, TDS and TH were considered independent or input variables.The IBM SPSS (v.22) was used for both analyses

Multiple linear regression modeling
Multiple linear regression (MLR) is a data-driven approach for predicting linear correlations between input (independent or predictor) variables and output (target or dependent) variables.It is more complex compared to the simple linear regression.The MLR is a decision-making algorithm that uses the least-squares rule.Predicting water quality indicators and indices in various locations of the world has been done using MLR (Chen and Liu 2015;Gaya et al. 2020).MLR can be stated mathematically as shown in Eq. 3 where y is the predicted target, b0 is the regression constant, xi is the value of the ith predictor, b1 represents the regression coefficient of the ith predictor, and ԑ represents the residual or error of individual i.
For this study, the MLR was used to predict SAR, PI, KR, MAR, SSP, PS, with all measured physicochemical parameters (Cl -, HCO3 -, SO4 2-, NO3 -, Ca 2+ , Mg 2+ , Na + , K + , pH, EC, TDS and TH) considered as input variables.The effectiveness of the MLR models was evaluated using statistical metrics including standard error of estimate (SEES), multiple correlation coefficient (R), coefficient of determination (R 2 ), and adjusted R 2 .Because a single measure of validity could lead to bias, incorporating various statistical metrics for verifying model accuracy appears to ensure model reliability.

Artificial neural network modeling
Compared to MLR model, an ANN is a more advanced machine learning model for forecasting linear and nonlinear associations between given parameters.The architectural framework of an ANN consists of interconnected networks that mimics the neural system of the human brain.This unique attributes gives the ANN the ability to learn, train, understand/ process and present meaningful results from a given dataset.Three major layers-the input, the hidden, and the output layer, makes up the architectural framework of an ANN model.Through complex mathematical functions and understanding of the different hidden patterns connected to a dataset, the various layers can combine to process and give meaningful results as a function of the input data (Wagh et al. 2018;Kouadri et al. 2021;Egbueri et al 2022).Errors could arise during a routine data input procedure, hence, due to its high precision in quantitative analysis, the ANNs can make up for this limitation.Given the complexity in the water system and intrinsic aquifer vulnerability to pollution, using the ANNs will provide more unique and precise results in water quality assessment, monitoring, and management (Kouadri et al. 2021).
In the present study, three ANN models were developed for predicting seven irrigation water quality parameters (SAR, PI, KR, MAR, SSP, PS).Similar to MLR, twelve analyzed physicochemical parameters (Cl -, HCO3 -, SO4 2-, NO3 -, Ca 2+ , Mg 2+ , Na + , K + , pH, EC, TDS and TH) were considered as input variables (Table 4).The multilayer perceptron artificial neural network (MLP-ANN), using the hyperbolic tangent as the input layer activation function was chosen for the present study.The normalized technique was used to rescale the covariates; the number of hidden layers was set to one (1), and the activation function of the hidden layer was calculated using the hyperbolic tangent with the number of units was calculated automatically.The dependent variables were rescaled using the adjusted normalized technique at a correction of 0.02.Batch training was used to train the ANNs while the scaled conjugate gradient (SCG) algorithm was used for ANN optimization.The training dataset partition was set to be ≥ 70%, while the testing dataset partition is ≤ 30% to produce an optimal model (Kadam et al. 2019).The best models were then chosen for each prediction scenario by considering those with high R 2 ratings and very low modeling errors.

Validation of the ANN and MLR models using statistical metrics
Due to the discrepancy that usually occurs between the predicted variable and the original dataset (raw scores), testing the performance and reliability of the ANN and MLR models becomes critical (Kalantar et al. 2018;Ray et al. 2020).
Testing the performance and reliability of the two models will be integral in validating and selecting the most accurate and suitable testing often aids in the validation and selection of the most efficient activation function and optimization algorithm can be identified (Ray et al. 2020).To this end, statistical metrics including adjusted R 2 , relative error (RE), multiple correlation coefficient (R), coefficient of determination (R 2 ), sum of square error (SOSE), and residual error where, n is the number of observations.
For the MLR validation, multiple statistical metrices such as multiple coefficient of determination or R-squared (R 2 ), correlation coefficient (R), standard error of estimate or root mean square error (SEE) and adjusted R-square (R 2 adj), were used.Given the variables x, y, z, R can be defined mathematically as per Eq. 8.
R 2 adj = 1- where, n = number of items; k = number of input variables R, R 2 , and/or R 2 adj statistically represents the correlation and variance between the original variables (scores) and the predicted variables.It is generally agreed that the closer to a unity value the R 2 , R, and R 2 adj values are, the more efficient or reliable is the model (Menard 2000).For the ANN model, the differences between original or observed variables and the predicted variables is usually represented by the residual errors (presented graphically as residual plots).Accordingly, the model prediction and accuracy is represented by the x and y axes of the plot respectively; with values nearer to zero indicative of higher model accuracy.In contrast, the larger the positive or negative values on the y-axis, the poorer the performance or accuracy of the model.
The sum of squared errors (SOSE) represents the association of the extent of errors occurring with the model prediction, and usually represents the sum of the squared difference between the original scores and predicted variables (usually represented as a positive integer).As a rule, lower SOSE values represents higher model performance/accuracy (Kalantar et al. 2018;Egbueri et al. 2022).The RE on the other hand is used to show the nearness of the predicted scores to the original scores or variables.The RE is similar to the SOSE in that lower values represents higher model precision and reliability.Therefore, the SEES can efficiently represent ideally how well any MLR model fits into a particular dataset.Generally, higher SEES is representative of a poor model output and vice versa (Egbueri et al. 2022).

Irrigational water quality evaluation
The combined impact of irrigation water quality, soil types and land use plays a significant role in agricultural productivity and sustainability.It is therefore critical for a detailed quality assessment of the irrigation water quality in an agrarian area due to the ubiquitous dissolved pollutants effluents from the intensive use of inorganic fertilizers and agrochemicals.To this end various irrigation water quality indices (SAR, TH, PI, KI, MR, SSP, PS) have been computed, and their summarized results are presented in Table 5.Additionally, the results of the various water quality indices were integrated to generate graphical models such as Doneen's chart, Wilcox diagram, and USSL diagrams based on the combined influence of various chemical parameters on crop production (Fig 3a-c.).Information from these graphical models would aid decision makers and farmers for adequate decision making towards improved crop yield and agricultural sustainability (Subba Rao 2017).

General description of the irrigation water quality parameters
Summary statistics of the water quality parameters for both surface and groundwater is presented in Table 5.The surface water pH ranged from 4.9 to 6.9 with a mean value of 6.12, while the groundwater recorded lower pH for individual water samples; with pH values ranging from 4.8 to 6.51 with a mean of 5.80.Water pH greater than 8.5 is known to increase the concentration of carbonate in soil resulting in soil sodicity (Bouaroudj et al. 2019).The implication of this is that the use of the surface water samples for irrigation will expose the soil to increased sodicity.
High variations of total dissolved solids (TDS) were with values ranging from 27mg/L to 1257mg/L and an average value of 452.55mg/L for surface water samples.Lower TDS values were recorded for the groundwater samples compared to the surface water samples; with values ranging from 25.3mg/L to 41.2mg/L with a mean of 30.86mg/L.
The low pH of the groundwater samples appears to be in commensuration with the TDS.Low water pH tends to increase the dissolution of minerals during the rock-water interaction, hence increasing the TDS (Rose and Cravotta 1998).Conversely, the high surface water pH can be attributed to high buffering inputs from rainfall and vegetative cover within the study area.
The water hardness (expressed as TH) varied between 70.34mg/L and 198.4mg/L (mean = 149.37mg/L)for the surface water samples, while that of the groundwater varied between 103.36mg/L and 156.85mg/L (mean = 131.67mg/L).
However, all the groundwater samples tend to fall under the suitable category for irrigation use.Sawyer and McCarthy (1967) classified water hardness based on total hardness (TH) represented as TH < 75 (soft); TH 75-150 (moderately hard); TH 150-300 (hard); TH 300 (very hard).All the surface water samples, except for PND3, falls within the moderate to hard category and are therefore unsuitable for irrigation purpose.On the flipside, all the groundwater samples are moderately hard (Table 5).
Electrical conductivity (EC) for surface water ranged from 55.9 µS/cm to 1448 µS/cm (mean = 483.93µS/cm).The EC for groundwater varied between 78.9µS/cm and 674.1 µS/cm (mean 424.58 µS/cm).High electrical conductivity can affect the root of plants and affect soil fertility (Singh et al. 2019).EC value greater than 700μS/cm of irrigational water can impede the movement of water into the root zone (Bouaroudj et al. 2019).As per the Richards classification criterion, water EC < 250 depicts excellent water quality for irrigation use; EC 250-750 depicts good water (medium salinity); EC 750-2250 depicts doubtful water for irrigation; and EC > 2250 depicts unsuitable water for irrigation.
The major cations and anions for surface water occurred in their decreasing order as Ca 2+ > Mg 2+ > K + > Na + ; HCO3 - > Cl -> NO3 -> SO4 2-> NO2 -.While for the groundwater, the major ion chemistry occurred in the order of Ca 2+ > Mg 2+ > K + > Na + ; HCO3 -> Cl -> SO4 2-> NO2 -> NO3 -The relatively higher concentration of nitrates (NO3 -) observed in the surface water compared to the groundwater can be as results of high direct run-off of agrochemicals into streams and ponds within the area.The presence of Calcium and Magnesium in water is essential for plant growth and development.However, they become toxic to plants when they occur in irrigation water at concentration of 40-100 mg/L and 30-50 mg/L, respectively (Adimala et al. 2018).All the analyzed water samples appear to be within required limits for Ca and Mg.
The value of HCO3 -ranged between 89.65mg/L to 3333mg/L with a mean value of 587.08mg/L for surface water.
SO4 2− concentration ranged between 1.08mg/L and 12.24mg/L (mean = 3.31mg/L) for surface water.Lower mean values were recorded for the groundwater with values ranging between 1.08mg/L and 12.24mg/L (mean = 3.31mg/L).
The required value of SO4 2− in the irrigation water has been set to be < 400mg/L (Vasanthavigar et al. 2010).At concentrations above 400 mg/L, it could result in soil acidity (Vasanthavigar et al. 2010).High SO4 2− content in irrigation water can reduce phosphorus supply to plants (Zouahri et al. 2014).Both groundwater and surface water samples recorded SO4 2− values within required limits

Combined effect of chemical parameters on crop production
Different irrigation parameters (SSP, SAR, PI, KI, EC, MAR) were combined for a comprehensive assessment of the suitability of the water for irrigation.Integrating the various parameters was to aid in reveal how the combined influence of various chemical parameters can affect crop yield.It is expected that from this assessment, proper management measures that could aid in crop yield and food production can be derived (Subba Rao 2018).

Sodium adsorption ratio (SAR) and salinity
The SAR is determined as a combination of Mg 2+ , Ca 2+ , and Na + .Following the Richards (1954) classification scheme, both the surface water and groundwater have SAR values less 10, hence are of excellent water quality for irrigation (Table 6).Soil structure is known to be affected by increased ionic strength (electrical conductivity) (Aravinthasamy 2020).The combined effect of SAR (S) and EC (C) on crop yield is represented in Fig. 3a poor moderate, and very poor water quality types respectively.In the same vein, sodium hazard (S) has been classified into four sub-classes (zones): low sodium hazard (S1: < 10), medium sodium hazard (S2: 10 -18), high sodium hazard (S3: 18 -26), and very high sodium hazard (S4: > 26), which are also depicted as good, poor moderate, and very poor water quality types, respectively.From Fig. 3a, 64.28% of the overall water samples (BH1, BH3, BH4, STRM 4, STRM5, PND1, PND2, PND3, PND4) falls within the C1-S1 zone (low salinity and low sodium hazard).This water may be suitable for irrigation for all soil types, although, a moderate amount of leaching may be needed required.
Water samples within this class will be unsuitable for use in soils with limited drainage (Subba Rao 2017, 2018).Even with soils with good drainage capacity, special management measures will still be required to control the salinity (Aravinthasamy 2020).

Percent sodium and salinity
The exchange of Mg 2+ and Ca 2+ ions in soil is said to be induced by the elevated concentration of sodium ion in irrigation water (Mohammed et al. 2017).The relative increase in Na + in irrigation water can to increased levels of HCO3 -resulting in the in the formation of precipitates (e.g.CaCO3 -and MgCO3).Sodium may combine with the precipitates and occur as NaCO3 in soil (Kumar et al. 2009).This can result in the reduction in soil permeability, resulting in stunted plant growth (Ayers and Westcot 1985;Todd and Mays 2005).In the present study, 71.5% (BH1, BH3, BH4, BH6, BH7, STRM1, STRM2, STRM3, STRM4, STRM5, STRM7, PND1, PND2, PND4, PND5) and 28.5% (BH2, BH5, STRM6, PND3, PND6, PND7) of the overall water samples were classified as excellent and good water quality for irrigation respectively (Table 6; Fig. 3b).The implication of Wilcox plot (Fig. 3b) is that plant yield may be impeded by increased sodium the hazards with the continual exposure of the water resources to contamination from anthropogenic influxes (Subba Rao et al. 2018).

Kelly's ratio
The KR is computed based on the combination of Na + versus Ca 2+ and Mg 2+ cations.Based on the Kelly (1940) classification criteria (Table 6) both surface and groundwater samples in the area showed KR values greater than 1 thus considered suitable for irrigation use.

Irrigation water quality zonation (IWQZ)
To examine the combined effect of all irrigation water quality factors on water and agricultural output, an irrigation water quality zonation map of the area was created.The weighted overlay analysis was used to create an IWQZ map of the research area, which was then incorporated into the ArcGIS 10.1.2environment with the help of the raster calculator.Spatial distribution maps were generated for six selected irrigation water quality indices (EC, TDS, TH, PS, MAR and PI) parameter (Fig. S1) by integrating the inverse distance Weighting (IDW) technique and simple kriging (SK).The data was statistically transformed and interpolated using the normal score transformation.The irrigation water quality zonation map was then generated using the weighted overlay analysis techniques.
The irrigation map was separated into zones of irrigation water quality that was suitable (low restriction), moderately suitable (moderate restriction), and unsuitable (severe restriction).This map depicts low water quality zones and where they may occur in order to conduct treatment procedures to improve water quality and, as a result, crop output.
According to Fig. 4, the majority of the studied area (72.5%) had suitable irrigation water quality (low restriction), while 28.4% had unsuitable irrigation water quality (severe restriction).The waters deemed inappropriate for irrigation are found in scattered areas throughout the south-central region, but are largely confined to the southeastern sector of the research area.However, appropriate irrigation water quality zones can be discovered in isolated pockets across the research region (Fig. 4).

Artificial neural network modeling
The multilayer perceptron artificial neural network model (MLP-ANN) was used in this study to predict the most influential irrigation water quality index for irrigation water quality analysis.The most influential water quality parameter influencing the irrigation water quality was also determined based on their sensitivity analysis results.
Furthermore, four statistical metrics including adjusted R 2 , relative error (RE), multiple correlation coefficient (R), coefficient of determination (R 2 ), sum of square error (SOSE), and residual error (RE) plots were used to test and compare the performance of the ANN model from combination results of the models in both the training and testing stages (Table 7).The parity plots as well as regression models of the MLP-ANN model is represented in Fig 5 and 6.
The modeling error of the ANN model was determined by SOSE, R 2 , and residual error plots (Table 7, Fig  The sensitivity of all the input (independent or predictor variables) parameters on the architectural structure and performance of the model was also carried out.This was performed to estimate the percentage of contribution of all the predictor variables in the prediction of the irrigation quality output variables (PS, PI, KR, SSP, SAR, and MAR).
The sensitivity of the input variables is represented in the bar charts in

Multiple linear regression modelling
Summary of the performance and statistical metrics used for the validation of the MLR model is presented in Table 6a, while the parity plots for the model is shown in Fig. 8. Results from the MLR model showed that model was efficient in the prediction of all irrigation water quality variables, although some variations in performance were noticed based on the results from some of the metrics.High values of Multiple correlation coefficient (R), Coefficient of determination (R 2 ) and adjusted R 2 (R 2 Adj) were observed for all the predicted parameters.A high variation was however observed for Standard error of estimate (SEES), with very low SEES observed for KR, PS and SAR.Based on the R 2 , the performance of the model occurred in the order of KR > SAR > PS > PI > MAR > SSP (KR = 0.858, SAR = 0.854, PS = 0.847, PI = 0.740, MAR = 0.636, SSP = 0.513).Overall, results from the MLR showed that the model is suitable for prediction.

Comparing the performance of MLP-ANN and MLR in irrigation water quality prediction
On the bright side, the present study has successfully demonstrated that both MLP-ANN and MLR are efficient and reliable tools in irrigation water quality prediction; this has been demonstrated in the low R 2 values observed for both models.R 2 for both models ranged between 0.513 and 0.983.However, the MLP-ANN outperformed the MLR in the prediction of all irrigational water quality parameters (Table 6).Results of the present study seem to agree with those of previous studies by Juahir et al. (2004), in Malasia where ANN outperformed MLR in water quality index (WQI) prediction.These results also agree with that of Kadam et al. (2019) in the water quality prediction in India.

Conclusion
In the present study, multilayer perceptron artificial neural networks (MLP-ANNs) and Multiple linear regression (MLR) machine learning models were developed and validated to predict the irrigation water quality of Okurumutet-Iyamitet agricultural-mine province in southeastern Nigeria.Using the GIS-based spatiotemporal modeling approach, a composite irrigation water quality (IWQ) zonation map of the area has also been developed.The following conclusions can be drawn based on the findings from this research:  The surface water samples recorded higher pH compared to the groundwater samples; implying that that the use of the surface water for irrigation will expose the soil to increased sodicity compared groundwater.The high surface water pH is attributed to high buffering inputs from rainfall and vegetative cover within the study area.
 Based on findings from the combined effect of chemical parameters on crop production, both water sources may be suitable for irrigation for all soil types, a moderate amount of leaching may however be required; as such, only moderate salt tolerance crops will need to be irrigated using this water.Additionally, both water sources will be unsuitable for use in soils with limited drainage.
 The IWQ map showed that the majority of the studied area (72.5%) had suitable irrigation water quality (low restriction), while 28.4% had unsuitable irrigation water quality (severe restriction).The waters deemed inappropriate for irrigation are found in scattered areas throughout the south-central region, but are largely confined to the southeastern sector of the research area.However, appropriate irrigation water quality zones can be discovered in isolated pockets across the research region  The MLP-ANN and MLR showed high performance accuracy based on the results of the coefficient of determination (R 2 = 0.513-0.983).Low modeling errors were observed from results of the sum of square errors (SOSE), relative errors (RE), adjusted R-square (R 2 adj), and residual plots; further confirming the efficacy of the two models.


Comparatively, the MLP-ANNs showed higher prediction accuracy with respect to R 2 compared to MLR.
Based on the sensitivity of the MLP-ANN model, HCO3, pH, SO4, EC, and Cl were identified to have the greatest influence on the irrigation water quality of the area.
 The integrated GIS-based and machine learning approach in this study has demonstrated that a more robust assessment of the irrigation water quality can be conducted especially in areas with very limited water samples.The use of these models can be very efficient and cost-effective especially in developing countries like Nigeria where the where the costs of lab-based water quality analysis is usually high and restrictive.
 Findings from this study will be integral in rapid decision making and for proper planning and enhanced agricultural productivity in the area.Moreover, future research on water quality prediction could benefit immensely from findings of the present study.

Limitation and recommendations
Although the present study has successfully provided a composite assessment of the irrigation water quality using high precision models, there are still some limitations that could be addressed in future research  The present study was unable to carry out extensive data collection due to absence of funding

(
RE) have been applied for the ANN model as presented in Eq. 4-7.R-Squared value (R 2 ) = 1 -∑ (   −    )2  =1 ∑ (   −   ) 2  =1 (4) Sum of square errors (SOSE) = ∑ (   −    ) ) moderate (100 -180), and (3) severe (180 -600 mg/L)(Adimala et al. 2018).Following this classification criterion, 100% of the pond water samples fall under the severe category.However, a high variation was observed among the surface water samples compared to the groundwater.Three samples (STRM1, STRM2, PND7) occurred within the low class while four samples (STRM4, STRM5, STRM6, STRM7) occurred within the severe category.Only one sample (STRM3) was within the moderate category.Increased HCO3 − concentration is known to decrease the concentration of Ca 2+ on soil exchange sites(Adimala et al. 2018; Yıldız and Karakas 2019).If the concentration of chloride ion in irrigation water exceeds 100mg/L, it may result in the reduction in soil permeability and increase toxicity in plants thereby reducing plants function(Yıldız and Karakas 2019).Based on this classification 71.42% of the surface water samples (PND2, PND3, PND4, PND6, STRM1, STRM3, STRM4, STRM5, STRM6, STRM) recorded Cl − concentration above the required values.The groundwater samples also showed a similar variation in Cl -concentration; with all the borehole samples (except for BH1 and BH5) recording values (Table , using the United State Soil Laboratory diagram (USSL 1954).According to the USSL diagram the salinity hazard (C) has been classified into four sub-classes (zones): low salinity hazard (C1: < 250 μS/cm), medium salinity hazard (C2: 250 -750 μS/cm), high salinity hazard (C3: 750 -2250 μS/cm), and very high-salinity hazard (C4: > 2250μS/cm) representing good, 5).Based on R 2 , the performance of the ANN model in irrigation water prediction occurred in the order of PS > PI > KR > SSP > SAR > MAR.On a general note, low relative errors were observed for the model, indicating the high precision of the model in irrigation water quality output parameters prediction.

Fig 7 .
Only input variables with a sensitivity score (normalized importance) above 50% were considered very important in the MLP-ANN modeling of the various irrigation water quality parameters.Based on the sensitivity model generated for PI, seven parameters (EC, Cl, HCO3, Mg, Ca, SO4, and pH) contributed more to the MLP-ANN model performance.For PS, four parameters (TDS, SO4, EC, and HCO3) were the major contributors to the model performance.For KR, the major contributors to the model performance include NO3, HCO3, Na, and Cl.Like PI, seven major parameters (HCO3, pH, Cl, SO4, Salinity, and TDS) were the major contributors to the model sensitivity of SSP.Salinity, SO4, pH and Ca contributed more to the model performance for SAR.Only three parameters (NO3, Mg, EC) showed to be the major contributors to the model performance for MAR.

Figures Figure 1
Figures


Other hybrid optimization algorithms such as Deep Learning is recommended for future studies in the area to validate findings of the present study  Regular water quality monitoring and proper regulation of agricultural practices (through the use of inorganic fertilizers) is highly recommended. In the present study, sampling was conducted in the rainy season.Perhaps, a seasonal assessment of the water quality in the area will be necessary for proper comparative quality assessment.Michael E. Omeka contributed to Manuscript design, conceptualization, Manuscript writing, map digitization, data analysis, computation of numerical indices and Machine learning modelling.Manuscript review and editing was also carried out by the author.The author declare that there are no competing interests regarding this research References Adamu, C. I., Nganje, T. N., & Edet, A. (2015).Heavy metal contamination and health risk assessment associated with abandoned barite mines in Cross River State, southeastern Nigeria.Environmental Nanotechnology, Monitoring and Management, 3, 10-21 Adimalla, N., Li, P., & Venkatayogi, S. (2018).Hydrogeochemical evaluation of groundwater quality for drinking and irrigation purposes and integrated interpretation with water quality index studies.Environmental Processes, 5, Declarations Funding: No funding was received for this research Data availability: Not applicable Animal research: Not applicable Consent to publish: Not applicable Consent to participate: Not applicable Competing interests: