An artificial neural network (ANN) model is created for recognizing sources of pollution in a challenging situation where the aquifer traits are unknown. Figure 2 depicts the notional investigation region that includes a section of an aquifer. Groundwater parameter values and percentage measurement data for a large number of specified pollution sources that were selected at random are generated as training data utilizing a groundwater flow and transport simulation model (Konikow and Bredehoeft, 1978). In this case, time-varying pollution sources are taken into account. O1, O2, O3, and O4 are the concentration in order observation locations, while S1 and S2 are additionally thought to be possible source locations (Figure 2). It is believed that there was no pollution in the aquifer at first. Tables 1 and 2 list the true values of source fluxes, the aquifer parameter values, and the sizes of the finite difference grid. Aquifer (b) thickness is estimated to be 30.5 m (Table 1). The source flow is measured in grams per second (g/s).
Table 1. The values of the flow and transport parameters used to simulate the observed data
Parameter
|
Value
|
Kxx (m/s)
|
0.0001
|
Kyy (m/s)
|
0.0001
|
|
0.20
|
aL(m)
|
30.5
|
aT(m)
|
12.2
|
b (m)
|
30.5
|
Δx(m)
|
91.5
|
Δy(m)
|
91.5
|
Δt(month)
|
3
|
Table 2. Values of the source fluxes used for simulating data that was observed
Year
|
Source fluxes at S1
|
Source fluxes at S2
|
Year 1
Year 2
Year 3
Year 4
Year 5
|
48.8
00.0
10.0
42.0
36.0
|
00.0
00.0
00.0
00.0
00.0
|
To simulate the concentration measurements, a 10-year time domain dissect into 40 equal time steps of three months each is taken into consideration. For the first five years of the ten-year time span, it is assumed that the pollutant source is operational. The study's unit of activity, which is one year, is considered to have constant sources. These models are developed with randomly generated source fluxes for each potential source. The values of the parameters that need to be approximated are likewise produced at random within predetermined limits. In accordance with our previous research, Srivastava and Singh, 2013, 2014 and 2015, 260 patterns were developed in this work. More or less than 260 patterns could exist. It was discovered that the breakthrough characterisation with statistical approach performed better. the whole breakthrough curve was characterized at the observation site in terms of statistical metrics. The inputs for the ANN model are considered as these statistical parameters at each observation site. The outputs used to train the ANN are the source fluxes causing these concentration levels combined with unknown parameters to be predicted. An Artificial Neural Network (ANN) model has been created to simultaneously identify the source of pollution and estimate the flow parameter, or hydraulic conductivity. For performance assessments, previous data in the form of source flux fluctuation within a certain range is integrated. By factoring in 50% uncertainty in the parameter estimating process, the upper and lower limits of the source flux values are determined. For example, after a year of solute transport, the source flux values are given as 24.4 and 73.2, respectively. It is believed that these upper and lower boundaries may be established based on some past data from nearby sites as well as some information on the qualities of the soil, etc. (Mahar, 1996).
These higher and lower boundaries are used to generate randomly dispersed values for these parameters that are evenly distributed. The computerized simulation model then uses these values to simulate the concentration measurement that finally results.
5.1 Performance Evaluation Criteria
The normalized Error (NE) value (Mahar 1996; Singh and Srivastava 2014) is used to quantify the mistakes to assess the prediction accuracy. As a gauge of the methodology's effectiveness, the NE is described as follows:
5.2 Training and Testing Results
By applying the suggested approach, observation from several observation wells is used to determine the possible source site. 30% of the total data sets are utilized for testing, while 70% are used for training. It should be noted that entire data was standardized into the range of 0:1 before the model was trained, and these normalized data were used in the computations.
Four statistical characteristics are used to characterize each breakthrough curve that was collected at four observation wells, for a total of 16 inputs. The outcomes of the ANN Model include source fluxes at two possible sites, which total have 10 source fluxes, and one flow parameter i.e. hydraulic conductivity, in total 11 source fluxes.
The network that included four neurons in each of the first and second hidden layers, sixteen input values, eleven target values per pattern, and 16-4-4-11 represented as 16-4-4-11 performed better for the ANN Model (table 3). This decision is supported by the assessment outcomes for various ANN model designs. Table 3 displays a selection of the 1000 iterations' outcomes. This design performs better in both the training and testing modes. Figure 3 displays the error vs number of iterations plot for the ANN design 16-4-4-11 that was produced using MATLAB (version 7.0). The efficiency of the ANN architecture or the Mean Square Error (MSE) is used to represent the error.
Table 3. Evaluations of performance for the generated ANN Model during training and testing
ANN Model
Architecture
|
Normalized Error (%)
|
Training
|
Testing
|
16-9-11
|
28
|
33
|
16-3-3-11
|
35
|
40
|
16-4-4-11
|
28
|
30
|
16-7-7-11
|
28
|
34
|
5.3 Identification Results
The trained ANN models are then assessed further for the example issue of observed concentration, where the given parameter and source flux magnitudes are taken to be unknown. The normalized error numbers in Table 4 demonstrate that the model's performance is satisfactory (NE=18.57%). As a result, complexity rises with an increase in unknown parameters, which has an immediate impact on model performance. Figure 4 illustrates how the anticipated values of parameters and source fluxes closely match the actual values.
Table 4. Comparison of actual and predicted source fluxes for developed ANN Model
Parameter or source duration and location
|
Actual value
|
Estimated value
|
NE(%)
|
Year 1, S1
Year 1, S2
Year 2, S1
Year 2, S2
Year 3, S1
Year 3, S2
Year 4, S1
Year 4, S2
Year 5, S1
Year 5, S2
Kxx(=Kyy)(m/sec)
|
48.8
0.0
0.0
0.0
10.0
0.0
42.0
0.0
36.0
0.0
0.00010
|
49.05
4.22
1.5
2.12
8.3
0.09
34.71
3.96
34.13
2.41
0.0000841
|
18.57
|
Compared to training and testing outcomes, the NE values discovered for prediction results are superior. It may be because there are fewer patterns in the identifying process. The comparison of the outcomes indicates that the quality of the results decreases with increasing problem complexity. When just the source identity problem was resolved, the NE value was reported to be 10.35 (Srivastava and Singh, 2014). Thus, both causes and parameters are discovered at the same time, the results get worse.