Simultaneous Identication of Number, Location, and Release History of Groundwater Contamination Sources

: In previous studies, a 0-1 mixed integer nonlinear programming optimization model 17 (0-1MINLPOM) could only identify the location and release intensity for groundwater 18 contamination sources (GCSs), and the location of each GCS was regarded as a 0-1 integer 19 variable, selected from several locations determined in advance. However, in actual situations, 20 the locations usually cannot be accurately isolated to a few GCSs and the number of GCSs is 21 often unknown, so 0-1MINLPOM was improved in this study. Based on the estimation that 22 there is a maximum of three GCSs in the study area , an improved 0-1 MINLPOM was 23 established to simultaneously identify the number of GCSs (treated as 0-1 integer variable), the 24 location (treated as integer variable) and release history of GCS (treated as continuous variables). The simulation model was constructed as an equality constraint embedded improved 0-1 MINLPOM. In the improved 0-1 MINLPOM solution process, repeatedly calling the 27 simulation model would have incurred a massive computational load and taken a long time. 28 Thus, a surrogate model based on kriging and extreme learning machine (ELM) was 29 established respectively for the simulation model to avoid this shortcoming. The results show 30 that the accuracy of the kriging surrogate model (Krig-SM) was higher compared with the 31 ELM surrogate model (ELM-SM). The improved 0-1 MINLPOM could identify the number, 32 location, and release history of GCSs simultaneously. The accuracy of identifying the number 33 of GCSs was 100%, and the accuracies of identifying the locations and release history were 34 above 91.67% and 90.14%, respectively. 35

The simulation model was constructed as an equality constraint embedded improved 0-1 26 MINLPOM. In the improved 0-1 MINLPOM solution process, repeatedly calling the 27 simulation model would have incurred a massive computational load and taken a long time. contamination remediation schemes is difficult, and the responsibility for contamination liabilities 44 and the assessment of contamination risks are also problematic (Lapworth et al., 2012; Om and problems, and have important significance (Bagtzoglou and Atmadja, 2005). 47

Methodology 91
Kriging 92 The kriging method is also known as the spatial local interpolation method and it is based on 93 variation function theory and structural analysis for obtaining unbiased optimal estimates of 94 regionalized variables in a finite region (Bargaoui and Chebbi., 2009;Kleijnen., 2009). In recent 95 years, the kriging method has been extended as a surrogate modeling method with applications in 96 many fields of engineering (Ryu et where: x is the input value of the training samples, (2) 105 where R(xi,xj) is the spatial correlation function between any two sampling points xi and xj, 2 σ 106 is variance of () Zx: where k  is the undetermined coefficient and i k x is the k-dimensional coordinate of the ith 109 sample. 110 Using the input and output data for n known sample points, the output value corresponding to any point * x in the predicted feasible domain is: where () rx is the correlation vector of point * x and n sampling points  12 ,, n x x x L , y is the 114 matrix nm  , n is the number of sampling points, m is the dimension of the output value, and *  is 115 the undetermined coefficient of the regression part, which can be obtained by the optimal linear 116 unbiased estimation: 117 where R is the correlation matrix comprising the correlation coefficients of n sampling points: 120 (7) 121 The variance estimate value 2  is determined by: 122 The surrogate model can be established by solving the nonlinear unconstrained optimization 125 problem defined above. 126 Extreme learning machine 127 Huang et al. (2004) proposed the ELM method to improve backward propagation neural 128 networks in order to improve the low efficiency of learning, while also simplifying the settings for 129 the learning parameters. Compared with the backward propagation neural network algorithm, the 130 ELM method has advantages in terms of its rapid learning speed and good generalization feedforward neural network (Huang et al., 2015), as illustrated in Fig. 1. where ij  is the connection weight between the i -th neuron in the input layer and the j -th 149 neuron in the hidden layer, and jl  is the connection weight between the j -th neuron in the hidden layer and the l -th neuron in the input layer. 151 The input matrix X and output matrix Y comprise training data sets containing M 152 samples, as follows.
The objective when training a single hidden layer neural network is to minimize the error 160 between the output data and training data, which can be expressed as follows.
Thus, we calculate  ,  , and b while ensuring that Eq. (19) holds: The input weight and the hidden layer threshold are determined randomly before training and 166 they remain unchanged during the training process, so the hidden layer's output matrix H is 167 uniquely determined. Training a single hidden layer neural network can be transformed into connection weight matrix can be obtained by solving the least squares solution of Eq. (20): 170 where H + is the Moore-Penrose generalized inverse matrix of the hidden layer output matrix H. 175 The output corresponding to any input can be predicted after calculating the output layer 176 connection weight matrix. 177

Numerical applications
178

Site overview 179
A hypothetical contaminated site was considered as a case study in order to analyze the 180 application of the research methods for the identification of groundwater contamination sources. 181 The hypothetical contaminated site was designed to represent the conditions that may occur in 182 most actual problems, including the aquifer geometry, boundary conditions, initial conditions, In order to test the effectiveness of the identification method developed in this study, two 196 hypothetical cases are designed in the paper. The aquifer parameters of hypothetical contaminated 197 site were the same for the two cases, as shown in Table 1 The study area was discretely divided into 5262 grids. Since the divided grid is treated as an 200 integer variable, the location information of the potential contamination source area in the study 201 area is represented by grid numbers. The horizontal and longitudinal locations of the potential 202 contamination source areas and their corresponding grid numbers are shown in Table 2. 203 The real number, location, and release history of GCSs for the two cases are designed as 207 follows: 208 Case one: the number of GCSs was one, the location of the GCSs is as shown in Fig. 2a and  209 release intensities of the GCSs during two release periods are as shown in Table 3a. 210 Case two: the number of GCSs was two, the locations of the two GCSs are shown in Fig. 2b  211 and release intensities of the two GCSs during two release periods are as shown in Table 3b. 212  During the inverse identification of GCSs process, the number, location, and release history 220 of the GCSs for the two cases were regarded as unknown, what known is only the potential 221 contamination source area (shown in Fig.2a and Fig.2b). For hypothetical case one and case two it 222 is estimated that the maximum number of GCSs is 3. The number, location, and release intensity 223 of the GCSs were then identified based on optimization model.  The governing partial differential equation for the groundwater flow is as follows: 241 where  is the specific yield, H is the hydraulic head, ij K is the hydraulic conductivity,  243 is the simulated area range, and W is the volumetric flux per unit volume. 244 The governing partial differential equations for the contaminant transport are as follows: 245 where  is the simulated area range,  is the effective porosity of the aquifer medium, c is 248 the contamination concentration, ij D is the dispersion coefficient, i u is the average linear 249 velocity of the groundwater flow determined by Darcy's law, and R is the source or sink term. 250 After establishing the groundwater flow and contaminant transport numerical simulation 251 model, the GMS software was used to solve the simulation model. 252 In contrast to the actual problem, the hypothetical example had no actual data measurements. 253 Therefore, it was necessary to forward run the contaminant transport simulation model and obtain 254 the contaminant concentration data for the observation wells during each simulation period for use 255 as the data measurements during the identification process. Fig. 3 shows the contaminant plume observation well in each simulation period for the two contaminated cases.  70 ,18 55 1, 2,3 , 0 300 1, 2, , 6 ( ) ( ) where i  is the 0-1 integer variable representing whether the current location is GCS, 1 323 indicates that a real GCS is present in the current location, 0 indicates that no GCS is present in Root mean square error (RMSR);(c) Certainty coefficient (R 2 ). 353 The minimum certainty coefficients with Krig-SM of case one and case two were all above 0.99. The maximum and minimum certainty coefficients with ELM-SM were 0.9934 and 0.9255, 355 respectively (show in Fig. 5),which demonstrate that Krig-SM obtained a higher R 2 than ELM-SM. 356 Fig. 5 show that the RMSR and MRE values were smaller with Krig-SM than ELM-SM. Thus, 357 Krig-SM obtained a better approximation of the simulation model and it had higher accuracy. 358 Therefore, Krig-SM was embedded in the 0-1 MINLPOM for simultaneously identifying the 359 number, location, and release history of GCSs. 360

Analysis of GCSs identification results 361
Krig-SM is embedded in the 0-1 MINLPOM and the GA was used to solve the 0-1 362 MINLPOM to identify the number, location, and release history of GCSs for two cases. A detailed 363 introduction to GAs was provided by Guo et al. (2018) and Zwickl (2006). The parameter settings 364 for the GA are shown in Table 5. The identification results of GCSs for the two cases are shown in Table 6 and Fig.6. 370 Case one: The identification results for GCSs are shown in Fig.6a and Table 6a. A value "1" 371 (shown in Fig. 6a) in the identification results indicates that the number of GCSs in the study area 372 was one. The location and the release intensities of GCSs are shown in Table 6a. The 373 identification accuracy of the number and location was 100%, and the identification accuracy of 374 release history was above 93.59%.
Case two: The identification results for GCSs are shown in Fig.6b and Table 6b. Those two 376 values of "1" (shown in Fig. 6b) in the identification results indicate that the number of GCSs was 377 two. The location and the release intensities of the two GCSs are shown in Table 6b. The 378 identification accuracy of the number was 100%, and the identification accuracy of location and 379 release history was above 91.67% and 90.14%, respectively. 380   The identification results of GCSs for the two cases showed that the proposed method 390 performed well at simultaneously identifying the number, location, and release history of the

Discussion 393
Comparing with the 0-1 MINLPOM proposed by previous researchers, the improved 0-1 394 MINLPOM proposed in this study is more applicable to the identification of GCSs, and can 395 simultaneously identify the number, location, and release history of the GCSs. 396 In this study, the actual number of GCSs was less than the estimated maximum number of