Identification of contaminant source and hydraulic conductivity field based on an ILUES-SOM surrogate model

Contaminant source identification and hydraulic conductivity estimation are of great significance for contaminant transport model in the subsurface media, but their actual values are difficult to obtain and can usually be inversely identified and estimated by sparse observations. In order to reduce computational cost in the process of estimating groundwater model parameters, the surrogate model was often used. This study addresses this challenge by proposing a modified self-organizing map (SOM) based surrogate model, named ILUES-SOM. The proposed model combines a modified iterative ensemble smoother method (SGSIM-ILUES) and the SOM algorithm to simultaneously identify contaminant source parameters and hydraulic conductivity field. Considering the characteristics of the proposed method (ILUES-SOM), a comparison of parameter estimation accuracy and computational efficiency is performed with the original SOM and SGSIM-ILUES inversion model. Moreover, the robustness of ILUES-SOM model for inversion was illustrated by proposing varying degrees of observation errors and missing early observation data. The results indicated that ILUES-SOM model can successfully retrieve unknown contaminant source simultaneously with heterogeneity hydraulic conductivity field in the groundwater system.


Introduction
In the process of groundwater pollution identification, researchers need to analyze the characteristic pollutants of groundwater to judge and trace the source of groundwater pollution. In the process of source tracing, it is necessary to accurately and effectively depict the site pollution source information and the hydrogeological parameters, which have a significant impact on the transport of pollutants (He et al. 2021). However, the occurrence of groundwater pollution has the characteristics of concealment and discovery lag, and the number of groundwater monitoring wells is often inadequate. It is often difficult to directly obtain pollution sources information and hydrogeological parameters (Prakash and Datta 2013). At present, there are many theories and methods to solve the problem of groundwater pollution traceability identification. Field test methods, such as chemical technologies, biological/biochemical/biosorptive technologies and physico-chemical technologies et al. (Mehrotra et al. 2021), is based on actual experimental data at contaminated sites to detect the current pollution situation. However, this method generally takes a lot of time and money, and it is difficult to trace the past emission history of the pollution source, which limits its applicability. The inversion method of mathematical physics equation can quantitatively identify the information of pollution sources and recover the release history of pollution sources by using a small amount of observation data and auxiliary information, which has great advantages and potential (Datta et al. 2009). This method uses sparse observation and site prior information to identify the pollution source information through the groundwater inverse problem, thus to restore the migration and transformation process of pollutants in groundwater (Ayaz et al. 2021). Data assimilation is a popular method to assist solving the groundwater inverse problem. Data assimilation methods combine dynamic data such as hydraulic head and site pollutant concentration in the groundwater flow and pollutant transport model to reduce the uncertainty of aquifer parameters. The updated parameters can improve prediction accuracy of groundwater numerical model (Bao et al. 2020).
Among available data assimilation algorithms, the ensemble Kalman filter (EnKF) (Evensen 1994) has been widely employed in hydrogeological research because of its outstanding performance (Kang et al. 2021;Li et al. 2012;Schöniger et al. 2012). van Leeuwen and Evensen (van Leeuwen and Evensen 1996) proposed a variant of EnKF: ensemble smoother (ES). It was shown that ES can obtain similar results to EnKF at lower computational cost . The ES has been widely used in hydrogeology and reservoir studies (Bailey and Baù 2010;Bailey et al. 2012;Lima et al. 2020). However, when the system is highly nonlinear, iterative ES (IES) is required, which is computational expensive (Chen and Oliver 2012). Ju et al. (Ju et al. 2018) improved the standard IES by proposing an iterative ensemble smoother algorithm based on Gaussian process (GPIES) for complex subsurface flow problems; Cao et al. (Cao et al. 2018) combined IES with a multipoint geostatistical method (Direct Sampling, DS) to assimilate indirect dynamic data (e.g., hydraulic head) into non-Gaussian aquifers. To improve the applicability and efficiency of IES to strongly nonlinear problems; Zhang et al.  proposed the iterative local updating ensemble smoother (ILUES) that extended IES to nonlinear complex hydrologic models. Several studies have shown that the ILUES algorithm greatly reduces the uncertainty of model parameters (Yang et al. 2020;Liu et al. 2021;Zhang et al. 2020).
For high-dimensional problems, a large ensemble size and iteration number are required to guarantee reliable estimation of unknown parameters in ILUES, leading to a huge computational burden . A effective method to improve computational efficiency is to use surrogate models (Asher et al. 2015).
In recent years, with the enhancement of computer performance, constructing surrogate models using machine learning (ML) has become increasingly popular (Chan and Elsheikh 2020;Tang et al. 2020Tang et al. , 2021Zhong et al. 2019). Among them, Hazrati et al. (Hazrati and Datta 2017a, b) adopted self-organizing map(SOM) to construct surrogate model. The model was used to identify the strength of a contaminant source, of which the location was known and the heterogeneity of the aquifer was mild; On the basis of Hazrati et al. 's research, Xia et al. (Xia et al. 2019)further explored the effects and robustness of SOM-based surrogate models and identified pollution source parameters (location and release history) in more realistic case, where the pollution source parameters were unknown and the heterogeneity was much stronger; Jiang et al. ) combined the dimensionality reduction idea of pilot points method and applied SOM algorithm to construct surrogate models for simultaneous identification of pollution sources and hydraulic conductivity field.
As a data mining technology, SOM algorithm highlights the nonlinear relationship of data by transforming the original high-dimensional data into lower dimensions (Penn 2005). This conversion is completed by calculating the main features and correlation between the input data, which effectively improves the data processing ability and the computational efficiency (Simula et al. 1998). The contaminant transport surrogate model constructed by the SOM algorithm not merely replaces the complex original numerical model (groundwater flow and solute transport simulation model), but also has the ability to identify unknown model parameters, which meant that the aforementioned inverse solution methods such as data assimilation methods will be no longer needed and a large computational cost will be subsequently reduced .
Among the related researches on the aforementioned SOM algorithms used for groundwater model parameter inversion, there are few research on simultaneous inversion of pollution source parameters and hydraulic conductivity field. Furthermore, as a machine learning method, the quality of the training data is one of the important factors to determine the goodness of SOM-based surrogate model. In view of the fact that the data assimilation method can effectively integrate the physical model and observation data, thus generating sample data considering the observed data and complying with the pollutant transport model, does the SOM model based on this posteriori samples have better performance? This study tackled the above challenges via a modified SOM, constructed using posterior samples from ILUES algorithm.
The remainder of this paper is organized as follows. Section 2 presents the detailed description of groundwater flow and the contaminant transport model. Section 3 outlines the framework of the proposed methodology. In Sect. 4, results obtained from numerical experiments are discussed. Section 5 discusses different scenarios and analyzes the results. Some conclusions are given in Sect. 6.

Problem formulation
In this study, the transport of contaminant can be physically described using the groundwater and the transport equations. The governing equation of one-dimensional, steady-state groundwater flow can be expressed as (Harbaugh et al. 2000): where h [L] is the hydraulic head; K [LT -1 ] denotes the hydraulic conductivity; W [T -1 ] represents volume flux.
Equation (1) is solved using MODFLOW. The governing equation of solute transport model can be expressed as (Zheng and Wang 1999):

Methodology
The self-organizing maps (SOM) is a clustering algorithm proposed by Kohonen (Kohonen 1982), which consists of an input layer and an output layer representing the grid topology. Each neuron in the input layer is connected to the neurons in the output layer by a weight vector. The principles of the SOM algorithm is as follows (Chaudhary et al. 2015). Firstly, initialize the weight vector of each neuron in the output layer in a random or linear manner, and set the learning rate and the type of the neighborhood function. The neighborhood function is used for quantitatively describe the relationship between any neuron and its surrounding neurons. Secondly, Euclidean distance [in Eq. (3)] between the input training data and each weight vector is calculated. Neuron with the smallest Euclidean distance (the winner neuron) is activated together with neurons in the topological neighborhood [in Eq. (4)]. The topological neighborhood value refers to the structure composed of the winner neuron and the activated neurons around it, as shown in Fig. 1. By repeating the above steps, the weight vectors of the neurons within the topological structure gradually approach the training data. In the training process, the adjustment amplitude Dx ji of the topological neighborhood decreases with time to avoid missing the optimal value. The iterative update stops when the learning rate decays to zero. Finally, the network (the output layer with modified weight vector) is obtained to represent the topological relationship between training data, in which each neuron represents a cluster. The Euclidean distance (d), the topological neighborhood (T) and the increment of weight (Dx ji ) are mathematically described in Eqs. (3)- (5): where D is dims of input; x i is input data; x ji is weight vector between neuron j and input data.
where T is the topological neighborhood; I(x) is the winner neuron; S j;I x ð Þ is distance between the winner neuron and neuron j; r is a coefficient of decay with time.
where g t ð Þ is learning rate, which decreases with time; Dx ji represents the update amount of the neuron weight during the training process, and the update amplitude of the neuron parameter in the topological neighborhood decreases with the training time.
For the trained surrogate model, the information contained in the neurons is called the map codebook. When using it for forward prediction or inverse source identification, the surrogate model determines the winner neuron by calculating the distance between the neuron and the input vector, and the data vector in the map codebook corresponding to the winner neuron is output. The flow chart of this study using the SOM surrogate model for prediction is shown in Fig. 2. A detailed explanation of the original SOM surrogate model can be found in the previous research work Xia et al. 2019).
The previous research work (Xia et al. 2019) has indicated that Imp SOM(imputation SOM) algorithm performs better than batch SOM in the construction of surrogate models for groundwater pollute transport. Besides the training algorithm, the hyperparameter such as map units was the most important parameter and was determined by trial and error method . Furthermore, the quantity and quality of the training data were also important factors to determine the quality of SOM-based surrogate model. With regard to the quantity of training data, almost all previous research had taken this into account, but as far as we know, there was almost no research on the quality of training data in SOM-based surrogate model. In this paper, the uncertainty of training data is regarded as the quality of training data, and the impact of ILUES updated training data on SOM-based surrogate model is discussed.
Considering that the ensemble-based data assimilation methods was the most widely used method for groundwater inverse problems and because of the similarity between the a priori/posterior set and the training data, the ensemblebased data assimilation method was adopted to improve the training data, then the surrogate model was constructed on the basis of posterior samples.
The main procedures for constructing the modified SOM based surrogate model (ILUES-SOM) for the solute transport model are as follows.
Step 1 Generation of training and validation data. A large amount of training data is needed to obtain an accurate surrogate model. In this study, the training data for the SOM-based surrogate model consists of the inputs and outputs of the original groundwater numerical model (i.e., the MODFLOW and the MT3DMS models). The input is the pollution source parameter and the hydraulic conductivities at pilot points, where the pollution source parameter includes the location of the pollution source and the release concentration of pollutants in each stress period. The output is the pollutant concentration at the observation points.
A modified iterative ensemble smoother (SGSIM-ILUES) proposed by Jiang et al. (Jiang et al. 2022) was adopted as the inversion framework to improve the training data. The SGSIM-ILUES method was based on the coupling of ILUES and sequential gaussian simulation (SGSIM) in geostatistics. Specifically, the inversion of hydraulic conductivities was converted to the estimation of hydraulic conductivity at pilot points. The posterior samples from ILUES algorithm (1 iteration) were used as the training data. In order to evaluate the accuracy of the surrogate model obtained, the same steps are used to generate the validation data, and the validation data is fixed to 500 groups in this study. A detailed explanation of the SGSIM-ILUES method can be found in our previous study (Jiang et al. 2022).
Step 2 Training of the SOM based surrogate model. As mentioned, the codebook size (units) and the training data size (TDS) have significant impact on the performance of the SOM-based surrogate model. Therefore, the number of units is set to 100, 500, 1000, 1500, 2000, 2500, 3000, respectively, and the training data size is set to 500, 1000, 2000, respectively. After training multiple surrogate models, the validation data is used to evaluate the accuracy of each candidate surrogate model, and the optimal surrogate Step 3 Using the surrogate model to identify unknown values.
Finally, the constructed ILUES-SOM based surrogate model can be used to estimate the missing components. Using the known observed true values, the best-matching unit (BMU) in the optimal ILUES-SOM based surrogate model is found, from which the estimated values of the groundwater model parameters can be retrieved. Then, the geostatistical method (i.e. SGSIM) is used to obtain the estimated hydraulic conductivity field, so as to complete the inversion of the pollution source parameters and the hydraulic conductivity field.

A hypothetical aquifer site
A hypothetical case was designed in this study to evaluate our proposed methodology. The computational domain consists of a 2D confined aquifer that is 40 m in the xdirection and 20 m in the y-direction. The flow and transport follow Eqs. (1) and (2), which is at steady-state and dominated with advection and dispersion. The top and bottom boundaries were no-flux, whereas the left and right boundaries were specific head boundary with the constant hydraulic heads of 12 and 10 m respectively. The known model parameters are listed in Table 1. For calculation of MODFLOW and MT3DMS, the domain was discretized by 0.5 9 0.5 m grids. The hydraulic conductivity was heterogeneous and isotropic, and was lognormally distributed with a mean of 4.0 and a variation of 0.5. The correlation lengths in x-direction and y-direction were 8.0 and 4.0 m, respectively. The reference log-conductivity field was depicted in Fig. 3a.
In this hypothetical aquifer, there is a point source releasing nonreactive contaminant (Fig. 3a asterisk). The contaminant source was located in the red dotted area (Fig. 3a), and the release strength varied with time and was characterized by 10 parameters, i.e., S x , S y , SP i (MT -1 ) as listed in Table 2. From the investigation of groundwater contaminant site, prior distribution of the 10 source parameters were assumed to satisfy uniform distribution and bounded by the ranges as listed in Table 2.
To infer unknown contaminant source parameters and hydraulic conductivity field, hydraulic heads and transient concentrations at t ¼ 4; 8; 12:::; 40 days were collected at 20 observation wells with their locations shown in Fig. 3a. These observation data were generated by running contaminant transport model with the reference K-field (Fig. 3a) and the actual contaminant source parameters (the last column in Table 2). Random errors that followed normal distribution were added to the hydraulic heads and concentrations, respectively. In this study, the estimation of hydraulic conductivity was interpolated from the pilot points, and the number of pilot points was fixed at 80 (Fig. 3b). Therefore, 10 unknown contaminant source parameters and hydraulic conductivity for 80 pilot points need to be estimated.

Assessment criteria
In this study, the pollution source parameter (SS), hydraulic conductivities at pilot points (KPP) and observation values (OBS) in the validation data were regarded as missing parts, and these missing values were estimated by the SOM-based surrogate model. The normalized absolute error of estimation (NAEE) and root mean square error (RMSE) were introduced to evaluate as assessment criteria. The NAEE can quantitatively characterize the deviation degree between the estimated value and the actual value, which is defined as follows: RMSE can measure the degree of match between the estimated value and the actual value, and is defined as follows: where d est is estimated value, d act is actual value, N is the number of values.
In order to better measure the total effect of the model, a comprehensive index skill score was proposed, which referred to Ma et al. (Ma et al. 2013(Ma et al. , 2021. The index takes into account both model accuracy and model training efficiency. A small value of the index indicates good performance. This indicator is defined as follows: where M (= 1, 2, 3) represent the three models discussed; TC represents the computation time of the model. In this study, NAEEss, NAEE KPP and NAEE OBS represent NAEE calculation results of pollution source parameters, hydraulic conductivities at pilot points and observation values, respectively. The surrogate model with the highest estimation accuracy (NAEE) of both pollution source parameters and hydraulic conductivities at pilot points was selected as the optimal surrogate model.

Results and discussion
Based on the proposed methodology (in Sect. 3), the simulations and the results can be grouped into three subsections.
(1) Firstly, the original SOM based surrogate model (S1) and the ILUES-SOM based surrogate model (S2) were constructed using randomly generated training samples and posterior samples from SGSIM-ILUES algorithm (1 iteration), respectively, and the optimal sizes of the codebook and training sample for SOM surrogate models were obtained.
(2) Then, the unknown contaminant source and hydraulic conductivities at pilot points were identified based on surrogate model S1 and S2. The better surrogate model was determined by evaluating S1 and S2.
(3) Finally, the performance of the selected surrogate model was further evaluated in the presence of observation error and in the absence of observational data, respectively.

SOM based surrogate model
The first stage was to obtain the optimal sizes of the codebook and training sample for these two SOM based surrogate model (original SOM and ILUES-SOM), because their sizes greatly affected the accuracy and efficiency of the SOM based surrogate models. Specifically, the training data size was 500, 1000, 2000 groups, respectively, and the codebook size of the SOM was set to 100, 500, 1000, 1500, 2000, 2500, 3000, respectively. Their corresponding original SOM and ILUES-SOM based surrogate models were constructed.

Original SOM based surrogate model
The verification results (NAEE OBS , NAEE SS , NAEE KPP ) of the SOM surrogate models with different parameter (training data size, TDS; codebook size, Units) combinations were compared in Fig. 4. The NAEE of estimating OBS, SS and KPP were obviously decreased with Units (precision increased with increased Units, in Fig. 4a, b and (a (b   (Fig. 4d), and the computational time for SOM based surrogate model (in Fig. 4d) was much smaller than the computational time for the physically based model (i.e. training data generation). After consideration of above-mentioned accuracy and efficiency, the optimal surrogate model is the codebook trained by the combination of TDS = 500 with Units = 3000.

ILUES-SOM based surrogate model
Considering the improvement of training data can help to improve the quality of the surrogate model, posterior samples from data assimilation algorithm was adopted as training data for the SOM based surrogate model. Considering that the ensemble-based data assimilation methods such as ILUES algorithm usually meant higher computational burden, which increased nearly linearly with the iteration number, only one iterative SGIM-ILUES operation was performed in the proposed ILUES-SOM surrogate model.
The verification results (NAEE OBS , NAEE SS , NAEE KPP ) of the trained ILUES-SOM based surrogate models were shown in Fig. 5. In comparison with Fig. 4, NAEE OBS , NAEE SS , NAEE KPP have been significantly improved, which proved that ILUES-SOM surrogate model was superior to original SOM model. It can be seen that there were no significant variations of NAEE OBS , NAEE SS , NAEE KPP with Units. As the validation results depicted in Fig. 5, when training data size was 500 groups and codebook size was 3000, the surrogate model had the highest inversion accuracy for the hydraulic conductivities at pilot points, and the inversion accuracy of pollution source parameters was also high. As selection principles suggested (in Sect. 4.2), the optimal surrogate model is the codebook trained by the combination of TDS = 500 with Units = 3000.

Application of the constructed SOM based surrogate model
The unknown contaminant source and hydraulic conductivities at pilot points were identified based on surrogate The inversion results of the hydraulic conductivity field of the above-mentioned three models (O, S1, S2) were shown in Fig. 6. Compared with the reference log-transformed conductivity field, the SGSIM-ILUES inversion model (model O) has the best inversion result, and the two SOM based surrogate models have relatively large errors in characterizing low conductivity areas. After further comparison, the morphology of K-field depicted in model S2 was slightly better than that in model S1.
To test the accuracy of the SOM based surrogate model, the identified contaminant source information from different inversion models were compared with the actual values. It can be seen in Fig. 7 that the estimation precision of ILUES-SOM (S2) was fairly high for both contaminant source location and source fluxes, and was close to that of SGSIM-ILUES (O). A further comparison of model S1 and S2 reveals that model S2 was more accurate except the source flux at SP1. Figure 8 showed the estimated values and estimated deviations of model S1 and S2. It was clear that the deviation bars of model S1 were much bigger than model S2, indicating that ILUES-SOM (S2) was the better model for estimating unknown contaminant source.
After a comparative analysis of the inversion performance of the two SOM surrogate models in terms of K-field and unknown contaminant source (SS), respectively, the performance of the two SOM models (S1, S2) and the ILUES inversion model (O) are synthesized by RMSE and skill score criteria (Table 3). Specifically, for estimating SS and KPP, the inversion accuracy of model S2 was closer to that of the ILUES inversion model, and clearly superior to that of model S1. Table 3  As can be seen from Table 3, model S1 constructed based on the original data has the largest improvement in computational efficiency compared to model O by 97% (i.e., the computational time is reduced from 43.85 to 0.90 h), but there were significant deviations in the inversion accuracy of the parameters (SS and KPP). Meanwhile, the computational efficiency improvement of model S2 was only slightly lower than that of model S1, reaching 86% (from 43.85 to 6.09 h), and the inversion accuracy of the unknown model parameters was closer to that of the ILUES inversion model (SS and KPP).
Considering the skill score of the two SOM-based surrogate model, model S2 (ILUES-SOM model) was a better choice (The value of skill score is the smallest, only about half of that of model S1), which could not only ensure the parameter inversion accuracy, but also significantly improve the computational efficiency.

Further discussion
As can be seen from the results of 5.1 and 5.2, model S2 not only showed better accuracy in the validation stage of surrogate model, but also had better performance in the parameter inversion stage. Practical groundwater system was affected by many factors. The uncertainty of site information was mainly caused by incomplete observation data. In order to fully consider practical situations, two scenarios were designed to further analyze the performance of the optimal ILUES-SOM surrogate model. In scenario 1, a varying degree of observation error was introduced to test the robustness of the proposed ILUES-SOM surrogate model; and early observation data was missing in scenario 2 to complicate the identification process.

Scenario 1
In this scenario, a varying degree of observation errors were introduced to test the robustness of the proposed ILUES-SOM model. These observation errors were generated by adding different degree of random errors to the numerically simulated concentrations (C) at observation wells.
It is assumed that the random errors at the observation wells follow a normal distribution, where the arithmetic mean is zero and the standard deviation is 1.
where C 0 is the perturbed value; e is a normally distributed random value; a is the error level of observation, and three error level with the values of 5, 10 and 15% were chosen to estimate the effects of errors on parameter inversion. The inversion results of hydraulic conductivity field with varying degrees of noise in the contaminant concentrations were shown in Fig. 9. It can be seen that the estimated K-fields under different error level (error-free, 5, 10, 15%) were stable and were slightly affected by error level (up to 15%).
The identification results of unknown contaminant source by the optimal ILUES-SOM model were shown in Table 4. The inversion results were not significantly affected when the error level was below 15%. Specifically, for estimating SS and KPP, the RMSE(SS) and RMSE(KPP) for the cases with different observation noise were almost stable and only slightly larger than that of the error-free case (Table 4). Therefore, the proposed ILUES-SOM based surrogate model was able to handle varying degrees of observation errors for identifying unknown contaminant source and estimating K-field. The identification error was stable when the observation errors range from 5 to 15%.

Scenario 2
In this section, the missing observation data for the first three observation time t ¼ 4; 8; 12 ð Þwere considered, and the observation error level was set to 5%. The inversion result of hydraulic conductivity field under incomplete observation data was shown in Fig. 10. In comparison to the reference K-field (Fig. 10a), the major low and high conductivity zones were effectively captured by the ILUES-SOM model (Fig. 10b).
The identification results of unknown contaminant source under incomplete observation data were shown in last column of Table 4 and Fig. 11, it can be seen that the estimated values of contaminant source did not change significantly except for source flux in the 5 th stress period. For estimating SS and KPP, the RMSE(SS) and Fig. 8 Error comparison of estimated source fluxes by optimal surrogate models of S1 and S2

Conclusions
(1) The proposed ILUES-SOM surrogate model was constructed for simultaneous identification of (a) (b) (c) (d) Fig. 9 a The reference K-field; b-d The corresponding interpolated K-field based on estimated KPP of data with 5, 10 and 15% error hydraulic conductivity field and contaminant source parameters by combining the SGSIM-ILUES and SOM. Note that in the inverse model, the estimation of hydraulic conductivity field was converted to an estimation of hydraulic conductivity at pilot points. (2) Considering the estimation accuracy and computational efficiency of the two SOM-based surrogate model, ILUES-SOM model was a better choice, which ensures the parameter inversion accuracy and significantly improves the computational efficiency. This indicated that the quality of training data can efficiently improve the performance of the SOMbased surrogate model. (3) In terms of parameter inversion accuracy, ILUES-SOM model (1 iteration) was close to ILUES inverse model, but with significantly lower computational cost. In other words, ILUES-SOM model has the advantage of fast-inverting the SOM based surrogate model, without sacrificing inversion accuracy compared with the multi-iteration ILUES inversion. (4) The proposed ILUES-SOM surrogate model for contaminant transport showed remarkable robustness. Varying degrees of observation errors were added to the limited observation data. When the error level was under 15%, the estimation performance remained satisfactory and stable. Even though early observation data were missing, the estimation precision of contaminant source was almost the same as that with full observation data, and the estimated K-field could still depict the morphological characteristics of reference conductivity field.
(a) (b) Fig. 10 a The reference K-field; b The corresponding interpolated K-field based on estimated KPP of missing data Fig. 11 Comparison of estimated SS by optimal surrogate models of S2 writing-review and editing, supervision, fund acquisition.All authors reviewed the manuscript.
Funding The authors have not disclosed any funding

Declarations
Conflict of interest The authors have not disclosed any conflict of interest.