Time-Series Uncertainty Quanti�cation of Foundation Settlement with Kernel Based Extreme Learning Machine

: The dynamic building foundation settlement subsidence are threatening the urban business and residential 10 communities. In the temporal domain, the building foundation settlement often suffers from high level dynamics and 11 needs real-time monitoring. Accurate quantification of the uncertainty of foundation settlement in the near future is 12 essential for the in-advance risk management for buildings. Traditional models for predicting foundation settlement 13 mostly utilizing the point estimates approach which provide a single value that can be close or distant from the actual 14 one. However, such estimation fails to offer the quantification of uncertainties of estimation. The interval prediction, 15 as an alternative, can provide a prediction interval for the ground settlement with high confidence bands. In this 16 paper, a lower upper bound estimation (LUBE) approach integrated with kernel based extreme learning machine 17 (KELM) is proposed to predict the ground settlement levels with prediction intervals in the temporal domain. 18 Comparison with the artificial neural network (ANN) and classical extreme learning machine (ELM) are conducted 19 in this study. Building settlement data collected from Fuxing City, Liaoning Province in China has been used to 20 validate the proposed approach. Comparative results show that the proposed approach can construct higher quality 21 prediction intervals for the future foundation settlement.


27
Ground settlement is considered as a commonly seen geological phenomenon and it threats the local 28 communities. The major factor that induces the ground settlement is the soil liquefaction which softened the soil 29 ground and cause buildings settle more than the soil in the free-field. As a consequence, the shear stresses and contact 30 pressure imposed by buildings change due to the soil softening and impact building settlement levels (Karimi et al.

32
The future settlement prediction matters to the risk management of the potential building structural damages. Hence, 33 it is necessary to conduct research in predicting foundation settlement in the temporal domain.  crossing the interface of water-bearing mixed ground. In general, such approaches can be successfully applied on 45 the case-specific geological conditions. However, the ground subsidence is a complex system with heterogeneous 46 geological and geo-mechanical characteristics. Hence, a more comprehensive approach is needed to model and 47 predict the ground settlement which can be transferred to a variety of cases with heterogeneous conditions.

48
On the contrary, the machine learning algorithms have demonstrated its effectiveness and accuracy in modeling

72
Based on the discussed outlined above, a data-driven approach using kernel-based extreme learning machine 73 integrated with lower-upper bound estimation is proposed in this study. First, an interval prediction framework is 74 utilized in the building foundation settlement study and a lower upper bound estimation (LUBE) method is adopted.

75
Second, the kernel-based extreme learning machine (KELM) is introduced in this study and the selection of kernels 76 is optimized with cross-validation experiment. Comparative analysis is also performed against the state-of-art 77 approaches such as artificial neural networks and classical extreme learning machine. Through computational results, 78 the proposed approach is feasible and outperforms other building foundation settlement studies.

79
The main contribution of this paper is as follows:

83
 Second, it utilized kernel-based extreme learning machine (KELM) to enhance the predictive 84 performance of future foundation settlement. Comparative analysis across multiple kernels is conducted to select the optimal kernel for the case studies.

86
To realize this proposed approach, this paper is organized as follows. Section 2 introduces the data collection 87 process and defines the underlying problem in the mathematical manner. Section 3 provides in the in-detail 88 description of the methodologies utilized in this study. Section 4 studies and compares the models' performance on 89 geological data collected from the monitoring sites. Finally, Section 5 concludes this study.

103
In engineering geology society, engineers would construct physics models to compute and forecast the 104 incoming foundation settlement depending on the effect of gravity of the building above. However, in practice, there 105 always exist a difference between the theoretical settlement and the actual settlement curve. As illustrated in Figure   106 2, the actual settlement monitoring and forecasting is a post-hoc analysis which can benefit the risk management 107 process of the building structure respectively.

Data collection 125
The dataset has been collected and provided via collaboration with the engineering geology experts from 126 Liaoning Technical University, School of Geomatics which locates in the case study area. They have spent years 127 monitoring multiple building in down and suburbs in the local area.

128
The diagram that illustrates the collection of the time-series settlement has been presented in

138
The dataset provided contains the daily monitoring foundation settlement from Jan 2013 to April 2013. We 139 select the point with largest cumulative settlement for each case study building. In total of 120 time-series 140 observations are obtained for each building. The basic information of the dataset has been provided in Table 1

Problem formulation 148
The main object of this research is to develop a data-driven framework to predict the interval of possible values 149 of foundation settlement in the near future in the temporal domain. For each case study, the foundation settlement is 150 monitored on daily basis and target is to predict the incoming daily settlement value. Hence, the underlying problem

Auto-correlation Analysis 162
The daily foundation settlement is a time-series data format in the temporal domain. In many cases, the daily 163 settlement always reflects strong statistical patterns including seasonality and auto-correlation (Zhou et al. 2018).

164
Identification of such patterns is essential to the construction of the time-series prediction models as it determines 165 the optimal input size. Here, two fundamental statistical indexes are adopted to discover the statistical auto-166 correlation patterns: the auto-correlation function (ACF) and the partial auto-correlation function (PACF).

167
The ACF measures the Pearson's correlation coefficient between the current settlement and its k-lagged

201
To obtain the optimal solution for the ELM, the least-square solution can be computed by (8) as follows:

207
In addition to the classical ELM, due to the unknown/unspecified feature mapping, we can hardly calculate the In this study, we study the effectiveness of the two popular kernels, i.e., Gaussian kernel (12) and polynomial

Prediction Interval Formulation with LUBE Method 234
Prediction intervals (PIs) are widely used for the quantification of uncertainty in the prediction models. Give

246
In this study, the lower upper bound estimation (LUBE) method is adopted to customize the KELM model as 247 presented in Section 3.2. The PIs are constructed as outputs for the KELM algorithm. As indicated in Figure

263
In order to predict the settlement, a sequential prediction strategy is adopted in research to predict the periodic 264 foundation settlement as described in Figure 6.

307
The most essential element that functions within an ANN is the neuron. With multiple neurons stacked in the 308 hidden layers, a non-linear mapping between the input features and the output can be expressed as (21):

312
where xj represents the j th input feature; wj is the weight associated with the j th input; b is the bias and is the 313 activation function. In this paper, the ANN is also customized with the LUBE method and the sigmoid activation 314 function is utilized respectively.

Auto-correlation Analysis 318
Selection of the optimal input feature set matters to the performance of the data-driven models. Inspired by the 319 ARIMA model, the ACF and PACF are computed between the current settlement and its k-lagged historic settlement 320 to investigate the auto-correlation and seasonality within the dataset. The combination of the ACF and PACF results 321 determines in the final input feature sets.

322
As illustrated in Figure 7, the ACFs and PACFs are computed for the four time-series datasets for the four cases 323 study building listed in Table 1

Hyper-parameter Optimization 337
After the selection of optimal input series, three algorithms including artificial neural network algorithm (ANN), 338 classical extreme learning machine (ELM) and kernel extreme learning machine (KELM) are selected for training 339 and validating the time-series prediction model. To ensure the models can achieve the optimal prediction 340 performance, tuning the hyperparameters is an essential component in the process.

341
In Table 2, the number of hyper-parameters for the three algorithms as well as the various initial settings for the

Foundation Settlement Prediction 361
time-series settlements in the month of April 2013 are performed. In this research, the prediction of the testing dataset 363 are hidden in blind and we adopted the sequential prediction strategy as illustrated in Figure 6 to predict the daily 364 incremental settlement for the four buildings. Then, the prediction intervals are overlaid with actual measured 365 incremental settlement as shown in Figure 9 as follows. 366 367 368 Figure 9. Prediction intervals constructed for the testing dataset using KELM.

370
As illustrated in Figure 9, there exist prediction errors between the actual measured settlement and the predicted 371 settlement. If we consider the systematic uncertainty in the prediction process, prediction intervals (PIs) can be 372 constructed and the majority of the actual settlements falls within the 95%-confidence level PIs according to the 373 prediction outcome. However, as there still exist few outliers which falls outside of the PIs and hence, we use the 374 overall measurement metrics (i.e., PICP and PINAW) to computed the overall prediction performance as presented 375 in Table 3.