Adaptive hybrid machine learning model for forecasting the step-like displacement of reservoir colluvial landslides: a case study in the three Gorges reservoir area, China

Establishing an accurate and dependable displacement prediction model is essential to building an early warning system for landslide hazards. This study proposes an adaptive hybrid machine learning model for forecasting the step-like displacements of reservoir colluvial landslides. Firstly, candidate factors are determined based on the landslide deformation response. Then, the cumulative displacement and candidate factors are decomposed using the optimized variational mode decomposition algorithm. Second, the sensitivity analysis of the gray wolf optimizer-based kernel extreme learning machine (GWO-KELM) models to each factor component is analyzed using the PAWN method. Then, the factors are optimized based on the analysis results. Third, based on the optimized factors, GWO-KELM models of different displacement components are established and integrated to predict the cumulative displacement. The Baishuihe landslide was taken as an example. The raw data of its three monitoring sites were employed to verify the performance of the proposed model. The results indicate that the model can decompose the cumulative displacement and factors with the adaptively determined parameters. In addition, the model performed well over a three-year prediction of the landslide displacement.


Introduction
As the most widely distributed geological hazard worldwide, landslides endanger human safety and property (Petley 2012;Haque et al. 2016;Moayedi et al. 2019). Since the Three Gorges Reservoir area (TGRA) began to impound in 2003, many ancient colluvial landslides have been reactivated (Zhang et al. 2015;Iqbal et al. 2018). It is imperative to develop accurate early warning systems (EWSs) for landslide disasters in this district . Thus, the displacement prediction models of landslides, as a critical element of EWSs (Casagli et al. 2010;Intrieri et al. 2013), are receiving increasing attention with the growing demand for disaster prevention and mitigation (Kirschbaum et al. 2010).
Landslide displacement decomposition is the crucial part affecting the performance of prediction models. The decomposition methods can be divided into partial decomposition (PD) and complete decomposition (CD) methods. The PD method combines the time series analysis (TSA) model and moving average technique to separate the trend and periodic displacements from the cumulative displacement (Du et al. 2013;Zhou et al. 2016). This method makes the physical meanings of the obtained displacement components apparent. However, it cannot obtain the random displacement components in the TSA model, leading to the incomplete decomposition problem. In the CD method, the TSA and the complex signal decomposition technique, such as wavelet transform (WT), empirical mode decomposition (EMD), and variational mode decomposition (VMD), are integrated to decompose the cumulative displacement (Lian et al. 2013;Shihabudheen and Peethambaran 2017;Li et al. 2018). This method offers a solution to the above incomplete decomposition problem. However, due to the complexity of these complex signal decomposition techniques, decomposition parameter determination has become the main obstacle restricting the application of this method. For example, Li et al. (2018) and Guo et al. (2020) proposed a TSA-VMD approach with excellent performance to realize cumulative displacement decomposition. However, the settings of the VMD parameters in their study still depend on the trial and calculation, which is not conducive to the promotion of their method. Thus, considering the differences in the entropy characteristics, central frequency characteristics, and physical meanings between the displacement components after decomposition, we propose an optimized VMD (OVMD) method to solve the above problem by applying the VMD algorithm.
The development of prediction models has experienced a leap from empirical models to neural network models (Saito 1965;Fukuzono 1985;Li et al. 2009;Intrieri et al. 2019;Criss et al. 2020;Zeng et al. 2021). As a neural network prediction model with strong generalization ability, the kernel extreme learning machine (KELM) has been widely utilized (Zhou et al. 2018). Like other nonlinear models, such as the backpropagation neural network (BPNN), the support vector regression (SVR), and the extreme learning machine (ELM), the selection of input factors is still a crucial part of affecting the prediction performance of models. In most cases (Zou et al. 2020;Li et al. 2021;Zhang et al. 2021;Ma et al. 2022), a univariate correlation analysis, such as the Pearson correlation analysis or the gray correlation analysis (GRA), is applied to conduct the correlation analysis between factors and displacements before building the prediction model. However, no factors are removed in the model construction process after the analysis. By comparison, a more reasonable quantitative method for factor selection is to propose candidate factors based on the landslide deformation response at first. Then, based on the sensitivity analysis of the initially trained model to candidate factors, the factors are optimized accordingly. Thus, a global sensitivity analysis (GSA) method called the PAWN method (Pianosi and Wagener 2015) is introduced in this study. The PAWN method can be easily used to characterize the significance of the input factors (Wang et al. 2020).
In summary, in this study, we propose an adaptive hybrid model to solve the parameter optimization problem of the VMD algorithm and the factor selection problem of the prediction model above. In the model, the OVMD method based on two criteria and one comprehensive index was first proposed and combined with the TSA model to decompose the cumulative displacement with adaptively determined decomposition parameters. Then, based on the landslide deformation response analysis, some candidate factors were proposed for each decomposed displacement component. Next, an integrated approach based on the Latin hypercube sampling (LHS) and PAWN algorithms was first proposed to optimize these factors according to the sensitivity analysis of KELM models to candidate factors. Finally, according to the optimized factors, the Grey Wolf Optimizer-based (Mirjalili et al. 2014) KELM models (namely, the GWO-KELM models) of each displacement component were built to conduct multistepahead prediction. Their prediction results are superposed to realize the prediction of the cumulative displacement of landslides.
To verify the performance of the proposed model, we selected the Baishuihe landslide as an example to predict the displacement of its three most representative monitoring sites ZG93, ZG118, and XD01. Fifty predictions for each monitoring site were implemented using this model. The root mean square error (RMSE), mean absolute percentage error (MAPE), R-squared (R 2 ), and Akaike information criterion (AIC) were used to compare the average result of these fifty operations with the results of four different GWO-KELM models.

TSA model of cumulative displacement
The cumulative displacement of landslides is nonstationary and increases obviously with time. The factors that cause the variation in the cumulative displacement can be divided into trend, periodic, and random factors (Yang 1992). Correspondingly, the cumulative displacement can be characterized as follows: where XðtÞ represents the observed value of landslide displacement, aðtÞ represents the trend displacement, bðtÞ represents the periodic displacement, and cðtÞ represents the random displacement. Due to the limitations of the monitoring means, these kinds of displacement components in the cumulative displacement cannot be directly obtained by current monitoring instruments. Therefore, obtaining the displacement components from the original cumulative displacement has become a necessary foundation for accurately establishing the nonlinear mapping between factors and displacements. The various displacement components were obtained in this study using the OVMD technique.

OVMD
VMD is a complicated signal-decomposition method with excellent performance. The basic definitions and computational process of the VMD method can be found in the literature published by Dragomiretskiy and Zosso (2013). Before using the method, three parameters (namely, the number of recovered modes K, the balance parameter of the data-fidelity constraint a, and the timestep of the dual ascent s) need to be specified appropriately. However, how to set the parameters K and a is still extremely complicated, which is the critical link to effectively decomposing landslide displacement time series. As shown in Fig. 1, the OVMD method is proposed, which can adaptively determine K and a to achieve automatic decomposition of displacement and factor time series. The specific calculation procedure of the method is as follows.
1. Optimization of K The interval of K (i.e., K interval ¼ ½1; 6, where K = 1 means no decomposition of the input signal) is determined, and a is temporarily set to 1. Then, the input signal is decomposed according to the initial parameter setting (i.e., K = 2, and a ¼ 1). The optimal value K optimal is found according to the following two criteria. If these two criteria are not met, iterative calculations are performed until these two criteria are reached: 1. Criterion 1 is whether there is a medium or high correlation between different decomposed components (i.e., whether the distance correlation coefficient (DCC) between different components is greater than 0.4). This ensures that the decomposition components obtained by using the OVMD method have different physical meanings. 2. Criterion 2 is whether there is a mode aliasing between decomposed components after decomposition (i.e., is the difference in the center frequencies (DCF) between the decomposed components higher than half of their sum of 3 dB bandwidths (HSB) as shown in Fig. 2). Criterion 2 is proposed to supplement Criterion 1 to prevent excessive decomposition of signals.
2. Optimization of a After K optimal is found, the optimization interval of a is set by equal spaces between logarithms (i.e., a interval ¼ ½10 0 ; 10 4 ). Then, the input signal is temporarily decomposed according to the initial parameter setting (i.e., K ¼ K optimal and a ¼ 1). The comprehensive determination indexes (CDIs) corresponding to different a are calculated according to the signal-to-noise ratio (SNR), approximate entropy, center frequency, and 3 dB bandwidth of the decomposed components. The definition of the CDI is as follows: where CDI is the comprehensive determination index; MAPEN is the mean value of the approximate entropies of decomposed components; NaN is a null variable; and t is the logical variable of the determination matrix of modal aliasing (DMMA), which is calculated based on the DCF and HSB between the decomposed components. If there is a mode aliasing between decomposed components, t = 0; otherwise, t = 1.
The smaller the CDI index is, the more regular the variation in each component is, the smaller the uncertainty is, and the more effective information it carries. Therefore, when the CDI of the decomposed components reaches its minimum value, a optimal is found. When there are multiple identical minimum values of the CDI at the same time, a optimal should be determined according to the minimum value of the SNR of the decomposed components, which would minimize the information loss of the reconstructed signals. Finally, the input time series is decomposed according to the optimized parameters K optimal and a optimal . In this study, we used the OVMD to decompose the cumulative displacements and their candidate factors.

GWO-KELM
The KELM model is an improved neural network model proposed by Huang et al. (2006) based on the ELM model and the kernel function. The regression function based on the KELM model can be expressed as follows: where I is the diagonal matrix, C is the penalty parameter, I/C is the bias constant of the diagonal elements in the symmetric matrix X ELM , X ELM is the kernel function matrix of ELM, and T is the output target vector-matrix (T ¼ ½t 1 ; t 2 ; . . .; t i T ). The function matrix X ELM is defined as follows: where H is the output matrix of the hidden layer (also known as the randomization matrix, and H ¼ ½hðx 1 Þ; hðx 2 Þ; . . .hðx i Þ T ), hðx i Þ is the output vector of the hidden layer corresponding to the input vector x i , and Kðx i ; x j Þ is the kernel function.
For the model parameter optimization of the KELM model, the penalty parameter and the kernel function parameter have significant effects on its performance. In order to obtain more accurate and stable prediction results, we used the grey wolf optimizer algorithm (GWO), which is proposed by Mirjalili et al. (2014), to optimize the penalty parameter and the kernel function parameter in this study.

PAWN
The PAWN, as a distribution-based GSA methodology, was proposed by Pianosi and Wagener (2015) to calculate the sensitivity S i of the input factor x i . More concretely, the sensitivity of the model output y to x i can be calculated according to the maximum distance between the unconditional output distribution F y ðyÞ and the conditional output distribution F yjx i ðyjx i Þ. F y ðyÞ can be obtained by varying all the input factors x simultaneously. F yjx i ðyjx i Þ is generated by changing all the input factors x except x i . The calculation of the maximum distance between F y ðyÞ and F yjx i ðyjx i Þ is implemented by the Kolmogorov-Smirnov (KS) test. Thus, for x i , S i is defined as follows: where stat denotes a statistic determined by the user, such as the maximum, median, or mean. In this study, we The KS statisticŜ dummy of the dummy factor is introduced into the PAWN method to identify the inputs whose measured sensitivities are too low to be distinguishable from approximation errors (Zadeh et al. 2017;Pianosi and Wagener 2018). The dummy factor is an input factor that is artificially generated in the calculation process of the PAWN and does affect the final analysis results. If the S i of an input factor x i is more significant thanŜ dummy , it suggests that this input factor x i is indeed influential.

Data processing and adaptive decomposition
1. Data cleaning It involves filling in the missing values of the raw data, reducing noise in the raw data, identifying outliers in the raw data, and correcting inconsistencies in the raw data. 2. Isochronous processing The raw data of the landslide displacements, reservoir water levels, precipitations, and so on must be preprocessed to unify the data sampling frequency at a fixed value (i.e., monthly). Then, factors whose mean KS statistics were less than the mean KS statistics of their corresponding dummy factors were removed. The initial optimized factors were obtained based on the remaining factors. 4. Reduction of redundant factors 1. The sorted mean KS statistics of the initial optimized factors were fit with an appropriate exponential fitting function as follows:

Normalization processing
where y ks is the fitted value of all factors' mean KS statistics. x order is the order number of the factors sorted in descending order according to the mean KS statistics. A, B, C, and D are the function parameters that need to be fitted. 2. The second-order differences of y ks (i.e., DðDy ks Þ) are calculated. 3. All the factors whose DðDy ks Þ are less than 0.001 are eliminated. 4. The retained factors were used as the final optimized factors to reconstruct the datasets of the different displacement components.

Model training, prediction, and validation
1. Model training and prediction GWO-KELM prediction models of the different displacement components were rebuilt based on the reconstructed datasets. Then, the optimal input factors corresponding to the period needed to be predicted were introduced into the welltrained models to predict the displacement components. The predicted cumulative displacement was obtained by summing all the prediction results of the displacement components. 2. Performance validation of the model In this study, four frequently used performance indexes (namely, R 2 , MAPE, RMSE, and AIC) are used to validate the performance of models from the perspective of the average prediction performance, extreme value prediction performance, and model complexity.
K interval is the optimization interval for the number of recovered modes K used in the OVMD method to decompose displacement and factor time series a interval ¼ ½10 0 ; 10 4 a interval is the optimization interval for the balance parameter of the data-fidelity constraint a used in the OVMD method to decompose displacement and factor time series s ¼ 0 s is the timestep of the dual ascent used in the OVMD method to decompose displacement and factor time series method is the sampling method used in the LHS-PAWN model to optimize input factors N agent ¼ 40 N agent is the number of search agents used in the GWO algorithm to optimize the penalty factor C and the kernel function parameter k of KELM models N iteration ¼ 200 N iteration is the maximum iteration used in the GWO algorithm to optimize the penalty factor C and the kernel function parameter k of KELM models N convergence ¼ 10 À7 N convergence is the convergence error used in the GWO algorithm to optimize the penalty factor C and the kernel function parameter k of KELM models F kernel = 'RBF kernel' F kernel is the type of kernel function used in the KELM models I C ¼ ½1 Â 10 À13 ; 1 Â 10 13 I C is the optimization interval for the penalty factor C of KELM models used in the GWO algorithm I k ¼ ½1 Â 10 À13 ; 1 Â 10 13 I k is the optimization interval for the kernel function parameter k of KELM models used in the GWO algorithm v = 'tenfold cross validation' v is the cross-validation method used in the GWO algorithm to optimize the penalty factor C and the kernel function parameter k of KELM models

Hyperparameter setting
As with other machine learning models, the prediction performance of the proposed OVMD-GWO-KELM model in forecasting unknown data is controlled by hyperparameter settings. Table 1 shows the hyperparameter settings used in the proposed model.

Baishuihe landslide
The Baishuihe landslide is located on the south bank of the Yangtze River (Fig. 4a). Figure 4b shows that the toe of the landslide is below the reservoir water level of 135 m, and its rear part is located at the geotechnical boundary with an elevation of 410 m. The main sliding direction, north-south length, and east-west width are 20°NE, 600 m, and 700 m, respectively. The front and back parts of the landslide are steeper than its middle part, with an overall slope of approximately 30°. The average thickness is 30 m, and its estimated volume is 1.26 9 10 7 m 3 . The monitoring data of inclinometer QZK1 show two sliding zones . The shallow sliding zone is the contact zone between Quaternary loose deposits and cataclastic rocks, with a depth of 12-25 m. The deep sliding zone is the contact zone between the bottom of the cataclastic rock and the underlying bedrock, with a depth of 18.9-34.1 m. From the monitoring report proposed by the China National Cryosphere Desert Data Center (http://www.ncdc. ac.cn/), we can find that the monthly cumulative displacements of the landslide were obtained using GPS monitoring technology from 2003 to 2007. After the strong deformation occurred in 2007, the monthly displacements were monitored by the total electronic station. A strict manual audit controls the quality of the monitoring data in the report, and no additional preprocessing method is required before using it. According to both the location (Fig. 4b) and monitoring displacement change (Fig. 5) of each site, the early warning area of the landslide was determined (Fig. 4c).

Selection of the candidate factors
Precipitations, fluctuations in reservoir water level, and earthquakes are the main factors affecting slope deformation (Crosta 2004;Lee et al. 2008;Tang et al. 2019). No massive seismic disaster has been reported in the TGR area recently. Abundant precipitation occurs year-round in this area . Additionally, the Baishuihe landslide is a typical riverside landslide in this region. Thus, the precipitations and the fluctuations in the reservoir water level should be the significant factors of the landslide deformation. Figure 5 shows that the sharp annual increases in displacements and heavy precipitation processes are concentrated in June, July, and August. There is a significant correlation between the occurrence of antecedent heavy precipitation processes before 1 or 2 months and the sharp increase in the monthly landslide displacement. Thus, it is appropriate to consider the cumulative effect of precipitation infiltration within two months on the landslide deformation. Some earlier research (Du et al. 2013;Krkač et al. 2017;Bogaard and Greco 2018) was consistent with this analysis. Therefore, the monthly cumulative antecedent Fig. 6 DDC matrix between the different components obtained by using VMD based on the different settings of K (a ¼ 1) precipitation, the bimonthly cumulative antecedent precipitation, and the monthly maximum continuous effective precipitation with four different reduction factors (i.e., r ¼ 1; 0:8; 0:6; and 0:4) were selected as the candidate factors. These six precipitation factors were defined as CF 1 , CF 2 , … CF 6 , respectively. Figure 5 also shows that the fluctuations in the reservoir water level have both positive and negative effects on the increase in landslide displacement because the pore water pressure is difficult to dissipate (Tang et al. 2019). Hence, considering the hysteresis effect of the groundwater level caused by the fluctuation of the reservoir water level Xue et al. 2020), the influence of the current and past water levels on landslide displacements should be considered. Therefore, we selected the monthly and bimonthly average elevation of reservoir levels, the monthly and bimonthly increase in reservoir levels, the monthly and bimonthly decrease in reservoir levels, the monthly and bimonthly relative variation in reservoir levels, and the monthly and bimonthly absolute variation in reservoir levels as the candidate factors. These ten waterlevel factors were defined as CF 7 , CF 8 , … CF 16 .
There is a remarkable difference in the deformation responses of landslides between their different evolution stages (Glade and Crozier 2005). For landslides with progressive failure characteristics, even extremely heavy precipitation may result in only minor deformation when they are in the primary or secondary creeping stage. In contrast, slight precipitation may easily cause extremely intense deformation in the tertiary creeping stage (Yang et al. 2019). Thus, it is necessary to consider the influence of historical deformation on the current evolution of landslides ). Thus, the cumulative displacements and their increments over the preceding 1-3 months were adopted as the candidate factors. These six historical displacement factors were defined as CF 17 , CF 18 , … CF 22 .

Results
Compared with the other eight monitoring sites, the monitoring sites ZG93, ZG118, and XD01 have two advantages: (1) the monitoring period is the longest, and the monitoring data are the most complete; and (2) they are all located in the middle of the landslide and can better reflect the whole landslide movement process. Therefore, we selected the data of these three monitoring sites as the prediction targets. Specifically, for the prediction of each monitoring site, the data from June 2006 to December 2013 (i.e., the data of 100 months) were used as the training set. The data from January 2014 to December 2016 (i.e., the data of 36 months) were used as the testing set. The ratio of testing data to training data is 36/100 = 0.36. The multistep-ahead prediction method was used in the prediction of the proposed model. In the following text, we took the data of monitoring site ZG93 as an example to introduce the details of the prediction process and its results obtained using the proposed model.

Decomposition parameters and decomposition results of landslide displacements
1. The optimal number of modes Figure 6 shows the variations in the DCC between the decomposed components with the increasing of K. When K = 4, two groups of medium-correlation or high-correlation decomposed components appeared. With the increase in K from 4 to 6, the number of medium-correlation and high-correlation components increased from 2 to 3. Clearly, K = 3 is the threshold at which high-correlation or medium-correlation components appeared in the displacement decomposition result of monitoring site ZG93. Figure 7 shows the variations in the DCF and HSB between different components with the increase of K. When K = 4, the HSB between different components is larger than the DCF between different components for the first time. Accompanying the increase of K from 4 to 6, the number of points with HSBs higher than the DCFs also increased from 3 to 6. It is clear that K = 3 is also the threshold at which the aliasing components appeared in the decomposition result of monitoring site ZG93. According to the definition of the convergence conditions for the optimization of K described in Sect. 2.5.1, K = 3 is the underdecomposition and overdecomposition threshold of the cumulative displacement for monitoring site ZG93, which means that K optimal ¼ 3. Monitoring sites ZG118 and XD01 have the same K optimal as monitoring site ZG93.

The optimal balancing parameter
After determining K optimal , a optimal was searched immediately. Figure 8 shows that the SNR and MAPEN gradually decrease with the increase of a. This finding means that the larger a is, the better the regularity of displacement components is, and the more accurate the reconstructed displacement is. However, when a is more significant than a specific value, mode aliasing would appear in the decomposed components. Therefore, according to the definition of the CDI and its corresponding optimization principle, the a optimal for monitoring sites ZG93, ZG118, and XD01 were determined, as shown in Fig. 8d.

Displacement decomposition results
The decomposition of the cumulative displacements for monitoring sites ZG93, ZG118, and XD01 was conducted based on K optimal and a optimal described above. The cumulative displacements, decomposed displacement components, and residuals are shown in Fig. 9.

Decomposition parameters and the decomposition results of factors
OVMD was also used to decompose the candidate factors. The decomposition parameters of these factors are shown in Table 2. We can find that not all the candidate factors need to be decomposed. In addition, the decomposition parameters are almost the same for the same type of factors.
The representative decomposition results of the different types of factors for monitoring site ZG93, including CF 3 , CF 9 , CF 11 , CF 14 and CF 17 , are shown in Fig. 10. For the candidate factors that need to be decomposed, their components obtained from the decomposition have different variation characteristics.

GSA of candidate factors for prediction models
After establishing the prediction models for different displacement components, the validity of each model's candidate factors was analyzed based on the PAWN. Figure 11 shows the mean KS statistics of the candidate factors. The red dotted line in Fig. 11 represents the mean KS statistics of the dummy factors. Most of the mean KS statistics of candidate factors are less than or remarkably close to those of dummy factors. This finding means that except for only a few valid candidate factors, most of the input factors have little or no influence on the results of the prediction models. Additionally, there is a certain similarity between their valid candidate factors for the prediction models of the same type of displacement components. Figure 12 shows the optimized candidate factors obtained using the reduction method described in Sect. 2.5.2. The red dotted line represents the initial dividing line of the optimized factors. The green dotted line represents the final dividing line of the optimized factors. By using this approach, the number of candidate factors is reduced. For instance, the number of candidate factors is reduced from 52 to 7 within the trend displacement prediction process of monitoring site ZG93. Moreover, their candidate factors after reduction are quite different for the prediction models of different displacement components. Figure 13 shows the DCC matrixes between the factors before and after factor selection, in which the diagonal elements of the matrixes of the different displacement components represent the DCC between the factors and the predicted targets. The correlations among the factors and between the factors and predicted objects are quite different for the prediction models of the different displacement components. The trend displacement prediction model has the highest average correlations among the factors and between the factors and predicted targets, followed by the random displacement prediction model. The periodic displacement prediction model has the lowest average correlations among the factors and between the factors and predicted targets.

Computational stability of factor selection
The candidate factors of the different displacement component prediction models were reduced 50 times to evaluate the stability of the PAWN method. Figure 14 shows the statistical results of factor selections. The reduced result of candidate factors is very stable, and the remaining factors are concentrated on a few factors. In addition, in terms of the prediction model of the same displacement component for different monitoring sites, the 50 times factor selection results have some similarities in the selection of several core factors.

Cumulative displacement prediction based on the proposed model
To verify the accuracy of the proposed model, we established fifty integrated models to predict the cumulative displacements for each monitoring site. The final prediction results were the mean values of the prediction results provided by these fifty models. Four different VMD-GWO-KELM models were taken as the comparison models to compare the model performances. These comparison models were separately established based on the initial candidate factors, the candidate factors optimized by the GRA method, the adaptive decomposed factors, and the adaptive decomposed factors optimized by the GRA method. For optimizing the input factors by using the GRA method, please refer to the research published by Miao et al. (2017) and Yang et al. (2019). Figure 15 and Table 3 show that the R 2 and MAPE of the proposed model are slightly higher or remarkably close to those of the comparison models. For the RMSE, the proposed models have relatively low RMSEs in the training process. They also have the lowest RMSEs in the prediction process. For the AIC, all the proposed models have the minimum AIC, whether in the process of prediction or training. In addition, the standard deviations of the prediction results for the proposed model changed slightly with the increase of the prediction period. Table 3 also shows the variations in each performance index caused by the change in the number of factors. In most cases, optimizing input factors is conducive to improving each model's performance, whether in prediction or training. Only in very few cases does the optimization of factors lead to the decline in the R 2 and MAPE of the proposed model in prediction. However, compared with the number of factors that need to be increased to improve these two indexes in the comparison models, this decrease in the R 2 and MAPE of the proposed model is entirely acceptable to users.

Component extraction of the landslide displacement and its factors
As shown in Fig. 9, the trend displacements retain the changing trend in the original cumulative displacements. For the periodic displacements, there is a good correspondence between their periodic variations and the steplike change in the original cumulative displacements. Their variations have a particular uncertainty and randomness for the random displacements. These results are consistent with the definition of the landslide displacement components in the TSA, which supports that the proposed OVMD can effectively decompose the landslide displacements. Moreover, the residuals between the reconstructed cumulative displacements and the original cumulative displacements account for only approximately 1/1000 of the original cumulative displacements. Using the reconstructed cumulative displacements obtained by the OVMD for prediction would not yield a misjudgment of the original cumulative displacements of landslides. The decomposition of factors has been proposed only in recent years. Its purpose is to extract the components in each factor that are more related to their corresponding displacement components to build prediction models with better performance. In this kind of research (Deng et al. 2017;Li et al. 2018;Guo et al. 2020), it was usually assumed that there were two components with different frequencies in the factors. Then the correlations between the two components and the prediction objects were analyzed. According to the analysis results, the prediction model based on the high-correlation components was built to achieve a more accurate prediction. However, as shown in Table 2, it is found that not all the factors have the above two components. In addition, a few factors have more than two components with different physical meanings. This fact reminds us that we must pay more attention to avoid performance improvements based on fake factor components, thus increasing the risk of incorrect predictions. Fig. 12 Division of the valid and invalid factors for the a trend displacement, b periodic displacement, and c random displacement prediction models

Influence of factor coupling on the prediction results
Extracting valid factors has become the key to improving the overall performance of prediction models. In past research, GRA was a widely used method to determine the input factors of the model with the advantages of strong generality (Yang et al. 2019). It is very suitable for selecting input factors to quickly remove some irrelevant and low-correlation input factors without the need for training models. Figure 13 shows that only a portion of the optimization factors obtained by the proposed PAWN-based method has high correlations with the prediction outputs. This result is significantly different from the optimization factors obtained by the GRA method. This is because the coupling influence between low-correlation factors on the prediction outputs has been considered when the PAWN-based method is used. Table 3 shows that compared with no optimization of factors, using the GRA method can improve the prediction performance of models. However, the performance improvements obtained using the GRA method are still lower than those obtained using the proposed PAWN-based method. This indicates that considering the impact of factor coupling on the prediction results is beneficial to improving the prediction performance. Thus, in constructing prediction models, we should choose the high-sensitivity factors as the input factors instead of the high-correlation factors. The proposed PAWN-based method has apparent advantages in finding high-sensitivity factors compared with the GRA method. Table 3 shows that the average prediction performance of the proposed OVMD-GWO-KELM model is slightly higher than that of the comparison models in most cases. For predicting the extreme values of displacements (i.e., when the displacements with significant growth are predicted), the proposed model has a better generalization ability than the comparison models. Meanwhile, the optimized input factors make the complexity of the proposed model significantly lower than those of the comparison models. In addition, the computational stability of the proposed model is also confirmed according to the variation characteristics of the results of the fifty multistepahead predictions. The above results show that adopting the proposed model is remarkably effective for improving the performance of the EWS for landslide hazards. However, it should be noted that when using the proposed model to predict the displacement of other landslides, it is vital to propose more targeted candidate factors based on the deformation response to establish a more accurate prediction model. In addition, if the proposed model cannot be dynamically updated with the growth of prediction time, its prediction accuracy will inevitably decrease. Therefore, in future research or applications of the proposed model, its adaptive updating function should be subjected to more targeted optimization.

Conclusion
This study proposed an adaptive hybrid machine learning model, namely, the OVMD-GWO-KELM model. Taking the Baishuihe landslide as a case study, the process of applying this model for landslide cumulative displacement prediction with multistep-ahead prediction was discussed in detail, and the following conclusions were drawn from this study: 1. The adaptive decomposition results obtained using OVMD show three displacement components with clear physical meaning and different variation characteristics in the cumulative displacement of the Baishuihe landslide. This finding is consistent with the conclusion of the TSA model of landslide displacement for step-like landslides. 2. Compared to the GRA method, the LHS-PAWN method can analyze the influence of the interaction between various low-correlation factors on the results of prediction models. Thus, considering the redundancy and lack of information in raw data, the lowsensitivity factors and combinations of low-sensitivity factors could be correctly reduced by applying this LHS-PAWN method. 3. Compared with four different VMD-GWO-KELM models, the proposed model can significantly improve its generalization without significantly reducing its training effect, so the overfitting and underfitting  phenomena can be effectively suppressed. In addition, the complexity of the model is typically low, and the stability of its calculation results is still high with varying time.