Data mining technology for the identification and threshold of governing factors of landslide in the Three Gorges Reservoir area

Due to the complex geological conditions and external triggering factors, the deformation of landslide disaster often has spatial differences, especially in the Three Gorges Reservoir area. It is of great significance to explore the governing factors and their thresholds in different parts of landslide. In this research, the deformation of Shuping landslide is analyzed. First, 9 hydrological factors are selected for data mining analysis by comprehensive research, including 5 reservoir water factors and 4 rainfall factors. Then, the numerical variables are transformed into discrete variables by two-step clustering method, and the Apriori algorithm is utilized to deal with the classified variables to generate the correlation criterion meeting the minimum confidence, and the correlation criterions between triggering factors and landslide displacement are established. Finally, the threshold of governing factors are mined out by decision tree C5.0 models. The results indicate that governing factors controlling the deformation of different parts of landslide are distinct. Specifically, the rear landslide is jointly controlled by the reservoir water and rainfall. On the contrary, the reservoir level controls the deformation of other parts of Shuping landslide. Generally, the daily drop of water level is the most important factor causing the deformation of Shuping landslide. During the period of low water level (138.951 ~ 147.437 m), once the daily drop of water level exceeded 0.416 m/d, the landslide will show severe deformation. This research reveals that the study of association criterion and threshold is of great significance for landslide deformation analysis. Data mining technology can be better applied to the prediction of the reservoir landslide.


Introduction
Geological disasters emerge in endlessly with the increasing frequency of extreme weather Gong et al. 2020). As one of the most widely distributed geological disasters, landslides have the characteristics of great destruction, which are widely developed in mountainous and reservoirs areas (Huang et al. 2017;Nicu and Asndulesei et al. 2018). The Three Gorges Reservoir area has imperative economic and social benefits (Zhou et al. 2016;Tang et al. 2019). Since the impoundment of water level increased to 175 m, the Three Gorges Reservoir area, which is surrounded by mountains and water, has become a highly prone area of geological disasters (Miao et al. 2018a;Jiang et al. 2021;Ma et al. 2022). The unique and complex geological conditions and ecological environment have given birth to the occurrence and resurrection of a myriad of riverine landslides, and these extremely devastating landslides will cause huge loss of life and property and even adverse social impacts (Peng et al. 2018;Zhang et al. 2020a;Wang et al. 2022).
The causes of landslides are miscellaneous and varying from disaster triggering factors (Gariano and Guzzetti 2016;Milevski et al. 2019). Moreover, the combination of internal and external factors such as tectonics, rainfall, water fluctuation, and human activities will promote the appearance of landslides to varying degrees (Moreiras 2005;Petley et al. 2007;Sassa et al. 2010). However, determined by its own function, the frequent rise and fall of the reservoir water inevitably weakens the structure of the rock and soil (Miao et al. 2018b;Zhang et al. 2020b). Moreover, coupled with the scouring of the increasingly common large-scale rainstorms, rainfall infiltration will inevitably reduce the stability of the reservoir bank slopes (Gong et al. 2021;Sun et al. 2021). Therefore, it is reasonable to conclude that rainfall and fluctuation of reservoir water level are the key factors affecting the stability of reservoir landslide. (Ma et al. 2017a(Ma et al. , 2020Miao et al. 2021Miao et al. , 2022. Due to the development of monitoring and statistical science, the theory of data analysis has been applied to many fields of landslide research, such as recognition (Shi et al. 2020;Gong et al. 2022), prediction (Du et al. 2013;Chen et al. 2017;Zhang et al. 2020a, b, c), modeling (Stumpf and Kerle 2011;Vorpahl et al. 2012), susceptibility (Hong et al. 2016;Chen et al. 2018;Zhao and Chen 2019). Compared with the traditional data analysis process, data mining has attracted great attention for its ability in exploring the inherent correlation criteria of data (Ma et al. 2017b(Ma et al. , 2018. For example, Tan et al. (2017) used Grey Relational Grade Analysis (GRGA) to determine the main control factors of Zhujiadian landslide at different deformation stages. Yao et al. (2019) used the neighborhood rough set theory to identify the triggering factors leading to the deformation of the Baijiabao landslide, which showed that the surface deformation and the expansion of ground fissures with this stepped displacement pattern were mainly controlled by rainfall and reservoir water level changes. Wu et al. (2021) combined the analysis of variance (ANOVA) and K-means clustering method to jointly analyze the monitoring data of reservoir water fluctuation and rainfall intensity, and revealed the mechanism of rapid deformation of Gapa landslide. Kusak et al. (2021) created a list of landslides that occurred in Karahacili at the end of 2019 and used Apriori algorithm and spatial data mining methods to assess the pre-landslide conditions in the area. Althuwaynee et al. (2021) used the Apriori algorithm based on association rule mining to explore the antecedents of landslides through spatial clustering mode. Summarily, the improvement of monitoring technology has also brought about the generation of massive landslide monitoring data, and data mining has been used in the analysis of landslide triggering factors extensively. Nevertheless, considering the spatial variability of triggering factors in different parts of landslide, there is still a lack of reports on how to accurately identify and distinguish the cause of landslides.
In this study, the deformation of Shuping landslide is analyzed. 9 hydrological factors were selected for data mining analysis by comprehensive research (Fig. 1). Data mining was utilized to deal with the classified variables to generate the correlation criterion meeting the minimum confidence. Each monitoring point selects 10 association rules with 100% confidence level for analysis. And decision tree C5.0 models were established to analyze the threshold of governing factors. The evidence showed that together with its high accuracy in monitoring data research, such a data mining method was expected to be widely applied in data analysis and prediction of accumulation landslides in the reservoir area.

Apriori algorithm
Based on the transformation of the landslide monitoring data types by the two-step clustering algorithm (Fig. 2), the variables obtained from the classification are calculated using the Apriori algorithm to mine the association criteria of the monitoring data. The Apriori algorithm, proposed by Agrawal and Srikant (Agrawal et al. 1993), mainly consists of two steps: (1) Generate the set of frequent items above the minimum support; (2) Generate the association criterion above the minimum confidence within it (Guo et al. 2019). The sketch map of Apriori algorithm is displayed in Fig. 3. The implementation process is to search and generate a frequent itemset L 1 of length 1, which in turn is used to produce a frequent itemset L 2 of length 2, and iterate until no higher-order frequent itemset can be obtained.
After generating the simple association criterion from frequent item sets, the set of valid rules is selected according to whether the confidence level is greater than the confidence threshold. Specifically, for each frequent itemset L, calculate the confidence level for each of its nonempty subsets, and if C L'?L-L' is greater than a userspecified confidence threshold: Then the association criterion is generated: L' ? (L-L').

Decision tree C5.0
The decision tree, as a prevalent classification algorithm and a decision support method, follows a set of rules and an established tree diagram. In term of classification and prediction, the nodes of the tree are sample attributes chosen for splitting while the branches are attribute values (Pandya and Pandya 2015). Although there are many other decision tree algorithms, C5.0 is a commonly used one. Videlicet, C5.0 obtains a decision tree model by analyzing and summarizing the properties of a large number of samples and combining them with the principles of information theory (Zhang et al. 2017). For a given node n, it is assumed that N is the entire set of samples, C and t are the set of target variables and the number of classifications.
Here, p (C i |N) is the relative probability of C i (i = 1, 2, 3, …, t). And then: The information gain can be expressed as: The growth of the decision tree can be defined as follows: In addition, determining the optimal splitting point is also essential in decision tree growth. The core idea of the solution has the following steps: (1) arrange the input variables in ascending order; (2) define a number of initial intervals so that each variable value of the input variables falls in a separate interval; (3) calculate the cross-group frequency table of the input variables and the output variables; (4) calculate the chi-square value of each two adjacent groups; (5) obtain the chi-square threshold based on the significance level and degrees of freedom, and merge the interval with the smallest cardinality value; (6) repeat recursive steps (2) * (5) until any pair of adjacent groups cannot be merged.
Boosting techniques can improve the robustness of the decision tree C5.0 algorithm, which consists of two phases: modelling and model voting.  In the modelling phase, the boosting technique increases the set of simulated samples by repeatedly sampling the existing weighted samples. If the whole process is iterated k times and the size of the training sample is N, then the modelling process can be expressed as: (1) Initialized sample weights: where w j (i) denotes the weight of jth samples in the ith iteration.
(2) According to w j (i), the training sample set T i is formed by taking n samples from T with replacement; (3) The model C i can be obtained from T i and calculate the error e(i) of the model; (4) End the modelling process when e(i) [ 0.5 or e(i) = 0, otherwise the weight of each sample are updated according to the error, recorded as: Among them, sample weight for misclassification remains the same: Then, normalized: (5) Iterative steps (2) * (4) to obtain k models and k errors.
In the voting phase, the voting process for a new sample set X can be summarized as: (1) Determine a prediction value C i (X) and a weighting value for each model C i (i = 1, 2, 3, …, k): Consequently, k models will give a number k of C i (X) and W i (X).
(2) Calculate the sum of the weights individually according to the categories, and the category with the largest sum is the final classification result of the set X. Moreover, by combining cross-validation methods with boosting techniques, the generalization ability of the model can be improved to prevent overfitting of the model.

Geological characteristics of Shuping landslide
Shuping Landslide is generally distributed in the northsouth direction, about 47 km away from the Three Gorges Dam, which is located on the right bank of the Yangtze River, and the toe of the landslide is directly to the Yangtze River. The sliding zone is steep in the upper part and gentle in the lower part, which is high in the south and low in the north. Two gentle slope platforms are distributed at the elevations of 170 * 200 m and 310 * 340 m respectively. Shuping landslide is divided into East and West parts, and the eastern part is called the active area, as shown in Fig. 4. The elevation of the shear outlet is 80 m and the area is about 55 9 10 4 m 2 , with a total volume of 2.89 9 10 7 m 3 . Shuping landslide is mainly composed of sliding mass, sliding zone and sliding bed. The sliding mass is mainly silty clay and broken stone soil. The buried depth of the slip zone is 30 * 70 m (average depth 50 m), which is the contact zone between the accumulation layer and the bedrock, and the constituent material is mainly silty clay. The sliding bed is purple red and grayish green medium thick bedded siltstone mixed with mudstone, light gray and grayish yellow medium thick bedded limestone and marl of Badong Formation (T b ) of Middle Triassic system. The attitude of rock stratum is 170°, and the dip angle is 12 * 15°, as shown in Fig. 5.

Deformation of the landslide
Shuping landslide is a historical landslide accumulation body, which is located in the folded mountain area of Western Hubei. This area has the characteristics of high rainfall intensity, long duration and high rainstorm frequency. Moreover, the terrain of this area is relatively lowlying. Therefore, under the influence of rainfall, the local cracks will accelerate the expansion. With the progress of rainfall, multiple structural planes will form a through sliding surface, and rainfall weakens the cohesion, internal friction angle. Its comprehensive factors promote the continuous deformation of the landslide. Before 1996, there were many arc-shaped tensile cracks in the toe of Shuping landslide, resulting in the forced relocation of more than 60 residents, and the damage was mainly caused by local deformation. Since the completion of the Three Gorges Dam and water storage in June 2003, the front edge of Shuping landslide has been affected by reservoir water and rainfall, and the deformation was mainly local surface collapse. From 2004 to 2007, there were many shear cracks, tension cracks and intermittent pinnate cracks on the east and west sides of the landslide, which caused cracks in the walls of many buildings, forcing 85 residents of 25 households to move. In August 2008, Shuping landslide was affected by heavy rainfall. The cracks on the eastern boundary of the trailing edge of the landslide continued to develop, and many settlement cracks and small-scale collapse began to appear. From May to July after 2009, the main sliding area of Shuping landslide was formed, and small-scale collapse and loosening occurred around the main sliding area. The original cracks at the landslide boundary continued to extend and expand under the action of external force, and many places were connected ( Fig. 6). At present, the landslide is still in the creep deformation stage, and severe deformation or sliding is more likely to occur under extreme conditions.

Analysis of the monitoring data
A series of devices based on GPS for displacement monitoring had been installed in various parts of the Shuping landslide since 2003. Specially, located in the active area, a total of 8 monitoring devices were arranged, which can be regarded as a very meaningful deployment that long-termly and closely reflected the state of displacement of the Shuping landslide instantaneously. Correspondingly, the  However, the emergence of extremely high intensity precipitation (July 2005) did not induce the landslide displacement to increase sharply. Especially for ZG89, ZG90, the displacement of which were less affected. Moreover, SP-6 did not even show any displacement changes.
(2) Metaphase (until June 2008): During this phase, the landslide deformation began to increase steadily in a ''step-like'' manner. Specifically, the fluctuation of the reservoir level weakened the rock and soil structure inside the landslide, sparking off the displacement to rise rapidly with the sharp drop of  reservoir level. Specially, the displacement of monitoring points such as ZG86, ZG88, and SP-2 had increased by more than 1000 mm. (3) Anaphase (until June 2014): The highest water level of the reservoir rose again during this period, reaching a maximum of 175 m. In addition, the periodic rise and fall of the reservoir water led to the landslide displacement to increase in a ''step-like'' manner continuously. The data of all monitoring points during this period connoted that the displacements were still increasing at a felicitously steady rate. Obviously, the displacements of ZG85, ZG86, ZG88, SP-2, SP-6 were significantly greater than other monitoring points. In this paper, the data of the monitoring points ZG85, ZG86, ZG88, SP-2, SP-6 are selected as the research object, where monitoring points ZG85 and ZG88 were located at the toe landslide, SP-2 was located in the middle of the landslide, and ZG86 and SP-6 were located at the rear landslide.

Physical mechanism of Shuping landslide deformation
Shuping landslide is an old landslide with a long history. Its unique landform, formation lithology and geological structure conditions are the internal causes of the formation of the landslide and control the formation and development of the landslide. The fluctuation of the reservoir level induces the revival of the landslide, and the rainfall accelerates the deformation of the landslide. These factors play an important incentive role in the deformation of the landslide. Therefore, the physical mechanism of Shuping landslide deformation can be analyzed from two aspects of internal and external factors.
(1) Internal factors The occurrence of landslides mainly depends on the control of slidable strata, such as layered rock formations containing weak intercalated surfaces or layers (zones). The Badong Formation (T 2 b) stratum is a typical slippery stratum spread all over the Three Gorges Reservoir area. Furthermore, the Shuping landslide develops in the Middle Triassic strata, constituted by a series of purple-red, gray-green medium-thick layered siltstone intercalated with mudstone and gray, light gray medium-thick layered mudstone mixed shale. The rock mass of Shuping landslide distributes between soft and hard phases where sandwiched with weak and weathering mudstone layers. Due to the high content of hydrophilic minerals, it is easily softened in contact with water and can be classified as slippery rocks. In short, the weak strata in the Shuping landslide area lays a material foundation for the deformation and failure of the slope. Specifically, the effect of gravity contributes to the creep deformation of the weak base at the lower part of the slope, thus pulling the upper slope to produce tensile cracking deformation, which gradually develops into large-scale deformation. In addition, affected by the regional structure, the fractures in the rock mass of Shuping landslide area are highly developed. The fissures tending to the NNE and NNW directions are the most extended, whose orientation is almost the same as the slope aspect, providing a cutting plane for the formation of the landslide. However, the gently sloping fracture surface outside the dip slope is favorable for the formation of the bottom slip surface. Briefly, these internal geological factors jointly control the formation and development of the Shuping landslide.
(2) External factors During the period of reservoir level rise up, the slope body is back pressured, which is conducive to the stability of the landslide. Conversely, the drawdown of the reservoir causes the groundwater to drain slowly due to the poor permeability of the slope, so that the decline of the groundwater level lags behind the decline of the reservoir water level, forming a positive drop between the groundwater and the reservoir water level. And the seepage force points to the outside of the slope, which is not conducive to the stability of the landslide. Meanwhile, most of the slopes remain in a state of watersaturated with a relatively large severity, because the decline of the groundwater level in the slopes is relatively lagging behind. Therefore, the stability of the slope decreases sharply and shows a hysteresis effect. In particular, a large water level falling speed will induce a severe landslide deformation rate, and the deformation characteristics of the weakly permeable hysteresis landslide shall appear. Thus, the decline of the reservoir level can be regarded as the main factor leading to the deformation of the Shuping landslide, which is a typical reservoir descending landslide. Besides, rainfall also has a great influence on the deformation of the landslide. The infiltration of rainfall into the slope increases the self-weight of the slope and forms the pore seepage pressure, which softens the properties of the slip zone soil and reduces the mechanical parameters. By comparison, it can be seen that there is an obvious correlation between the deformation of the landslide and the seasonal rainfall. Moreover, the landslide deformation curve has a step every year, and it happens to be in the annual rainfall concentration period. It shows that atmospheric rainfall is also one of the main factors for the deformation of the landslide.
It can be seen that the drop of reservoir water level and atmospheric rainfall have stimulated the deformation of Shuping landslide. When these two factors are superimposed on each other, the deformation of the landslide is more likely to occur. Especially, owing to the double influence of the lowering of the water level of the Three Gorges Reservoir from 2007 to 2013 and the rainfall in the flood season after April each year, the landslide deformation increased sharply. It can be inferred that the decline of the reservoir water level reduces the stability of the landslide and intensifies the deformation. Nonetheless, if there is an obvious rainfall process during the decline of the reservoir water level, it will have a more adverse effect on the stability of the landslide and aggravate the deformation of the landslide.
The evidence suggests that the fluctuation of the reservoir water level and rainfall are the main factors affecting the deformation of the landslide. Therefore, we select a total of 9 hydrological factors from two major categories for data mining analysis. Among them, the hydrological factors related to the reservoir water level include: Monthly average water level ( h) (m), monthly maximum daily drop of water level (Dh d m ) (m/day), monthly maximum daily rise of water level (Dh r m ) (m/day), monthly variation of water level (Dh 1 ) (m/month), and bi-monthly variation of water level (Dh 2 ) (m/month). Additionally, the hydrological factors related to rainfall include: monthly maximum continuous rainfall (q e c ) (mm), monthly cumulative rainfall (q 1 ) (mm), bi-monthly cumulative rainfall (q 2 ) (mm), and monthly maximum daily rainfall (q d m ) (mm), as shown in Table 1. It should be noted that the monthly fluctuation of reservoir water level is the height difference between the highest and the lowest reservoir water level in each month. Monthly maximum effective continuous rainfall refers to the cumulative rainfall of a continuous rainfall process.

Clustering results
9 hydrological factors were clustered by two-step algorithm, as shown in Table 2. Specifically, the categories were set to 2 * 10 in the clustering process. And the Euclidean distance was employed to two-step clustering algorithm for distance measurement. In addition, the clustering criterion was subjected to Bayesian Information Criterion (BIC).
The monthly velocity was clustered into two groups: V1 and V2 (Table 3). In addition, Fig. 8 displays the count of clustering results of the monitoring sites. Due to the triggering factors were mainly considered of each part of the landslide in this paper, the second type of clustering result V2 with higher monthly, alternately, ZG85, ZG86, ZG88, SP-2, SP-6 at different position of the landslide were selected for research.

Association criteria mining
In the association criteria mining process, in order to ensure the quantity and accuracy of association criteria, confidence and support are usually used to control. Confidence indicates the proportion of transactions containing both A and B in transactions containing A (Confidence = P(A&B)/P(A)). Support indicates the proportion of transactions containing both A and B in all transactions (Support = P(A&B)). In general, there is no fixed value of minimum support and confidence, which should be given according to the acceptance of training data and scene. In this paper, the parameters of Apriori algorithm were set as follows: confidence thresholds was 80%; support thresholds was 1.5%. Considering that the deformation rate of the V2 stage was in the range of 68.87 to 399.45 mm/day. Thus, we were more inclined to explore and discuss the triggering factors of the landslide deformation at this stage. Therefore, in this paper, we screened and listed several typical deformation association rules in the five monitoring Monthly maximum daily rise of water level (Dh r m ) (m/day) Bi-monthly cumulative rainfall (q 2 ) (mm) Monthly variation of water level (Dh 1 ) (m/month) Monthly maximum daily rainfall (q d m ) (mm) Bi-monthly variation of water level (Dh 2 ) (m/month) points of ZG85, ZG86, ZG88, SP-2, and SP-6 at V2 stage. In addition, elevation of each monitoring points were 190 m, 270 m, 190 m, 220 m, 310 m.
Association rule results of ZG85 were displayed in Table 4. It was reasonable to conclude that reservoir level dominated in the former item of the association rule. Rule 1 was the correlation criterion for slope monitoring data at V2, including the monthly maximum daily rise of water level. Alternatively, when it was in sharply-rise and monthly variation of water level was in sharply-variation, the landslide shall suffer a rapid displacement period. Similarly, Rule 2 * 5 indicated that the displacement of month velocity will be aggravated to V2 when the monthly maximum daily drop of water level was in the state of sharply-drop. Then, it can be seen from Rule 6 * 10 that monthly/bi-monthly variation of water level played a crucial role in triggering factors of water level. Specifically, the sharply-variation of monthly/bi-monthly variation contributed to the possibility of landslide. Moreover, it was more likely to appear under the influence of factors such as a sharp-rise of water level.
As elucidated in Table 5, the clustering Rules 1 * 7 of ZG86 meant that the large deformation stage was more likely to occur in the case of heavy-continuous-rainfall, while reservoir water fluctuations less mattered. Subsequently, rules 8 * 10 also indicated that V2 was prone to  appear during bi-monthly-cumulative rainfall was in a heavy state. Therefore, these association rules indicated that the part of the landslide in the middle and rear (Fig. 5) was almost dominated by rainfall. Association rule results of ZG88 were shown in Table 6. Similar to ZG85, these clustering rules such as Rules 1 * 5 and Rules 6 * 9 indicated that monthly maximum daily rise of water level was in the state of sharply-rise or monthly maximum daily was prone to induce the sharplydrop of water level, which will increase the landslide deformation rate to V2. In addition, Rule 10 indicated that when the bi-monthly variation of water level reached sharply-variation, the synergy of other reservoir water uplift factors will also lead to the realization of V2. Furthermore, ZG88 and ZG85 were at the same elevation (Fig. 5). However, the monitoring data elucidated that ZG88, which was located on the west landslide, had a larger deformation than ZG85 (Fig. 7). Then, it was possible to believe that maximum daily drop of water level was more likely to cause large displacements among the triggering factors of the reservoir water level. Table 7 portrayed the association rules of SP-2. Rule 1 can be interpreted as the continuous monthly rainfall and maximum daily rise of water level reached in a heavy state, which induced the displacement and deformation rate in the middle of the landslide to reach the V2 stage. Similarly, Rules 2 * 5 can be explained as heavy state of monthly cumulative rainfall can aggravate the deformation of landslide with other triggering factors of rainfall such as monthly variable of water level. Then, Rules 6 * 10 meant that under the combined effect of rainfall and reservoir water level fluctuations, such as bi-monthly cumulative rainfall (Heavy-Cumulative-Rainfall) and monthly maximum daily rise of water level (Medium/ Sharply-Rise), the landslide was likely to show severe deformation.  Compared to ZG86, SP-6 was located at 310 m on the contour line, which was the closest to the rear of the landslide among all monitoring points. Table 8 showed the correlation criteria of SP-6. Precisely, Rules 1 * 4 indicated that if the monthly maximum continuous rainfall was between 1.25 and 2.39 m (Heavy-continuous-rainfall), and the monthly maximum daily rainfall was also in the interval of (0.52, 0.95), that was, medium-daily-rainfall, the landslide was prone to deform at a high rate (V2).
In general, longitudinal observation of the correlation criteria of ZG85, SP-2, ZG86 and SP-6 indicated that among the triggering factors of the toe, middle and rear of the landslide, the governing factors restricted each other, alternatively, the reservoir level factor was inversely proportional to the rainfall related factors. Specifically, the reservoir level factors played a leading role in the toe landslide. On the contrary, the rear landslide was jointly controlled by the reservoir water level and rainfall. In particular, the rainfall factor was the key factor that causes the deformation of the rear landslide.

Threshold values analysis
The 80% of the total data is defined as training samples to build the decision tree model, and the rest 20% data is set as the testing samples to check the accuracy of the model. In order to improve the generalization ability of the model and prevent the model from over fitting, this paper adopts the method of combining cross validation and boosting technology in the construction of decision tree C5.0 model. The number of tests for boosting is set to 10, the number of cross validation folds is set to 10, and the expected noise is set to 10%, as shown in Fig. 9, which contained 6 hydrological factors ( h, Dh r m , Dh 2 , q e c , q 1 , q d m ). In this C5.0  model, the accuracy of total sample identification reached 94.53%. Generally, V2 mode should be paid enough attention. Therefore, nodes in V2 state were selected for analysis. In this model, there were 5 nodes in V2, as shown in Table 9. Among them, criterion 4 was only composed of reservoir water factor, while the remaining four criterion were composed of reservoir water and rainfall factor. It meant that when the water level was 138.951 * 147.437 m, once the daily drop of water level exceeded 0.416 m/d, the landslide will enter second stage (V2). This criterion was the most important one which accounted for 58.02% of all V2 state points. Therefore, during the period of low water level, 0.416 m/d can be identified as the threshold of daily drop of water level for the severe deformation of the landslide. Whereas, this did not mean that when the daily drop of water level was less than 0.416 m/d, the landslide will not have strong deformation. Criterion 5 indicated that under the same reservoir water level condition, the landslide might still appear V2 state as long as the rainfall meet certain conditions (q d m \ 95.8 mm).

Discussion
Significantly, Fig. 9 contained the data of 5 displacement monitoring points of Shuping landslide. Among them, monitoring points ZG85 and ZG88 were located at the toe landslide, SP-2 was located in the middle of the landslide, and ZG86 and SP-6 were located at the rear landslide.
Although the accuracy of total sample identification reached 94.53%, the state identification of some nodes might be confusion due to the different governing factors of monitoring points at different part of the landslide. For instance, Node 18 displayed 9 points in V1 state and 14 points in V2 state. Therefore, it was necessary to establish some separate decision tree models for the monitoring points at different locations of the landslide to accurately identified the control factors of deformation. The decision tree C5.0 model of each monitoring point was shown in Fig. 10. Obviously, the three monitoring points located in the middle and toe landslide were only controlled by the reservoir water factors (ZG85, ZG88, SP-2), while the two monitoring points located at the rear landslide were jointly controlled by the reservoir water and rainfall factors (ZG86, SP-6), which also made the decision tree model of these two monitoring points much more complex. In terms of the accuracy of the model, due to the inclusion of multiple factors related to reservoir level and rainfall, the accuracy of these two model was the highest, reaching 97.73% and 94.93% respectively. The two monitoring points at the toe landslide were only controlled by the reservoir water level, and the accuracy of the model was more than 0.9. It was worth noting that the decision tree model of the monitoring point SP-2 located in the middle landslide was the simplest, but the accuracy of the model was the lowest, only for 88.41%. This was because the middle landslide was in the transition zone from reservoir water control to rainfall control. On the one hand, the control effect of reservoir water related factors was gradually weakened. On the other hand, the control effect of rainfall related factors was gradually enhanced. In other words, the governing factors of landslide deformation in this area were fuzzy.
In order to quantitatively analyze the main control factors of monitoring points at different locations of the landslide, three monitoring points at different locations (toe landslide: ZG85, middle landslide: SP-2, and rear landslide: ZG86) were selected to count the factor contribution in the decision tree models, as shown in Fig. 11. Obviously, the daily drop of water level was the most important factor causing the deformation of Shuping landslide. ZG85 monitoring point at the toe landslide was only affected by Dh r m and h, and the contribution degrees of the two factors reached 0.58 and 0.42 respectively. When the reservoir water level was between 138.951 and 147.072 m, once the daily decline of reservoir water level exceeded 0.315 m/d, the toe landslide will enter the second stage (V2). Similarly, SP-2 monitoring point in the middle of the landslide was affected by Dh r m and h, and the contribution degrees of the two factors were 0.77 and 0.23 respectively. When the reservoir water level was lower than 155.552 m and the daily drop of water level exceeded 0.416 m/d, the middle of the landslide will enter the second stage (V2). The deformation of ZG86 at the rear landslide was controlled by six factors (Dh d m , q d m , h, q e c , q 2 , Dh 2 ), and the contribution degrees of each factor were 0.28, 0.23, 0.18, 0.14, 0.09 and 0.09 respectively. Among them, the total contribution of the three reservoir water level related factors was 0.55 (Dh d m , h, Dh 2 ), while the total contribution of the three rainfall related factors was 0.46 (q d m , q e c , q 2 ). When the reservoir water level was 138.951 * 147.437 m and the daily drop of water level exceeded 0.388 m/d, once the bimonthly cumulative rainfall exceeded 254.2 mm, the rear landslide will enter the second stage (V2). Therefore, when the reservoir water level was at a low level, the daily drop of water level reaches 0.315 m/d, and the deformation of the toe landslide will be accelerated. When the daily drop of water level exceeds 0.416 m/d, accelerated deformation will occur in the middle landslide. The deformation of the rear landslide need to be warned in combination with various factors of rainfall and reservoir water level. In this paper, when establishing the decision tree model, nine influencing factors related to reservoir water level and rainfall were selected ( h, Dh d m , Dh r m , Dh 1 , Dh 2 , q e c , q 1 , q 2 , q d m ). Nevertheless, it can be concluded from Fig. 7 that the selected 9 hydrological factors may have some time correlation. Therefore, in the established decision tree models, these two most unfavorable factors will not appear at the same time. Considering the results of data mining and decision tree models, Shuping landslide was prone to strong deformation from May to June every year. Therefore, the drop rate of water level need to be strictly controlled during this period. In case of heavy rainfall from May to June, it was necessary to monitor the deformation of landslide in real time and give early warning to prevent large-scale landslide disaster.

Conclusions
In this research, considering the spatial variability of governing factors in different parts of landslide, data mining algorithms were used to identify the governing factors and their thresholds of Shuping landslide. The following conclusions can be reached: (1) Under the joint influence of seasonal rainfall and periodic reservoir water level fluctuation, the displacement of Shuping landslide presented a ''steplike'' trend. Taking rainfall and reservoir water level as hydrological factors has clear physical significance.
(2) Data mining results indicated that the governing factors of the toe, middle and rear landslide restricted each other. Specifically, the reservoir water level factor played a leading role in the toe landslide. On the contrary, the rear landslide was jointly controlled by the reservoir water level and rainfall. (3) The daily drop of water level was the most important factor causing the deformation of Shuping landslide. During the period of low water level, 0.416 m/d can be identified as the threshold of daily drop of water level for the severe deformation of the landslide. (4) Considering the results of data mining and decision tree models, Shuping landslide was prone to strong deformation from May to June every year. Consequently, it was necessary to monitor the deformation of landslide in real time and give early warning to prevent large-scale landslide disaster.