Temperature trend analysis and extreme high temperature prediction based on weighted Markov Model in Lanzhou

In this study, temporal trend analysis was conducted on the annual and quarterly meteorological variables of Lanzhou from 1951 to 2016, and a weighted Markov model for extremely high temperature prediction was constructed. Several non-parametric methods were used to analyse the trend of meteorological variables. Considering that sequence autocorrelation may affect the accuracy of the trend test, we performed an autocorrelation test and carried out trend analysis for sequences with autocorrelation after removing correlation. The results show that the maximum temperature, minimum temperature and average temperature in Lanzhou all have a significant upward trend and show different performances in each season. In detail, the trend of maximum temperature in the summer is not significant, while the upward trend of minimum temperature in the winter is the most significant, which leads to more and more “warm winter” phenomenon. Finally, we construct a weighted Markov prediction model for extremely high temperature and obtain the conclusion that the prediction results by the model are consistent with the actual situation.


Introduction
The trend analysis of temperature change is one of the main aspects of climate change research and an important means to understand global climate change. The Intergovernmental Panel on Climate Change(IPCC) assessment points out that there has been a marked trend in global temperature change since the 20th century and that the frequency of extreme weather and climate events has increased. Therefore, the trend analysis of temperature change plays an important role in the detection of climate change.

3
In recent years, many scholars have done extensive research on the trend analysis of temperature variation in different regions. It is the main content of research to use trend analysis to explore the trend, direction, amplitude and abrupt of hydro-climatic data change. Partal (2017) and Sharma et al. (2016) explored the annual and multi-annual variation characteristics of data such as temperature and rainfall by using trend analysis method and summarized the law of climate change. China's extreme temperature changes are generally consistent with global's, He and Zhang (2005) analyzed climate change trends and characteristics along the Lancang River (China) according to the archival data of monthly air temperature and precipitation series. Zhai and Pan (2003) studied the extreme temperature in northern China in the past 50 years, and concluded that in northern China, both extremely minimum temperature and extremely maximum temperature tend to be warmer. In the northwest of China, extremely minimum and maximum temperatures have undergone an asymmetric growth (Shi and Zhao 2014). Liu et al. (2016) analyzed the hot weather and high temperature days in the summer of Gansu Province from 1981 to 2010, they found the summer high temperature days for nearly 30 years had an increasing trend.
In general, we are mainly concerned about two issues for the trend analysis of time series data. The first is whether there is a significant trend, the direction of trend change and the magnitude of change range. The second is whether there is a break in the time series and when the break occurs. Faced with these two problems, current research methods on trend analysis can be divided into two categories. One is trend testing with non-parametric methods. Koutsoyiannis (2003) pointed out that changes in climatic data over the past period can be reflected by the non-stationarity of the data. Therefore, the commonly used methods to solve the first kind of problems are Mann-Kendall(MK) test, Spearman's rho test and Sen's slope estimation. The MK test is a non-parametric trend test proposed by Mann (1945) and Kendall (1975), however, it and Spearman's rho test can only be used to test the trend but not to estimate the amplitude of the trend. Although the amplitude estimation of the change trend can be obtained by the least square method, considering the normality assumption of the least square method and its sensitivity to extreme values, the non-parameter method of Sen's slope(SS) estimation is proposed by Sen (1968). This method has strong robustness to the extreme value and no assumption of normality to the original data, so it has been widely used especially in the trend test and estimation of meteorological data. The annual and quarterly trends of temperature change in Serbia are studied by using the above methods in Gocic and Trajkovic (2013), and more applications can be found in Chen et al. (2009), Gocic andTrajkovic (2013), Meena (2020), Partal (2017), Sayemuzzaman and Jha (2014), Sonali and Kumar (2013) and Yacoub and Tayfur (2018). The other is the change point test method for time series, in which Pettitt's method (Pettitt 1979) is mainly used for change point test in meteorological variable data. Yacoub and Tayfur (2018) and Partal (2017) used Pettitt's method to test the annual change point of rainfall and temperature sequences. In this study, the above two kinds of methods are used to analyse the trend of meteorological data in Lanzhou.
In the above trend analysis articles, most studies only used the non-parametric methods to study whether there is a change trend, and few articles paid attention to the effect of sequence autocorrelation on the trend. Both the MK test and Sen's slope estimation require time series to be serially independent which can be accomplished by using the pre-whitening technique (von Storch 1995). The trend test in this paper takes into account the influence of sequence autocorrelation and carries out trend test after removing the correlation of sequences with autocorrelation.
Markov process is used to describe random processes without aftereffect. It is used in many natural sciences such as queuing theory, biology and geography. As an improvement of Markov model, weighted Markov model has been applied in stochastic phenomenon prediction, disaster prediction and other fields. Tian et al. (2010) applied the grey weighted Markov chain model to forecast the utility of water supply. Gong et al. (2013) used weighted Markov model to predict the low temperature risk in Nanjing. Masseran (2015) analyzed the stochastic behaviors of wind-direction data by Markov chain model. In our study, we used the method in literature Gong et al. (2013) to classify the state of the extremely high temperature series in Lanzhou and used weighted Markov to predict the occurrence of extremely high temperature.
The main purposes of this paper are as follows: (1) Analyse the variation trend of the meteorological variables in Lanzhou, conduct in-depth research on the different changes of the annual and seasonal trends and give the specific variation trend in the annual and seasonal. (2) Consider the influence of autocorrelation of time series on the trend and discuss the change trend after removing the correlation of the influential series. (3) Explore whether there are changes in meteorological variables and when they occur. (4) Prediction of extremely high temperature weather using weighted Markov model.
The remainder of this paper is organized as follows. In Sect. 2, the data source used in this paper and the form of the data are described; In Sect. 3, the common non-parametric analysis methods using trend analysis and the Markov model used for prediction are introduced. In Sect. 4, we use the methods and models in Sect. 3 to analyse and apply the data.

Study area and data
Lanzhou is located in the northwest of China, the geometric center of China's continental map, and the central part of Gansu Province. The city center is located at 36 • 03 ′ north latitude and 103 • 40 ′ east longitude, with a total area of 13,085.6 square kilometers. Lanzhou has a temperate continental climate with large temperature difference and little precipitation. It has a mild climate without severe cold in winter or scorching heat in summer. In recent years, more and more people begin to pay attention to the change in temperature. Temperature change affects not only agricultural production but also the daily life of residents. Therefore, it is of great significance to analyse the temperature variation trend in Lanzhou.
In this paper, the meteorological datasets of Lanzhou from 1951 to 2016 collected by Lanzhou Central Meteorological Observatory of Gansu Province are used, which include daily temperature, rainfall, pressure and variables such as whether extreme weather occurs. First of all, basic data cleaning is carried out for the original data. We focus on analyzing the variation trend of daily temperature and rainfall. Through the analysis of the collected data, the annual average rainfall of Lanzhou is 311.2 mm, and the annual average temperature is 9.87 • C. August is the wettest month with an average rainfall of 72.2 mm, while December is the least wet month with an average rainfall of only 1 mm. The average annual temperature in July was as high as 22.7 • C, while the average annual temperature in January was as low as − 5.6 • C. Detailed descriptions of temperature and rainfall are presented in the following sections.

Trend analysis
In this part, several non-parametric methods of trend analysis are introduced. It is mainly used to test whether there is a trend in meteorological data variables (such as temperature), and to estimate the size of the trend if it exists.

Mann-Kendall test
The Mann-Kendall(MK) test is a non-parametric rank-based test (Kendall 1975;Mann 1945), which does not require the data to be distributed normally. The test statistic S, which has mean zero and a variance computed by Eq. (2), is calculated using Eq. (1): where x i and x j are the data value in time point i and j, sign() is the sign function. n is the number of data points, p is the number of tied groups, and t p denotes the number of ties of extent p . A tied group is a set of sample data having the same value. In cases where the sample size n ≠ 10 , the standard normal test statistic Z is computed by Eq. (3) Positive values of Z indicate increasing trends while negative Z values show decreasing trends. To test for monotonic trend at a significance level, the null hypothesis of no trend is rejected if the absolute value of standardized test statistic Z is greater than Z 1− ∕2 .

Sen's slope estimate (SS)
If a linear trend is present in a time series, the true slope (change per unit time ) can be estimated using a simple non-parametric procedure developed by Sen (1968). The slope estimates of N pairs of data are first computed by where x j and x k are data values at times j and k(j > k) , respectively. The median of these N values of Q i is Sen's estimator of slope. Sen's estimator is computed by Positive and negative signs of test statistics indicate increasing and decreasing trends.

Spearman's rho test
Like the MK test, the rank-based non-parametric statistical SR test is commonly used to assess the significance of monotonic trends in hydro-meteorological data time series (Yacoub and Tayfur 2018). This method detects the existence and non-existence of trends in data time series. It can also identify if there is increase or decrease in the trend. In this test: the null hypothesis H 0 means that the given data are independent and identically distributed in time, while the alternative hypothesis H 1 indicates that a trend exists. The SR test statistic D and the standardized test statistic Z SR are calculated by Eqs. (4) and (5) as follows.
where R X i is the rank of ith observation X i in the sample size n. In this test, H 0 is rejected and H 1 is accepted if | | Z SR | | > 2.08 for the 5% significance level. Positive values of Z SR indicate trend increase, while the negative values indicate trend decrease.

Serial correlation effect
In trend analysis of time series, the correlation of time series will affect the result of trend analysis. Therefore, if there is correlation, the trend analysis should be carried out after the correlation is removed.
The lag-1 serial coefficient r 1 of sample data X i , originally derived by Salas et al. (1985). but several recent researchers have been utilizing the same equation (Gocic and Trajkovic 2013) to compute r 1 . It can be computed by where E x i is the mean of sample data, and n is the number of observations in the data. According to Salas et al. (1985) and most recent study, Gocic and Trajkovic (2013) have used the following equation for testing the time series data sets of serial correlation.
If r 1 falls inside the above interval, then the time series data sets are independent observations. In cases where r 1 is outside the above interval, the data are serially correlated. If time series data sets are independent, then the MK test and the SS can be applied to original values of time series. If time series data sets are serially correlated, then the 'pre-whitened' time series may be obtained as x 2 − r 1 x 1 , x 3 − r 1 x 2 , … , x n − r 1 x n−1 (Gocic and Trajkovic 2013;Sayemuzzaman and Jha 2014).

Pettitt's change point test
The Pettitt test developed by Pettitt (1979), was used to find the existence of a main change point in the time series. This test determines a significant change in the general tendency of a time series. Test statistic, K is given as: where U k is equivalent to a Mann-Whitney statistic using for testing two samples

Weighted Markov chain modeling
The Fluctuation Index is introduced in Markov chain modeling process : Note: taking x (0) (k) as the trend value of the (kth) item, x (0) (k) −x (0) (k) indicates the (kth) item's absolute fluctuation. Obviously, the greater v k is, the farther the actual value deviates from the trend value and the worse the stability is; the smaller v k is, the closer the actual value deviates from the trend value and the better the stability is.
, a 2i , i = 1, 2, … , s, the fluctuation state of the k th item is in State i E i in which a 1i , a 2i denote the lower bound and upper bound of E i , respectively. Thereby, state set E = E 1 , E 2 , … , E s is generated according to each fluctuation index.
In order to make full usage of the recent data and to reduce the effects of random errors, here adopts multi-step transitions by constructing state transition probability matrix. The transition probability of state i to state j after w steps acts as p (w) ij : ij is the number of state i to state j after w steps' transition; M i is the total number of state i . The transition probability matrix after w steps is composed by transition probabilities p (w) ij after w steps: Next, select the last s items' states as the initial states, and the transition steps w are distances between the forecasting item and each selected item. Considering the initial states i 1 , i 2 , … , i s of s items, respectively, the corresponding transition probability after w steps is , i = 1, 2, … , s should be adopted from the transition probability matrix P (w) . Therefore, state transition probability matrix of the t th forecasting data can be obtained: Determine the state of the t th forecasting data by choosing the corresponding state * of and this state's upper bound and lower bound are a 1 * , a 2 * , respectively.
Finally, the modified forecasting data can be calculated: For the prediction of the (t + 1) th item, the state of the t th item should be added in the state transition probability matrix. Then equal dimension processing should be implemented through reconstructing the probability matrix. Consequently, the next modified forecasting value will come out. Repeating to use of this method until all the rest forecasting values are modified.

Application
In this section, The meteorological data of Lanzhou city are analyzed by the method introduced the Sect. 3. Firstly, the trend analysis of the maximum temperature, minimum temperature, average temperature and rainfall is carried out. Secondly, weighted Markov model is established to predict high temperature disaster.

Trend analysis
Non-parametric trend test was carried out on the annual highest temperature(T max ), lowest temperature(T min ), average temperature(T av ), temperature difference(diff-T) and annual average rainfall of annual data. The analysis results are shown in the Table 1.
In Table 1, the mean, standard deviation(sd), Mann-Kendall test statistic Z, Spearman rho test statistic Z SR , sen's slope, trend, year of change point occurred and lag-1 serial coefficient r 1 of meteorological variables are analyzed. From Table 1, the mean values of annual maximum temperature, minimum temperature and average temperature are, respectively, 35.6 • C, − 16 • C , 9.9 • C, and the sd of average temperature is the smallest, while the sd of annual minimum temperature is the largest. The mean value of the annual mean temperature difference is 12.4 • C, and the variance is small, which means it's less volatile. The annual average rainfall was 311.3 mm, and sd is 75.84 mm, indicating that it fluctuated within a large range.
In order to make the result of trend test more reasonable, first observe the lag-1 serial coefficient r 1 . According to the conclusion of serial correlation effect test, if r 1 ∈ (-0.218, 0.187) , the series is considered to have no correlation effect. In Table 1, there are four variables whose lag-1 serial coefficients are outside the interval, that is, the data corresponding to the variable are serially correlated. We have to consider the new sequence x 2 − r 1 x 1 , x 3 − r 1 x 2 , … , x n − r 1 x n−1 , which is removed the correlation. After the correlation is removed, the new sequence is analyzed again. The test results are shown in Table 2 It can be seen from the test results in Table 2, firstly, the test coefficients of the temperature series are all within the range (− 0.218, 0.187) , indicating that the series is independent at this time, and the trend test results are valid. Secondly, according to the results of Mann-Kendall and Spearman rho test, there was no significant variation trend in the annual maximum temperature and the annual average rainfall series, while the annual minimum temperature and annual average temperature series have significant increased trend, while the temperature difference series has a significant decreased trend.
To sum up, the main reasons for the climate warming in Lanzhou in recent years are the increase of the minimum temperature and the decrease of the temperature difference. The average annual minimum temperature increases by 0.104 • C , the average temperature increases by 0.044 • C , and the temperature difference decreases by 0.026 • C. To illustrate the performance of the increased trend of temperature in different seasons, the above annual meteorological data were divided into seasons, and the non-parametric trend test statistics of each variable in different seasons were obtained. The test results are shown in Table 3.
Similarly, we first check the lag-1 serial coefficient r 1 . Some of the Lag-1 serial coefficients in the table are outside the interval (− 0.218, 0.187) , which indicating that there is a correlation, and the sequence needs to be de-correlated. Table 4 shown the decorrelated test results for correlated sequences.
According to Table 4, in the test of correlation coefficient, the average temperature and the temperature difference in winter are still correlated. We used the method of decorrelation again to remove the correlation before the trend test. The conclusions of trend test are shown in Table 5.
In Table 5, according to the value of correlation effect test statistic r 1 , there is no correlation effect in the sequence, and the trend test conclusion is reliable. Synthesize the test results in Tables 3, 4 and 5, first of all, from the average level of temperature and rainfall in the four seasons, Lanzhou has distinctive seasons with the largest temperature difference in spring and the smallest in autumn. Secondly, in terms of the variation trend of meteorological variables, there is no significant variation trend of rainfall and summer maximum temperature, while other seasonal temperature variables have significant variation trend. Among the variables with obvious trend, the maximum temperature, the minimum temperature and the average temperature all show an increasing trend, while the temperature difference shows a decreasing trend. Finally, from the extent of the variation in meteorological variables, the four temperature variables all show the largest slop in winter. In detail, the average maximum temperature increasing by 0.04 • C per year, the average minimum temperature increasing by 0.104 • C per year, the average temperature increasing by 0.074 • C per year, and the temperature difference decreasing by 0.057 • C per year. Similarly, the variation degree of temperature variables in other seasons can be seen in the sen's slope in Table 3. The results of the pettitt's change-point test show that the temperature changes in different seasons are different. Plot the seasonal variation trend of temperature variables, as shown in Fig. 1. Horizontally, the trend of the lowest temperature in each season is the most obvious. Longitudinally, each variable has a greater variation trend in winter. In a comprehensive way, the main reason for the climate warming in Lanzhou lies in the rise of the minimum temperature. Meanwhile, the variation trend of all temperature variables in winter indicates that the temperature rising in Lanzhou is mainly reflected in the "warm winter" phenomenon.

Extreme high temperature prediction using weighted Markov models
In the last section, we have found and analyzed the significant trend of temperature rise. In this section, we build a weighted Markov model to predict extreme high temperature weather in Lanzhou. In the context of climate warming, the increasing frequency of disasters and the increasing losses caused by meteorological disasters have seriously affected agricultural production, social economy and regional sustainable development.
The extreme high temperature events were defined by Bonsal et al. (2001). Extreme high weather is considered when a day's temperature is higher than 90the 90th percentile value of the daily maximum temperature series of 66 years was selected, and the average value of 66 years was defined as the threshold value.
After calculation, the threshold T * = 30.12 • C . Next, calculate the days whose temperature is greater than or equal to T * , and divide these days into 6 categories according to the number (one, two, three, four, five, six and above) of consecutive high temperature days, and then we construct a set of annual high temperature index, denoted by h 1i , h 2i , h 3i , h 4i , h 5i , h 6i , where h di represents the high temperature index of the consecutive d-days in the ith year(i = 1, 2, … , 66, d = 1, 2, … , 6) . The high temperature index of the six categories can be calculated as follows: If the frequency of the consecutive d-days of high temperature in the ith year is N , then the d-day high temperature index in this year can be counted as h di = Nd, d = 1, 2, … , 6 . For example, the frequency of the consecutive four days of high temperature in 1951 (the first year) is 4, then the 4-day index in this year is h 41 = 4 × 4 = 16 . the frequency of the consecutive 7 days of high temperature is 2, and the frequency of the consecutive 8 days is 1 in 1951, then we can calculate the 6-day high temperature index by h 61 = 1 × 7 + 1 × 8 = 15 . By the same token, we obtain the high temperature index of all categories of Lanzhou city from 1951 to 2016 (As shown in Table 6).
Obviously, the longer duration of high temperature lasts, the higher risk it will bring. Reference to Gong et al. (2013), in this paper, the different weights of the d-day high temperature index h di , d = 1, 2, 3, 4, 5, 6 , are endowed to d , where 1 = 1.0, 2 = 1.05, 3 = 1.1, 4 = 1.15, 5 = 1.2, 6 = 1.25 ; Thus, we obtain the set of indicates the high-temperature weighted index of consecutive d-days in the i th year. Finally, we can get an annual high temperature weighted index of the i th year through arithmetic weighted average on all categories, denoted by F i = ∑ 6 d=1 d h di . For example, the annual high temperature weighted index of second year (1952) can be obtained by F 2 = 7 × 1.0 + 2 × 1.05 + 12 × 1.1 + 0 × 1.15 + 5 × 1.2 + 17 × 1.25 = 49.55 , see Table 6 .
Establish classification standard, divide states: � 2 represent the mean and the standard deviation of F 1 , F 2 , … , F 66 , respectively. The annual high-temperature weighted indexes F 1 , F 2 , … , F 66 are divided into five intervals as follows: where F = 42.376, S = 14.8 , and we set 1 = 1.0, 2 = 0.5 . Each interval represents a state (interval). Thus, F 1 , F 2 , … , F 66 actually belong to a state set E = E 1 , E 2 , … , E 5 , see Table 7. The resulting states are shown in the last column in the Table 6.
The autocorrelation coefficient of the sequence is calculated by using the following formula (6). According to the magnitude of the correlation coefficient, the correlation coefficient of lag order 4 is selected, and the weight vector is calculated according to the formula (7), where l = 4 .The correlation coefficients and weight vector are shown in Table 8, The state transition probability matrix can be calculated by using the index value of extreme high temperature and the partitioned state. steps state transition probability matrix is where m ij is the number of state i to state j after steps' transition; M i is the number of state i.
We used 2012-2015 as the initial year to predict extreme high temperatures in 2016 using a weighted Markov model. Combined with the state transition probability and the weight coefficient, the probability of the occurrence of extreme high temperature in each state in 2016 is predicted. The results are shown in Table 9.
The bold figure in Table 9 means the maximum of the state transition probability based on the annual high-temperature weighted index of 2012-2015. We find that when max{P i } = 0.4570 , we have i = 5 , which means the risk of the high temperature based on the annual high-temperature weighted index of 2016 belongs to state 5. Finally, the fuzzy mathematical method mentioned in was used to estimate the extreme high temperature index. After calculation, it can be concluded that the extreme high temperature index value in 2016 is 67.01, which is still a certain gap from the real value of 75.06, indicating that this method still has space for improvement.  Table 6 The high temperature index and states Years 1-day 2-day 3-day 4-day 5-day 6-day Index States 1 3

Result
The annual and quarterly variation trends of meteorological data variables are analyzed in detail. The results show that the annual trend changes of temperature variables are obvious. In terms of seasonal trend, the maximum temperature in summer has no obvious trend, while the seasonal trend of other variables is obvious. Generally speaking, the temperature in Lanzhou has a significant trend of rise in the past 66 years, and it has  a larger range of rise in winter. The annual trend and quarterly trend of rainfall are not obvious, but the total amount of rainfall has decreased. In detail, rainfall increases in winter and spring, and decreases in summer and fall. In addition, we use the weighted Markov model to explore the possibility of extreme high temperature, which provides a certain theoretical basis for high temperature warning and prevention.