Entropy-based dynamic graph embedding for climate change detection

Climate change is a severe problem caused by abnormal climate events. The existing methods for detecting climate changes utilize statistical models to analyze the atmospheric temperature, but a climate event commonly comprises multiple meteorological data. To detect climate changes using meteorological data, we propose a novel dynamic graph embedding model based on graph entropy called EDynGE. A climate event is denoted as a graph, in which the nodes indicate meteorological data and edges indicate the correlation between nodes. Graph entropy measures the information of the climate event, and the EDynGE model clusters graphs based on graph entropy. We conducted experiments on real meteorological data. The results showed that the number of days of abnormal climate events has increased by 304.5 days in the past 30 years.


Introduction
Climate change is a severe problem that leads to the redistribution of global precipitation, melting of glaciers, and rise in sea levels 1,2 . Furthermore, it endangers the balance of the natural ecosystem and threatens the survival of humans. The main reason for climate change is that the terrestrial greenhouse gas emissions cause the atmospheric temperature to rise on the mainland 3 . Existing research shows that in the 20th century, the world's average temperature showed an upward trend 4 . Therefore, climate change detection can help people identify the causes of ecosystem damage and suggest corresponding countermeasures 5 .
Existing data analysis methods for detecting climate changes are based on statistical models that identify temporal and spatial information from meteorological data 6 . As a simple example, most of the statistical analysis usually measures the average temperature and compares it with past climate conditions to detect whether the climate is changing. These methods focus on detecting climate change using single meteorological data. Climate events comprise multiple meteorological data, and the correlation of these data plays an essential role in climate change detection.
To solve this problem, we identified a spurious relationship between meteorological data in which two correlation data are not causally related 7 . A graph was constructed to model the climate event using these spurious relationships. The vertices of the graphs indicate meteorological data, whereas the edges indicate the spurious relationship between two vertices. To obtain information from the climate graph, we calculated the graph entropy using the spurious correlation coefficient. We constructed a dynamic graph embedding model based on graph entropy to cluster the climate graphs. Climate change was detected as an abnormal climate event in a time interval where the spurious correlation coefficients are different from most other time intervals, which is defined as follows.
Definition 1 (Abnormal climate event) An abnormal climate event is defined as a graph G i in the i th time interval in which the weights of the edges are significantly different from those in other time intervals. It is formulated as is the entropy of the climate event G i and θ is the threshold for detecting climate change. Figure 1 shows the climate events in three time intervals where T indicates temperature, P indicates pressure, S indicates wind speed, G i indicates the climate event in the i th time interval, and the weight of an edge indicates the spurious correlation coefficient. The temperature at G 2 increases by 2 • C, which leads to a difference in the weights of edges and those of the other two time intervals. This indicates that the climate has changed at G 2 .
The main contributions of this work are as follows.
• We present the graph entropy to measure the information of climate events, which is calculated using the spurious correlation coefficient.
• We conducted experiments on real meteorological datasets using the EDynGE model. The results showed that the days of the abnormal climate events exhibit an upward trend from 1990-2020.
The remainder of this paper is organized as follows. In the next section, the methodology of entropy-based dynamic graph embedding for detecting climate change is detailed. Then, the experimental results are described. Finally, in the discussion section, the conclusions, limitations, and future works are provided.

Methods
This section describes the dynamic climate graph, graph entropy, and EDynGE model. A dynamic graph is used to model climate events in each time interval to detect climate change. The graph entropy measures information regarding the climate event. The meteorological datasets are allowed to collect from the China meteorological data service center (http://data. cma.cn/en) by registering an account.

Graph Construction
The graph is denoted as G(V, E), where V and E denote the vertices and edges, respectively. For a climate event, the vertex indicates the meteorological data, the edge denotes the spurious relationship, and the weight w indicates the spurious correlation coefficient. The coefficient is calculated based on the causality and correlation between two time series, x and y. The time series causality is defined as follows.
Definition 2 (Time series causality) The causality of two series is defined as that if one of the series improves the prediction of the other, which is formulated as where x and y are two time series, C(x, y) indicates the causality between them, and p is the probability that the two series are not causally related.
The Granger causality test is utilized to calculate the short-run causality between two meteorological time series, x and y 8 . The test makes a null hypothesis that the two series are not causally related and includes two predictions. Firstly, it uses the past values of the series y as variables to predict the current y. Then, it uses the past values of series x and y as variables to predict the current y. If the prediction result obtained using the temporal information of two series x and y is better than the prediction only using the series y, then x helps predict y. The t-test was utilized to compare the difference between two prediction results 9 . The p value was used to denote the probability of the null hypothesis. If the p value was more than 0.05, then the two series x and y were said to not be causally related 10 .
The weight value of an edge in the graph indicates the spurious correlation coefficient that is calculated using the causality and Pearson correlation coefficient (PCC) 11 , which is formulated as follows.
where C(x, y) indicates the causality between the two series x and y. R(x, y) indicates the spurious correlation coefficient between the two series. The spurious correlation coefficient is inversely proportional to the causality between the time series x and y. If the causality C(x, y) between two series x and y is 0, then there is no spurious correlation between the two series. In this case, the corresponding spurious correlation coefficient R(x, y) is 1.

Graph Entropy
The graph entropy is calculated based on information entropy 12 . We assume that there are two independent events, x and y. The information of these events should be satisfied as h(x, y) = h(x) + h(y), where h(x) indicates the information of the event x, and h(x, y) indicates the information of these two events occurring at the same time. The probability of these events should be satisfied as p(x, y) = p(x) × p(y), where p indicates the probability of the event. The information of the event x can be measured as h(x) = −log 2 p(x). Information entropy can be represented as the information of the event x times the probability of x, which is formulated as e(x) = −p(x)log 2 p(x). For a set of events X, the information entropy is formulated as where N indicates the number of events in the set and x i indicates the i th event. To calculate the graph entropy, we calculated the entropy for each vertex in the graph. The definition of the vertex entropy is defined as follows.

2/9
Definition 3 (Vertex entropy) Given a graph G = (V, E), the entropy of the vertex v i is defined based on the weight between the vertices v i and v j , which is formulated as e(v i ) = ∑ N j=0, j =i −w i, j log 2 w i, j , where N indicates the number of vertices. The weight value w i, j equals R(v i , v j ), which denotes the spurious correlation coefficient between two vertices v i and v j .
The graph entropy is calculated by summing the entropy of all vertices, which is formulated as e(G) = ∑ N i=0 e(v i ). The dynamic graph entropy is composed of the graph at time interval t ∈ [0, T ], which is formulated as E = {e(G t )|t ∈ [0, T ]}. The information of the climate event can be quantified using graph entropy. When one of the meteorological data points changes, the spurious relationship coefficients change, and the graph entropy also changes at the corresponding time interval. The abnormal climate event can be detected by obtaining graph entropy.

Entropy-Based Graph Embedding
The dynamic graph consists of graphs G t in the time interval t ∈ [0, T ], which is formulated as G = {G t |t ∈ [0, T ]}. Dynamic graph embedding is used to capture the temporal information of the dynamic graph G for learning a mapping function f : G t → g t , where g t is an embedding vector of the graph G t . The similarity of the entropy between the two graphs is The object of the entropy-based graph embedding reduces the distance between two graphs with similar entropy. To address this problem, we construct a dynamic supervised graph, which is defined as follows.
For the graph G t , the corresponding graph G i can be found from the dynamic graph G , where the similarity d(e(G i ), e(G t )) of the entropy between the two graphs is the smallest. The dynamic supervised matrix is a set composed of the graph G i , which is formulated as As shown in Figure 2, the dynamic graph is formulated as The entropies of the vertices v 1 and v 2 in G 0 can be calculated as e(v 1 ) = 0.241 and e(v 2 ) = 0.267, respectively. The entropy of graph G 0 is e(G 0 ) = 0.292 + 0.241 + 0.267 = 0.800. The entropy of graphs G 1 and G 2 are e(G 2 ) = 0.796 and e(G 2 ) = 0.848, respectively. The similarity of the entropy between the graphs G 0 and G 1 is d(e(G 0 ), e(G 1 )) = 0.004. The similarity between the graphs G 0 and G 2 is d(e(G 0 ), e(G 2 )) = 0.048. Therefore, the nearest graph from G 0 is G 1 , the nearest graph from G 1 is G 0 , and the nearest graph from G 2 is G 0 . The dynamic supervised matrix is thus denoted by S = G 1 , G 0 , G 0 .
We utilize two autoencoders to reconstruct the dynamic graph and supervised graph. The two autoencoders share parameters with each other. Figure 3 shows the architecture of the EDynGE model, where G t and S t indicate the climate graph and the supervised graph, respectively. The embedding vectors of G t and S t are denoted as g t and s t , respectively. The autoencoder includes an encoder and decoder. We use y i to indicate the i th layer of the encoder, and y i is used to denote the i th layer of the decoder. The autoencoder reconstructs the input data using the encoder and decoder to calculate the graph's embedding vector. The encoder uses non-linear functions to extract the features for mapping the graphs into the embedding space, which are formulated as where δ is an activation function. W i and b i indicate the weight and basis in the i t h layer, respectively. The ReLU function is utilized as the activation for making the neural network non-linear, which is formulated as f (y i ) = max(0, y i ) 13 . The decoder reconstructs the graph from the embedding vector, which is calculated by reversing the encoder's computation. The purpose of dynamic embedding is to reduce the distance between two graphs that have a similar entropy in an embedding space. Therefore, we establish a loss function based on the similarity of the graph entropy in the embedding layer, which is formulated as L s = 1 T ∑ T t=1 ||g t − s t || 2 2 , where T indicates the number of time intervals. The graph G t and supervised graph S t have the smallest similarity on graph entropy. Thus, the function L s reduces the loss between a t and s t to reduce the distance between two graphs in the embedding space. An autoencoder is used to reconstruct the input so that we have to establish a loss function for reducing the loss between the input and output, which are formulated as To avoid overfitting, we establish a regularization term that is formulated as L reg = 1 2 ∑ I i=0 (||W i || 2 2 + || W i || 2 2 ), where W i and W i indicates the weight of the encoder and decoder in the i th layer. The joint loss function is established using the functions L s , L 1 , and L reg , which is formulated as We utilize the gradient descent algorithm and backward propagation algorithm 14,15 to train the model. Gradient descent is used to calculate the weight and basis in the output layer, which are formulated as W I = W I − η ∂ L ∂W I and b I = b I − η ∂ L ∂ b I , where I indicates the output layer. Each layer's weight and basis are calculated using the backward propagation algorithm, which calculates the partial derivation of the loss function based on the chain rule for updating each layer's weight and basis.

Results
In this section, we apply the EDynGE model to real meteorological data and use local outlier factor (LOF) 16 , isolation forest (IF) 17 , and box-plot (BP) 18 methods to detect the abnormal climate events in an embedding space. IF shows that the distribution of outliers is sparse, and these outliers are far away from the normal observations with high density. Thus, the outliers can be easily separated. LOF detects outliers based on the density of the data points. The BP method is based on statistical indices for detecting outliers and requires the dataset to have a normal distribution.

Dataset
Daily climate data from the Chinese surface stations of 10 provinces were used to conduct experiments. According to the nationwide surface climate statistical method 19 , these datasets were derived from various provincial meteorological bureaus through statistical compilations. The datasets were collected from 194 basic and reference surface meteorological observation stations and automatic weather stations in China from 1951. Each dataset included 18 elements, including mean pressure, mean temperature, and precipitation. In this study, we collected meteorological data from 1990-2020 to evaluate the EDynGE model.

Evaluation Metrics
Because the datasets are unlabeled, we propose using two different ways to evaluate the EDynGE model. The first way is to label a certain number of data points as outliers. These data points are embedded vectors of climate events. The selection rules for these outliers are as follows. We assume that 10% of data points are selected as outliers in each dataset. The embedding vector of the t th graph is denoted as g t . The center of the embedding vectors can be formulated as c = 1 T ∑ T t=1 g t , where T is the number of time intervals. If the entropies of the graphs are similar, the embedding vectors of these graphs are close to each other, and the outliers are far from the normal observations. The 10% data points farthest from the center is selected as outliers. The EDynGE model can be evaluated using accuracy and F-score.
In the second way, we propose a hypothesis based on global warming that with increasing temperature, the number of days of abnormal climate events also increases. We counted the number of days with abnormal climate events every year, every five years, and every decade. If the number of days of abnormal climate events exhibited an upward trend, then climate was shown to have changed in the past 30 years. EDynGE can be used to detect this climate change. Figure 4 shows the days of abnormal climate events in the four provinces obtained using the IF method. The results of every year show that the frequency of abnormal climate events exhibits an increasing trend. We calculated the days of abnormal climate events every five years, the results showed that four provinces exhibited a non-linear increasing trend. However, a local minimum value in Guangzhou and Shanghai was observed from 2005-2010, and Beijing had a local minimum value from 2000-2005. The results of every decade indicated that three provinces Beijing, Shandong, and Shanghai showed an upward trend. Guangzhou showed a falling trend first followed by a rising trend. According to the experimental results, the detected climate change results conform with the hypothesis in most cases.

Analysis of Results
To conduct a comparison, we utilized a graph convolutional neural network (GCN) and a dynamic graph to a vector-based model (dyngraph2vec) as baselines 20,21 . The GCN uses convolutional kernels to capture the spatial information of vertices in the graph. Because GCN is applied to the static graph, it does not consider the temporal information of the dynamic graph. Dyngraph2vec is an unsupervised learning model for embedding dynamic graphs. It provides the three dyngraph2vec-based models, which are autoencoders (dyngraph2vecAE), recurrent neural network (dyngraph2vecRNN), and the autoencoder-based recurrent neural network model (dyngraph2vecAERNN). Dyngraph2vecAE cannot extract the temporal information from the dynamic graph since the model computes the embedding vectors by reconstructing the graphs. Dyngraph2vecRNN utilizes the idea of the skip-gram to consider the temporal information of the graphs 22 . It computes the embedding vector of the current graph by using the graphs around the current graph. Table 1 shows the accuracy of the models using the IF method under 10% outliers. According to the experimental results, the EDynGE model exhibits the best performance in all provinces. GCN achieves better accuracy than dyngraph2vecAE in 6 provinces. The dyngraph2vecAE method achieves better accuracy than the other two dyngraph2vec models. Because GCN, AE, and EDynGE do not capture the temporal information of the dynamic graph and they perform better than the other two models that extract the temporal features, the effect of temporal information is negligible. Table 2 shows the performance of the EDynGE model with 10% outliers. According to the results, the IF method achieved the best accuracy and F1-score for the eight provinces. On the other hand, it achieved a lower accuracy and F1-score than the LOF method for Beijing and Shanxi. The BP method achieved a better F1-score than LOF for seven provinces, but the scores were worse than those obtained for IF. Overall, the experimental results show that IF exhibited the best performance for most provinces.
To obtain performances with different embedding sizes, we evaluated the stability of the EDynGE model using mean ± std, where std indicates the standard deviation. Figure 5 shows the stability of the EDynGE model in Beijing. According to the experimental results, the IF method achieved the best performance based on the average F1-score and had the lowest std among the three outlier detection methods. This indicates that IF is the most stable method compared with the other two methods.
We conducted experiments to validate the EDynGE model with different ratios of outliers. Figure 6 shows the performance of the model on Beijing province by choosing three different ratios of outliers. According to the experimental results, the IF method achieved the best accuracy for 10% outliers. The LOF and BP methods achieved the best performance for 5% outliers. If the ratio of outliers was low, the imbalance of labels reduced the performance of the EDynGE model. Therefore, the three methods showed the lowest accuracy under 3% outliers.

Discussion
In this study, we proposed an EDynGE model to detect climate changes. The model uses the spurious correlation coefficient to calculate the graph entropy and reduces the distance between two climate events with similar graph entropy. We conducted experiments to validate the performance and stability of the EDynGE model for climate change detection. The results showed that the IF method exhibited better results than the LOF and BP methods by 32.5% and 12.7% in terms of F1-score, respectively. The EDynGE model performed better than the other dynamic graph embedding models by 37.9% in terms of accuracy. Based on global warming, we hypothesized that with an increase in temperature, the number of days of abnormal climate event exhibits an upward trend. The experimental results showed that the number of days of climate change increased by 304.5 days from 1990-2020, which agreed with the hypothesis. This indicates that the EDynGE model can detect climate change.
This study has some limitations. The EDynGE model can cluster graphs based on graph entropy. However, graphs with different neighbor structures have the same graph entropy in some cases. The EDynGE model cannot detect outliers with an abnormal neighbor structure. This indicates that the EDynGE model ignores the spatial information of the dynamic graph. To overcome this issue, we plan to construct a hybrid model that consists of the neighbor structure similarity and graph entropy similarity for detecting outliers in multiple time series. The second limitation is that the EDynGE model is based on an autoencoder that does not capture the temporal information from the dynamic graph. Although the temporal information is negligible in the research problem, it also needs to be considered in dynamic graph embedding. To overcome this problem, we plan to construct an autoencoder model using the long short-term memory architecture to discover temporal features from the dynamic graph.
We propose a novel idea to detect outliers from multiple time series. It utilizes the correlation of the time series to construct a dynamic graph and detects the outlier from the dynamic graph. The outlier detection problem is transformed from the multiple time series domain to the dynamic graph domain. It can help people find the causes of the outliers by obtaining the evolution of graphs. For example, in the financial time series, abnormal trends in the stock market can be detected and analyzed using the EDynGE model. Furthermore, the digital twin technology is developing rapidly. It utilizes the sensors to record the digital information for simulating the condition of object in the physical space. The proposed is able to detect the anomalies from the recorded digital signals to diagnose faults from the physical. For example, the transmission failure in the machines and the structural damage in the buildings can be detected by using the proposed idea.