Spatial Evaluation of Earthquake Disaster Based On Correlation Characteristics And BP Neural Network

Rapid spatial evaluation of disaster after earthquake occurrence is required in the emergency rescue management, due to its signicant support for decreasing casualties and property losses. The earthquake-hit population is taken as an example of earthquake disaster to construct the evaluation model using the data from the 2013 Ms7.0 Lushan earthquake. Ten inuencing factors are classied into environmental factors and seismic factors. The correlation analysis reveals characteristics that there is a nonlinear relationship between the earthquake-hit population and various factors, and per capita GDP and PGA factor have a stronger correlation with earthquake-hit population. Moreover, the spatial variability of inuencing factors would affect the distribution of earthquake-hit population. The earthquake-hit population is evaluated using BP neural network with optimizing training samples based on the spatial characteristics of per capita GDP and PGA factors. Different number of sample points are generated in areas with different value intervals of inuencing factors, instead of the random distribution of sample points. The minimum value of RMSE (Root Mean Square Error) from testing set is 18 people/km 2 , showing good accuracy in the spatial evaluation of earthquake-hit population. Meanwhile, the optimizing samples considering spatial characteristics could improve the convergence speed and generalization capability comparing to random samples. The trained network was generalized to the 2017 Ms7.0 Jiuzhaigou earthquake to verify the prediction accuracy. The evaluation results indicate that BP neural network considering the correlation characteristics of factors has the capability to evaluate the seismic disaster information in space, providing more detailed information for emergency service and rescue operation. results show that the neural good prediction accuracy the the study area. total population of ve counties optimizing is 198362


Introduction
Earthquake disasters have a profound impact on human living environment due to their suddenness and destructiveness. Severe casualties, house collapse and economic loss would be caused under the action of intense seismic ground motion. Strong earthquakes have continued to appear worldwide in the past two decades (Rossetto et al., 2007;Lara et al., 2016;Shimada, 2016). People injured or killed by the earthquake could range from a few to tens of thousands of people, distributed in different spatial locations (Zhao et al., 2018). Although the concerns of seismic problem continue to deepen and the seismic awareness of human is constantly enhanced, within the past decades, the active activities of geological structures are still affecting the environment of anthroposphere (Sun et al., 2016;Wu et al., 2020; Santos-Reyes and Gouzeva, 2020; Luo et al., 2021). Due to the unpredictability of earthquake occurrence, it is di cult to prepare before earthquake, so countries are committed to improve the emergency rescue ability after earthquake (Huang and Li, 2014). Among the various types of earthquake disaster information, modeling the earthquake casualty is particularly important for offering reference for emergency rescue and decision making. Casualty evaluation after earthquake is fast becoming an important issue increasingly responsible for signi cant economic, social, and environmental risk management (Huang and Huang, 2018).
The disaster under intense seismic motion is a complex result of various in uencing factors. Seismic intensity, topography, population and economic level are all related to casualties due to earthquake to a certain extent. The traditional physical model or statistical regression model is di cult to re ect the nonlinear relation between earthquake-hit population and factors (Erdik et al., 2011). With the continuous improvement of computing speed in recent years, machine learning methods have been more widely used. More and more scholars apply them for disaster mapping under earthquake considering that machine learning methods could provide the ability to learn from historical data for producing insight into extreme events (Yang et  (2020) established Extreme Learning Machine (ELM) network to predict earthquake casualty based on the data of 84 groups of earthquake victims in China. It was found that the ELM algorithm had better robustness and generalization capability than BP neural network and SVM. It can be noticed that the existing researches focused on the prediction of population numerical value affected by earthquake considering factors of multiple dimensions. Moreover, the accuracy and performance of different machine learning methods were compared based on the evaluation results of earthquake casualty. However, the input layer of different machine learning methods used numerical data without spatial information, and the spatial characteristics of disaster information of output layer have not been evaluated effectively. For earthquake emergency management, the spatial distribution of disaster information within the earthquake affected area has a greater signi cance for the formulation of detailed rescue plans.
The generalization capability of network refers to the ability to obtain accurate output when inputting new data other than training samples. Generalization capability is the most important index to measure the performance of network. The complexity of structure and samples are the main factors affecting the generalization capability of model. Research of Partridge (1996) on three-layer neural network found that the in uence of training set on generalization capability is great, even more than the in uence of neural number. Many researchers combine principal component analysis (PCA), clustering analysis and other methods with machine learning to optimize the training set aiming to improve the generalization capability of the network (Basharat et al., 2016;Li et al., 2020). Lou et al. (2012) used PCA to reduce the dimension of assessment factors, disasterformative environment and disaster-affected bodies, and established a BP neural network to assess the economic loss under tropical cyclones in Zhejiang Province. A combined use of PCA and ANN was adopted by Gao et al. (2020) to evaluate the personal exposure level to PM2.5, and it was found that the combined use of PCA and ANN produced more accurate results than simple ANN method. It can be seen that optimizing the input samples of network could improve the generalization capability.
Most of the existing sample optimization methods are based on statistical analysis on numerical dimensions. The distribution of in uencing factors and training results in the spatial dimension are also related. Sample optimization based on spatial correlation characteristics might provide a novel solution to improve the generalization capability.
The study presented herein aimed at effective evaluating the spatial distribution of earthquake disaster information in each county. The earthquake-hit population spatial distribution was selected as the study content and evaluated based on correlation characteristics of in uencing factors and BP neural network, using data from the 2013 Ms7.0 Lushan earthquake. The selection of samples was optimized based on the spatial characteristics resulting from correlation analysis, to improve the generalization capability of network and accuracy of evaluation results. The total number of earthquake-hit population under the Lushan earthquake reached 3.7 million. The earthquake-hit population refers to the people who have suffered property or life losses due to the earthquake. The earthquake-hit population not only re ects the severity of natural disasters, but also reveals the impact of earthquake on people's lives. The earthquake-hit population also offers references in the formulation of emergency rescue plan, leading to a fact that the number of earthquakehit population has become an important index to evaluate the damage caused by earthquake. Figure 1 illustrates the earthquakehit population density (number of earthquake-hit people per square kilometer) in each county-level administrative region within the earthquake affected area. The data was collected and released by Sichuan provincial government after the earthquake on the Internet (Wang and Li, 2014).
In Figure1, the color represents the earthquake-hit population density. The earthquake-hit population density in Yucheng District, Danling County and Mingshan County was relatively higher. The earthquake-hit population density in Mingshan County was the highest, reaching approximately 432 people/km 2 . It can be observed that the region with the highest earthquake-hit population density was not the region where the epicenter was located. It indicated that the impact of earthquakes on population is complicated in space, the epicenter is not necessarily the most severely affected area under seismic motion. Similar phenomenon had appeared in the 2008 Ms8.0 Wenchuan earthquake (Yang et al., 2014). The casualties caused by earthquakes are related to many categories of in uencing factors. The in uence factors of earthquake-hit population are divided into environmental and seismic factors.

Environmental in uencing factors
The environmental in uencing factors refer to the environmental conditions in the study area, and there is no direct relationship between environmental factors and earthquake occurrence. The environmental in uencing factors considered in the research contained elevation, slope angle, population density, per capita GDP, distance to fault, distance to river and Normalized Difference Vegetation Index (NDVI). The data details of environmental in uencing factors are shown in Table 1. In the environmental factors, elevation, slope angle, distance to fault, distance to river and NDVI are the maps with data varying with spatial location.
However, population density and per capita GDP have inconsistent gradation with other environmental factors. There is one attribute value for each county level administrative region for these two factors, because counties were used as the basis unit of statistics. . There is also correlation between elevation and distribution of earthquake-hit population.
On the one aspect, the population is concentrated on the plains with lower elevations; on the other aspect, there is slope ampli cation effect on seismic ground motion resulting in the more severe geological disasters in high elevation areas (Zhang et al., 2018). The digital elevation model with a resolution of 30×30m updated in 2009 (Figure 2(a)) was obtained from Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences.

Slope angle
The slope angle is a geomorphic parameter which has an important impact on seismic geological disasters such as landslide, debris ow, barrier lake, etc. In the eld investigation of historical strong earthquakes, it was found that a large number of earthquake casualties were caused by geological disasters triggered by seismic motion (Xu et al., 2015). The slope angle map ( Figure 2(b)) was derived from the digital elevation model using ArcMap software.

Population density
The population density is a key factor in risk assessment of natural disaster. It is calculated as the ratio of population to bare land area here. Some strong earthquakes occurred in mountainous areas with low population density, which posed a relatively small threat to people's lives and property (Ara, 2014). Because of the strong mobility of population, it is di cult to obtain the spatial distribution of population at the moment before earthquake occurrence. Therefore, the resident population of each county in census was applied to approximate the population distribution (Figure 2(c)). The population density is the ratio of population to area in each county. The population data were updated in 2011 and offered by China Earthquake Network Center.

Per capita GDP
The per capita GDP is another crucial factor in earthquake-hit population evaluation. The per capita GDP re ects the economic level of local people from one aspect, and the economic level would affect the seismic resistance ability of engineering constructions. Generally, the higher the economic level is, the stronger seismic resistance ability the constructions have. Similar to population density map, the per capita GDP of each county was applied as per capita GDP distribution map (Figure 2(d)). The per capita GDP data were updated in 2011 and offered by China Earthquake Network Center.

Distance to fault
The distance to fault is another signi cant factor related to seismic geological disasters. Generally, fractured or weak zones are located near fault bedding planes, which are susceptible to weathering and sliding (Conforti et al., 2014). The fault data were offered by China Earthquake Network Center and the distance to fault was calculated using buffers of ArcMap software ( Figure  2(e)).

Distance to river
The distance to river can also in uence the earthquake-hit population, as river erosion and soil saturation would decrease the seismic stability of slopes (Yalcin, 2008). The river data were offered by China Earthquake Network Center and the distance to river was calculated using buffers of ArcMap software (Figure 2(f)).

NDVI
The NDVI can play a feasible role in the earthquake-hit population evaluation under the Lushan earthquake. NDVI quanti es vegetation by measuring the difference between near infrared (vegetation strong re ection) and red light (vegetation absorption).
The closer the NDVI is to +1, the better the vegetation coverage in the area is, and the lower the degree of urbanization is. The NDVI map was obtained from Landsat 7 ETM+ satellite images acquired in 2012 from Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (Figure 2(g)).

Seismic in uencing factors
The seismic in uencing factors refer to the elements and characteristics that are directly related to earthquake occurrence, and can be rapidly re ected by seismic motion monitoring instruments after earthquake occurrence. The seismic in uencing factors considered in this research contained peak ground acceleration (PGA), peak ground velocity (PGV) and distance to epicenter. The data of seismic in uencing factors are shown in Table 2. by earthquake, and can effectively evaluate the intensity of seismic ground motion at different positions in space. The PGA data were recorded by strong-motion seismograph network which could be offered in a short time after earthquake occurrence. The data used in this research were supported by China Earthquake Network Center (Figure 3(a)).

PGV
PGV is also an important index to evaluate the intensity of seismic ground motion. The acceleration time-history wave would miss some information, such as low-frequency components. The velocity time-history wave can better record this information. Therefore, PGV data are considered to comprehensively evaluate the seismic ground motion intensity. The PGV data were supported by China Earthquake Network Center (Figure 3(b)).

Distance to epicenter
Distance to epicenter is a parameter to measure the relative distance between the study region and the epicenter. In previous researches, it was shown that with the increase of distance to epicenter, the impact of earthquake disaster was gradually reduced. The distance to epicenter was calculated using buffers of ArcMap software (Figure 3(c)).

Spatial Correlation Characteristics Of In uencing Factors
Earthquake-hit population is related to environmental and seismic in uencing factors. The spatial distribution variability of in uencing factors leads to the difference of earthquake-hit population in different counties. The Spearman rank correlation coe cients were calculated to analyze the relationship between earthquake-hit population and in uencing factors, which can be expressed as follow in Eq. Within the study region shown in Figure 1, 1000 sampling points were randomly generated and distributed using ArcMap software. The earthquake-hit population density data and in uencing factor data at sampling points were extracted to construct the database for correlation analysis.

Correlation analysis between earthquake-hit population and environmental in uencing factors
The Spearman correlation coe cient and signi cance test results between earthquake-hit population and environmental in uencing factors are listed in Table 3. The results of signi cance test were all less than 0.05. It showed that the number of samples was reasonable and the value of correlation coe cient was effective. The highest correlation coe cient was -0.322 between earthquake-hit population density and per capita GDP, showing that in the regions with high per capita GDP, the earthquake-hit population was generally low. The minimum value of coe cient was -0.073. The results of correlation coe cient revealed that among various environmental in uencing factors, none of them had a direct linear relationship with earthquake-hit population density. The earthquake-hit population density distribution is the result of multiple environmental factors. Moreover, the spatial correlation between earthquake-hit population density and environmental factors are illustrated in Figure 4. The histogram statistics the average earthquake-hit population density within the different factor intervals, which can re ect the related characteristics between spatial distribution of earthquake-hit population density and in uencing factors. It can be seen from Figure 4 that in various factors, the average earthquake-hit population density in different intervals was discrete. However, relatively high earthquake-hit population density would concentrate in the speci c range of certain factors. For example, in the 2013 Lushan earthquake, area between 500m and 1200m in elevation had relatively high earthquake-hit population density ( Figure 4(a)). As regards the relationship between earthquake-hit population density and distance to river, the result showed that the maximum earthquake-hit population density was observed in the interval closest to river (Figure 4(f)).

Correlation analysis between earthquake-hit population and seismic in uencing factors
The Spearman correlation coe cient and signi cance test results between earthquake-hit population and seismic in uencing factors are listed in Table 4. The results of signi cance test were all less than 0.05. The correlation coe cient between earthquake-hit population density and PGA had the highest value of 0.433. It showed that the earthquake-hit population density had remarkable PGA positive correlation, and the higher the PGA, the greater the earthquake-hit population. The spatial correlation between earthquake-hit population density and seismic in uencing factors are illustrated in Figure 5. It can be observed that PGA had a stronger positive correlation with the earthquake-hit population density. The earthquake-hit population density in each interval generally increased with the value of PGA. However, when PGA value ranged from 750 m/s 2 to 850 m/s 2 , the earthquake-hit population density had relatively lower value ( Figure 5(a)). The reason was that the area with great seismic motion intensity had a low population density. Therefore, the population density affected by the earthquake was relatively lower.
Correlation coe cient is a statistical index to re ect the close degree of linear correlation between variables. The results of correlation analysis indicated that there is a nonlinear relationship between the earthquake-hit population and various in uencing factors in spatial distribution. The correlation coe cients imply the existence of spatial correlation between factors and disaster results. For example, in the area with greater seismic motion, the earthquake-hit population density was higher. Nevertheless, the numeric value of correlation coe cients indicates that the earthquake-hit population is a result of the complex interaction of multiple factors. It is di cult to evaluate the spatial earthquake-hit population with a linear relation.

Sample optimization selection based on correlation characteristics
Generalization capability is an important indicator to measure the accuracy of the neural network for predicting data outside the sample. In order to effectively evaluate the spatial distribution of the earthquake-hit population in a newly occurred earthquake, it is necessary to ensure that the network has a preferable generalization capability. Training samples have a great impact on generalization capability (Partridge, 1996), so the selection of samples is optimized based on the results of correlation analysis.
In the process of network training, to uniform the format of in uencing factor data and earthquake-hit population data, the vector images were transformed into raster layer data. The number of raster-based samples was far greater than the number of in uencing indicators, which could result in the over tting of neural network. In practical application, a part of all samples would be randomly selected as the training set. However, the selection of random samples might lose part of the spatial characteristics of all sample data in the training process, thus reducing the generalization capability and evaluation accuracy of the network.
Based on the correlation characteristic between in uencing factors and earthquake-hit population density, it was observed that compared with other factors, there was a stronger correlation between per capita GDP, PGA and the earthquake-hit population density. It implied that the area with lower values of per capita GDP and higher values of PGA had a greater number of earthquake-hit population, which was the key study area of earthquake casualty evaluation. The per capita GDP and PGA indicator indirectly re ected the spatial distribution characteristics of earthquake-hit population to some extent. Therefore, more samples were selected in the area with lower values of per capita GDP and higher values of PGA, and fewer samples were selected in the area with higher values of per capita GDP and lower values of PGA to consider the spatial variability of earthquake-hit population. Figure 7 illustrates the frequency histograms of per capita GDP and PGA factor. It can be seen that the frequency of raster was approximately normal to per capita GDP value, and the frequency of raster decreased with the increase of PGA value. It indicated that the area of poor economy and intense seismic motion was much smaller than that of strong economy and weak motion.
Although the area was small, it was the predominant area of disaster assessment and emergency rescue under earthquake. The per capita GDP and PGA data were classi ed into ve clusters using Natural Breaks Classi cation method according to the value, respectively. The Natural Breaks Classi cation method is an extensively applied clustering method to maximize the internal similarity of each cluster and the difference between clusters. The proportion of samples was determined on the basis of the average value of clusters as listed in Table 5. The proportion of samples was proportional to the average of clusters. A total number of 1000 sample points were generated in the study area to extract attribute values from raster layers. The distribution comparison between random sample points and optimizing sample points is showed in Figure 8. It can be seen that according to the result of clustering analysis, in the area with low per capita GDP values and high PGA values, the sample points were denser.
The samples were optimized based on the numerical and spatial characteristics of per capita GDP and PGA indicator.

Earthquake-hit population evaluation results
Two networks based on the random samples and optimizing samples were trained, respectively. One hidden layer is su cient for most of the applications (Aghamohammadi et al. 2013). Therefore, a three-layer network containing one input layer, one hidden layer and one output layer was adopted. In order to ensure the scienti c of the evaluation, 70% of the sample data were used as training set, and 30% of the sample data were used as testing set. The testing set was a sample set to test the classi cation ability of the trained network. The normalized goal error of training set was 0.0003. When the error of the training set during the iteration was less than 0.0003, the training process stops. ones from optimizing samples. It can be observed that in networks with different numbers of neurons, the number of iterations based on the optimizing samples was less than that of random samples. It indicated that the optimizing samples accelerated the convergence speed. In the meanwhile, by comparing the RMSE of the networks, it can be noted that the RMSE of the optimizing samples was smaller than that of the random samples, except when the number of neurons was 13. When the number of neurons in the hidden layer was 13, the RMSE of the two networks was close to each other as illustrated in Figure 9. The RMSE measures the difference between estimation data and testing set, and re ects the generalization capability and evaluation accuracy of network for new data. It implied that the earthquake-hit population evaluation based on optimizing samples not only had faster convergence speed, but also had better generalization capability and prediction accuracy comparing to random samples.

Veri cation of network generalization capability
In County. Jiuzhaigou earthquake and Lushan earthquake had the same magnitude, and the earthquake affected areas of them had similar geographical conditions. Therefore, the Jiuzhaigou earthquake was selected to verify the generalization capability and effectiveness of the earthquake-hit population evaluation. The earthquake-hit population density in each county-level administrative region collected by Sichuan provincial government was illustrated in Figure 10.
The networks based on optimizing samples and random samples of Lushan earthquake were used to evaluate the earthquakehit population under Jiuzhaigou earthquake separately. The output of network was a raster layer with earthquake-hit population density data varying with spatial position. The average value of the earthquake-hit population density raster data within the county area was calculated as the earthquake-hit population density of this county. The earthquake-hit population is the product of earthquake-hit population density and county area.
The actual data and evaluation results of earthquake-hit population were listed in Table 6. There were deviations between estimation value and actual value in each area. For the total number of earthquake-hit population, the evaluation result of optimizing samples was much closer to actual data than that of random samples. The actual data of earthquake-hit population in Jiuzhaigou earthquake was 216597. The evaluation results of earthquake-hit population were 198362 and 347207 for optimizing samples and random samples respectively. The mean absolute error (MAE) was calculated to assess the evaluation results and the expression of MAE is shown as following: The MAE of evaluation result of earthquake-hit population were 18357 and 26121 for optimizing samples and random samples respectively. The comparison of actual data and evaluation results in each county was showed in Figure 11. The histogram represented the value of actual data and evaluation results, and the curve represented the error rate of evaluation results. The expression of error rate is shown as follow: 4 The error rate was de ned as the ratio of the difference between actual data and result and actual data. It can be seen that for Jiuzhaigou County, Pingwu County and Songpan County, where the earthquake-hit population was relatively large, the error rate of evaluation results based on the BP neural network was less than 100%. For Hongyuan County and Zoige County, the error rate was relatively great. The maximum value of error rate based on the random samples reached 1049.84% and that based on the optimizing samples reached 490.14%. It can be noted that in each area the error rate of optimizing samples was smaller than that of random samples. The earthquake-hit population evaluation based on the optimizing samples had more accurate prediction for new data. It revealed that optimizing samples can effectively offer a more accurate evaluation of earthquake-hit population.

Conclusions
In the present study, a spatial earthquake-hit population distribution is evaluated based on correlation characteristics of factors and BP neural network. The main conclusions are as follows: 1) The in uencing factors of earthquake-hit population are classi ed into environmental and seismic factors. Elevation, slope angle, population density, per capita GDP, distance to fault, distance to river and NDVI are considered as environmental factors, and PGA, PGV and distance to epicenter are considered as seismic factors. The correlation analysis between earthquake-hit population and in uencing factors indicates that per capita GDP and PGA have stronger correlation relation with earthquake-hit population in the Lushan earthquake. There is a great nonlinear relationship between the earthquake-hit population and various in uencing factors.
2) Samples have a signi cant impact on the generalization capability and evaluation accuracy of neural network. The samples are optimized according to the spatial distribution of per capita GDP and PGA based on the correlation characteristics. In the area with lower per capita GDP values and higher PGA values, more sample points are generated and distributed according to the correlation between per capita GDP, PGA and earthquake-hit population. By comparing to the random samples, the optimizing samples can effectively improve the convergence speed and generalization capability of the trained network. In networks with different numbers of neurons, the number of iterations based on the optimizing samples is less than that of random samples.
The network trained by the optimizing samples considering the spatial characteristics has more accurate prediction ability.
3) A BP neural network is established using in uencing factors as input indicators based on the data from Lushan earthquake. The trained network is applied for Jiuzhaigou earthquake to test the generalization capability and prediction accuracy. The results show that the neural network has good prediction accuracy on the spatial evaluation in the study area. The total evaluating earthquake-hit population of ve counties affected by Jiuzhaigou earthquake based on optimizing samples is 198362 people, while the actual data is 216597 people. BP neural network has abilities to construct complex nonlinear relations to evaluate earthquake-hit population. The trained network can offer spatial evaluation of earthquake-hit population as well as other earthquake disaster information quickly after the occurrence of earthquake, providing signi cant reference for emergency rescue.