Earthquake-hit Population Spatial Evaluation Based on Correlation Characteristics and BP Neural Network

Rapid spatial evaluation of earthquake-hit population after earthquake occurrence is required in the 15 disaster emergency rescue management, due to its significant support for decreasing casualties and 16 property losses. The correlation between earthquake-hit population and influencing factors are analyzed 17 using the data from the 2013 Ms7.0 Lushan earthquake. Ten influencing factors including elevation, slope 18 angle, population density, per capita GDP, distance to fault, distance to river, NDVI, PGA, PGV and 19 distance to epicenter, are classified into environmental factors and seismic factors. The correlation 20 analysis reveals characteristics that there is a nonlinear relationship between the earthquake-hit 21 population and various factors, and per capita GDP and PGA factor have a stronger correlation with 22 earthquake-hit population. Moreover, the spatial variability of influencing factors would affect the distribution of earthquake-hit population. The earthquake-hit population is evaluated using BP neural 24 network with optimizing training samples based on the spatial characteristics of per capita GDP and PGA 25 factors. Different number of sample points are generated in areas with different value intervals of 26 influencing factors, instead of the random distribution of sample points. The minimum value of RMSE 27 (Root Mean Square Error) from testing set is 18 people/km 2 , showing good accuracy in the spatial 28 evaluation of earthquake-hit population. Meanwhile, the optimizing samples considering spatial 29 characteristics could improve the convergence speed and generalization capability comparing to random 30 samples. The trained network was generalized to the 2017 Ms7.0 Jiuzhaigou earthquake to verify the 31 prediction accuracy. The mean absolute error of earthquake-hit population evaluation results in different 32 counties under the Jiuzhaigou earthquake were 18357 people and 26121 people for optimizing samples 33 and random samples, respectively. The evaluation results indicate that BP neural network considering 34 the correlation characteristics of factors has the capability to evaluate the earthquake-hit population in 35 space, providing more detailed information for emergency service and rescue operation.

Reyes and Gouzeva, 2020; Luo et al., 2021). Due to the unpredictability of earthquake occurrence, it is 48 difficult to prepare before earthquake, so countries are committed to improve the emergency rescue 49 ability after earthquake (Huang and Li, 2014). Modeling the earthquake casualty is particularly important 50 for offering reference for emergency rescue and decision making. Casualty evaluation after earthquake 51 is fast becoming an important issue increasingly responsible for significant economic, social, and 52 environmental risk management (Huang and Huang, 2018). 53 The disaster under intense seismic motion is a complex result of various influencing factors. Seismic 54 intensity, topography, population and economic level are all related to casualties due to earthquake to a 55 certain extent. The traditional physical model or statistical regression model is difficult to reflect the 56 nonlinear relation between earthquake-hit population and factors (Erdik et al., 2011). With the continuous 57 improvement of computing speed in recent years, machine learning methods have been more widely 58 used. More and more scholars apply them for disaster mapping under earthquake considering that 59 machine learning methods could provide the ability to learn from historical data for producing insight 60 into extreme events (Yang et  population density, pre-warning level, in-building probability, location of occurrence, supply support and 66 building collapse ratio were considered. It was concluded that RW v-SVM model had higher prediction 67 Wenchuan earthquake epicenter was about 85 km. A total of 196 people were killed, 21 were missing 112 and 11470 injured in the Lushan earthquake. The Lushan earthquake affected an area of 12500 km 2 and 113 caused a direct economic loss of about 185.4 billion yuan. After the earthquake, Sichuan Province 114 immediately started the first level emergency procedures and sent out army to carry out emergency rescue 115

work. 116
The total number of earthquake-hit population under the Lushan earthquake reached 3.7 million. The 117 earthquake-hit population refers to the people who have suffered property or life losses due to the 118 earthquake. The earthquake-hit population not only reflects the severity of natural disasters, but also 119 reveals the impact of earthquake on people's lives. The earthquake-hit population also offers references 120 in the formulation of emergency rescue plan, leading to a fact that the number of earthquake-hit 121 population has become an important index to evaluate the damage caused by earthquake. Figure 1  122 illustrates the earthquake-hit population density (number of earthquake-hit people per square kilometer) 123 in each county-level administrative region within the earthquake affected area. The data was collected 124 and released by Sichuan provincial government after the earthquake on the Internet (Wang and Li, 2014). 125 In Figure1, the color represents the earthquake-hit population density. The earthquake-hit population 126 density in Yucheng District, Danling County and Mingshan County was relatively higher. The 127 earthquake-hit population density in Mingshan County was the highest, reaching approximately 432 128 people/km 2 . It can be observed that the region with the highest earthquake-hit population density was not 129 the region where the epicenter was located. It indicated that the impact of earthquakes on population is 130 complicated in space, the epicenter is not necessarily the most severely affected area under seismic 131 motion. Similar phenomenon had appeared in the 2008 Ms8.0 Wenchuan earthquake (Yang et al., 2014). 132 The casualties caused by earthquakes are related to many categories of influencing factors. The influence 133 factors of earthquake-hit population are divided into environmental and seismic factors. 134

Environmental influencing factors 135
The environmental influencing factors refer to the environmental conditions in the study area, and there 136 is no direct relationship between environmental factors and earthquake occurrence. The environmental 137 influencing factors considered in the research contained elevation, slope angle, population density, per 138 capita GDP, distance to fault, distance to river and Normalized Difference Vegetation Index (NDVI). The 139 data details of environmental influencing factors are shown in Table 1. In the environmental factors, 140 elevation, slope angle, distance to fault, distance to river and NDVI are the maps with data varying with 141 spatial location. However, population density and per capita GDP have inconsistent gradation with other 142 environmental factors. There is one attribute value for each county level administrative region for these 143 two factors, because counties were used as the basis unit of statistics. 144

Elevation 145
Elevation is considered to be the most important factor in the analysis of natural disaster susceptibility 146

Slope angle 153
The slope angle is a geomorphic parameter which has an important impact on seismic geological disasters 154 such as landslide, debris flow, barrier lake, etc. In the field investigation of historical strong earthquakes, 155 it was found that a large number of earthquake casualties were caused by geological disasters triggered 156 by seismic motion (Xu et al., 2015). The slope angle map (Figure 2(b)) was derived from the digital 157 elevation model using ArcMap software. 158

Population density 159
The population density is a key factor in risk assessment of natural disaster. It is calculated as the ratio 160 of population to bare land area here. Some strong earthquakes occurred in mountainous areas with low 161 population density, which posed a relatively small threat to people's lives and property (Ara, 2014). 162 Because of the strong mobility of population, it is difficult to obtain the spatial distribution of population 163 at the moment before earthquake occurrence. Therefore, the resident population of each county in census 164 was applied to approximate the population distribution (Figure 2(c)). The population density is the ratio 165 of population to area in each county. The population data were updated in 2011 and offered by China 166 Earthquake Network Center. 167

Per capita GDP 168
The per capita GDP is another crucial factor in earthquake-hit population evaluation. The per capita GDP 169 reflects the economic level of local people from one aspect, and the economic level would affect the 170 seismic resistance ability of engineering constructions. Generally, the higher the economic level is, the 171 stronger seismic resistance ability the constructions have. Similar to population density map, the per 172 capita GDP of each county was applied as per capita GDP distribution map (Figure 2(d)). The per capita 173 GDP data were updated in 2011 and offered by China Earthquake Network Center. 174

Distance to fault 175
The distance to fault is another significant factor related to seismic geological disasters. Generally, 176 fractured or weak zones are located near fault bedding planes, which are susceptible to weathering and 177 sliding (Conforti et al., 2014). The fault data were offered by China Earthquake Network Center and the 178 distance to fault was calculated using buffers of ArcMap software (Figure 2(e)). 179

Distance to river 180
The distance to river can also influence the earthquake-hit population, as river erosion and soil saturation 181 would decrease the seismic stability of slopes (Yalcin, 2008). The river data were offered by China 182 Earthquake Network Center and the distance to river was calculated using buffers of ArcMap software 183 ( Figure 2(f)). 184

NDVI 185
The NDVI can play a feasible role in the earthquake-hit population evaluation under the Lushan 186 earthquake. NDVI quantifies vegetation by measuring the difference between near infrared (vegetation 187 strong reflection) and red light (vegetation absorption). The closer the NDVI is to +1, the better the 188 vegetation coverage in the area is, and the lower the degree of urbanization is. The NDVI map was 189 obtained from Landsat 7 ETM+ satellite images acquired in 2012 from Geospatial Data Cloud site, 190 Computer Network Information Center, Chinese Academy of Sciences (Figure 2(g)). 191

Seismic influencing factors 192
The seismic influencing factors refer to the elements and characteristics that are directly related to 193 earthquake occurrence, and can be rapidly reflected by seismic motion monitoring instruments after 194 earthquake occurrence. The seismic influencing factors considered in this research contained peak 195 ground acceleration (PGA), peak ground velocity (PGV) and distance to epicenter. The data of seismic 196 influencing factors are shown in Table 2. 197

PGA 198
The PGA distribution map is the most commonly used parameter to describe the seismic ground motion 199 intensity of an earthquake (Boatwright et al., 2003;Yuan et al., 2013). PGA represents the peak value of 200 acceleration time-history waveform recorded on the ground surface during earthquake occurrence. It can 201 be considered as the maximum instantaneous force exerted by earthquake, and can effectively evaluate 202 the intensity of seismic ground motion at different positions in space. The PGA data were recorded by 203 strong-motion seismograph network which could be offered in a short time after earthquake occurrence. 204 The data used in this research were supported by China Earthquake Network Center (Figure 3(a)). 205

PGV 206
PGV is also an important index to evaluate the intensity of seismic ground motion. The acceleration time-207 history wave would miss some information, such as low-frequency components. The velocity time-208 history wave can better record this information. Therefore, PGV data are considered to comprehensively 209 evaluate the seismic ground motion intensity. The PGV data were supported by China Earthquake 210 Network Center (Figure 3(b)). 211

Distance to epicenter 212
Distance to epicenter is a parameter to measure the relative distance between the study region and the 213 epicenter. In previous researches, it was shown that with the increase of distance to epicenter, the impact 214 of earthquake disaster was gradually reduced. The distance to epicenter was calculated using buffers of 215 ArcMap software (Figure 3(c)). 216

Spatial correlation characteristics of influencing factors 217
Earthquake-hit population is related to environmental and seismic influencing factors. The spatial 218 distribution variability of influencing factors leads to the difference of earthquake-hit population in 219 different counties. The Spearman rank correlation coefficients were calculated to analyze the relationship 220 between earthquake-hit population and influencing factors, which can be expressed as follow in Eq. closer the absolute value of coefficient is to 1, the stronger correlation the data have. 227 Within the study region shown in Figure 1, 1000 sampling points were randomly generated and 228 distributed using ArcMap software. The earthquake-hit population density data and influencing factor 229 data at sampling points were extracted to construct the database for correlation analysis. 230

Correlation analysis between earthquake-hit population and environmental influencing factors 231
The Spearman correlation coefficient and significance test results between earthquake-hit population and 232 environmental influencing factors are listed in Table 3. The results of significance test were all less than 233 0.05. It showed that the number of samples was reasonable and the value of correlation coefficient was 234 effective. The highest correlation coefficient was -0.322 between earthquake-hit population density and 235 per capita GDP, showing that in the regions with high per capita GDP, the earthquake-hit population was 236 generally low. The minimum value of coefficient was -0.073. The results of correlation coefficient 237 revealed that among various environmental influencing factors, none of them had a direct linear 238 relationship with earthquake-hit population density. The earthquake-hit population density distribution 239 is the result of multiple environmental factors. 240 Moreover, the spatial correlation between earthquake-hit population density and environmental factors 241 are illustrated in Figure 4. The histogram statistics the average earthquake-hit population density within 242 the different factor intervals, which can reflect the related characteristics between spatial distribution of 243 earthquake-hit population density and influencing factors. It can be seen from Figure 4

that in various 244
factors, the average earthquake-hit population density in different intervals was discrete. However, 245 relatively high earthquake-hit population density would concentrate in the specific range of certain 246 factors. For example, in the 2013 Lushan earthquake, area between 500m and 1200m in elevation had 247 relatively high earthquake-hit population density (Figure 4(a)). As regards the relationship between 248 earthquake-hit population density and distance to river, the result showed that the maximum earthquake-249 hit population density was observed in the interval closest to river (Figure 4(f)). 250

Correlation analysis between earthquake-hit population and seismic influencing factors 251
The Spearman correlation coefficient and significance test results between earthquake-hit population and 252 seismic influencing factors are listed in Table 4. The results of significance test were all less than 0.05. 253 The correlation coefficient between earthquake-hit population density and PGA had the highest value of 254 0.433. It showed that the earthquake-hit population density had remarkable PGA positive correlation, 255 and the higher the PGA, the greater the earthquake-hit population. 256 The spatial correlation between earthquake-hit population density and seismic influencing factors are 257 illustrated in Figure 5. It can be observed that PGA had a stronger positive correlation with the 258 earthquake-hit population density. The earthquake-hit population density in each interval generally 259 increased with the value of PGA. However, when PGA value ranged from 750 m/s 2 to 850 m/s 2 , the 260 earthquake-hit population density had relatively lower value (Figure 5(a)). The reason was that the area 261 with great seismic motion intensity had a low population density. Therefore, the population density 262 affected by the earthquake was relatively lower. 263 Correlation coefficient is a statistical index to reflect the close degree of linear correlation between 264 variables. The results of correlation analysis indicated that there is a nonlinear relationship between the 265 earthquake-hit population and various influencing factors in spatial distribution. The correlation 266 coefficients imply the existence of spatial correlation between factors and disaster results. For example, 267 in the area with greater seismic motion, the earthquake-hit population density was higher. Nevertheless, 268 the numeric value of correlation coefficients indicates that the earthquake-hit population is a result of the 269 complex interaction of multiple factors. It is difficult to evaluate the spatial earthquake-hit population 270 with a linear relation.

Sample optimization selection based on correlation characteristics 288
Generalization capability is an important indicator to measure the accuracy of the neural network for 289 predicting data outside the sample. In order to effectively evaluate the spatial distribution of the 290 earthquake-hit population in a newly occurred earthquake, it is necessary to ensure that the network has 291 a preferable generalization capability. Training samples have a great impact on generalization capability 292 (Partridge, 1996), so the selection of samples is optimized based on the results of correlation analysis. 293 In the process of network training, to uniform the format of influencing factor data and earthquake-hit 294 population data, the vector images were transformed into raster layer data. The number of raster-based 295 samples was far greater than the number of influencing indicators, which could result in the over fitting 296 of neural network. In practical application, a part of all samples would be randomly selected as the 297 training set. However, the selection of random samples might lose part of the spatial characteristics of all 298 sample data in the training process, thus reducing the generalization capability and evaluation accuracy 299 of the network. Based on the correlation characteristic between influencing factors and earthquake-hit 300 population density, it was observed that compared with other factors, there was a stronger correlation 301 between per capita GDP, PGA and the earthquake-hit population density. It implied that the area with 302 lower values of per capita GDP and higher values of PGA had a greater number of earthquake-hit 303 population, which was the key study area of earthquake casualty evaluation. The per capita GDP and 304 PGA indicator indirectly reflected the spatial distribution characteristics of earthquake-hit population to 305 some extent. Therefore, more samples were selected in the area with lower values of per capita GDP and 306 higher values of PGA, and fewer samples were selected in the area with higher values of per capita GDP 307 and lower values of PGA to consider the spatial variability of earthquake-hit population. 308 Figure 7 illustrates the frequency histograms of per capita GDP and PGA factor. It can be seen that the 309 frequency of raster was approximately normal to per capita GDP value, and the frequency of raster 310 decreased with the increase of PGA value. It indicated that the area of poor economy and intense seismic 311 motion was much smaller than that of strong economy and weak motion. Although the area was small, it 312 was the predominant area of disaster assessment and emergency rescue under earthquake. The per capita 313 GDP and PGA data were classified into five clusters using Natural Breaks Classification method 314 according to the value, respectively. The Natural Breaks Classification method is an extensively applied 315 clustering method to maximize the internal similarity of each cluster and the difference between clusters. 316 The proportion of samples was determined on the basis of the average value of clusters as listed in Table  317 5. The proportion of samples was proportional to the average of clusters. A total number of 1000 sample 318 points were generated in the study area to extract attribute values from raster layers. The distribution 319 comparison between random sample points and optimizing sample points is showed in Figure 8. It can 320 be seen that according to the result of clustering analysis, in the area with low per capita GDP values and 321 high PGA values, the sample points were denser. The samples were optimized based on the numerical 322 and spatial characteristics of per capita GDP and PGA indicator. 323

Earthquake-hit population evaluation results 324
Two networks based on the random samples and optimizing samples were trained, respectively. One 325 hidden layer is sufficient for most of the applications (Aghamohammadi et al. 2013). Therefore, a three-326 layer network containing one input layer, one hidden layer and one output layer was adopted. In order to 327 ensure the scientific of the evaluation, 70% of the sample data were used as training set, and 30% of the 328 sample data were used as testing set. The testing set was a sample set to test the classification ability of 329 the trained network. The normalized goal error of training set was 0.0003. When the error of the training 330 set during the iteration was less than 0.0003, the training process stops. 331 comparing the RMSE of the networks, it can be noted that the RMSE of the optimizing samples was 344 smaller than that of the random samples, except when the number of neurons was 13. When the number 345 of neurons in the hidden layer was 13, the RMSE of the two networks was close to each other as 346 illustrated in Figure 9. The RMSE measures the difference between estimation data and testing set, and 347 reflects the generalization capability and evaluation accuracy of network for new data. It implied that the 348 earthquake-hit population evaluation based on optimizing samples not only had faster convergence speed, 349 but also had better generalization capability and prediction accuracy comparing to random samples. County, Songpan County and Zoige County. Jiuzhaigou earthquake and Lushan earthquake had the same 360 magnitude, and the earthquake affected areas of them had similar geographical conditions. Therefore, 361 the Jiuzhaigou earthquake was selected to verify the generalization capability and effectiveness of the 362 earthquake-hit population evaluation. The earthquake-hit population density in each county-level 363 administrative region collected by Sichuan provincial government was illustrated in Figure 10. 364 The networks based on optimizing samples and random samples of Lushan earthquake were used to 365 evaluate the earthquake-hit population under Jiuzhaigou earthquake separately. The output of network 366 was a raster layer with earthquake-hit population density data varying with spatial position. The average 367 value of the earthquake-hit population density raster data within the county area was calculated as the 368 earthquake-hit population density of this county. The earthquake-hit population is the product of 369 earthquake-hit population density and county area. 370 The actual data and evaluation results of earthquake-hit population were listed in Table 6. There were 371 deviations between estimation value and actual value in each area. For the total number of earthquake-372 hit population, the evaluation result of optimizing samples was much closer to actual data than that of 373 The error rate was defined as the ratio of the difference between actual data and result and actual data. It 385 can be seen that for Jiuzhaigou County, Pingwu County and Songpan County, where the earthquake-hit 386 population was relatively large, the error rate of evaluation results based on the BP neural network was 387 less than 100%. For Hongyuan County and Zoige County, the error rate was relatively great. The 388 maximum value of error rate based on the random samples reached 1049.84% and that based on the 389 optimizing samples reached 490.14%. It can be noted that in each area the error rate of optimizing 390 samples was smaller than that of random samples. The earthquake-hit population evaluation based on 391 the optimizing samples had more accurate prediction for new data. It revealed that optimizing samples 392 can effectively offer a more accurate evaluation of earthquake-hit population. 393

Conclusions 394
In the present study, a spatial earthquake-hit population distribution is evaluated based on correlation is less than that of random samples. The network trained by the optimizing samples considering the 411 spatial characteristics has more accurate prediction ability. 412

3) A BP neural network is established using influencing factors as input indicators based on the data from 413
Lushan earthquake. The trained network is applied for Jiuzhaigou earthquake to test the generalization 414 capability and prediction accuracy. The results show that the neural network has good prediction accuracy 415 on the spatial evaluation in the study area. The total evaluating earthquake-hit population of five counties 416 affected by Jiuzhaigou earthquake based on optimizing samples is 198362 people, while the actual data 417 is 216597 people. BP neural network has abilities to construct complex nonlinear relations to evaluate 418 earthquake-hit population. The trained network can offer spatial evaluation of earthquake-hit population 419 quickly after the occurrence of earthquake, providing significant reference for emergency rescue. 420