Analysis of public emotion on flood disasters in southern China in 2020 based on social media data

The exploding popularity of social networks provides a new opportunity to study disasters and public emotion. Among the social networks, Weibo is one of the largest microblogging services in China. Taking Guangdong and Guangxi in the south of China as a case, Web Scraper was used to obtain Weibo texts related to floods in 2020. The spatial distribution of floods was analyzed using Kernel Density Estimation. Public emotion was analyzed using Natural Language Processing tools. The association between floods and public emotion was explored through correlation analysis methods. The results indicated that: (1) Weibo texts could be utilized as effective data to identify urban waterlogging risk in Guangdong and Guangxi. (2) The waterlogging was mainly distributed in the southern part of Guangdong and Guangxi, especially in the provincial capitals and coastal cities. (3) Public emotion was predominantly negative, especially during periods of heavy precipitation. (4) There was a strong correlation between public emotion and floods in spatial–temporal variation. The degree of negative public emotion was significantly influenced by the number of waterlogging points. The presented results serve as the preliminary data for future planning and designing of emergency management.

RS technologies are essential for non-engineering flood management, as they enable effective flood monitoring and risk assessment. While GIS and RS technologies offer powerful spatial data management and analysis functions, the acquisition and interpretation of remote sensing images require significant human and material resources (Li et al. 2014). The hydrodynamic model is currently the dominant approach for simulating stormwater flooding, as it possesses a clear physical basis and can accurately express complex hydrodynamic processes. The hydrodynamic model has strict data requirements and needs to be calibrated using data such as flow velocity and water level. The computational efficiency of hydrodynamic simulations is not high; it is still a big challenge to perform fast and highprecision simulations at the urban scale of hundreds of square kilometers (Huang et al. 2021). In summary, GIS and RS technologies and hydrodynamic models have their advantages in the study of flooding, but they are also limited by many factors.
The rising of social media has profoundly altered the way researchers acquire information about disaster events (Fang et al. 2019). Compared to traditional sources of information, social media has several advantages in terms of information extraction and dissemination, such as the ability to be searched and shared, real-time updates, wide distribution, and self-publishing capabilities (Crooks et al. 2013;Fang et al. 2019). And it is noteworthy that social media also provides an enabling condition for public emotion analysis (Anwar et al. 2015;Wang 2020). The public is likely to express their opinions and emotions toward disasters on social media platforms. When a natural disaster suddenly occurs, emergency managers can understand the status, trends, and abnormal changes in public emotion through emotion analysis (Bai and Yu 2016). In flood-related studies, social media is better suited for acquiring data on a larger scale and in real-time than GIS technology and hydrological models (Huang et al. 2021). In recent years, many scholars have utilized social media as a research data source in urban flooding studies and have validated the reliability of disaster information extracted from social media (Fang et al. 2019;Shoyama et al. 2021). Some studies have further explored public emotion during floods and obtained characteristics of public emotion Wu et al. 2018). Their studies have confirmed the feasibility of social media for flood and public emotion research, but few studies have explored the relationship between urban flooding and public emotion.
With the wide use of Twitter in disaster management abroad, Sina Weibo, a Chinese version of Twitter, has attracted a lot of attention in disaster management (Deng et al. 2016). Weibo has a large user base, with 511 million monthly active users in 2020, of which the post-90s and post-00s account for nearly 80% (Weibo report publishing platform). Each user of Weibo is both a publisher and an audience of Weibo information. It has been shown that disaster information extracted from Sina Weibo could reflect the public's disaster risk perception well and have the potential to serve as a data source for disaster management in China Wu et al. 2021;Xiao 2019). Meanwhile, there are also studies showing that Weibo data can well support public emotion analysis in disaster events (Chen and Song 2020;Han and Wang 2019;Wu et al. 2018). Thus, Weibo is an ideal tool to provide basic data for the study of urban waterlogging events.
Weibo data were less frequently used in flooding studies in the Guangdong and Guangxi regions, and the correlation between disaster situations and public emotion has rarely been considered in relevant studies. Therefore, this study aims to explore the disaster characteristics and public emotion characteristics of flooding in Guangdong and Guangxi from June 1 to September 30, 2020, using Weibo texts, and to explore the correlation between flooding and public emotion. It is hoped that this paper can provide scientific support for the identification of flood risks, improvement in the emergency management system, and guidance and control of public emotion in Guangdong and Guangxi. Disasters in Guangdong and Guangxi are characterized by many types, high frequency of occurrence, suddenness, wide impact, and high losses. According to the statistics of historical disasters, the main types of disasters in Guangdong and Guangxi are floods, typhoons, geological disasters, and marine disasters. Among them, flood disasters have the most significant impact on human society (Ministry of Emergency Management of the People's Republic of China). For example, in June 2020, several rounds of heavy rainfall in southern China triggered severe flooding and secondary geological hazards, with some rivers in Guangdong and Guangxi even experiencing record flooding. A total of 7.14 million people were affected in Guangdong, Guangxi, and other areas, with direct economic losses of 21.06 billion RMB.

Data collection
The remote sensing images, DEM data, and administrative boundaries of the Guangdong and Guangxi regions in 2020 were obtained through the open-source map software-LocaSpace Viewer. Weibo texts were chosen as the data source. Based on the open interface of the Weibo platform, index phrases were formed by keywords such as "flood," "waterlogging," and "trapped." The original sign-in data for urban waterlogging disasters in Guangdong and Guangxi from June 1, 2020, to September 30, 2020, were obtained. Through manual screening, duplicate data, pictures, videos, and other data were deleted, and 6125 pieces of valid text data were finally obtained. The number of Weibo texts for different dates and different cities is shown in Figs. 2 and 3.

Methods
The framework and approach used in this study are schematically depicted in Fig. 4, which mainly contained three parts: Weibo database construction, analysis of disaster situation and public emotion, and correlation analysis between disaster situation and public emotion.

Weibo database construction
Firstly, Web Scraper was used to obtain flood-related Weibo texts between June 1, 2020, and September 30, 2020. Web Scraper is a Google Chrome-based crawler tool that is free and does not require a high level of programming from the user (Sspai Web 2020a, b). After that, duplicate data, images, and videos were removed through manual screening. In this way, a database of flood-themed Weibo texts was obtained.

Analysis of disaster situation
To extract useful information from Weibo texts, the collected texts were divided into words using "jieba" (Chen and Song 2020). "jieba" (Chinese for "to stutter") Chinese text segmentation is a Python Chinese word segmentation module. The functions of "jieba" include text segmentation, keyword extraction, lexical annotation, and word position query. After obtaining the results of text segmentation, the analysis of the disaster situation and public emotion can be launched (GitHub Web 2020a, b).
In the analysis of the disaster, Baidu Maps was used to obtain the coordinates of the location where the waterlogging occurred in Weibo texts. The coordinates of these waterlogging points were plotted in ArcGIS, and the spatial distribution characteristics of waterlogging were explored by the Kernel Density Analysis (Allen et al. 2021). Kernel Density Analysis is a method used to calculate the unit density of the measured values of point and line elements within a specified neighborhood. It visualizes the distribution of discrete measurements over a continuous area. The result of Kernel Density Analysis is a smooth surface with large intermediate values and small peripheral values, and the raster value is the unit density. The Kernel Density Analysis uses the following function: In the above formula, r is the lookup radius; scale is the ratio of the distance from the grid center point to the point or line object to the lookup radius. For a point object, the volume of the space enclosed by its kernel density surface and the plane below approximates the measurement at this point. For a line object, the volume of the space enclosed by the kernel density surface and the plane below is approximated by the product of the measurement of this line and the length of the line (IDesktop Web 2022).

Analysis of public emotion
Baidu Natural Language Processing was used to analyze public emotion (Zhang and Gan 2020). Baidu Natural Language Processing is a tool to analyze Chinese emotional tendencies, which is built on deep learning technology and Baidu big data. Baidu Natural Language Processing can automatically determine the sentiment polarity category of the text, including positive, negative, and neutral, and give the corresponding confidence level. The emotion polarity category of each Weibo text was determined, and then, positive, negative, and neutral were expressed as 1, − 1, and 0, respectively, to quantify the public emotional score ). Based on the text segmentation results, the TF-IDF formula was used for the semanticbased classification of Weibo texts (Sarirete 2022). TF-IDF is a statistical method to assess the importance of a word for a document set or one of the documents in a corpus. The importance of a word increases positively with its number of occurrences in a document but decreases inversely with its frequency in a corpus.
In the above formula, f i,j represents the number of times that item t i appears in Weibo j , the numerator is the number of times the word appears in the text, and the denominator is the sum of the number of times that all words appear in the text (Natural Language Processing Column 2022).
Combining existing studies (Wu et al. 2018) and the contents of the collected microblog texts, the feature items after calculating the weights were divided into three categories: disaster descriptions, pre-disaster warnings and news reports, and emotional expressions and related thoughts.

Correlation analysis between disaster situation and public emotion
The data for waterlogging points and negative emotion scores for each city were found to be normally distributed through testing. Pearson correlation analysis and partial correlations analysis were used to analyze the correlation between waterlogging and negative public emotion (Ates and Guran 2021). Pearson correlation analysis is a method proposed by British statistician Pearson and is widely used to measure the degree of correlation between two variables; its result is represented by the correlation coefficient r.
In the formula, n is the sample size, X and Y are the observed values of the research variables. Generally defined, if r > 0, it can be concluded that the two variables are positively correlated, otherwise, it is negatively correlated. The larger the absolute value of  Partial Correlations Analysis is the process of eliminating the effect of the third variable when two variables are simultaneously correlated with a third variable and analyzing only the degree of correlation between the two variables to be explored. In analyzing the correlation between two variables X and Y, with Z as the control variable, the partial correlation coefficient between X and Y is defined as r xy(z) .
In the formula, r xy is the correlation coefficient of x and y, r xz is the correlation coefficient of x and z, and r yz is the correlation coefficient of y and z. The larger the absolute value of r xy(z) , the stronger the correlation between x and y (Mengte Web 2022).

Natural breaks classification (Jenks)
There are natural inflection points and breakpoints between any series of numbers, which have statistical significance. With these inflection points, the research objects can be divided into groups with similar characteristics. Therefore, the natural breakpoints themselves are good bounds for grading (GIS Column 2016). Natural breakpoints were used in this study to classify the kernel density of waterlogging points and emotional scores.

Distribution characteristics of waterlogging
By screening and counting the geographic location descriptions in Weibo texts, a total of 276 waterlogging points in Guangdong and Guangxi were obtained. Among them, there were 110 in Guangxi, accounting for 39.86% of the total waterlogging points, and 166 in Guangdong, accounting for 60.14%. The nucleation density of waterlogging points extracted from Weibo is shown in Fig. 5. Flooding was more prevalent in the southern coastal cities of Guangdong and Guangxi than in the northern cities, with the southeast experiencing more flooding than in the central and northwestern regions. This result was highly consistent with media reports. Floods in Guangdong Province were mainly distributed in Canton, Foshan, Huizhou, Dongguan, Zhongshan, and Zhuhai. Floods in Guangxi Zhuang Autonomous Region were mainly distributed in Nanning, Fangchenggang, Beihai, Qinzhou, and Guilin. Waterlogging in Guangdong and Guangxi often occurred simultaneously in multiple locations or repeatedly in the same location. Combining the text data and remote sensing images, the high incidence of waterlogging was mostly located at the intersection of urban roads, where the urbanization rate was high and drainage networks were commonly aging. The stagnant water produced by heavy rainfall carried attached materials such as branches, leaves, garbage, or hail, which can easily cause drainage facilities to become clogged, making the disaster even worse.

Public response features
According to Fig. 2, the number of Weibo texts related to waterlogging showed certain peaks and valleys during the continuous rainstorm events, and the peaks with more obvious changes in the number of microblogs appeared on June 8 and August 12.
According to the results of historical weather inquiries (Table 1), the weather in the two peak periods was mostly characterized by heavy rainfall in Guangdong and Guangxi. In addition, judging from the number of microblogs, public discussions on social media about waterlogging mainly occurred from early June to the end of August. During this period, the frequent occurrence of extreme precipitation in Guangdong and Guangxi triggered the continuous occurrence of urban waterlogging. The problems of traffic standstills, school closures, and trapped people caused by heavy rainfall and flooding, made the public respond strongly. The scope of heavy rainfall in September after the main flood season was significantly reduced, and the number of microblogs sent by the public about urban waterlogging showed a declining trend. It could be seen that the number of related Weibo was affected by the precipitation weather. The keywords, weights, and frequencies of Weibo texts during the study period were obtained by text segmentation using "jieba," including waterlogging (7544 times), rainstorm (3161 times), road (2009 times), severe (1083 times), vehicles (886 times), trapped (852 times), affected (845 times), flooding (578 times), and heavy rainfall (565 times), etc. The word cloud of these keywords is shown in Fig. 6. Based on the results of the normalized relative and absolute subscripts, text clustering was performed using  the TF-IDF formula to derive the types and quantitative differences of the social public response. The classification results showed that the public responded the most to the description of the disaster location and the disaster situation (74.59%). The second was the pre-disaster warning and the delivery of news content during the disaster (19.73%). And the least part was thoughts on urban planning and disaster situations (5.68%). Relevant departments should focus on identifying and guiding public response while strengthening disaster early warning, to curb the growth and spread of negative emotions.

Spatial and temporal characteristics of public emotion
The polarity and score of each Weibo text were determined through Baidu Natural Language Processing. These emotional scores were summed temporally, as shown in Fig. 7. The number of positive and negative tweets within each prefecture-level city is shown in Fig. 8. It could be seen that the public's emotional response to urban flooding was mainly negative. The emotional score reached the extreme during periods of concentrated rainfall, as did the number of Weibo texts. This suggested a possible link between changes in public emotion and weather conditions. The emotional scores of negative tweets within the prefecture-level cities were summed to obtain the negative emotion scores of each prefecture-level city, as shown in Fig. 9. The spatial distribution of the negative emotion scores and the waterlogging points were displayed in ArcGIS, as shown in Fig. 10.
According to Figs. 9 and 10, most prefecture-level cities had negative emotion scores between -12 and 0 (74.29%). Among them, Chongzuo had a negative emotion score of 0. On the one hand, verification of news reports showed that Chongzuo was hardly affected by the floods; on the other hand, Chongzuo had the lowest number of Weibo texts among all prefecture-level cities. Negative emotion scores in Liuzhou, Guilin, Foshan, Zhuhai, Shantou, and Nanning ranged from − 37 to − 15 (17.14%). Negative emotion scores for Dongguan and Shenzhen were − 78 and − 76 (5.71%). Canton had the lowest negative emotion score of − 248 (2.86%) and was much lower than other prefecture-level cities, while Canton also had the highest number of waterlogging points and Weibo texts. In general, the negative emotion scores of most cities in Guangdong and Guangxi were not very low. The cities with low negative emotion scores were primarily located in the provincial capital and surrounding city clusters of Guangdong Province.

The relationship between waterlogging and public emotion
As shown in Fig. 7, the public emotional reactions were greater before the onset of the rainstorm and when the flooding occurred, while negative emotional words such as worry, sadness, fear, alarm, and annoyance were easy to appear in Weibo texts. However, the effective implementation of emergency measures by relevant departments and the promotion of antiflooding deeds in the later stage of the flooding led to a gradual easing of public emotion. The use of positive terms on Weibo became more frequent, and public emotion of society was positive for a few days. It was evident that changes in the disaster situation did affect the direction of public emotion. According to Fig. 10, areas with a high number of waterlogging points were accompanied by lower negative emotion scores, suggesting a link between waterlogging points and negative emotion scores. Nanning was a very special city with many waterlogging points, but its negative emotion score was not very low. A correlation analysis was done between the waterlogging points and the negative emotion scores, yielding a correlation coefficient of − 0.759 and a Sig value of 0.00 < 0.05. The correlation coefficient even reaches − 0.896 when Nanning was not considered. To exclude the effect of population differences on the correlation analysis, a partial correlation analysis was performed between the number of waterlogging points and negative emotion scores, with the population used as a control variable (without considering Nanning). The result of the partial correlation analysis was − 0.756, and the Sig value was 0.00 < 0.05. The above analysis indicated that the more intensive the waterlogging, the higher the negative public emotion score, and the public emotion response was consistent with the intensity of the flooding.

Is it feasible to study flooding and public emotion with Weibo data?
Among the extant studies related to urban flooding in the Guangdong and Guangxi regions, few of them used social media data as the database; the present research fills the gap in this area to some extent. The primary data used in this study were Weibo texts, and the results confirmed the feasibility of Weibo texts to study flooding and public emotion, which is consistent with the results of related studies (Wu et al. 2018;Xiao 2019). In addition to Weibo texts, user check-in information, pictures and videos, and users' retweets and comments information are also useful Weibo data. In Wang et al. study, the ratio of the number of flood-related microblogs to the number of user check-ins was used to characterize the extent to which residents perceive flooding . Only the original texts published by Weibo users were selected for analysis in this study, which affected the accuracy and completeness of the information to some extent. Combining multiple sources and communication methods in social media can provide a more comprehensive reflection of disaster situations and users' emotions (Meng et al. 2021). In addition, some studies used more than just social media data. In Fang et al. study, hourly precipitation data and social media data were utilized. Precipitation data were used to represent the disaster process and validate the effectiveness of social media messages (Fang et al. 2019). Thus, integrating information from multiple sources in social media and other data can serve as a valuable database for future studies on flooding and public emotions.

Spatial distribution of waterlogging points
The location attributes unique to Weibo, combined with location descriptions within the text, provide valuable location information for disaster occurrence areas. This was effectively demonstrated in the pickup and nuclear density analysis of the waterlogging points in this paper. The results of the study showed that the provincial capitals and coastal cities in Guangdong and Guangxi were densely flooded, while the rest of the regions were relatively less affected. The results were generally consistent with those obtained from the coupled GIS and RS analysis, as well as scenario simulation methods employed in the study (Huang and Bai 2019;Li et al. 2019;Tian et al. 2017). The main reason for this distribution pattern is that provincial capitals and coastal cities are urbanizing relatively fast, and their drainage capacity is significantly lagging behind the level of urbanization (Zhang and Tang 2020;Zhao and Zhang 2017). Therefore, it is necessary to upgrade the existing drainage network or construct new pipes in these areas to improve drainage capacity. At the same time, the relevant departments should carry out targeted disaster prevention, mitigation, and relief work.

Characteristics of public emotion
The Weibo texts during the study period were dominated by descriptions of the disaster, which led to the formation of dissemination nodes of varying sizes within users' two-way social circles. It is related to the information dissemination mechanism of social media such as Weibo (Wang 2010). In addition, a few users provided insightful perspectives on urban planning and emergency deployment, and offered corresponding recommendations. These suggestions help evaluate the extent of disaster damage and retrofitting needed in affected areas. Therefore, it can be considered that public emotion stems from the disaster situation, and analyzing public emotion can aid in analyzing the disaster situation. Affected by the flooding, public emotion presented predominantly negative on Weibo, which is consistent with the results obtained in similar studies (Han and Wang 2019;Wu et al. 2018;Xiao 2019). While Wu et al. directly used feature words indicating emotion as the basis for classifying public emotion levels, this paper used the entire Weibo text as the emotion feature term for emotion identification, with a focus on the semantic relationships in Chinese expressions. While other studies have focused on the overall public emotion characteristics Zhang and Cheng 2021), this paper focused on negative public emotion. The reason is that the collected Weibo texts contain a large number of tweets from institutions or organizations, which serve to convey disaster information or channel positive emotions.
Most of these tweets were positive or neutral and did not strictly reflect public emotion, while negative tweets were almost published by individual Weibo users.
The low scores of negative public emotion were mainly distributed in the provincial capital and surrounding urban clusters of Guangdong Province. On the one hand, the large number of waterlogging points in these places triggered negative public emotion; on the other hand, the highly developed economy in these areas led to the aggregation of social networks, which amplified the score of negative emotion (li et al. 2013). Therefore, the emergency management departments in these places should focus on the positive guidance of public emotion while fighting floods and providing disaster relief.

The connection between disaster and public emotion
The intertwining of disaster and public emotion reflects both the complexity of worsening extreme weather into flooding-causing and the high correlation between the two. The results of the correlation analysis showed that the negative public emotion was consistent with the change in the disaster situation. The more severe the urban flooding, the more negative the public emotion. This conclusion is similar to that reached by other disaster studies (Wang and Taylor. 2018).
To explain the correlation, it is necessary to understand the factors that contribute to the generation of negative public emotion. Although few studies have explored the factors that influence public emotion during flood events, some similar studies have identified certain factors (Qing et al. 2021;Xie et al. 2022;Guo et al. 2021;Schipper and Petermann 2013). These studies concluded that the degree to which citizens' lives are affected, the government's disaster preparedness and response efforts, risk exposure, rumors, empathy, and personal characteristics and perceptions all have an impact on public emotion. Based on these views, the content of the collected Weibo texts of negative emotion was analyzed. The results showed that most of the texts with negative emotions described the impact of the floods on the posters, such as "The thunderstorm is troublesome, it's hard to travel." A few texts showed negative emotions due to dissatisfaction with the behavior of others or the relevant departments. For example, "What an unethical driver, driving so fast even though there is water on the road!", "Can drainage be considered when building roads? Who will take care of the waterlogged traffic?". Overall, the consequences of flooding, the behavior of others, and the ineffective work of the government were the main direct influences on the generation of negative emotions, with the disaster factor being the most dominant. Thus, the high correlation between floods and public emotion can be explained. However, it is not comprehensive to judge the influencing factors of public emotion only from the textual content. The influencing factors of public emotion can be studied quantitatively from multiple dimensions in the next study.
Figuring out the correlation between flooding and public emotion is of great significance for emergency management. For example, relevant departments can identify problems in disaster prevention and response based on public emotion, or predict changes in public emotion based on the disaster situation and take measures in advance to prevent potential public emotion risks.

3 5 Conclusion
This paper analyzed the spatial and temporal distribution of the disaster situation of flooding events and public emotion in the Guangdong and Guangxi regions based on the Weibo text data. The main conclusions are as follows: (1) The Weibo text can be used to study the flooding situation and public emotion.
(2) From June 1 to September 30, 2020, the Guangdong and Guangxi regions experienced severe flooding, with waterlogging concentrated in coastal cities and provincial capitals, exhibiting significant spatial heterogeneity. (3) During the flooding, the type of public response was mostly descriptions of the flooding, and the public emotion was mainly negative. (4) There was a strong correlation between public emotion and flooding. The more severe the urban flooding, the more negative the public emotion.
Based on the current research progress, the following recommendations are made to the relevant authorities: (1) Relevant departments should optimize the pipe network and build new flood control projects according to the characteristics of flooding.
(2) Relevant departments can keep track of public response and public emotion to identify problems in disaster response through social media. (3) Based on the correlation between flooding and public emotion, it is possible to predict the direction of public emotion and identify potential emotional risks. (4) It is necessary to establish sound and detailed flood control emergency plans according to the characteristics of each region. (5) The public should be informed and educated about flood response methods through various channels. (6) Official media is supposed to publish timely flood information, dispel rumors, and pay attention to the positive guidance of the public.