Public Perception of the COVID-19 Pandemic on Chinese Social Networking Service (Weibo): Sentiment Analysis and Fuzzy-C-Means Model


 BackgroundSocial media analysis tools have been used to monitor public sentiment and communication methods during public health emergencies.Public health emergencies are required to better understand the impact of the crisis on the public and to provide reference material for the prevention of future public health emergencies. We are concentrating on the sentiments around the public health emergency created by COVID-19.ObjectiveThis study aims to better understand the impact of public health emergencies on citizens and provide reference material for future public health emergency prevention.MethodsThe Fuzzy-c-means method was used to divide the 850,083 content of Weibo from January 24, 2020, to March 31, 2020, into seven categories of emotions: fear, happiness, disgust, surprise, sadness, anger, and good. The changes in emotion were tracked over time.ResultsThe results indicated that people showed "surprise" overall (55.89%); however with time, the "surprise" decreased. As the knowledge regarding the coronavirus disease 2019 (COVID-19) increased (contents about COVID-19 knowledge: from 21.16% to 4.19%), the "surprise" of the citizens decreased (from 59.95% to 46.58%). Citizens' feelings of "fear" and "good" increased as the number of deaths associated with COVID-19 increased ("fear”: from 15.42% to 20.95% "good”: 10.31% to 18.89%). As the infection was suppressed, the feelings of "fear" and "good" diminished ("fear”: from 20.95% to 15.79% "good”: from 18.89% to 8.46%).ConclusionsIn this study, the emotions and changes in emotions of Weibo users were analyzed in chronological order. The results of this study can prepare for future public health emergencies.


Background
The coronavirus disease (COVID- 19) pandemic has spread in more than 200 countries and has caused many deaths [1]. As of , the death toll reached 2,758,877 [2]. With the World Health Organization (WHO) pandemic statement and government action on the disease, various sentiments regarding COVID-19 have spread across the world. Over the last decade, social media analysis tools have been used to monitor public sentiment and communication methods for public health emergencies such as the Ebola and Zika epidemics.
There are two directions in the emotion analysis area: sentiment strength detection and detection of multiple emotions. Emotional strength detection tends to classify emotions into three types: positive, neutral, and negative. Conversely, detection of multiple emotions classi es human emotions into various types.
Many studies have analyzed sentiment strength detection and detection of multiple emotions on social media, such as tweets [3,4,5,6]. However, most of China's social media (Weibo) analyses perform sentiment strength detection [3,7] and the emotions fall into only three categories: positive, neutral, and negative. In this study, seven emotions were set, and emotion classi cation was studied according to these seven emotions.
There are two existing sentiment analysis methods: supervised learning and unsupervised learning. Neural networks and machine learning techniques are used with supervised learning. First, the model is trained, and then it is used to classify emotions. The disadvantage of this method is that the creation of training data is labor intensive. Unsupervised learning primarily creates rules and dictionaries, which are then used to classify emotions. The disadvantage of this method is that words that are not in the rules or dictionaries cannot be analyzed. However, it is impossible to put every word in the dictionary.
The fuzzy c-means (FCM) method used in this study is an unsupervised soft computing technology. It was developed by Dunn [8] in 1973 and improved by Bezdek [9] in 1981. The soft clustering method, as compared to the hard clustering method uses a fuzzy set [10], which can better solve the problem of text ambiguity. Membership in fuzzy sets, indicates the degree of matching between the element and the set, with membership values ranging from 0 to 1. The concept of membership was extended using FCM. In this method, the membership matrix represents the membership value of the elements in multiple clusters. FCM is one of the most commonly used methods for solving fuzzy problems. Compared with other clustering methods, it is exible and can accurately represent the degree of data a liation [11].
The advantages of this method are as follows: 1. the number of words that constitute an emotion dictionary can be reduced; 2. it can analyze words that are not in an emotion dictionary; 3. it is suitable for accurately judging the ambiguity of people's emotions.
This study aims to better understand the impact of public health emergencies on citizens and provide reference material for future public health emergency prevention. FCM was used to analyze seven different emotions related to Weibo's content and track changes in these emotions over time.

Methods
The overall structure of the proposed method is shown in (Fig. 1). It contains ve parts. Preprocessing, feature extraction, clustering, and emotion classi cation. All calculation methods used in this experiment were implemented in Python.
First, the raw data needs to be preprocessed. A skip-gram is then used to extract word features and convert them into computer-processable data. Next, the processed data is clustered by the FCM method. Finally, the data are classi ed into seven types of emotions using the clustered values of the words in each sentence.

Data set
As shown in (Fig. 2), the data source used in this study was Weibo. The collection time was from January 24, 2020 to March 31, 2020. The keywords searched were "COVID-19 outbreak status" and "COVID-19 pneumonia" and the data collected totaled 1,367,842 user contents.

Preprocessing
The input dataset is preprocessed using normalization and python code. The preprocessing tasks are as follows: Excluding Weibo contents that have no meaning: URLs, images, etc.
Removing special characters: remove all special characters (punctuation marks, question marks, exclamation marks, etc.) and replace them with spaces.
Finally, there were 850,083 contents (only NAVA) that were used as data.

Emotional dictionary
In this study, seven emotions were set based on Ekman, P, and Xu. The rst six sentiments were set based on Ekman's basic emotions [12], whereas the last was set based on Chinese local emotions [13]. (Table 1) lists the seven emotions and some of the corresponding words that represent them. Feature extraction must be used to convert words in natural languages into computer-processable word vectors. In this study, the word2vec skip-gram [14] was used to extract features from the collected data.
The dimension of the word vector was set to 100 [15]. Each word was represented as a 1 × 100 vector.

Fuzzy-C-Means
The FCM method was used to cluster representative words of seven types of emotion dictionaries. The coordinates of seven centers (100 dimensions) were obtained.

Emotion classi cation
The membership value for each word of the seven emotions was calculated using the word vector for each word and the center coordinates of the seven emotions. The word membership value was then used to calculate the average membership value for each emotion in the sentence. The emotion with the highest degree of membership was the nal emotion.
As shown in (Table 2), "surprise" is high at the start stage (average: 59.95%) and end stage (average: 66.17%), and low at the stage of occurrence (average: 46.58%). "Fear" is low at the start (average: 15.42%) and end (average: 15.79%), and increases at the onset (average: 20.95%). "Good" was low at the start stage (average: 10.31%) and end stage (average: 8.46%), and was high at the stage of occurrence (average: 18.89%). "Happiness" is high at the start stage (average: 12.73%) and at the stage of occurrence (average: 11.55%), and decreases at the end stage (average: 7.41%).  [17,19], it was found that the emotions of Weibo people during the outbreak of COVID-19 changed signi cantly from "anger" to "surprise."

Time series analysis results
Citizens' emotions regarding COVID-19 can be classi ed into three parts: the start stage of COVID-19, the occurrence stage of COVID-19, and the end stage of COVID-19. The stages and emotions toward it were the same as that found by previous studies on Weibo [18,19,20]. Compared to a previous study on Tweet [21], the emotions of the three stages and each stage are primarily the same, but the time partition of each stage is different because the user range is different. The details are as follows.
At the start of COVID-19 (January 24-February 6, 2020), people's main feelings were "surprise." The reason is that the citizens did not understand COVID-19 (contents about COVID-19 knowledge: 21.16% on the January 24, 2020) [22,23]. At the same time, the mood of "happiness" uctuated during this period, which was because of China's "Chinese New Year [23]." decreased from 18.89-8.46%. However, the increase in "surprise" was inconsistent with the survey [17]. This is because the public's focus was on the epidemic situation in foreign countries such as Italy, India, Brazil, and France [25]. Second, only texts were used for this study to analyze emotions. Pictograms or symbols contained in sentences were not analyzed. Pictograms and symbols contain considerable emotional information [27], and emotions are lost if not processed.

Conclusion
Page 8/15 In this study, the FCM method was used to analyze Weibo content from January 24, 2020 to March 31, 2020. In addition, people's feelings were analyzed regarding COVID-19 pandemic in three stages over time.
Throughout the period, the public's attitude toward emergencies was a "large surprise." In the beginning, people's emotions were primarily "surprised;" however after the outbreak, people's "surprise" decreased with increasing knowledge. In addition, as the number of deaths increased, people felt "fear" and "good." At the end of phase I of the COVID-19 pandemic, people's "fear" and "good" feelings were diminished as the epidemic was suppressed. People's interest shifted from China to other countries and their concern about the situation in other countries.
From the results, it is possible to understand whether a public health emergency is a public sentiment or an idea. Our ndings facilitate an understanding of public discussions and emotions about the COVID-19 pandemic among Weibo users between January 24 and March 31, 2020. By analyzing these emotions, we can provide reference materials and enable better preparation for a future public health emergency. Availability of data and materials    Overall results (ratio)