An exploratory analysis of public opinion and sentiments towards COVID-19 pandemic using Twitter data.

Objective: Twitter data have been increasingly used to address health-related issues. However, little is known about their potential for understanding public opinions and sentiments of the current COVID-19 pandemic. The present study explores public opinion and sentiments about the COVID-19 pandemic using Tweets from 3 popular Coronavirus-related hashtags (#COVID19, #Coronavirus, #SARSCoV2). Results: Of the 39,726 Tweets analysed, we found that over 60% of words used within Tweets in all hashtags (#COVID19, 63.9%; #Coronavirus: 65.6%; #SARSCoV2: 63.5%) conveyed a negative mood towards the pandemic. Our results also showed similar trends in Tweet volume in #COVID19 and #SARSCoV2, with a spike in the number of Tweets on the 3rd and 6th of April 2020. Further exploration of Tweets in both hashtags revealed similar Twitter discussions related to topics on “Hydroxychloroquine” and “Hospitalisations of the British Prime minister” and “ the attainment of 1 million cases of coronavirus globally”. The �ndings of this exploratory study indicate that there is potential for using data generated from Twitter to understand general public opinion and sentiments towards the COVID-19 pandemic. However, caution is needed due to several limitations in this study. It is also important for future studies to explore the context around Tweets.


Introduction
The COVID-19 pandemic, currently affecting millions of people globally, has resulted in a rise of the use of social media platforms (e.g.Twitter, Facebook, Instagram, Weibo, TikTOk and WeChat).Due to social distancing and movement restrictions, many individuals and organisations are taking to social media to communicate, share and express their opinions, sentiments and experiences about the pandemic.
Twitter is one of the most popular social media platforms.It is a microblogging service with over 300 million active users [1].Twitter users share personal opinions and experiences by posting 280-character messages known as Tweets.Twitter has been increasingly used as a data source for health-related research, offering a more e cient means of data collection over traditional survey methods [2].For instance, it has been used to examine disease stigma [3], and to monitor disease pandemics [4] [5].More recent studies have used Twitter data for the assessment of public sentiments, attitudes and opinions about health related issues [6] [7].These studies have shown that there may be scope for using Twitter generated data to provide insight about the public opinion on the current COVID-19 pandemic.
The present study aims to explore the use of Twitter data to understand public opinion and sentiment about the COVID-19 pandemic using Tweets from three popular coronavirus-related hashtags (#COVID19, #Coronavirus and #SARSCoV2).

Methods
We downloaded publicly available Tweets from three popular Coronavirus-related hashtags (#COVID19; #SARCoV2 and #Coronavirus).Tweets were consecutively downloaded over a 7-day period (1st April -7th April 2020) via a Twitter search Application Programming Interface (API) called from within R software the 'twitteR package' [8].We downloaded 10,000 Tweets per hashtag per day, consisting of a total of 210,000 randomly generated Tweets written in English.Each downloaded Tweet included information about the date the Tweet was created, status source (i.e.device used to post Tweet), screen name or user handle, the number of re-tweets ('reposted tweets'), Tweet text and weblinks or URLs of the Tweet.
Tweets from all three hashtags were merged and pre-processed by removing duplicates including retweets, emoticons, emojis, punctuations, special characters and URLs.After pre-processing, a total of 39,726 Tweets were analysed.Trends in the volume of Tweets and frequencies across the 7-day period were described using line and bar graphs.Tweets were analysed for sentiments using the Bing sentiment lexicon from the R 'tidytext' package [9].The Bing lexicon detects the sentiment of words through a dictionary lookup and then classi es words into binary "positive" or "negative" sentiments.All analyses including graphs were produced in R version 3.5.2.

Tweet volume and frequency
After pre-processing, a total of 39,726 unique Tweets were identi ed across the three COVID-19 related hashtags.The volume of Tweets (Fig. 1) in each hashtag varied considerably across the 7-day period: #COVID19 (1,518 -2,259), #SARSCoV2 (1,084 -2,640), #Coronavirus (1,566 -1,982).Trends in the volume of Tweets for #COVID19 and #SARSCoV2 were similar across the 7-day period, and both hashtags had a peak in the number of Tweets on the 3rd (Friday) and 6th (Monday) of April (Fig. 1).  1 provide some examples of Twitter discussions on both days.Results presented in Table 1 show that topics related to "Hydroxychloroquine" and "Hospitalisations of the British Prime minister" and "reaching 1 million cases of coronavirus globally" were topics similarly discussed by users of #COVID19 and #SARSCoV2 on both days.
In terms of frequency, Coronavirus related terms (e.g."Coronavirus", "covid19" and "sarscov2") were the most frequently used terms across all three hashtags (Fig S1).Each had a frequency count of more than 6,000.The words "Pandemic", "trump", "people", "Covid", "coronavirus" were similarly tweeted across all hashtags.Results of all three hashtags combined showed that "coronavirus", was the most tweeted word, occurring more than 7000 times.Results of the sentiment analysis (Fig 2 .)showed that the majority of words used in the Tweets posted conveyed a negative sentiment towards the pandemic (#COVID19, 63.9%; #Coronavirus: 65.6%; #SARSCoV2 63.5%).The #Coronavirus had the largest proportion of words with a negative sentiment (65.6%).The words "trump" and "virus" were the most tweeted words that conveyed a negative and positive sentiment respectively (See Table S1).

Discussion
This exploratory study has shown that there is potential for using data generated from Twitter to understand general public opinion and sentiments towards the COVID-19 pandemic.We found that the volume of Tweets varied considerably across hashtags during the study period.The volume of Tweets across two hashtags (i.e.#COVID19 and #SARSCoV2) followed a similar pattern to each other, both experiencing a peak in Tweet volume on the 3rd and 6th of April 2020.A sharp increase in Tweet volume is an indicator of public interest or attention [5].Further exploration of Tweets from the two hashtags on both days indicated that these similar spikes in Tweet volume may be linked to discussions on certain key events preceding those days.First, on the 2nd of April, 2020 it was announced that the con rmed cases of Coronavirus had surpassed 1 million globally based on data from Johns Hopkins University [10].Second, on the 5th of April the UK Prime Minister was admitted to hospital for coronavirus symptoms.
Another topic of interest during these dates, revealed by exploring treats, was the use of 'Hydroxychloroquine/ Azithromycin' to treat COVID-19.These events may have generated greater interest and Twitter discussion, particularly amongst users of #COVID19 and #SARSCoV2 hashtags.These ndings suggest that changes in Tweet volume or frequency could provide greater insight into how the public react to changing events during the coronavirus pandemic.Further research should examine how public sentiment varies with Tweet volumes.
We found that most words used within Tweets (i.e. over 60%) conveyed a negative sentiment or mood towards the pandemic.Our results also showed that #Coronavirus contained the most words with negative sentiment (65.6%).Our study, however, did not consider the context of Tweets or how words were used within Tweets.Therefore, it is di cult to conclude whether the large proportion of negative words identi ed within Tweets across hashtags actually re ected negative sentiments towards the pandemic during the 7-day period.Future studies should use more advanced text analytic techniques (e.g.Topic modelling, n-grams) to con rm these ndings.In addition, using frequently identi ed 'negative' words as a basis for further exploration may provide further insight and inform a greater understanding of public sentiment about the pandemic.

Conclusion
The study has shown that there is scope for using Twitter data to e ciently and rapidly explore public sentiments and opinion about the current COVID-19 pandemic.Our results revealed some interesting insights from the analysis of Tweet trends and sentiments.However, greater consideration of the context of words or how words are used within Tweets is required.

Strength and limitations:
major strength of this is the use of publicly accessible Twitter data to rapidly explore general insights about public mood towards the COVID-19 pandemic.
Nevertheless, there are several limitations with this exploratory analysis.First, Tweets used in the analysis were randomly selected English Tweets derived from three popular hashtags and are limited to a 7-day period.Therefore, we must exercise caution in interpreting and generalising the ndings from this study.Future work should consider using data from other coronavirus-related hashtags, including non-English Tweets and explore Tweets across an extended timeframe or at multiple time points.Second, we used a limited number of Tweets due to daily restrictions imposed by the Twitter search API.Future research should consider using a larger dataset to examine public sentiments and opinions with a search API that allows more exibility in terms of the number of Tweets that can be downloaded.Third, we cannot verify the accuracy of the content analysed from Tweets because of the rise of misinformation or 'infodemic' as highlighted by the WHO [11].Fourth, we did not have access to geolocated of Tweets (i.e.geographical location of tweets).Thus, it is di cult to ascertain whether tweets are globally representative of public opinion.It would be helpful for future studies to capture the geographical location of tweets.Finally, we used limited categories for the analysis of sentiments (i.e.'positive' or 'negative').It is possible that words which convey a neutral sentiment may have been misclassi ed.Additionally, and importantly, within this analysis we did not consider the usage and context of the words within tweets.For example, the word 'positive' -may not necessarily indicate something good or positive (e.g."tested positive" or "tests positive") but was translated to indicate positive sentiment due to the lexicon used (Table S1).Further research is needed to better understand the context around tweets by using lexicon that captures the entire content in a Tweet.In addition, supporting results of sentiment analysis with emoticons and emojis would add further contexts.

Figure 1 :
Figure 1: Trends in the volumes of Tweets by hashtags

Table 1 :
Examples of Tweet posted on the days with highest Tweet Volumes in #SARSCoV2 and #COVID-19 UK PM Boris Johnson in intensive care with #COVID19 This virus can hurt anyone Im terrified for my sick, sufferin… So Boris has been moved to ICU.It's quite clear that #SARSCoV2 does not discriminate.Whilst I do not agree with a… I was at a hospital where there were a few #coronavirus patients and I shook hands with everybody, you will be ple…