Russia’s Covid – 19 Vaccine: Social discussion and first emotions


 Social media chats related to healthcare are the prodigious basis of analysing the emotions of the people. Now, the Covid-19 vaccine is being the prevalent hope of nearly the entire mankind in the planet. Russia’s first vaccine announcement kindled the various rays of emotions among the social media users which are shared as tweets. The tweet data is collected and analysed for the emotions and psychology of the users along with the topic of interest in their discussion. Using computational methods and algorithms such as machine learning and LDA, the social emotions are revealed and presented.


Introduction
The personal opinions of people collectively can be organized and mined for extracting the entity-based sentiment. Opinion mining drives the subjective analysis part on the texts which has expressions.
Linguistic sentiment analysis can be done using the social blogging site such as Twitter from which emotions can be easily portrayed as sentiments. [1] The short tweets shared by the user pose several challenges in identifying the exact opinion and sentiment since it has a highly unstructured form of the text which contains emojis and sarcastic comments. [2 -6] An outbreak of the covid-19 pandemic which spreads exponentially from china to many parts of the world creates the biggest impact on global health. [7 -10] Since there is no prescribed drug to cure the corona virus infection, the World Health Organization suggests the standards for the caution and protection of public from the disease.
[11] So, the prime hope for the people is the SARS-CoV-2 vaccine which gives the immunity against the virus. Vaccine development is a very tedious process which includes many stages of trail phases that can run through years. On a pandemic situation, where the world relies on the vaccine, researchers are developing the vaccine at a pandemic speed. [12] In a global race to the vaccine production, some of them has successfully reached the phase 2 and phase 3 of clinical trials.
[13] Amidst the situation, Russia has launched its vaccine for Covid 19 named Sputnik V on August 11, 2020 and claimed to be rst of its kind released for public use. [14] People around the world poured their views as tweets after the o cial declaration of the vaccine by Russia. The large scale emotion on the vaccine for pandemic is a great source to understand the sentiments of people and to uncover their topics of discussion along with it.
In this work, we address the sentiment analysis of the rst vaccine launched for covid 19 pandemic.
We present the contributions of the paper as follows: We have manually annotated the tweets containing the sentiment on the Russia's covid -19 vaccine. The data set consists of 4375 tweets on English language. Every tweet in the dataset is manually given either a positive or negative label by two annotators.
We did the analysis to reveal the sentiment of the tweets by using the popular VADER sentiment lexicon and evaluated the manual labelling using machine learning We have analysed the topics which were discussed along with the vaccine using Latent Dichelet Allocation algorithm.
The paper is organized as follows: related works are listed in section II, Data collection and labelling in section III, sentiment analysis in section IV, topic mining in section V. The results are discussed in section VI followed by the conclusion in section VII.

Related Work
Social sentiment of the Covid-19 pandemic has been studied using the social media data such as tweets. Paper [15] collected tweets for a fourteen days period of time which uses nrc sentiment lexicon. They have presented that the majority of the tweets were of positive sentiment and also scaled the Plutchik's emotions. Paper [16] perceives the topic modelling and the dominant emotions on the two weeks of twitter data on covid-19. Paper [17] extracts tweets on various hashtags such as "#COVID19", #SARSCoV2", "#CoronaVirus" over a week of time using twitteR API and the sentiments has been studied which revealed 65.6% of negative sentiment. Paper [18] collects nearly 63 million tweets and conducted the machine learning experiment using NLP on two cases which includes (i) ten topics based on LDA (ii) Sentiment analysis using CrystalFeel. Sentiments on the other attributes of covid -19 such as "lockdown", "stock market and economic impacts", "work from home" , "news", "masks", "Reopening" were discussed in Papers [19 -25]. Previous studies have analysed the sentiments on various aspects of Covid -19 and its impacts. To the best of our knowledge, there is no separate work trailed on the global sentiment of Covid-19 vaccine release. Moreover, majority of the previous works uses automatic text annotation while our work annotated the tweets manually which reduces many pitfalls in the understanding of the emotions on tweets over the particular context.

Method DATA COLLECTION AND LABELLING
We have collected the tweets using the Twitter API which is related to the Russia's vaccine on Covid -19 using the hashtags "#russia", "#russiavaccine", "vaccine".
In this work, we have collected the tweets using the Twitter API on August 11, 2020 the day which Russia claimed to have the vaccine for Corona virus.
The initial labelling is done manually for all the tweets in two types such as positive or negative. The data contain a variety of sarcastic tweets such as "This tree needs a vaccine, clearly." express the sense of dissatisfaction of the people regarding the vaccine release. The tweet which mentions "Russia has a COVID vaccine. Polonium is so versatile." generally would be labelled positive since there is no negative lexicons present. But it express negative emotion to the context of vaccine. The other tweet mentioned, "This appears to be a 20ml shot of real vodka" which gives the sarcastic comment on the vaccine.
Labelling such tweets on the particular context either as positive or negative is crucial and often mislead the sentiments of the overall text. So we have done manual annotation of the tweets as the part of our work.

Sentiment Analysis
Valence Aware Dictionary and sEntiment Reasoner is a specially developed to read the social media text sentiments which contains more of emojis and short texts [26]. Vader lexicon is the proven to be the outperformer in analysing the social media text snippets. The polarity scores of the tweets were calculated using the Intensity Analyzer of vader lexicon. The scores were on two major classes such as positive, negative. Followed by that, the compounded score of each tweet is calculated and tabulated.
After computing the score if the sentiment is assigned as,

Pos if Cs>=0
Neg if Cs<0 Table 1 Putin added that one of his daughters had already taken it; he said she had a slightly higher temperature after each dose, but that: "Now she feels well."  The sentence level sentiment analysis is carried out in the tweets using a two part process of labelling them. The tweets are investigated using the vader lexicon is compared with the manual labels using the accuracy calculation. Accuracy, Precision and Recall [27] is calculated using the following equation: The precision, recall and f1 scores has also been calculated.

Topic Mining
To identify the topics discussed along with the vaccine, we have used Latent Dirichlet Allocation for topic allocation. The topic model follows the Markov approach where the next state of the model depends on the current state [28]. Initially the state is set as random. Repeatedly the best topic for the word is selected.
Consider document D has multiple number of topics T N . The LDA algorithm has two steps (i) Generative process (ii) Topic distribution.
Dirichlet Distribution: A k -dimensional Dirichlet random variable q can take values in the (k-1) simplex. The probability density is , More concentratedly, we use the topic modelling algorithm to cluster the set of topics based on the word occurrences in the tweet. The algorithm calculates the probability of words which binds to a particular topic from the mixture of topics.

3: Remove Stop words in English
Step 4: LDA Topic Modelling Step 4.1: Structuring Input G n Step 4.2: Matrix Representation Step 4.3: Extract Features F Step 4.4: Set Components and Iteration Step 4.5: Print Topics We conduct the work in two parts : Pre processing tweets and applying the LDA Gensim algorithm. We have considered each tweet as a document D and assigned the set of tweets to a Corpus C where D 1 , D 2 , D 3 ,…. D n C. The corpus is converted to lower case for e cient cleaning and stored in G. The cleaning of tweets follows tokenization process includes removing punctuations and URLs present in each document. Words belong to the following four categories (Verbs -VBP; VB; VBG, Noun -NN, Adverb -RB and Adjective -JJ) are being tokenized after stop word removal. The documents in the form of word set has to be converted into a structured input for LDA. We use CountVectorizer for feature extraction which assigns an integer to each term in the corpus and convert them into vector. The parameter tuning of LDA algorithm is essential to get accurate results. The dimensionality (number of topics) 'k' is assigned to '2' since the context of the tweets are converging to a small topic called as Covid vaccine.
The number of iterations is set to 200 for evaluation.

Discussion And Results
In this section, we discussed the results derived in the analysis. Firstly, the tweets which are manually annotated has been segregated into two classes such as positive and negative and the graph is plotted to show the percentage of tweets in each class. The graph depicts that, nearly 90 % of them are positive and 10% are negative tweets.
Secondly, the labelled tweet classes are compared with the sentiment lexicon (VADER) and the performance is tabulated. The overall accuracy achieved is 74.2%. The precision, recall, f1-score is been tabulated for both the classes positive and negative. Finally, the discussions on the topic Covid vaccine is retrieved. The Latent Dirichlet Allocation algorithm extracts the highly in uencing topics over the tweets. The high scaled probability topics contain the words such as, Safety of the vaccine" and "Trump, Election and Putin"

Conclusion And Future Work
In this work, we collect a sample set of tweets to understand the sentiment of Covid 19 vaccine proposed by Russia using machine learning. The topic modelling is carried out with LDA. In future work, this can be extended with advanced sentiment analysis models with speci c features and comparative analysis can be done with several other discussions on vaccines. Topic modelling can be extended to reveal detailed analysis.
Abbreviations LDA-Latent Dirichlet Allocation VADER-Valence Aware Dictionary and sEntiment Reasoner Declarations Availability of data and materials Data has been collected by the authors themselves using the twitter API.

Competing interests
The authors declare that they have no competing interests

Funding No Funding
Authors' contributions Author 1 has collected, annotated, analysed and interpreted the data.
Author 2 has annotated the data and has checked the accuracy for the analysis. Topics with high probability words