Does Twitter affect Stock Market Decisions? Financial Sentiment Analysis in Pandemic Seasons: A Comparative Study of H1N1 and COVID-19

Backgroud:Investors are always playing with the fears and desires of buyers and sellers. Stock exchange markets are not the exception. Financial sentiment analysis allows us to understand the effect of reactions and emotions on social media in the stock market. In this research, we analyze Twitter data and �nancial indices to answer the question: How do polarity generated by the posts on Twitter in�uence �nancial indices behavior in pandemic seasons? Methods:The study is based on the sentiment analysis of in�uential Twitter accounts in this �eld and its relationship with the behavior of important �nancial indices. To achieve this, we tested four lexicons to detect polarity on Twitter. Results:Our �ndings shows that the period in which the markets reacted was 6 to 13 days after the information was shared and disseminated on Twitter in the COVID-19 season, and 1 to 2 day for H1N1 season. Furthermore, in our analysis, we found that the lexicons that got the best results for sentiment analysis on Twitter were S140 and A�n. Conclusions:Financial sentiment analysis is an important technique to forecasting stock market and polarity is the most widely used technique in the �nancial area. There is a relationship between the polarity in Twitter and the �nancial indexes behavior. The most in�uential Twitter accounts during the pandemic season were The New York Times, Bloomberg, CNN News, and Investing, presenting a very high relation between sentiments on Twitter and the stock market behavior.


Introduction
Financial markets are based on expectations. As stated earlier, investors are always playing with the fears and desires of buyers and sellers. Kiymaz [17] studied 355 favorable tweets; agreeing that rumors related to earning expectations, and purchases by foreign investors generated a more signi cant impact on stock prices. Cruz and Gómez [9] con rmed Kiymaz [17] opinion and added that investors make decisions based on rumors betting on the credibility of the media that disclose them, even though knowing that the information is not always reliable. However, this trend grows with the use of online social media networks such as Twitter, Facebook, Instagram, and YouTube, which increases the exposure of topics and ideas through their platforms impacting the stock market. Emergencies are special cases, where the amount of available information is reduced, and decisions have to be made on such bases. Recently, pandemic COVID-19 changed our lives and altered the balance among stock markets around the world and created an unstable environment for several weeks. The same thing happened 11 years ago, with the pandemic caused by another the virus H1N1.
Emotions and sentiments are inherent in investors' reactions and decision-making, learning, communication, and awareness [26]. Understanding emotions is one of the most important aspects of personal development and growth, and their understanding is paramount to the emulation of human intelligence. Thus, the processing of emotions is important for the advancement of arti cial intelligence and is a task closely related to the detection of polarity and emotions [6]. One of the most widely used techniques for sentiment analysis is machine learning, which has allowed us to analyze a large amount of data. In this regard, data based on video, image, audio, and text are analyzed in the BigData, complementing its analysis with relational databases. The potential of this type of technique has extended to different elds, such as social media, where sentiment analysis or opinion mining is applied to nding patterns of behavior, identifying emotions, trends, and even inducing people's decisions.
Recently, research on sentiment analysis on nance has increased, but very few studies use social media datasets. However, it is important to identify signs or patterns that guide investors when selecting the assets that will make up their portfolios, trying to ensure that the prices of the selected stocks have a very similar trend or above the prices of the corresponding indices. Therefore, the investor must have tools that allow him/her to consider the most considerable information available for a technical and fundamental analysis that leads him/her to reduce risk and uncertainty as much as possible, as well as a maximum return.
The shared information in social media is a source that generates sentiments and reactions in investors, which in uence their decisions to buy or sell nancial assets and especially in the stock market [16].
News shared on social media, whether true or not, causes changes in the trends of international stock market indices. Some studies con rm the relationship between sentiments generated by social media in investors and market trends, and this research continues this path in pandemic seasons.
The purpose of this investigation is to compare these emergency events with similar characteristics: a health situation, epidemic distribution, and worldwide attention, and the use of social media platforms that improve their diffusion of rumors and news that alter investors' perceptions on their stock exchange.
The question guiding the document is: How do polarity generated by the posts on Twitter in uence nancial indices behavior in pandemic seasons? To answer this question, our analysis proposes a methodology based on the sentiment analysis of in uential Twitter accounts in the nancial eld and its relationship with the behavior of important nancial indices. We analyzed two crucial moments in the global nancial crises: the H1N1 pandemic of 2009 and the COVID-19 pandemic in 2020.
The document is divided into ve sections: in the rst section, we present a literature review based on affective computing and sentiment analysis in nance. In the second section, our methodology based on the analysis of the evolution of the selected nancial indices and the sentiment analysis Twitter. The third section describes the ndings. The fourth section is the discussion of results. Finally, we present the conclusions of the study and future work.

Literature Review
The literature review consists of three sections. First, the literature on emotional computing and sentiment analysis; second section, some works related to the area of affective computing and nance, and the nal section describes research on sentiment analysis and indices.

Affective Computing and Sentiment Analysis
Emotions are fundamental for a successful and effective communication between human beings. In fact, in many everyday situations, emotional intelligence is more important than intelligence quotient (IQ) for a successful interaction [4]. A modern and massive source for nding people's sentiments is social media; for instance, the content published on Facebook and Twitter shows the emotions and sentiments of people depending on any kind of event.
Affective computing and sentiment analysis are multidisciplinary areas that include academics and professionals from different disciplines, such as computer science, cognitive science, political science, economics, and nance [5,26]. Particularly, sentiment analysis is an area of arti cial intelligence, which allows us to extract and analyze data stored in social media, images, sounds, and videos. The sentiment analysis allows nding patterns or characteristics in the information published in long data sets, which can be very useful for decision-making in organizations, political movements, business strategies, marketing campaigns, and product preferences, among others [26,27]. The sentiment analysis reveals personal opinions towards entities such as products, services, or events, which can affect organizations and companies by improving their marketing, communication, production, and acquisition [18].
Cambria et al. [6] argued that sentiment analysis had been confused with the task of polarity detection: from negative to positive sentiments. However, this is only one of many natural language processing problems that must be solved to achieve human-like performance in sentiment analysis. Thus, when performing sentiment analysis, we encounter a complex problem of natural language processing that can be analyzed from different emotional or psychological models [5,23].
The opportunity to automatically capture the sentiments of the public about social events, political movements, marketing campaigns, and product preferences has raised a growing interest both in the scienti c community, for the exciting open challenges, and in the business world, for the remarkable repercussions on marketing and nancial market prediction sentiments [7].
This has led to the emergence of the elds of affective computing and sentiment analysis, which take advantage of human-computer interaction, information retrieval, and multimodal signal processing to distill people's emotions from the growing amount of social data online.

Sentiment Analysis and Finance
As argued in the previous section, sentiment analysis is the computational study of people's opinions, emotions, assessments, attitudes, moods, and emotions. It has become one of the most active areas of research in natural language processing, data mining, information retrieval, and Web mining [21]. In recent years, its research and applications have also extended to the managerial and social sciences, because of their importance for businesses, governments, and society.
Affective computing and sentiment analysis have great potential to improve the capacity of customer relationship management and referral systems. For example, to reveal what characteristics clients enjoy, or to exclude items from referrals that were received with a negative response. Business and nancial intelligence is also a major factor in the interest of companies in the elements of affective computing and sentiment analysis [4].
The rapid progress of the Internet has had a signi cant impact on the nancial eld. How to quickly and accurately extract key information from negative nancial texts has become one of the key issues for investors and decision-makers [34]. Investors have always been interested in stock price forecasting; however, little research has been done on social media and its effects on forecast results in pandemics seasons. In this regard, Day and Lee [10] argued that nancial sentiment analysis is an important area of nancial technology research (FinTech). Chen and colleagues [8] found that social media sentiment associates strongly with stock returns and identi ed the importance of social media as an additional channel to re ect stock prices.
Some studies analyze Twitter data for the sentiment analysis applied to nance. Souza et al. [31] investigated the relationship between sentiment and volume of data on Twitter, and the pro tability and volatility of the stock market, nding that social media are a valuable source in analyzing the nancial dynamics of the retail sector, even when compared to major news outlets such as The Wall Street Journal and the Dow Jones Newswires. Ahmed [2] argued that investors' reaction to a nancial event, in the form of stock price movement, re ects the severity of the event and that the words used in the published news show a similar degree of emotion.
Gomez, Prado, and Plaza [14] generated an algorithm based on arti cial intelligence that uses investors' sentiment to open long or short positions in the future Ibex 35. To measure investors' sentiment, the authors used Semantic Analysis Algorithms that quali ed as good, bad, or neutral any communication related to the Ibex 35 made on Twitter or in the media [28]. The authors demonstrated that by assessing investor sentiment, they could predict the evolution of the Ibex 35 and improved their returns relative to Sharpe, success rate, and a pro t factor of traditional trading systems.
In the nancial sector, the application of sentiment analysis has focused on identifying the polarity related to the similarity or relationship between the upward -downward nancial trends and the sentimental polarity that varies from positive to negative. Thus, Devitt and Ahmad [12] analyzed how nancial news texts affect investors' judgment, which impacts transaction volumes, stock prices, volatility, and even future company earnings, by classifying the information obtained into positive and negative. De la Orden, Martínez, and Vianez [11] found differences in the sentiments transmitted by the Spanish media, in digital format, specialized and general. They also observed that the media that transmitted less sentiment is mainly negative.
The data generated in social media have become a valuable resource for the analysis of sentiment in the nancial eld, as they have proved to be extremely important for market research companies and public opinion organizations. In this regard, Atzeni, Dridi, and Recupero [3,13] suggested a continuous polarity scoring approach in the range of [-1, + 1] to identify bullish and bearish sentiment associated with companies and stocks, nding that the ontological framework-based resource could be successfully applied for sentiment analysis within the nancial domain. They also achieved better results than traditional methods of sentiment analysis that do not incorporate semantics.
Smailović et al. [29] used predictive sentiment analysis to forecast closing stock price movements via Twitter feeds, nding that the polarity of sentiment can indicate stock price movements a few days in advance.
According to Zhang et al. [33], the daily happiness generated on Twitter in uences the performance and volatility of international indices. In this paper, they examined the relationships between the everyday feeling of happiness on Twitter and stock market performance in 11 international stock markets, dividing this feeling of happiness into quintiles from the least to the happiest days. They found that the correlation coe cients between the feeling of happiness and the index performance in the happiest subgroups 4 and 3 were higher than in the last 2 and 3 happiness subgroups and that the sentiment of happiness could provide additional explanatory power for the return index in the happiest subgroup. Third, daily happiness can increase changes in the index performance for most stock markets. Fourth, we found that the return of the index and the volatility based on the range of the happiest subgroup are higher than those of other subgroups.

Sentiment Analysis and Social Media in the Stock Market
Sentiment analysis and its effect on the stock market have produced many studies by scholars around the world [15]. Kearny and Liu [16] argued that future research on nance is related to the availability of more accurate and e cient sentiment measures and more wide-ranging studies. Chen, et al. [8] found that social-media sentiment is strongly associated with stock returns and the importance of social media as an additional channel where stock prices are re ected.
Ruiz-Martinez and colleagues [28], pioneer of studies on sentiment analysis over the nancial market, proposed an algorithm for opinion extraction in nancial news developing an ontology to understand this object of study. Another contribution to the eld found that the nancial community behaves similarly to a small-world network and identi ed critical nodes to analyze its in uence [35]. They used a novel sentiment analysis algorithm measuring tweet messages from the nodes and discover that it is signi cantly correlated to the returns of the major nancial market indices. Souza and colleagues [31] also supported the idea that social media is valuable for the analysis of nancial dynamics, using the retail industry analyzing twitter sentiment and volume, and stock returns and volatility.
Research from Liew et al. [19] trained several standard machine learning classi cation algorithms, and found that support vector machines (SVM) generate the most useful predictive sentiments. They also found a strong positive relation with tweet sentiments on daily market returns, and a weaker negative correlation with next day market returns. Another contribution from Liew and Budavari [20] is that they identi ed a social media factor showing that tweet sentiments have signi cant power in explaining the time-series contemporaneous variation in daily stock returns.
The most recent research comes from Schangir et al. [30] which uses deep learning to analyze big data in StockTwits using short-term memory, doc2vec, and convolutional neural networks [25]. Their ndings support the argument that deep learning and convolutional neural networks are the best models to predict the sentiment of authors in StockTwits data sets. The above-mentioned studies frame our research object of study, and our argument to analyze nancial markets using different tools of arti cial intelligence.
Finally, similar studies to our research sample that uses indices data such as Standard & Poor's (S&P) started with Sul and colleagues [36]. They introduced a research from the Standard & Poor's 500 index collecting twitter posts from the rms on this index and compared them with the average daily stock market returns. They found that the cumulative emotional valence of tweets about a speci c rm was signi cantly related to that rm's stock returns. Another nding was in terms of the number of followers.
If the emotional level of tweets with many followers (more than the median) had a stronger impact on same day returns, then the emotional level of tweets with few followers had a stronger impact on future stock returns (10-day returns).
Sun and Fabozzi [32] tested their model using the majority of stocks listed on the S&P 500 index. They did not evaluate sentiment, but they analyzed textual information from microblogs. They correlated movements of both stock prices and social media content. All of this is evidence that there are different perspectives in the study of nancial sentiment analysis to understand two main domains: social media and nance.

Methodology
This section describes the methodology of the study based on the comparison of different lexicons to determine the polarity on Twitter and its relation to the nancial indices behavior, and it is divided into two stages. First, the analyzed data of the stock market and the selected Twitter accounts. Secondly, the process of extracting and analyzing data to determine polarity.

Description of Data
The methodology is based on the sentiment analysis applied to Twitter accounts that in uence the stock market behavior. To achieve this, tweets were downloaded at two important moments: H1N1 pandemic and COVID-19 pandemic. For the H1N1 season, the period considered was from June to July 2009 and for COVID-19 from January to May 2020. The above periods were selected from the time when the rst stock market index peaked and began to fall. For instance, the point at which the pandemics started to in uence the indices causing prices to fall to a minimum price, which marked the lowest prices reached as a result of the pandemic; and the period of nancial disruption and uncertainty. Because the COVID-19 pandemic has been most damaging to the nancial market, we extended the Twitter data analysis period to May 2020.
The nancial indices were selected considering that they were the most representative and in uential of each continent, in addition to including companies from different sectors to have a general perception of their performance and behavior.
The considered nancial data consisted of adjusting closing prices, where each of the nancial indices in the world reached their maximum and minimum, rst in the H1N1 in uenza period and then in the Once the dates of the maximum and minimum prices were identi ed in the stock market behavior, the prices, from January 2009 to May 2020, were downloaded from the Yahoo Finance site and Investing.com, and the percentage of loss of each of the indices was calculated ( Table 2). The adjusting closing prices were selected as they represent the closing price and corporate earnings to which you are entitled once you acquire the share. The adjusting closing price was used when analyzing historical returns and was calculated by the stock exchange to which it belongs. To select the Twitter accounts, it was considered that they would include special publications in the area of nancial markets, nance, and economy. Additionally, the media where experts in the eld commented, as well as places on the Internet that published general and international news that affected the world's population and therefore affected the sentiment of investors. In this manner, Twitter accounts selected for the sentiment analysis consisted of accounts of individuals, companies or organizations, and news broadcasts that were in uential or important for nance. Some of the nancial in uencers do not have a Twitter account, so we downloaded Tweets from the company or organization they run, if there was one; furthermore, some Twitter users did not have an account in 2009 (Table 3).

Data Extraction and Analysis
Tweets were analyzed to determine the semantic orientation by using a lexicon-based approach. The lexicons used were the Bing Liu [22], Sentiment 140 [24], NRC [24], and A n [1]. In some cases, more than one tweet was published by a person or institution on the same date; so, we decided to average the strengths of the polarities of these tweets. Before computing these polarities, texts of tweets were pre-processed, removing numbers, punctuation symbols, and stop words.
We started from the assumption that the latent relationship between the nancial indices' behavior and the polarity of sentiments in tweets can occur not always on the same day; that is, this relationship could be present a few days before a publication on Twitter, or a few days later.
To investigate the latent relationship, we computed the correlation of the adjusting closing prices, and the components of the nancial index polarity ( ), de ned above. The method is as follows: we rst computed an offset or date shift ( ) for the polarity vector. The date shifts can be forward or backward. In the rst case (forward date shift), the offset is a positive integer ( ℎ + +); in the second case on the nancial index polarity ( ), is calculated considering the coincidence of the dates between the adjusting closing prices and the components of the polarity vector with shifted dates (see Figure 1).
Finally, in order to identify the Twitter posts that had the greatest in uence on the nancial indices, we performed a search to nd the value of the offset (from shift date = -7 to shift date = +7) that produced the most signi cant correlation in an absolute value: When there was not a coincidence of the dates between the adjusting closing prices and the components of the polarity vector with shifted dates; the correlation was de ned as not available (NA).

Results
The analysis of data was performed on a computer with the following characteristics.    Table 5; this example shows a perfect correlation between polarities and nancial indices found in the @business data set. ); we identify the best correlations to understand nancial market behavior and sentiment analysis (Table 6). Not obtaining correlations for date shifts between -7 and 7 for dataset @NYtimes, we decided to extend this interval. Discussion Sentiment analysis has the potential to be an important tool to improve decision-making in nance [5,26], the analysis of polarity in Twitter has the potential to predict the nancial market [29]. The information shared on Twitter affects investors' reactions, and therefore, affects stock indices, either positively or negatively, or in terms of upward or downward trends. Negative sentiments lead to lower risk tolerance [11].
One of the applications of understanding the emotions of social media consists of predicting the behavior of the nancial indices diminishing the risk and uncertainty of the investors. Since investment portfolios can be formed with the shares that follow the tendency of the indices, diversifying with the selection of shares of different world markets and with the availability of more accurate and e cient sentiment measures and more wide-ranging studies [16].
Some of the analyzed Twitter accounts were unrelated to nancial indices for several factors: 1) they had few followers, 2) they published very little, daily, or did not publish at all, 3) the publications were very speci c and could affect a station or a sector, and therefore were not perceived in the index, 4) the Twitter account did not publish relevant information for the nancial market; for these reasons, some Twitter accounts were not included in the correlation analysis.
High correlations were found in Investing, Bloomberg, and CNN Business accounts. One of the reasons they had high correlation was because they had a considerable number of followers: Investing 168,000, CNN Business 1.8 million, and Bloomberg 6.4 million and an average of 10 publications per day. Another Twitter account with high correlations was The New York Times (46.8 million followers) because it covered both nancial and economic topics, such as music, culture, sports, art, and entertainment; which generated different kinds of sentiments in investors [33]; and it is a traditional means of communication among them.
We found that in 2009, the information posted on Twitter about H1N1 is practically non-existent; unlike COVID-19 in 2020, in this manner, our ndings show that some of the Twitter posts had a more signi cant effect in the COVID-19 era, compared to H1N1. However, the effect took more days, compared to the moderate effect in 2009; this was because in 2020, the use of Twitter was intensive and more accounts published information about COVID-19 and its effects on nances.
The sentiment of investors could also predict the evolution of indices a few days in advance [33]. For the data of this investigation, the period in which the markets reacted, in the COVID-19 season, was 4 to 13 days after the information was shared and disseminated on Twitter; unlike the H1N1 season, which was 1-2 days [17].
Some studies re ect the importance of the investors' emotions in nancial markets [8]. In this research, we con rmed the argument of Ahmed [2] in relation to the investors' reaction to a nancial event, in the form of stock price movement, but applied to Twitter as there is an effect of the reactions generated by the Twitter posts on the nancial indices days later. An important factor in uencing the sentiments, found on Twitter, is the sharp drop in the nancial market caused by COVID-19, in addition to the fact that its duration and impact on the economy and health has been much greater, compared to H1N1.
We also con rm the argument of Sul and his colleagues [36], related to the number of followers on Twitter in uences the performance of nancial indices. Regardless of whether the publication is accurate or not, altering the feelings of a higher number of investors, forcing them to act in such a way that they seek the strategy that reduces risk. We veri ed that the Wall Street Journal [31] is an important source to know the behavior of the stock market. Moreover, we found three more sources: CNN News, Bloomberg, and Investingcom. We also found in our analysis that the lexicons that get the best results for sentiment analysis on Twitter were S140 and A n.

Conclusions
This investigation aims to study the effect of polarity on Twitter posts on world indices behavior in pandemics seasons. Most of the studies had been carried out under normal conditions; that is, they had not been analyzed in times of crisis, pandemics, or some disruptive economic event.
We proposed a correlation matrix that contains the polarity on Twitter and the nancial market behavior, to study their relationship through the effect displaced, in days, by sentiment analysis and the adjusting close prices. Our ndings show some important relationships days after the posts were made.
We found that the period in which the nancial markets reacted was 6 to 13 days after the information was shared and disseminated on Twitter, in COVID-19 season; and 1 to 2 days, in H1N1 season. Still, we identi ed the inverse relation, where Twitter posted information about the nancial market four days after.
Also, we found that in the 2009 pandemic, practically, there was no spread of H1N1 through Twitter and that several of the users considered in uential did not have an account in 2009. More data and more information related to COVID was spread in 2020. The most in uential Twitter accounts, in the COVID-19 season, were The New York Times, Bloomberg, CNN News, and Investing, presenting a very high relation between polarity on Twitter and stock market behavior.
We are aware that several factors in uence market behavior. In this research, we found that the effect of social media publications was more signi cant in the times of COVID-19, compared to H1N1. First, we found that there were more active Twitter accounts in 2020. Second, there is virtually no published information about H1N1 in Twitter accounts that existed in 2009. Third, the drop in the rates, in the COVID-19 era, was more dramatic because there were more speculation, rumors, and negative news regarding the 2020 pandemic.
Fourth, there is an important effect of sentiments on Twitter on nancial indices, a few days after their publication.