This section presents the research findings regarding the analysis of how major aviation events affect the performance of US Airline stocks. The previously introduced approach is used to conduct topic extraction from the data. It is hypothesized that some aviation news affects the performance of US airline stocks. Next, topics are identified that trigger significant stock price changes. In the next section, the aviation news corpus used in the analysis is introduced. Furthermore, the extracted topics from Latent Dirichlet Allocation are presented and investigated for the topic's impact on US Airlines' market prices.
4.1 Corpus
We create a corpus of aviation news as follows: First, aviation news was downloaded from Aviation Voice. Then, the news from the beginning of January 2016 to the end of December 2019 is considered. A total of 1716 news articles is obtained from scraping the website. This time frame was chosen to avoid accounting only for news affected by a single market event. This period provided the possibility of exploring significant events in aviation. Many filtering steps were applied to the news corpus. First, all articles with fewer than 100 words were removed from the data. Second, news articles that are in English are kept. Some unnecessary characters were removed from the texts. A total of 1549 articles were kept at the end. Understanding the data and being on the right track is vital to see if more preprocessing is needed before training the model.
We collect financial data from Investing.com. We retrieve Dow Jones US Airlines Index (DJUSAR) and Standard & Poor's 500 Index (S&P 500) as the benchmark corresponding to the aviation news dates from 2016 to 2019 to be accurate, coherent, and consistent in the analysis. DJUSAR was chosen to account for all the airlines included in the index, and the research concerned mainly the airlines contained in it. We restrict data to 2019, to exclude the COVID situation that affected the entire market. We collect daily stock returns for all trading days during the sample period.
4.2 Topics Extraction from Aviation news
The title of an aviation news article does not include a topic label or code clarifying the theme of their content. For this purpose, it is part of the analysis to find the corresponding topics using Latent Dirichlet Allocation ( LDA) on the created document word matrix from the aviation news corpus. We have performed all data processing and modeling using the python programming language in Jupyter notebook.
The most important tuning parameter for LDA models is n_components (number of topics). For obtaining the optimal number of topics, there are many methods such as empirical likelihood (Li and McCallum, 2006), marginal likelihood (Newton and Raftery, 1994; Griffiths and Steyvers, 2004; and Wallach, 2006), perplexity (Blei et al., 2003), hierarchical Dirichlet processes (Teh et al., 2006), and so on. For this paper, we apply a grid search to find the optimal number of topics for our model. We find ten topics and extract the top ten words per topic.
Before validating the topics obtained from topic modeling, we must label the topics. There are certain automatic labeling methods (Mei et al., 2007). However, these methods are not convenient for this study, where the labeling needs domain knowledge, being aviation knowledge. To ensure quality labeling, most topic model researchers (Chang et al., 2009) infer the topic and label manually. We also perform a manual labeling procedure for each topic and assign a meaningful name based on the terms and the content of each document. For example, topic one is named Aviation Training and maintenance because its top ten stemmed terms are “pilot,” “training,” “aviation,” “maintenance,” “airline,” “say,” “program,” “need,” “flight,” and “service”. It is reasonable to choose the topic with the highest probability as selected topics have a probability of close to 100% for most news. The topics are inferred according to their keywords and put into the data frame. Table 1 below shows the top ten words with assigned topics.
Table 1. Inferred dominant topics by keywords and highest probability.
Word 1
|
Word 2
|
Word 3
|
Word 4
|
Word 5
|
Word 6
|
Word 7
|
Word 8
|
Word 9
|
Word 10
|
Topics
|
|
Topic 1
|
aircraft
|
flight
|
engine
|
Test
|
say
|
aviation
|
design
|
technology
|
program
|
fuel
|
Aviation Technology and Fuel
|
Topic 2
|
pilot
|
training
|
aviation
|
maintenance
|
airline
|
say
|
program
|
need
|
flight
|
service
|
Aviation Training and Maintenance
|
Topic 3
|
flight
|
aircraft
|
plane
|
Say
|
crew
|
passenger
|
report
|
crash
|
airport
|
engine
|
Aircraft Accidents and Incidents
|
Topic 4
|
aircraft
|
boeing
|
max
|
Order
|
airbus
|
airline
|
boee
|
delivery
|
say
|
airplane
|
Aircraft order and delivery
|
Topic 5
|
aviation
|
faa
|
safety
|
Drone
|
say
|
use
|
issue
|
pilot
|
process
|
datum
|
Aviation Safety
|
Topic 6
|
air
|
force
|
fighter
|
defense
|
lockheed
|
jet
|
state
|
mission
|
aircraft
|
japan
|
Air Force and Defense
|
Topic 7
|
say
|
engine
|
whitney
|
lufthansa
|
traffic
|
controller
|
air
|
house
|
fee
|
pratt
|
Air Traffic Controller
|
Topic 8
|
jet
|
business
|
company
|
charter
|
cost
|
travel
|
ita
|
price
|
hour
|
plane
|
Air Travel Cost
|
Topic 9
|
year
|
air
|
market
|
growth
|
demand
|
increase
|
airline
|
passenger
|
business
|
grow
|
Air Travel Demand
|
Topic 10
|
airline
|
flight
|
airport
|
passenger
|
carrier
|
service
|
fly
|
route
|
delta
|
air
|
Airports
|
The topic LDA is built and takes the text through the same routine of transformations done for the words before predicting the document topic. In this process, we assign a dominant topic to each document within the data frame, the dominant topic being the topic with the highest probability. Table 2 below shows predicted topics for each article of aviation news within the original dataset. In the section of Stock Market Response across topics, each news document relates to the corresponding DJUSAR abnormal return.
Table 2. Prediction result of the original dataset
We use a popular visualization package to help interactively better understand the relationships between the topics and interpret individual topics. The topic visualization helps select each topic to view its most frequent terms using different values of the λ parameter. Figure 3 below shows topics visualization with parameter λ=1 and topic two is selected. We explore The Intertopic Distance Plot to help learn about how topics relate to each other and to observe the possibility of higher-level structure between groups of topics. The areas of circles represent the importance of each topic over the entire news corpus. The distance between the center of circles reveals the similarity among topics. For each topic, the histogram on the right side lists the top 30 most important terms with their estimated level of frequency for selected topic two.
4.3 Stock Market Response across topics
We now analyze the reaction of DJUSAR’s cumulative abnormal returns to the disclosed news across the extracted topics. We compute Cumulative Abnormal Returns (CARs) over 3, 5, 7, and 15 days to investigate the magnitude of the relationship. The classification of a news article belonging to a particular topic follows a logical approach in the topic modeling. We assign the dominant topic by looking at the topic with the highest contribution to the news article. Each document or news article has a probability distribution per topic. Figure 5 below shows the topic probability distribution per document. For example, topic one has 95% contribution to document one while other topics have nearly 0% contribution. So, the model assigns topic one as a dominant topic to document one. The model finds the probability of each topic belonging to a document and assigns those topics with their probability to all the documents from the news corpus.
As shown in Figure 5, we download the Excel file from the topic modeling with all the documents and their topics probability distribution. Then, we transform the dominant topic variables to indicator variables in order to represent each topic. Then, we calculate the cumulative abnormal returns over 3, 5,7, and 15 days. To run the multiple regression models, we use the topic indicators as the explanatory variables and the CAR as the dependent variable. This design helps us to understand how selected dominant topics affect CAR.
Table 3. Results of multiple regression models
As shown in the results from Table 3, in order to reduce the skewness of the variable, we apply log transformation to the dependent variable in the models . We estimate five models using CAR over each of 3, 5, 7, and 21 days as dependent variables to determine the impact and the magnitude of topic indicators.
For robustness check, we log transform both dependent and independent variables and find that the results remain qualitatively similar. We report the results of robustness tests in Table A1.
As shown in Table 3, aviation technology and fuel (Topic1) and Aviation training and maintenance (Topic2) are statistically significant at 10% e level to explain the CAR over day 3. For example, aviation technology and fuel topic has a coefficient of 0.278. We get (exp (0.278) – 1) * 100 = 32.04. This calculation suggests that for every one percent increase in the topic in the press, CAR increases by about 32%. The aviation technology and fuel topic is positively significant in explaining CAR over 15 and 21 days.
The results of estimated coefficients of the five different models are summarized in Table 4 for straightforward interpretation.
Table4. Summary of multiple regression models
|
CAR03
|
CAR05
|
CAR07
|
CAR15
|
CAR21
|
Aviation Technology and Fuel
|
+
|
|
|
+
|
+
|
Aviation Training and Maintenance
|
+
|
+
|
|
|
+
|
Aircraft Accidents and Incidents
|
|
|
-
|
|
|
Aircraft order and delivery
|
|
|
-
|
|
+
|
Aviation Safety
|
|
|
|
|
+
|
Air Force and Defense
|
|
-
|
-
|
|
|
Air Traffic Controller
|
|
|
|
+
|
+
|
Air Travel Cost
|
|
|
-
|
|
|
Air Travel Demand
|
|
|
|
|
+
|
Airports
|
|
|
|
|
+
|
Table 4 shows that all ten topics are significant determinants of CAR over different moving average periods. Three and five days are considered as short-term moving averages, seven days as medium term, and 15, and 21 days as long-term moving averages. Overall, we observe a positive link between the topics related to technology and fuel, as well as training and maintenance. In the medium term, safety related topics, topics related to defense and costs are negatively associated with cumulative abnormal returns.
Topic1 related to technology, fuel, aircraft, engine, and Topic2 related to pilot, training, and maintenance are positively correlated to the stock returns in the short and long run. The results indicate that one-unit increase related to these topics in press leads to an increase by a certain percentage in cumulative stock returns. It is notable that in the long run, these estimated coefficients are slightly higher, showing that a change in those values leads to a relatively higher stock return.
Topic4 (delivery, aircraft, Boeing, max), Topic5 (Safety, FAA), Topic7 (air, traffic, controller), Topic9 (market, growth, passenger, demand), and Topic10 (flight, airport, service, route) have prolonged effects on stock returns as these variables are significantly positive only in the long-term moving average estimation.
Topic3 (passenger, aircraft, crew, airport, crash), Topic6 (defense, fighter, mission, force), and Topic8 (business, company, travel, cost, price) have negative effects on the stock return in the medium term of the moving averages. These topics have significant negative effect to the stock returns. For example, Topic8 has a coefficient of -1.044 and (exp (-1.044) – 1) * 100 = 64.79. For every one-unit increase in Topic8 in the press, CAR decreases by about 65%.