Vaping-related news events and their relationship with sentiment in the online vaping environment: A computational interrupted time series analysis with large-scale public data

Background: Vaping-related news coverage may have furthered misconceptions around the relative harms of vapes. Also, some positive opinions around vaping may be derived from misinformation, perhaps creating inimical health outcomes. Thus, we need to study how vaping-related news events (e.g. 2019 vaping illness epidemic, COVID-19) are associated with sentiment in the online vaping environment, to better understand how to promote vaping as a potential harm reduction technique for those who smoke and are unable to quit, and to minimize vape-centric misinformation that could lead to reduced health outcomes. Methods: We obtained vaping-related online data through web-scraping several online environments from August 1 2019 - April 21 2020. Sentiment analysis was performed to understand changes in sentiment in the online vaping environment in relation to vaping-related events, such as the Trump administration’s planned ban on ﬂavored vaping products, and when COVID-19 was ﬁrst reported to the WHO. Results: For all online environments, we observed a statistically signiﬁcant negative association of 15% (Estimate: -0.16; 95% CI: -0.29, -0.03; P: 0.01) between sentiment score and the Trump administration’s move towards a ban on ﬂavored vaping products, and a statistically signiﬁcant positive association of 7% between sentiment score (Estimate: 0.07; 95% CI: 0.01, 0.14; P: 0.02) and when COVID-19 was ﬁrst reported to the WHO (December 31 2019). Conclusions: News events may be related to sentiment in the online vaping environment, depending on the event. Depending on the nature of the event, we suggest that public health messaging may improve health outcomes.

Background E-cigarette use (vaping) is likely less injurious to health compared to combustible cigarettes, due to reduced production of toxic chemicals and carcinogens [1,2]. Despite this evidence, many people who smoke in the US perceive e-cigarettes (vapes) to be at least as dangerous to health as combustible cigarettes [1,3]. Such misconceptions may dissuade people who smoke and are unable to quit from switching to vaping [4]. If quitting smoking is not a viable option, switching to vaping may improve overall health outcomes [5,6]. While youth vape use has declined since 2019, its prevalence remains high. As of 2020, 4.5% of US adults and 3.6 million middle and high school students used e-cigarettes [7]. Sales from 2010-2016 show strong early growth followed by considerable slowing over time [8]. In the US, the current consensus is that vaping is not a smoking cessation method, as no vape has been approved by the Food and Drug Administration as a safe and effective cessation product. The US scientific consensus is that vape aerosol contains fewer numbers and lower levels of toxicants than smoke from combustible tobacco cigarettes [9]. However, use of vapes results in dependence on the devices, but with apparently less risk and severity than that of combustible tobacco cigarettes [9]. Among youths, vape use is associated with increased risk for cigarette initiation [10]. Among adults, a Cochrane review found that nicotine vapes probably do help people to stop smoking for at least six months, working better than nicotine replacement therapy and nicotine-free e-cigarettes [11].
News coverage around vaping-related events may have furthered misconceptions around the relative harms of vapes [4]. For example, regarding the recent outbreak of vaping-related lung injury (EVALI), most cases were related to consumption of vitamin E acetate, an additive included in some tetrahydrocannabinol (THC) devices [12]. However, news reports did not always differentiate between THC devices and standard nicotine-based vapes [4], perhaps disproportionately characterizing vaping harms. In addition, in response to EVALI, the Trump administration proposed to ban some nicotine-based vaping products [13], despite most EVALI cases being related to vapes that contained THC and vitamin E acetate [14]. This event was heavily featured in the news, with rising news coverage peaking in September 2019 [15].
More recently, there have been similar news articles and research involving COVID-19. Several studies have indicated that those who vape are more vulnerable to COVID-19 infections or more likely to develop serious complications once contracting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as compared to people who do not vape [16,17,18]. Such information may dissuade people who smoke from switching to vaping, perhaps potentially increasing the overall tobacco mortality burden [19]. However, improved opinions around vaping may not always be beneficial to tobacco control. For example, during COVID-19, some individuals had positive sentiment about vaping, usually around vaping cannabidiol (CBD) as a possible COVID-19 cure, derived from incorrect beliefs that vaping may be a COVID-19 treatment [17]. Vaping CBD as COVID-19 treatment is still largely unsubstantiated, and likely a form of misinformation [20,21]. Thus, we need to study how such news events are related to sentiment around vaping, to better understand how to promote vaping as a potential harm reduction technique for those who smoke and are unable to quit [22] and also ensure that positive sentiment around vaping is not borne from misinformation, possibly affecting health outcomes of those who vape.
A previous study detailed how exposure to vaping-centric news shaped individuals' normative understandings around real-life vape use [23]. This study used an experiment to show that news articles could influence individuals' perceptions of vaping prevalence [23]. Another study used data on consumer risk perceptions from two surveys conducted before and during EVALI to examine differences in risk perceptions between these periods [24]. This study reported that the first EVALI news event around the Centers for Disease Control and Prevention (CDC) warning consumers to avoid all vaping products [25] increased the proportion of individuals who believed vapes to be harmful. As news around EVALI died down, the perceived risk of vaping decreased [24]. Past work has demonstrated that news sources can influence risk perceptions and normative perceptions around vaping, especially around perceived risks of these products. Risk perceptions are associated with health-related behaviors and thus perceptions around vaping may influence use [26].
Previous research has not explored how news events are associated with sentiment in vaping-related online data from a range of sources, nor explored how COVID-19 news related to vaping is associated with sentiment in online vaping environments. Detailing a large scope of sources is necessary to document the broad range of vaping conversations online and how they are related to news events. Exploring how COVID-19 news events are associated with sentiment around vaping is key to ensuring that positive sentiment around vaping is not driven by inaccurate information, inimical to health outcomes.
We mapped temporal trends (August 1 2019 -April 21 2020) in the online vaping arena using sentiment analysis to show how various news events were related to sentiment around vaping, in various online environments (e.g. social media, forums, news media). Sentiment analysis can identify if expressions in text detail positive or negative opinions. We aim to provide insight around improving health outcomes of people who smoke, amid increased risk perception around vaping [5,6], and improved sentiment around vaping drawn from possibly inaccurate information.

Ethics statement
Approval and informed consent were not needed as all data was publicly available, based on practices in similar past research [27,28].

Data acquisition and processing
Data was obtained using a textual query (web-scraping) which scanned a data pool of approximately 200000 different US-based domains such as public forum posts, blogs, news articles, message boards, healthcare provider forums and social media (see Supplement for full list). The data collection process was conducted following guidelines established by Kim et. al (2016) [29]. We attempted to collect data from all publicly available sources that addressed vaping, both generalist and specialized sources. We thus obtained data from generalist sources such as Facebook posts, and specialized vaping forums.
Textual queries automatically search indicated sources for text fragments related to keywords, in this case, keywords such as vape, vaping, and e-cigarette (see Appendix for full list). The textual query extracted text fragments (e.g. sentences or paragraphs surrounding each keyword) instead of the full articles or posts. Keywords were drawn from vaping-specific keywords used in systematic and scoping reviews around vaping [30,31]. To validate the accuracy of the textual query in retrieving data regarding vaping, we handcoded a randomly generated sample of 100 text fragments, maintaining a distribution similar to the distribution of sources. Two coders coded whether these text fragments regarded vaping and we achieved >90% retrieval accuracy. We did not code data for misinformation as our goal was to determine the sentiment of text, not whether it was misinformation or otherwise around vaping. The time period August 1 2019 -April 21 2020 was chosen as it included several key vaping events, especially those related to the US outbreak of ecigarette product use-associated lung injury (EVALI) in late 2019. It also provided sufficient data to detail news events around COVID-19. As we used a broad range of sources, we likely captured both organic and commercial posts around vaping. The content of such posts are significantly different [32]. We did not account for the difference in content between these different sources as our goal was to capture as much of the online vaping arena as possible and see how sentiment was associated across various online environments, not to differentiate the types of online content. We then processed the data for analysis as follows: 1) duplicate entries were removed; 2) keeping informative text by filtering out entries not in the 10 -8000 character range. Text shorter than 10 words did not normally contain useful information. Text longer than 8000 characters tended to be short stories that contained information about vaping; 3) using the key words in the Appendix we further subset the entries to ensure that retained content was related to vaping; 4) text in non-English languages, emojis, punctuation, room reviews from Tripadvisor.com were removed.

Outcomes of interest
We first assembled a preliminary list of key vaping events based on a review of online vaping forums, editorials, and peer-reviewed vaping research, and consulted experts on tobacco cessation, to result in a final list of seven events as below. Criteria for event selection were as follows: 1) Minimization of redundancy. If there were related events on September 1, 5, 19, we selected the September 1 event; 2) Relative importance. We dropped events that were discussed minimally compared to other events on vaping forums and other related media. Events were as follows: 1) CDC announcing an investigation into vaping-related illnesses on August 17 2019 following an outbreak in 14 states [33]; 2) Trump administration plan to ban some vaping products on September 11 2019 [13] . As COVID-19 seems to affect perceptions around vaping, both as a risk factor and a form of treatment, it is possible that the announcement of major developments in the pandemic would be related to sentiment around vaping [17].

Sentiment analysis
Sentiment analysis was performed on the dataset. The unit of analysis was text fragments aggregated by day, regardless of source. Sentiment analysis is the computational study of people's opinions towards entities such as products, services, and events [38]. Sentiment analysis can detect polarity (positive or negative sentiment) in text, in this case within the online vaping environment, but not specific emotions (happiness or sadness) [39]. Sentiment analysis can use a lexicon of positive and negative words and phrases to automatically, through an algorithm, classify expressions within the data [40]. For example, "Beautiful" and "I like vaping" have positive valences and "Ugly" and "Vaping is bad" have negative valences. In our sentiment analysis model output, a sentiment score from -1 to 1 was provided for each segment of text. Note that sentiment analysis does not code text fragments as pro-or anti-vaping, but whether text has a negative or positive valence. A score close to -1 indicates a highly negative sentiment, and a score close to 1 represents a very positive sentiment. A score around +/-0.5 details moderately positive or negative sentiment. A neutral text fragment would have a score of 0 [41]. Sentiment analysis determines the valence of a text without regard to the context, i.e. it does not determine whether the sentiment of the text fragment is related to vaping or a news event. As our data concerned only vaping-related posts, sentiment analysis provided an overview of sentiment within vaping, despite the technique's inability to determine whether text was pro-or anti-vaping. Analysis was conducted using R with the sentimentr package [42]. sentimentr incorporates valence shifters (e.g. negators, amplifiers), de-amplifiers (downtoners) [42]. For example, sentimentr can code the following as having a negative valence: I like it but it's not worth it. To validate the sentiment analysis results, we handcoded a randomly generated sample of 100 text fragments, maintaining a distribution similar to the distribution of sources. Two coders coded the valence of these fragments and we achieved >80% similarity with the sentiment analysis results. Below are examples of text fragments coded as positive and negative by the algorithm.
Positive: THC is safe to vape people been doing it for ages. The advantage of vaping that I can tell is it being odorless, meaning, you can't smell the weed/cigarette smoke.
Negative: This is no surprise. Anyone who has walked through a cloud of vape knows how bad it is.
I think e-cigs are worse than cigarettes and I'm not opposed to a ban.

Statistical analysis
We used an interrupted time series design with segmented regression analysis to determine immediate and longer-term impacts of individual news events on sentiment, controlling for pre-existing trends. The unit of analysis (dependent variable) was sentiment scores per day. Interrupted time series is the strongest quasi-experimental design to assess longitudinal effects of time-delimited treatments or interventions [43]. This design was appropriate as data was collected at multiple time points and we wanted to detect if a treatment (news events) had a significantly greater effect than another underlying trend [44]. We first conducted a visual examination on the pattern of the time series by plotting it and generating auto-correlation and partial correlation plots. No seasonal patterns were identified. Auto-correlation was tested with the Durbin-Watson test. Nonstationarity was identified using the augmented Dickey-Fuller test and corrected through differencing. An autoregressive moving average (ARIMA) model was fit against a white noise series generated from the stationarized data to determine optimal model parameters. The model included binary variables for events (0=dates before the event, 1=dates after the event), time (1 was denoted for the first day and numbered sequentially after), and interaction terms between each event and time, as is standard for interrupted time series analysis using segmented regression [45]. We used Google Trends and US-based hospitalizations from vaping as control variables. These control variables may address underlying factors possibly influencing perceptions around vaping. By considering a broader picture of what may influence sentiment around vaping, we can better test the claims relation to the association between specific events and sentiment. We used internet search query trends as these may reveal what people are potentially thinking or doing based on the content and timing of their queries [46]. Regarding Google Trends, we tracked Google search interest (trends.google.com) originating from the US that mentioned terms regarding vaping (see Appendix for full list of terms). Searches were obtained from August 1 2019 -April 21 2020 to allow for historical trends to inform our sentiment analysis. Search interest represented search interest relative to the highest point in the given region and time period. A value of 100 represented peak popularity for the term. A value of 50 meant that the term was half as popular. A score of 0 meant there was not enough data for that term. Google search interest was derived using the rates provided by Google. We summed search interest for all search terms to result in a search interest value for each day. We derived hospitalization from vaping data by summing the number of individuals hospitalized with lung injury associated with e-cigarette use or vaping in the US on a particular week, from CDC data dated March 31 2019 -February 15 2020 [47]. We then repeated the above analyses dividing the data by online environment (blogs, comments, Facebook posts, forums, news), to determine if there was a differential effect of news events on sentiment score for each online environment. We calculated 95% confidence intervals for the association of each event with sentiment score. We only reported results where the key independent variable and its corresponding interaction term had P < 0.05. Analysis was conducted using R with the following packages: tseries, forecast and lmtest [48,49,50].
INSERT   Table 1 reported the estimates of the interrupted time series design with segmented regression analysis across various key vaping events. For all online environments, we observed a statistically significant negative association of 15% (Estimate: -0.16; 95% CI: -0.29, -0.03; P: 0.01) between sentiment score and the Trump administration's move towards a ban on flavored vaping products, possibly representing a deterioration in sentiment around vaping. This meant that the valence of text was negative at the time of the Trump vaping ban, and the text valence was correlated with that news event. Conversely, we observed a statistically significant positive association of 7% between sentiment score (Estimate: 0.07; 95% CI: 0.01, 0.14; P: 0.02) when COVID-19 was first reported to the WHO, perhaps indicating increased positive sentiment around vaping. The valence of the text was negative when COVID-19 was reported to the WHO, but rose to eventually become positive shortly after. To ensure the statistically significant findings were not chance results, we conducted validity checks. For each key vaping event, we examined a randomly generated sample of 50 associated text fragments, maintaining a distribution similar to the distribution of sources, for the time period around the event. We found that text fragments tended to mirror the results of the sentiment analysis. For example, when detailing text samples around the Trump administration's plan to ban flavored vapes, we found several documents, coded negative by sentiment analysis, detailing the negative stance toward the Trump decision: Instead banning flavored e-cigarettes we need to ban Donald J. Trump as president. Along with Moscow Mitch.
Precisely... freedom to choose. Cigarettes and alcohol are legal. Six people die from THC vapes and Trump wants to ban all vaping. It's stupid.
Yea Trump ban e-cigarettes but keep up approval of the sale of automatic weapons their only causing multiple deaths. Great priorities. People using ecigarettes do it by choice. But those massacred by automatic weapons don't have a choice or say. Get a brain Trump.
Similarly, when detailing a sample of text around Chinese government disclosure of COVID-19 to the WHO, we found multiple documents, positively coded via sentiment analysis, indicating vaping cannabidiol (CBD) as a possible COVID-19 cure or protective agent, possibly indicative of the improved sentiment when COVID-19 was first reported to the WHO. Example text fragments regarding how vaping CBD can prevent/treat COVID-19: COVID-19 deaths invariably involve a "cytokine storm", an excessive, unchecked immune system response. Cannabinoids from cannabis, CBD in particular, can lower cytokine production naturally. Research needed asap! Why hemp cbd flowers and a vaporizer are the best covid-19 prepping tools. REDACTED applauds the use of CBD during the coronavirus outbreak.
No statistical differences were found between the other five outcomes of interest and sentiment scores for all online environments together. While not statistically significant, there seemed to be a decrease in sentiment scores after Massachusetts banned vaping products (September 24 2019).
Regarding forums, there was a statistically significant 40% negative association between sentiment score and the Trump administration's move towards a ban on flavored vaping products (Estimate: -0.37; 95% CI: -0.52, -0.22; P < 0.001). There was also a 9% positive association between sentiment score and COVID-19 being reported to the WHO (Estimate: 0.09; 95% CI: 0.02, 0.15; P: 0.01). Similarly, there was a 95% positive association between sentiment score and the WHO declaring COVID-19 a pandemic for news media (Estimate: 0.67; 95% CI: 0.10, 1.24; P: 0.02). We verified that the statistically significant findings for individual online environments were not chance results using similar techniques as above. News events seemed to have similar associations with sentiment scores of different online environments.

Discussion
We reported two main findings. First, for all online environments taken together there was a relationship between increased negative sentiment in the online vaping environment and the Trump administration's move to ban flavored vaping products. Second, the association between improved sentiment in the vaping environment and COVID-19 report to the WHO. Recent research indicated that vaping related news events can increase risk perceptions around vaping [24,23], but has not detailed how events are related to sentiment in online vaping spaces. The possible Trump ban on vaping may have increased risk perceptions around vaping and thus decreased sentiment in the online vaping environment. We believe that misinformation around vaping devices as a possible COVID-19 treatment [17] may have improved sentiment in the online vaping sphere during COVID-19. We propose that only these two events were salient to individuals in the online vaping arena and thus associated with statistically significant shifts in sentiment. The other events, such as the CDC investigation into vaping-related illnesses may not have produced a similar effect on sentiment as individuals did not see an immediate threat to vaping products, unlike the planned Trump ban on vaping products. Similarly, the Massachusetts ban on vaping products may not have been salient as individuals were possibly already affected by the previous Trump decision on vaping products.
There is limited research on how news events relate to sentiment around vaping, especially in response to COVID-19 news. Previous work suggested that disproportionately negative news around vaping may dissuade people who smoke unwilling to quit from moving to vaping, likely impacting health outcomes [4]. We expand on such studies, bolstering the need for balanced views on vaping, demonstrating that news events casting vaping in a negative light may increase negative sentiment around vaping, perhaps discouraging some people who smoke from switching to vapes. A recent review detailed the role of misinformation on public health outcomes [51], and we expand on past work by providing evidence on how large-scale events may create misinformation in the health sphere. We also provide evidence that some news events may be associated with improved sentiment around vaping, particularly on social media, perhaps encouraging some individuals who smoke to make the switch. However, these improvements in sentiment may be related to misinformation around COVID-19. Improvements in sentiment around vaping may come with misinformation around the subject, buttressing the need for evidencebased public health messaging around vaping. While improved sentiment around vaping may be beneficial for those who smoke and are unable to quit, we must ensure that such sentiment is not drawn from inaccurate information, perhaps impacting the effectiveness of vaping-centric tobacco control mechanisms.
The strength of our work is the use of computational methods to explore how news events are associated with sentiment in the online vaping arena, in a range of online environments. Such outcome measurement is central to understanding how news events shift sentiment regarding vaping, allowing for accurate public health messaging around vaping when such events arise. Accurate public health messaging around vaping may augment health outcomes in two modes. Firstly, it can encourage some people who smoke in switching to vaping, improving health outcomes, even when news outlets display inordinately negative coverage regarding vaping. Secondly, such messaging can prevent misinformation, possibly leading to reduced health outcomes, such as in the case of COVID-19. For example, relying on COVID-19 misinformation, vapers who develop the condition may use vapeadministered treatments, associated with reduced immune system functioning [21] perhaps heightening COVID-19 disease progression. Without accurate information, those who vape may share devices to administer unsubstantiated COVID-19 treatments -a possible site of transmission [17], perhaps increasing COVID-19 spread. We also note that messaging by public health authorities which supports vaping may inadvertently decrease risk perceptions among young people or non-smokers, who may then be more likely to try vaping, increasing their risk [52,53]. We do not support discouraging news warning about the risks of e-cigarettes, but a careful consideration of the balance of risks related to promoting vaping as a harm reduction approach.
Public health authorities can also conduct interventions to balance the rhetoric of news events. Interventions that ask respondents to judge information accuracy around vaping [54,55], may nudge individuals toward accurate information regarding vaping during news events which possibly distort vaping perceptions. Thus, our results may aid those who smoke and cannot quit in switching to vaping, minimize COVID-19-related misinformation around vaping, and mitigate further mischaracterizations of vaping, perhaps improving health outcomes of tobacco users. Future research can detail how some news events have a greater effect on sentiment around vaping compared to others, and address how best to intervene around disproportionate responses to vaping news events.
Our findings relied on the validity of data collected through the textual query. We searched a broad range of online media and our data contained text fragments which represented the news events in our analysis. We are thus confident in the comprehensiveness of our data. A key limitation is that we cannot say with certainty that vaping-related news events caused a shift in sentiment in the online vaping space or whether there were other underlying factors. We provide strong correlational evidence, but cannot make causal claims. We also were not able to adjust for other possible confounders. A large proportion of vaping posts on social media may be generated by bots [56], and we did not account for such posts as our goal was not to identify misinformation but code for vaping sentiment. We may have missed slang terms for vaping and perhaps underestimated the online vaping arena. Our data was drawn from August 1 2019 -April 21 2020 and we were not able to explore the influence of news events before or after this period. Results suggested a shift in sentiment in the online vaping arena correlated with certain relevant events reported in the news, but the method did not allow a differentiation between sentiments and perceptions of people who smoke and those of dual users, those who vape, and people who previously smoked etc.

Conclusions
Overall, we indicated that news events may be associated with either positive or negative sentiment in the online vaping environment. Depending on the nature of the event, we suggest that public health messaging may either ensure that those who smoke and wish to quit are not dissuaded from switching to vaping, or reduce incidences where those who vape injure themselves through unsubstantiated vape-related COVID-19 treatments. Findings have implications for the management of risk perceptions around vaping to improve health outcomes of tobacco users. Information-based policy instruments can be applied to balance the negative effects of news events that may create disproportionately negative vaping perceptions.

Appendix
Full list of search terms for textual query and Google search e-cigarette, electronic cigarette, electronic cigarettes, electronic nicotine delivery, vape, vaping, electronic nicotine delivery system, personal vaporizer, vape pen, electric cigarette, electric nicotine delivery system, electric nicotine delivery device, e-hookah, e-juice, e-liquid.

Declarations
Ethical Approval and Consent to participate Approval and informed consent were not needed as we used an anonymized dataset. Yale University IRB committee guidelines waived the need for informed consent and ethical approval. Research was performed in accordance with the Declaration of Helsinki. This study was pre-registered on the Open Science Framework (OSF.IO/HZVJB).

Consent for publication Not applicable
Availability of data and materials The datasets used and analyzed during the current study available from the corresponding author on reasonable request.

Competing interests
Navin Kumar declares financial support through a grant from the Foundation for a Smoke-Free World, a US nonprofit 501(c)(3) private foundation with a mission to end smoking in this generation. The Foundation accepts charitable gifts from PMI Global Services Inc. (PMI); under the Foundation's Bylaws and Pledge Agreement with PMI, the Foundation is independent from PMI and the tobacco industry. There are no financial relationships with any other organizations that might have an interest in the submitted work in the previous three years; and no other relationships or activities that could appear to have influenced the submitted work. All remaining authors do not declare any conflicts of interest.  Tables  Table 1 Estimates of the interrupted time series design with segmented regression analysis across various key vaping events, for various online environments