3.1 Bayesian Network (BN)
Bayesian network is based on Bayesian statistical methods, which could fit the nonlinear and complex interactions among variables (Sener et al., 2019). BNs have been extensively explored in earlier studies especially in risk and uncertainty-related application areas due to their advantage in identifying and assessing the impact of factors in different what-if scenarios. On the one hand, BN employsa probabilistic reasoning to capture the network characteristic and distributionwhich is in line with the nature of risk analysis(Aven, 2017; Shortridge et al., 2017). On the other hand, BN provides a graphical representation of complex relationships among factors, and decisionmakers can identity critical components because the model helpsvisualize the propagation impact of the variable, that is how a change of interested variable impacts other variables across a network(Kabir & Papadopoulos, 2019). Owing to the feature and advantage of BNs, previous study has assessed the critical components and risk forecasting (Adedipe et al., 2020; Ale et al., 2014; Aven, 2016). For instance, Simsekler and Qazi (2020) identified organizational factors of patient safety errors to help managers to prioritize them on the basis of their importance.
A BN model is a probabilistic model of directed acyclic graph.The model comprises a set of nodes representing variables and arcs that either represent causality or statistical dependencies among interconnected variables (Hanea et al., 2018). The strength of dependencies among interconnected variables is captured through probability distributions. The directed links (A→B, B→D, C→D) in Fig.1 represent the dependence of indicators on the target variable,and past beliefs on uncertain variables are efficiently updated with the evidence against distinct sources in the network.
This method has one major limitation. Empirical data and expert judgment can be used to construct a BN model(Hanea et al., 2018), however, in most scenarios the data are scarce or even unavailable, thus expert judgment is a more popular approach in existing literature although it is quite challenging (Werner et al., 2017). To explore the application of BNs in the healthprotectivebehavior field with a unique survey dataset, this study used a data-driven approach to obtain conditional probability distribution and develop our BN model.
3.2 Dynamic Bayesian Networks (DBN)
Another limitation of BN is that it is static and unsuitable for representing dynamic relationships among variables. A DBN is a derivative of the static BN by adding the time dimension, which can capture the temporal relationships and predict the future probability of a variable using prediction inference. Nowadays, DBNs are widely used to model dynamic processes in many other fields, such as resilience assessment(Tong et al., 2020), reliability assessment(Rebello et al., 2018), and others. Protective consumption behavior is susceptible to a variety of factors when faced with unexpected events, irrational herd buying behavior is prone to appear under collective stress, leading to abrupt changes in demand and then substantial stock-outs, increasing-price.However, health protective behavior may emerge as intervention ineffective if all the causes are treated with equal weight, and identifying the most critical variable is necessary. The nature of variables is dynamic and then the protective consumption behavior evolves over time. Hence, DBN was utilized to capture this temporal nature and identify the most important causes/variables. This study mainly aimed to visualize the complex relationships among self-protective behavior causes and identify critical causes to prevent and mitigate unreasonable behavior. Although BN can meet the requirement of identifying causes, it fails to explain the dynamic relationships among variables and even ignores time-related factors. Thus, we proposed a DBN model to achieve this goal.
A DBN comprises multiple BNs (referred to as time slices), the process in a DBN is stationary and the network structure repeats after the 2nd time slice, so the variables for the slices t = 2, 3, ..,T remain unchanged. The dotted lines are forward moving between the successive time slices and can never go back to the prior time slices. Fig.2shows a simple DBN of three timeslides. The blue nodes (indicators) are the parent nodes of the red nodes (target variable) in the same time slice, and the probability of the target variablealso depends on its valuefrom prior time slices.
3.3 Data Sources
In this study, we evaluated the impact of factors on health protective behavior towards buying anti-smog productson the basis ofBN. As described above, the protective consumption behavior is affected by the following fourdimensions and 13 variables (including time dimension), as shown in Fig.3. Several continuous variables, including the number of reports (report) and media information (media_n), product price (price) and air quality (PM2.5) were discretized in subsequent model settings. For categorical variables, we followed the original number of categories, for example, media_emotion is a binary variable, then, we assigned two states to this variable (positive and negative) to maintain a low complexity of the network, while we assigned four states to the variables, such as CPI and season, coded as 4, 3, 2, and 1.
Given the variationof smog in each city of China, we targeted five cities (Beijing, Chengdu, Shijiazhuang, Shanghai, and Xi’an) and five product brands (Xiaomi, Midea, Panasonic, Philips, and Blueair) as our sample. The data include four parts as shown in Fig.4.We collected the variables included in the model from exclusively renowned literary publications. The four dimensions have been proposed ensuring a good coverage of the influencing factors of the purchase decision for anti-smog air purifiers. To reduce the overlap among variables, we exerted substantial effort to remove duplicate variables.This led to a condensed list of indicators.
The first part is the media-related data which wecollected using the Python algorithm. We used the key words, such as “smog,” “PM2.5,” and “air pollution” to search for news reports from authoritative media agencies, such as China Daily, CCTV (China Centre TV), and other official media. As for social media, we selected SINA micro-blog, the most popular social platform in China, as data source of media-related indicators. We collected all the official microblog using key words above and recorded the release time of each post. The media_n is represented by the number of comments, and for media_emotion, we used text sentiment analysis method on the basis of word vectors to identify the sentiment polarity of online comments. During data preprocessing, we only deleted duplicate comments and comments with only emoticons. Given the characteristics of the Chinese language, we must perform word segmentation for text data. In thisstudy, we used the package jieba[1] of R language for word segmentation.
The second is the city-related data, including air quality (PM2.5) and city inherent index (urbanization rate, population density,and consumer price index (CPI)). We collected the air quality data of five target cities from the official website of the China National Environmental Monitoring Center, and obtained the data of other three indicators from the Statistical Yearbook and Government Annual Report.
As for the product-related data, we collected the data of sales and prices of five target air purifiers brands from Taobao and Tianmao, which accounted for about 70 per cent online market share. We selected Xiaomi, Midea, Panasonic, Philips, and Blueair selected because of their popularity among consumers, and they also have a wide price range (low, medium and high prices) of air purifiers. The unit of original data we obtained is one week, thus in this study, the analysis unit of dynamic variables is one week. Similar to the variable media_n and media_emotion, we measured word-of-mouth and brand popularity using online product comments. We used Python to crawl the comments text of each brand of the online shopping platform, the number of comments reflect the brand popularity, and the sentiment polarity of comments text can represent the word-of-mouth. Another variable is brand value, the data arefrom the 2017 Brand Finance Global 500. On the basis of the value ranking of five brands, we divided them into three levels: high, middle, and low. Ultimately, brand value and word-of-mouth are three categorical variables, andbrand popularity is a continuous variable.
Finally, we also selected three time-related variables as a supplement to the DBN model. Festival and promotion are binary variables, if the data node contains festival or promotion days, the value is 1. The season has four states, namely, spring, summer, autumn, and winter. All data are from July 2016 to July 2017. The dataset includes 1,400 pieces of data, including peaks and valleys of air purifier sales.