Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets

The Bayesian network (BN) is a probability inference model to describe the explicit relationship between cause and effect, which may be examined in the complex system of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, Bayesian Maximal Information Coefficient (BMIC) is proposed to uncover the causal correlations from a large data set in a random system by integrating probabilistic graphical model (PGM) and maximal information coefficient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable to reduce the number of variables for the BN structural learning of PGM. Second, BLR is responsible for assigning orientation in a graph resulting from a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes’ Theorem. Third, the Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed BMIC obtains the highest score compared to the two traditional learning algorithms. Finally, the proposed BMIC is applied to discover the causal correlations from the large data set on Thai rice price by identifying the causal changes in the paddy price of Jasmine rice. The results of the experiments show that the proposed BMIC returns directional relationships with clues to identify the cause(s) and effect(s) of paddy price with a better heuristic search.


Introduction
Uncertain data is a significant issue that arises as data grows in volume, variety, and velocity. Although uncertain data introduces noise into the observed data set and results in a deviation from correctness, it is necessary for computational and statistical analysis in predictive modeling. Indeed, price data has an uncertain value because it is collected in a time-series format that exhibits both data variability and a genuine incomprehension of the data attribute relationships within the system. Price analysis, particularly in predictive analytics models, has to contend with the difficulties inherent in data analysis due to non-stationary and noisy data. However, the price data is highly sought after for economic [1,2] and agricultural [3,4] research that focuses on comprehending domestic and international economies' processes, characteristics, and future development trends. As a result, employing a causal model can aid in the analysis of data with uncertain values. Not only can correlations between data variables be used to represent cause-and-effect relationships, but they are also necessary for data analysis in order to achieve the prediction model. This paper focuses on an agricultural price, specifically the price of Thai rice, because it is intriguing to examine the relevant impacts of price variance in global market rivalry. Many important variables, such as unpredictability in productivity, climatic conditions, competitors (import and export commerce), and domestic and international events, all contribute to the fluctuation of Thai rice prices [5]. Consequently, using a graphical model will help to describe causal relationships more accurately, which can assist in lowering price barriers and avoiding unanticipated future scenarios.
The Probabilistic Graphical Model (PGM) [6,7] is a declarative representation of the causality of real-world situations' relationships based on probabilistic models. It is a graph-based representation that makes use of the joint distribution to assign probabilistic directions to the random variables. Two representations patterns can be used to describe the graphical representations of distributions: Bayesian networks (BN), a directed graph and Markov networks (MN), an undirected graph. The BN is an extraordinary representation that yields a directed acyclic graph (DAG) and a set of conditional probability distributions (CPD) via Bayes' theorem, as well as machine learning capable of dealing with domain uncertainty. The model can then proceed with inference through the particular structure for the purpose of predictive modeling. Numerous researchers have been interested in the probabilistic forecasting of uncertain data using the BN, including medical analysis [8], ecology system [9,10], agricultural forecasting [11,12], and economic application [13]. Unfortunately, when all observed variables are required to construct the DAG, PGM's structured learning process encounters the NP-hard problem [14]. This implies that the search space contains a super-exponential number of graphical models. The more graphs in the search space, the more time, memory, and computational resources are required to find the optimal network structure.
Consequently, this paper will employ a data-dependent approach to scope the observed variables in the sample space. The maximal information coefficient (MIC) is an interested algorithm and reasonable choice which has been used to determine the degree of dependence. [15] The MIC detects correlation between pairwise variables in a large data set with high precision because of its heuristic properties, generality and equitability. The data correlation coefficient is expressed as an R-square ( R 2 ) score in relation to the regression. The generality property allows unrestricted relationship type access, while the equitability maximizes the R 2 score to 1 of a noiseless function relationship. Nevertheless, the correlated data will be connected using an undirected graph to represent their relationships. Without a direction between data variables, edges cannot accurately represent causality. As a result of the foregoing, this paper emphasizes the synergy of correlated data on MIC and PGM with the goal of diagnosing the direction of the relationship by statistical inference.
To motivate this study further, Bayesian linear regression (BLR) is selected to be used in this paper for identifying the directional relationship between correlated pairwise variables, x i and x j , via their dependence on a probability distribution. BLR formulates linear regression using Bayesian inference, implying that variables are dependent on one another in a linear function. It generates a posterior probability for model parameters that can be used to describe the conditional probability of correlated pairwise variables. But the linear relationship can be seen their correlations between pairwise variables in the target (y) and predictor (x). Instead of estimating a single value, the BLR declares the linear regression model using a probability distribution derived from a normal (Gaussian) distribution. The x i and x j will be tested as target and predictor variables, which means that when x i is a target variable, x j is a predictor variable, and vice versa. So, switching between being the target and predictor variables produces a different posterior probability. The direction of the relationship between correlated pairwise variables is determined by the highest posterior probability of whether x i or x j will be the target or the predictor variable. As a result, a directional relationship and conditional dependency between x i and x j can be implied.

Related methods
PGM and MIC are techniques for acquiring background knowledge.

ProbabilisticGraphical Model
Probabilistic Graphical Model (PGM) framework [6,7] establishes a graph-based model of a probability distribution as a data structure on a conditional basis. The graph representation is a collection of independencies found in the distribution that may satisfy the conditional independence in response to a specific situation. Additionally, it serves as a skeleton of factorization for a high dimension of joint distribution. The joint distribution of conditional independence can be used for inference, which computes the posterior probability of a random variable given the evidence according to a particular algorithm. Utilizing the conditional independencies for inference is more efficient than directly manipulating the joint distribution.
Fortunately, the PGM facilitates model construction using a data-driven approach. Rather than being constructed manually by humans, learning from data enables more efficient model construction. Certain domains, in particular, prefer to learn a probabilistic model based on a distribution that incorporates prior experience (prior). The ability to perform with a set of historical data can be aided by data-driven learning. Thus, the PGM enables the manipulation of the uncertainty inherent in real-world situations via a reasonable probability model-based construction. These three benefits of representation, inference, and learning can all be interpreted as indicating the critical nature of PGM utility.
The PGM can be visualized as a Bayesian network (BN) or a Markov network (MN) to illustrate relationships between observable variables using a fundamental concept of probabilistic and graph theories. Both models can be viewed as a collection of conditional independencies and encode a factorization compactly, but they differ in terms of induce. According to graph theory [16], a graph G = (V, E) naturally consists of a collection of nodes and edges, where V denotes vertices or nodes and E denotes edges. A graph is a data structure in which a set of variables, V = x i , ..., x n , are connected by an edge. Each pair of variables in a data set, x i and x j , can be connected via either a directed edge x i → x j or an undirected edge x i − x j . The directed edge indicates the direction of the relationship, whereas the undirected edge implies ignorance of the direction of the relationship as shown in Fig. 1.
The BN is a type of directed graph that has the property of being acyclic and having no self-relationship. It is known as a directed acyclic graph (DAG). The model structure is intended to place an emphasis on the usability of causal inference. The BN representation makes use of Bayes' theorem [17] to determine the posterior probability associated with a particular distribution, which is advantageous for describing causal relationships. Besides, it is applicable to the creation of predictive models for uncertain data that can account for their dependencies via their probability values. The BN structure has been utilized in maximizing the classification accuracy and information [18] and also evaluating the risk quantification with the DEMATEL Model [19]. Initially, the basic graphical model of BN consists of at least two nodes connected by a single edge. BN's edge representation depicts a direction by using an arrow to convey the most accurate understanding of the causal relationship. In 1(a), an arrow connecting node x i to node x j , implying that node x i is the cause of node x j or that node x j is affected by node x i . By contrast, MN is intended to be an undirected graph that demonstrates only the variable association without regard for direction. Thus, in 1 (b), nodes x i and x j cannot be inferred to have any causal relationship because they must be imagined as either the cause node or the affected node, x i ⇄ x j .
In addition, the BN has been constructed naturally by leveraging the distribution's conditional independence property. It expresses the causative model succinctly and graphically in order to infer the causes and effects between variables. It is not only designed for normal cases but also for uncertainty prediction. In terms of the explicit representation of BN, the objective is to use the conditional distribution to represent the joint probability distribution over a set of random variables. Bayes' theorem will be used to support the model construction by estimating probability values. It is recommended when causal reasoning or prediction are involved.
The Bayes' theorem is derived from the conditional probability definition, which is consistent with the joint distribution's symmetry property. The BN is used to perform Bayesian inference in the presence of uncertainty and to handle numerous relationships between random variables (factors). The Bayes' theorem is usually used in the financial domain to aid in risk evaluation. The key concept of Bayes' theorem is that a result of conditional distributions can be simply defined in terms of joint distributions. Prior probability distributions are required to calculate the posterior probability using Bayes' theorem. Prior probability refers to the probability of an event occurring prior to the collection of new data. While the probability of events occurring after the new data is received will return the posterior probability.
As a result, the Bayes' theorem benefits revising predictions by allowing for the updating of probabilities in the presence of new information. In prediction terminology, information about occurring events is based on hypotheses and evidence, which refer to the probability of a hypothesis, H, based on given evidence, E, as defined in definition 1 [17].

Definition 1 [The Bayes' theorem on prediction perspective]
Where P(H | E) is a posterior probability. P(E | H) is a likelihood probability. P(H) is a prior probability. And P(E) is a marginal probability or model evidence.
To consider the probability of constructing a BN, S denotes the structured BN and S denotes the associated conditional probabilities. Then, P < S, S > is defined as the joint probability distribution of all random variables contained within a constructed network. Accordingly, the set of vertices, V, is composed of a set of random variables x 1 , x 2 , … , x n while the edge, E, connects two random variables, representing conditional independence. Thus, each Fig. 1 A simple graph of a directed and an undirected relationship variable x i in a set of V has the CPD as P(x i , Pa(x i )) where Pa(x i ) denotes the set of parent variables of variable x i . It is a term that can be defined in definition 2.

Definition 2 [The probability of a BN construction]
BN therefore draws edges among random variables by learning from the observed data sample to represent their dependencies. The PGM framework contains two learning approaches, parameter learning and structure learning, which emphasize the attributes of dependence in addition to the conditional probability estimation process. However, the objective of both learning algorithms is to build the DAG over the observable data set using different methods and target data sets. Not only does the BN gain knowledge from a pure data sample, but it can also learn about existing relationships and use that information to build a network model.

Parameter learning
The parameter learning approach necessitates sample data with an existing DAG in order to capture the conditional probability of individual variables. Maximum likelihood estimation (MLE) and Bayesian estimation are used to calculate the conditional probability. MLE returns the maximum-likelihood conditional probability of data given its the model, P(Data | Model) . Simultaneously, Bayesian estimation returns the conditional prior distribution.

Structure learning
The structure learning approach captures the maximum probability of variable dependencies, although it is constrained by the lack of known prior knowledge. The solution is to use the uniform prior, which has equal probability. In this learning approach, the network model or graph, G, is determined by the posterior probability distribution associated with the data sample, D, as P(G | D) . According to Bayesian statistical decision theory, the graphs will grow into an exponential number of DAGs. Following that, two different learning algorithms, constraint-based and score-based, will consider only the most optimized and best-performing network structure.
(a) Constraint-based learning A constraint-based learning approach is intuitively structured around conditional independencies that do not ignore the Bayesian Network concept. Instead of testing for probability, the conditional independence test is performed using Pearson's 2 -test, Fisher's Z-test, and t-test. This constraintbased learning approach models the network in three stages: conditional independence identification, undirected relationship skeleton learning, and arcs direction learning. In [13], The Markov blanket was first introduced for the purpose of optimizing the number of candidate nodes in a DAG. The Markov blanket's conditioned edges can be used to determine the conditional independence of any set of nodes. It benefits skeleton learning by identifying the undirected edges in DAG. The direction will then be assigned to the undirected edges, resulting in a complete partial DAG. So, the resulting model implies the existence of a causal relationship between nodes. (b) Score-based learning The score-based learning approach focuses on using a network score to determine the best BN for the data. Given the observed data set, the maximal score returns the graph's highest posterior probability.
Since the number of possible DAGs (BNs) has grown exponentially in the search space, O(n!2 ( n 2 ) ) [20], where n is the number of nodes in the network, heuristic search algorithms are responsible for reducing the number of DAGs in order to find the optimal BN structure. The heuristic search algorithms include options for specific purposes, such as hill-climbing, Tabu, genetic, and greedy equivalent searches (GES) which can be calculated using AIC, BIC [21], K2 [22], and Bdeu [23] as an indicator the scoring function. The emphasis in [24] was on operational efficiency by comparing each scoring method of each heuristic search algorithm. The findings revealed that different types and interest domains can have an impact on learning performance. The scorebased mechanism can demonstrate the dependencies of the entire structural model and avoid the failure of individual conditional independence, although only a few heuristic search algorithms work well with scoring methods. As a result, PGM can be used to describe the dependencies between random variables, as it supports conditional probability. The difficult task of identifying the optimal model encounters an NP-hard problem, which subjects all networks in the search space to heuristic-search algorithms for scoring comparison. The model with the highest score will be declared to be the best model. Even if the NP-hard problem requires a time-consuming search for the best network, the PGM framework remains a robust framework for illustrating causal relationship models. BN, in particular, can improve users' comprehension through the intuitive interpretation of models in a domain [25].

Maximal information coefficient
The maximum information coefficient (MIC) [15] is a statistical measurement for determining the relationship between pairwise variables in a large data set. It is referred to as an exploratory data analysis (EDA) tool. The MIC score is used to quantify dependence between pairwise variables and to determine the strength of the dependent relationship. To calculate the MIC score, a grid line drawn on the scatter diagram of two variables serves as a data partitioning tool that allows for the exploration of all grids up to the maximum grid resolution.
Given the finite data set D of ordered pairs, x and y, which x-and y-values of D are partitioned to be x-by-y on grid G. The distribution, D| G , then occurs from the points in D on the grid G, implying that the probability distribution in each cell is the fraction of points in D in that cell. So, the fixed finite set D with different grids G will result in a different distribution of D| G . The defining MIC will be explored in detail by first making the following definition [26].
where the maximum is on the grid G with x and y axis and Then, the characteristic matrix and the MIC of D are defined in term of I *

Definition 4 [The characteristic matrix M(D)]
The characteristic matrix M(D) of the 2 variables, x and y, is an infinite matrix with entries Therefore, the MIC of a set D of two variables' data in the sample size n and grid size less than B(n) is denoted in definition 5.
As a result, B(n) is the maximum size of the grid. A value for B(n) that is set too high may result in non-zero scores even when testing with random data, as each data point will have its own cell. Otherwise, they concentrate exclusively on a single simple pattern. Therefore, B(n) is set to n 0.6 by default, based on a heuristic suggestion by the founder team.
Three fundamental properties of MIC follow the mutual information properties. To begin, each pair of integers (x,y) can be used to find the largest mutual information using an x-by-y grid that is normalized to allow for fair comparisons between grids and generates a score value between 0 and 1. Second, the symmetrical mutual information, I(x, y) = I(y, x), implies that the x-and y-values of D can be interchanged. Thirdly, the characteristic matrix will remain constant regardless of how the x-and y-values of D are transformed, as the result of distribution (D| G ) will be determined solely by the order of the data.
As mentioned, the MIC score is normalized from mutual information and ranges from 0 to 1. A MIC score of 0 indicates an independent relationship. The increasing magnitude of the MIC score, up to 1, represents the increased strength of dependence between x and y variables, which is expressed in Table 1 as the strength type. They are typically classified into five to six levels [27,28], which include no correlation and perfect correlation.
When compared to other algorithms, the MIC algorithm produces a significant score due to two of its heuristic properties: generality and equitability. The generality property enables relationship types to have unrestricted access. MIC is capable of detecting both linear and non-linear relationships, whereas other algorithms, such as distance correlation [29] and Pearson's R [30], are only capable of detecting linear relationships. Furthermore, the equitability property assigns a similar MIC score to different relationship types with similar noise levels. MIC is discovered using mutual information but is not estimated using mutual information. In almost all noise models, it is a directly implemented equitable dependence measure that produces a more significant result than mutual information and others. In Fig. 2, the correlation coefficient illustrates the score comparison between MIC and other current functions in noiseless models. The MIC algorithm is shown to be capable of detecting an equal score of 1.00 on all different noiseless model functions except the random function.
MIC is described as an equitable measure of dependence based on two heuristic properties. The MIC equitability property is exceptional when considering the variable dependence of noise models with a variety of function types and sample sizes ranging from n = 250 to n = 1,000.  [15,31]. Therefore, [31] proves the omitting of both maximization and normalization of the original MIC in definition 5, which can present effects on the equitable measure over a noisy relationship. There are three distinct MIC variants: maximization is omitted, normalization is omitted, and maximization and normalization are omitted.
The first variant of MIC omits the maximization as

Definition 7
Let D is a data set of ordered pairs, x and y, and let I * is the same as in definition 5. Then MIC 2 , which is without normalization, is defined by Thirdly, both maximization and normalization are omitted in the variant of MIC 3 .
Definition 8 Let D is a data set of ordered pairs, x and y, and let I E is the same as in definition 6. Then MIC 3 , which is the third variation of MIC omitting both maximization and normalization, is given by In Fig. 3, the three MIC variants omitting maximization and normalization were tested over the noisy relationship to show the coefficient of R 2 score. When compared to all three variants, MIC 1 , MIC 2 , and MIC 3 , the original MIC (first column of plots) clearly exhibits the equitable behavior that maximization and normalization are required. Equitability takes into account the close proximity of couple points that should be in a small range and did not scatter. Eliminating maximization and normalization can result in an unnatural grid arrangement and a more accurately captured relationship on grids with more cells than on simple grids, respectively. Consequently, the outstanding MIC's heuristic properties are advantageous for estimating the dependency of Thai rice data.
According to MIC utilization, the MIC provides significant benefits for revealing associations between variables: the outlier's robustness, the absence of variable distribution assumptions, and ease of interpretation. It is extensively used in a variety of research domains, including language recognition, image processing, search-based data analysis, spatial data analysis, and medicine. Wang et al. [32] enhanced the MIC methodology by developing iMIC, which was intelligent in its search for the relationship between variables in the system. Additionally, the agriculture domain [33] chose the MIC method to determine the factors influencing rice price variation.

Methodological approach
A Bayesian Maximal Information Coefficient (BMIC) is proposed to uncover the causal correlations from large data sets in the random system by integrating PGM and MIC with Bayesian Linear Regression (BLR). The proposed BMIC determines the best causality model to minimize run-time execution in order to identify the best causality model. In Fig. 4, the methodological approach depicts the processes of this work.

Search the correlated variables
The purpose of MIC is to reduce the number of variables in the search space through generality and equitability, which are advantageous for enabling the accessibility of numerous functions and noise data manipulation. Since the MIC algorithm assigns a score for ranking their correlation coefficient, a high MIC score indicates a strong dependent relationship between pairwise variables for the purpose of constructing the structural model. The execution time required to find an optimized model during the structure learning process is reduced. Unfortunately, the relationship x → y has a MIC score equal to y → x , which cannot represent causality. A directional arrow between any two nodes is additionally required when using MIC. Consequently, MIC is interesting in studying the orientation of pairwise variables for identifying the cause-and-effect relationship as a result of an undirected model of the variables' dependency, the symmetrical relationship.

Identify the causal relationship
PGM is a robust framework that enables the identification of causal relationship models. The BN is a representation of the PGM framework that builds a DAG by encapsulating knowledge about probabilistic numerical information. The conditional probability distribution, which takes the posterior probability value into account, denotes the variables' dependence. Moreover, posterior probability serves as the foundation for any inference that involves the integration of prior knowledge and new information. They are used to represent the edges in a network. By favoring the Bayesian perspective, Bayes' theorem captures the posterior probability. It is a powerful statistical theory that can determine the probability of uncertain data. Thus, Bayesian inference is a well-established technique for establishing the causal relationship between random variables.

Determine the posterior probability
The Bayesian view is used to derive the posterior probability values for the linear regression model. BLR [34] estimates the posterior probability of a linear regression model using the Bayesian theorem. Instead of using point estimation, the linear function is expressed as a probability distribution. The output parameter, y, represents the normal distribution's response in terms of mean and variance.
The mean value is obtained by multiplying the parameter T by the input variable x, whereas the variance value is a power of standard deviation. Once the y variable of the linear regression model is completely computed, the posterior distribution for the model parameter given the x and y variables can be determined using Bayes' theorem support. The model parameters were chosen in accordance with Bayesian inference principles.
where, P( | y, x) is a posterior probability distribution, P (y | , x) is a likelihood of data, P( | x) is a prior probability, and P(y | x) is a normalization.
Due to the intractable nature of continuous values for parameterizing posterior distribution evaluation, the BLR model samples the posterior distribution using the Markov Chain Monte Carlo (MCMC) algorithm to estimate the posterior distribution. The Markov Chain is a mathematical representation of the next sample value generated by the previous sample value. And Monte Carlo is a term that refers to the technique of selecting a random sample. So, The MCMC concept emphasizes that the more posterior distribution samples are drawn, the more convergent the posterior distribution between approximated and real values will be.
PyMC3 [35] is a python library used to develop the Bayesian model by the MCMC method. Bayesian inference has been used to construct Bayesian Linear Models from the linear function that was formulated. It is provided by the generalized linear models (GLM) module, which can generate the formula from the input variable(s), x, and a target output variable, y.
The BLR does not impose any restrictions on the use of non-informative priors when assigning values to model parameters. The non-informative prior is considered by a normal distribution with the mean and variance of observed data. Another advantage is that the posterior probability distribution can be used to quantify the model's uncertainty. The degree of certainty relates to the quantity of data; a model with a low level of uncertainty results from increasing the quantity of data. As a result, the BLR has gained popularity in statistical training and is widely used in a variety of domains. It is most frequently used to investigate the relationship between model variables. As with [36], the BLR outperforms frequentist statistical methods when examining the relationship between an electroencephalogram (EEG) and anxiety using clinical data.
The BLR has been proposed for predictive modeling, where it is used to represent a predicted value and a confidence (probability) interval (CI) [37]. The predicted value denotes the distribution of the target y i given a set of features x i .
Additionally, relying solely on the posterior probability distribution for Bayesian decision-making may be insufficient. The confidence interval (CI) assists in validating the model parameters' possible values within a range. The greater the percentage of CI, the stronger the relationship should be between pairwise variables. The geology study in [38] is based on a demand prediction model for seismic fragility that estimates the fragility curve's uncertainty. The BLR yields more precise results than alternative benchmarks.
In frequentist statistics, the CI [39] is a measure of uncertainty that can also be used to denote a degree of uncertainty. It consists of the lower and upper bounds for estimating the result, which can be varied according to the sample size and standard deviation of observed data (heterogeneity). The width of an interval determines the precision of the estimated values. A low degree of uncertainty is indicated when the data sample is large and the CI width is narrow. However, heterogeneity is directly proportional to the degree of uncertainty, as low heterogeneity indicates low uncertainty resulting from the narrow confidence interval. Thus, the narrower an interval's width, the more precise the effect estimates.
The CI is preferred over the p-value, particularly in the health sciences, due to the potential for misinterpretation, misuse, and overestimation of hypothesis testing. Typically, the p-value establishes a threshold for accepting or rejecting the defined null hypothesis at a significance level of 0.05 (5 percent). Numerous researchers have interpreted the p-value as the probability of accepting or rejecting the null hypothesis, denoted by P H 0 | y , where is the probability of H 0 given the observed data. Indeed, the p-value quantifies the probability of the actual result given the null hypothesis as P y | H 0 . Furthermore, researchers frequently attempt to oversimplify the interpretation of p-values in practice. They distinguish between statistically significant and non-significant data using a p-value threshold of 0.05. Certain concerns, such as sample size and result estimation variability, have been overlooked. As a result, the CI is promoted as an alternative to the p-value, which can describe the variability of the estimate and the width of the interval. The width of the interval indicates the precision of the result, which is typically set at a 95 percent confidence level.
Furthermore, the Bayesian credible interval, abbreviated as CrI, is used to estimate the posterior probability distribution's uncertainty in the Bayesian approach. The Crl operates in the same manner as the CI. Bayesian inference uses two types of Crls to evaluate the posterior distribution: an equal tail Crl and the highest posterior density (HPD) interval. The equal tail Crl is a direct posterior probability threshold value associated with an interval of the posterior probability distribution, which is easily calculable. For instance, the upper bound of Crl is 0.975, which means that it encompasses 97.5 percent of the posterior probability quantile distribution. Asymmetric posterior probability, on the other hand, has a lower probability of affecting the yield estimate value within the interval than outside the interval. In addition, the HPD interval defines the posterior probability threshold, indicating an interval with the probability mass surrounding the distribution center. Considering the data, a 95 percent confidence interval for the HPD interval of -4.0 to -1.0 implies that the mean difference primarily emphasizes between -4.0 and -1.0 given the observed data. The maximum posterior probability value would be between -4.0 and -1.0. Although the HPD Crl is equivalent to the equal tail Crl due to posterior probability symmetry, the HPD Crl is more complicated in a computation interval when compared to the equivalent tail Crl method.

Casestudy on Thai rice price
As a case study, the proposed BMIC is used to identify the causal correlations from a large data set on Thai rice prices by identifying the change in the paddy price of Jasmine rice.
The causality of changes in the price of Jasmine rice can be accurately determined by its uncertain data and associated impacts.

Thai rice background
Due to the fact that Thai rice is a well-known global trading commodity, it is an interesting subject to study the effects of price changes. Despite the high price of Thai rice, many countries are willing to import it. Thailand is the world's largest exporter of rice [40]. Thai rice is a critical agricultural product for the country's revenue expansion. Rice is not a monopoly product, as many countries, including China, India, Thailand, Vietnam, Pakistan, Australia, and the United States, have the ability to export it [41]. Resultingly, other competitors benefit from the opportunity to adjust their rice prices in accordance with the FOB of Thai rice price criteria. FOB [40,42] stands for Free on Board, which is typically associated with the process of product shipment. According to Incoterms ® (International Commercial Terms), a trading contract specifies on a FOB basis that the exporter or shipper is responsible for the costs and risks associated with transporting the cargo to the export port of origin, which are included in the export price of the particular product. While the importer or consignee will bear the costs and risks associated with the delivery to the destination. Thus, the export price quoted FOB excludes insurance, loading and unloading, freight, customs, value-added tax, import duty, and transportation costs (from the exporter's port to the destination).
Unfortunately, Thai rice encounters difficulties, causing the price of Thai rice to fluctuate due to global and domestic circumstances. Many agriculturists decide to plant alternative crops in order to increase their profit margins. The 2008 food crisis [43] was a serious event that resulted in a 50-100 percent increase in raw material and food costs. The increase in demand, the decline in stocks and farming area, the lack of agricultural infrastructure and investment, the price of oil, and the exchange rate all have an effect on the rising price, affecting worldwide demand and supply [44].
Furthermore, Thailand encountered domestic difficulties in 2011 as a result of the Paddy Price Pledging scheme. The Paddy Price Pledging scheme is the political mechanism that sustains the income of farmers. The government permitted promising an unrestricted paddy quota and guaranteeing an acquisition price that was 50 percent higher than the market price. This policy resulted in a gradual rise in Thai rice stockpiles while the real market continued to decline. It resulted in a dramatic increase in the price of Thai rice relative to competitors, which caused the loss of access to trading partners. A linear model was created in the study of [45] to determine the impact of Thailand's rice-pledging policy. The global trading price of rice act as an indirect factor in determining the price of various varieties of Thai rice inside a country. Because rice exporters are the first to know the price direction from the global market, they attempt to bargain with millers who set the rice price. Accordingly, domestic rice prices are forced to set a similar price to the export price.
During the same period as the Paddy Price Pledging policy was implemented, Thailand faced an emergency disaster, the severe flooding of 2011, which shifted Thailand's position as the world's largest rice exporter to be India as a result of the rising Thai rice price. The high price enticed farmers to increase their rice production. Consequently both the actual market and the stockpiles were brimming with Thai rice. Regrettably, Thai rice revenue declined, as partners benefited from a discount, knowing that Thai rice needed to be sold. Thus, Thailand was under pressure to sell rice in 2014. Then, the Thai rice price fell to 385.91 US dollars per ton in 2015 [46], and all activities ceased completely in 2016. The Thai rice price was then restored to its regular level in comparison to the Vietnamese rice price [47]. Therefore, the variance in Thai rice prices is determined not only by time, but also by impact factors [48]. The purpose of this study is to identify the impact factors through the use of a probabilistic model, thereby ascertaining the causes of variance in the price of Thai rice. In identifying fluctuations in rice price, significant situations such as the quantity of Thai rice for export, the Thai rice price, the exchange rate, and the rice prices of other countries should be addressed [49]. These findings aid all stakeholders in forecasting the Thai rice price.

Data sample
We implicitly used BN to illustrate the causation of volatility in the price of Thai rice as a result of numerous external influences. In a BN, each directed edge represents the causal relationship between parent and child nodes. Additionally, BN can be utilized as a decision-making tool for foreseeable situations.
BN requires pertinent data for the extraction of Thai rice pricing models. Thai paddy prices, particularly those of the Thai Jasmine (Hom Mali) species, are being studied to determine the cause of price fluctuations. There are two cultivation seasons, each of which produces a different crop. 85 percent of the rice is known as the major rice, which yields jasmine rice (Hom Mali rice) and parboiled rice. The major rice is usually grown between May and October and is harvested from August to April the following year. Another 15 percent of rice is classified as second rice. The second rice produces glutinous rice, nonglutinous rice, and indigenous species which is harvested between January and April [41]. As such, this paper examines the paddy price of jasmine rice using characteristics associated with major rice.
The sample for this study spans 11 years, from 2008 to 2018, and includes all the important variables collected from the sources of the Office of Agricultural Economics [50], the Thai Rice Expoter Association [51], Thai Customs [52], and the Thai Meteorological Department [53]. Due to the increasing interest in the price of Thai rice in economic and data science domains, the variable selection was based on criteria studied in the following other publications. Economic studies have concentrated on the links between rice prices or the competitive nature of the rice market. Using case studies from Thailand and Vietnam, the usefulness of several models [1,2] demonstrated the asymmetric volatility of rice prices. Between 2003 and 2013, they [44] considered the factors influencing the Thai rice price. Numerous studies have been conducted in the field of data science to examine various statistical methodologies [3,4,49,54]. The ARIMA model has been a widely used tool for presenting a prediction result focused on the quantity of Thai rice exported, the price of Thai rice, the exchange rate, and the price of rice in other countries [49].
For the reasons stated previously, 14 variables (including paddy price) were gathered with continuous values in order to avoid missing values for variables of interest. They are the plant area of major rice, the yield of major rice, the minimum income of Thai citizens, the paddy price of Jasmine rice, the price of Jasmine rice, the export quantity of Jasmine rice, the export price of Jasmine rice, the export price of five percent white rice in Thailand, the export price of five percent white rice in Vietnam, the export price of five percent white rice in Pakistan, the gold price, the domestic oil price, US exchange rate and Thailand's agricultural GDP. The variables in Table 2 are abbreviated. After determining that paddy prices may fluctuate due to domestic and global factors, we will collect the significant relevant qualities. Domestic elements dictate the relevant characteristics of Thailand's economic system, whilst global factors highlight competitors' rice prices in the same category.

Exploratory Analysis with MIC
In this study, we employ a probability model, BN, to deduce the cause-and-effect relationship between the factors that affect Thai paddy price variations. The observed data, which includes both target and predictor variables, is necessary for the learning structure process to produce the perfect BN. Without prior knowledge, the structure model is discovered by calculating the probability of variable dependency. The highest posterior probability distribution given its data is returned to confirm their relationships. However, learning a large number of random variables is required to develop the model, which might result in a super-exponential expansion of DAGs in the search space, referred as an NP-hard problem.
The structure learning algorithms consist of constraintbased and score-based algorithms that are applied to observed data in order to detect relationships between variables. The optimized graph is determined by its possession of the maximum posterior probability distribution. Due to the NP-hard problem, we discovered that the run time required significant time to determine the optimal model. We begin by narrowing the variable space using the MIC algorithm to select only the highest strength relationship to the paddy price of Jasmine rice. As a result, the first ten-ordered pairwise variables in Table 3 are preferred according to the perfect and strong relationship in Table 1 because of their high impact on the target variable.
Unfortunately, MIC results cannot explicitly infer a causeand-effect relationship by reason of their symmetric nature.
The MIC results diagram can be used as a preliminary result to connect pertinent variables via undirected edges in Fig. 5.
According to Table 3, the MIC score demonstrates a perfect association between target variable of the paddy price of Jasmine rice with the Jasmine rice price, and the Jasmine rice export price. This implies that the paddy price of Jasmine rice is directly related to both aforementioned variables. The remainder of the MIC scores can be used to establish an indirect relationship between the paddy price of Jasmine rice. However, the perfect MIC score is also found in the relationship between plant area and yield of major rice, despite being unrelated to the paddy price of jasmine rice. While the export price of 5 percent white rice in Thailand, Vietnam, and Pakistan has a high MIC score, it is clearly irrelevant to the paddy price of Jasmine rice. These preliminary MIC findings conclusively established the associations to define the scope of a variable and focus on its true relationship to the target domain.
Due to the inability of the MIC score to discern causeand-effect relationships, all edges in the diagram must be assigned orientations. The BN representation has been chosen to illustrate the volatility of the paddy price of Jasmine rice in a graphical model. Correlated data is a parameterization of the Bayesian network that is used for parameter and structure learning [55]. Zhang et al. [56] used MIC to improve the heuristic search for structure learning. In addition, they discovered the triangular loops relationship problem, which must eliminate the loop by using the d-separation rule. Then, the conditional independence test confirmed their work's assignment orientation. As such, our study is an attempt to develop and provide a new technique for the MIC algorithm that prioritizes assigning directional relationships to support the cause-and-effect relationships.

Results and discussion
A causality model for understanding the direct and indirect impacts on a target variable, the paddy price of Jasmine rice, is developed using the results of a proposed BMIC and traditional structure learning of PGM. They are investigated using a learning method based on Bayesian inference in order to discover the least complex causal model. Moreover, the proposed BMIC aims to solve the NP-hard problem which can identify the optimal structure from a super-exponential number of graphs in the search space.

The proposed BMIC (method of PGM and MIC with BLR)
The MIC diagram in Fig. 5 uses the BN construction rule to determine the orientation of the edges. The examination of the causation of the fluctuation in Jasmine rice paddy prices uses continuous variables because both the target and the predictor factor contain the time series data. We attempt to deduce the cause-and-effect relationship between nodes from the MIC result. By contrast, MIC returns the symmetry results, which indicate the relevant association between pairwise variables in both directions. We conduct the analysis using the open-source Python package PyMC3, which includes models and probabilistic machine learning algorithms based on gradient-based MCMC algorithms. The BLR provides a more robust foundation for Bayesian inference and is typically used to summarize the posterior distribution by a central tendency (mean) and an uncertainty estimate (variance). A linear regression model generates the response, y, from the normal distributions of the predictor variables discussed previously. The posterior distribution is used to derive the conditional probability of the Bayes' theorem from the linear regression parameter. The linear model formula will be generated from the MIC results, assuming that x ∼ y and y ∼ x .
Following that, parameter estimation is carried out using MCMC responses from model simulation. The intercept and predictor variables of the formula are returned in a normal distribution (details of mean and variance). Besides, a measure of the effect estimate's uncertainty requires the Cl or Crl. The most probable parameter values are determined in this experiment using the HPD interval of 3 percent and 97 percent. The width of the interval denotes the uncertainty degree of the estimated model. The most precise linear model reveals the HPD interval with the smallest width.
The posterior probability from the linear model is computed using all ten relationships. The MCMC algorithm is in charge of simulating the Bayesian model, which is parameterized using the identified linear formula. Each parameter's observed data must conform to the normal distribution. Thus, the direction of each pairwise relationship has been determined by selecting the smallest HPD Crl given the results in Table 4.
The BN in Fig. 6 illustrates the direct and indirect effects on the paddy price of Jasmine rice as a result of the HPD Crls in Table 4. We discovered four connections, which we classified into two categories of factors irrelevant to the target. The first group comprises plant area and yield of major rice, which exhibit a perfect relationship. The second group includes the gold price and the export price of Thailand, Vietnam, and Pakistan's 5 percent white rice. On the other hand, we discovered that the paddy price of jasmine rice has been clearly affected by the jasmine rice export price. Domestic jasmine rice prices are influenced by the paddy price and export price of jasmine rice. Simultaneously, the US dollar's exchange rate against the Thai baht and domestic oil price have an indirect effect on the paddy price of Jasmine rice.

The traditional structure learning experiment
The model is constructed with BN, which can be obtained using traditional or applied approaches. The observed data Fig. 5 The relationship diagram of random variables by MIC score set is a required component that must be realized throughout the learning phase of PGM framework, which is responsible for generating the model from the data. The connected nodes represent the Bayes' theorem's conditional probability, which determines the dependence values. However, a time complexity issue, an NP-hard problem, is the presence of super-exponential models in the search space, necessitating a longer search time to find the optimized model. The best model will be identified through the use of heuristic algorithms that will aim to discover the model with the highest probability value. The Hill Climbing search algorithm is used in this study since it is well-known and uncomplicated for comparing networks to the baseline. The Hill Climbing search algorithm is a fast search algorithm with a maximum of n variables, n(n − 1)∕2 for all the possible edges and 2(n(n − 1)∕2) for all the subset of edges in a sample space.
The traditional structure learning experiment is validated using both constraint-based and score-based learning. The structure learning process takes all variables into account.
To begin, the constraint-based method depicts a BN, as illustrated in Fig. 7. Correlation analysis is used to determine conditional independence, which use the Chi-square dependency test to estimate the model skeleton. After that, each connection between nodes should be assigned an orientation based on the information contained in the separate sets: a set that contains the conditional independence of each pair of indirectly connected nodes. The skeleton model and separate set information can be used to estimate the partial DAG (PDAG), in which relationships may exist in both directions, → , and → . Due to the fact that the fundamental of BN prevents the occurrence of a v-structure or cyclic graph, BN is intuitively constructed as a DAG from PDAG. It appears the BN analysis of the selected case demonstrates that the export and domestic prices of Jasmine rice exert control over the rice paddy price. In detail, the domestic price of Jasmine rice is influenced by the export price of Jasmine rice, which in turn influences the paddy price of Jasmine rice. As can be seen, no other effect is associated with the export price of Jasmine rice. Therefore, we can assert that the export price of Jasmine rice is an initiatory actor in the variation in the paddy price of Jasmine rice.
Second, Fig. 8 illustrates the structure of the score-based algorithm. The Bayesian estimator will return the optimal score by estimating the best model for each learning process. BN takes into account the Bayesian Information Criterion (BIC), which is a log-likelihood score with Dirichlet priors for expressing how well observed data describes a model. To prevent overfitting, the method incorporates a penalty for network complexity. It turned out that the paddy price of Jasmine rice can be changed by the export price of 5 percent white rice in Pakistan. The dependent relationship is a chain that begins with the export price of 5 percent white rice in Pakistan, continues with the domestic oil price, the US exchange rate against Thai currency, the domestic price of Jasmine rice and finally with the export price of Jasmine rice. The three models, Figs. 6 to 8, show the various impact factors that can cause the paddy price of Jasmine rice to change based on the results of the proposed BMIC model and traditional PGM model. It is noticeable that some pairwise nodes in the different models have similar connections but distinct directions. The proposed BMIC can reduce the number of relevant variables by perfect and high-strength relationships selection. Choosing only relevant variables for BN structure learning has the advantage of avoiding the NPhard problem. The result is completely satisfactory as it is consistent with the findings of another study. The traditional method will be then compared to the proposed BMIC in terms of performance and model accuracy. Ten connections are contained in the best-selected model in the case of constraint-based learning. The graph quite closely resembles the undirected edges of MIC results. Only one additional relationship has been added: the GDP of agricultural products and the minimum income of Thai citizens. After assigning the orientation, the opposite effects of Jasmine rice paddy price change are implied by the different directions of causeand-effect relationships. Furthermore, the constructed BN contains 13 relationships as a result of score-based learning. The dependency relationships are more complicated than the proposed BMIC and constraint-based learning approaches proposed previously. The findings of score-based learning indicate that changes in the paddy price of Jasmine rice can occur via the forwarding of numerous effects.
As a result, the explicit model of each algorithm reveals a distinct BN pattern. The optimal BN model should be the one that best reflects the cause-and-effect relationships that exist when the paddy price of Jasmine rice changes. The BIC score is used to assess how well-described the constructed BN is in light of its data. The BIC score is a log-likelihood calculation with an added penalty to avoid overfitting, which results in a negative value. The best model will return the probability with the highest value. The BIC scores in Table 5 demonstrate that the proposed BMIC algorithm outperforms the other two learning algorithms in traditional PGM. Therefore, the proposed BMIC obtains the best-fitting BN in order to learn the structure of Jasmine rice paddy prices.

Conclusion
The ability of the BN framework to interpret the model is exploited in this paper. The BN framework is used to describe the causal model underlying the Thai rice price situation, with an emphasis on changes in the paddy price of Jasmine rice. Given the existence of data uncertainty in the price of Thai rice and other relevant attributes, the use of BN probability inference provides a more complete explanation. By structural learning from the observed dataset, BN establishes a model that is fundamentally based on the Bayes' theorem. The NP-hard problem is frequently encountered during the structure learning process due to the super-exponential number of graphs in search space. We limited the number of variables in the Thai rice price system by using the MIC algorithm in order to capture only the strongest relationships. However, MIC results cannot be used to explain the causality of any target domain because of the undirected nature of MIC results. Therefore, the BLR model determines the edge orientation, with an emphasis on the posterior probability distribution's return. The BLR model assumes the normal (Gaussian) distribution sample to formulate the linear regression. Under the Bayes' theorem, the BLR model parameters are given by the inputs and outputs, of the linear model. The Bayes' theorem is applicable to models that are probabilistic and non-deterministic. Because BN demonstrates the independent relationships of a given joint probability distribution, each connected node presents a conditional probability that results the posterior probability. The posterior distributions for the dependent variables → and → are computed in both directions from the undirected MIC results, with the highest posterior probability distribution being preferred.
The Crl is used to evaluate the significant posterior probability distribution, with the HPD interval as a criterion. The orientation prefers the direction of dependent variables holding a minimum value of the HPD interval. The proposed BMIC elucidates the causes and effects of the paddy price of Jasmine rice via a BN model based on the estimated edge direction results. We discovered that the export price of Jasmine rice can affect its paddy price. This point is confirmed by [45], which discusses exporters' ability to forecast the direction of global rice market. Moreover, the model indicates that the paddy price of Jasmine rice can affect the domestic price of Jasmine rice. This information becomes useful for developing policies and coping with future circumstances, as we can track trends in the paddy price of Jasmine rice and its directed effects. The proposed BMIC's reliability is assessed by comparing network scores to those of a traditional PGMs. The network score indicates how well explained and fitted the model is. The score demonstrates that the proposed BMIC achieves the highest score, while the two conventional learning algorithms achieve lower scores. As a result, this study enhances the optimized model selection process by synergy a new BN structure from MIC results with the BLR model. As a consequence, the BN model establishes the cause-and-effect relationships for the paddy price of Jasmine rice, allowing us to continue developing a predictive model for our future work. and the office of Agricultural Economics for providing the historical data of Thai rice.

Author Contributions
All authors contributed to the study conception and design a novelty on its application. Conceptualization: Shuliang Wang; Methodology: Shuliang Wang and Tisinee Surapunt; Material preparation, data collection and analysis were performed by Shuliang Wang and Tisinee Surapunt. Writing -original draft preparation: Tisinee Surapunt; Writing -review and editing: Shuliang Wang and

Declarations
Ethical approval This manuscript does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent Informed consent was obtained from all individual participants included in the study.

Conflicts of interest
The authors declare that they have no conflict of interest.