Using BTA Algorithm for finding Nash equilibrium problem aiming the extraction of rules in rule learning

It is crystal clear that discovering the rules for finding a specific pattern among given data for extraction of association rules in rule-based learning systems has been defined in previous researches. Making use of game theory for the processes contributing to discovery of rules can be seen in numerous researches. In recent years, modeling based on game theory in rule learning sphere has gained much more attention for computer scientists. When two or more players use different strategies independently, the strategy game modeling could be used. In this view, strategic play is a desirable model for situations with no permanent strategic relationship among interactions. In addition, Nash equilibrium is the most widely used solution concept in game theory. This concept is a state-of-the-art interpretation of a strategy game. Each player has an accurate prediction of other players’ behavior and acts according to such a rational prediction. In the present study, by extracting rules from frequent patterns we have presented a model that can extract learning rules by abstraction based on game theory, which can be used not only for association rules but also for rule-based learning systems. Also, the introduced method can be easily generalized to fuzzy data. To find Nash equilibrium (FNE) in the proposed method, we used meta-heuristic bus transportation algorithm. The results indicated that the method reduces computational complexity in the associate rule discovery process and rule learning, provided that FNE is solved.


Introduction
Rule learning methods are popular techniques among data mining and machine learning methods with applications in expert systems, controller systems (especially fuzzy controller systems), describing economic processes and predicting related efficient strategies, distributed systems (especially cloud computing systems), and informatics for prediction and detection of different biological structures.Generally speaking, rule learning methods are efficient in any given systems dealing with rule-based strategies (Theocharopoulou et al. 2017;Iancu and Gabroveanu 2010;Millette 2012;Bell 2020).B F. Mahan mahan@tabrizu.ac.irM. Boudaghi m.bodaghi@tabrizu.ac.ir A. Isazadeh isazadeh@tabrizu.ac.ir 1 Computer Science Department, University of Tabriz, Tabriz, Iran All rule-based systems use the "if ... then" form to express rules.Therefore, this form is always used for finding the rules of frequent patterns in specific set of data.There is a clear difference between the point of view or the purpose of using rules dealing with specific set of data.The two general purposes for using learning rules in discovering frequent data patterns are to discover learning rules for describing and also predicting given data (Fürnkranz and Kliegr 2015;Novak et al. 2010;Carmona et al. 2018).Discovering the rules in rule learning to describe data means finding a pattern among frequent data.Data prediction rules in rule learning include the extraction of a set of rules that cover frequent data and the entire sample space, making it possible to predict each sample in the sample space.
In discovering data description rules, the focus is on the statistical validity of rules not on predicting the data.In this type of rule learning-in supervised learning mode-the goal is to discover subgroups when the desired characteristics exist.Because the desired relationship between items is explored-in unsupervised rule learning mode-the goal is to find associative rules (Novak et al. 2009;Fürnkranz et al. 2012;Triantaphyllou and Felici 2006).When rule learning aims at predicting the data, in practice, the training information is generalized so that predictions for new examples become possible.Therefore, when the entire sample space is being searched, it does not matter what the computational complexity is.In rule learning concerning the description of data, only the frequent part of the training data is being monitored to reduce computational complexity through new methods and algorithms.
So, there exist two fundamental challenges facing discovering rules in rule learning, namely to say the introduction of an algorithm that can significantly reduce the computational complexity and generalizability of the same algorithm to fuzzy data.In the present study, based on the first goal of rule learning, i.e., the discovery of rules to describe data, a method is proposed that uses game theory abstraction to extract rules by reducing computational complexity.This method can be generalized to fuzzy data in addition to factual data.The contribution of the present study includes the following issues: 1. Changing the abstraction of the problem related to the learning rule extraction in rule learning systems into game theory.2. Proposing an efficient Payoff function contributing to the reduction of the rule extraction computational complexities.3. Reducing the Nash equilibrium in rule extraction game's rule learning systems to 0-1 IP (integer programming) and solving the 0-1 IP with BTA (bus transportation algorithm) (Bodaghi and Samieefar 2019).4. Extracting learning system's super rules and extracting the same system's rules according to obtained super rules.5. Putting the computational complexity of the proposed method in comparison with the previous abstractions.6. Presenting the empirical results of the proposed method for extracting rules with hypothetical data.
Section 2 elaborates on previous research concerning the improvement of rule learning algorithms.Section 3 explains proposed method in five subsections; the first subsection, i.e., Sect.3.1, is about data preparation resorting to abstract game theory; the second subsection, i.e., Sect.3.2, refers to the calculation of player Payoff by abstracting game theory.The subsection introduces a new model for calculating players' (data generation resources) Payoff; the third subsection, i.e., Sect.3.3, covers the solution for created game d by finding the game's equilibrium points; the fourth subsection, i.e., Sect. 3.4, reduces the NEP to the zero-one integer programming (0-1 IP) problem; and the fifth subsection, i.e., Sect.3.5, describes how to solve the 0-1 IP problem through the BTA.The fourth section, i.e., Sect.4, discusses implementation details and introduces the results along with hypothetical data.

Related works
Recently, there have been various approaches regarding rule learning.Even more recently, parallelization methods have been introduced for optimization of the algorithms related to these approaches (Zhu et al. 2021), improvement of the Apriori algorithm (Huiqi 2020), methods for fuzzy rule learning systems (Cózar et al. 2018), or algorithms for improving rule-based classification performance (Liu and Chen 2019).The origin of all these new methods is the discovery of association rules.The recurrence of association patterns based on frequent patterns is evaluated according to Support and Confidence criteria.Rule detection algorithms such as Apriori and FP-growth and their improved variants use these two criteria to detect frequent patterns.Algorithms that use these two criteria to evaluate the resulting rules may undergo many repetitive and useless calculations.However, in improved versions of rule discovery algorithms significant efforts have been made to reduce computational complexity.In non-improvement mode (classical mode), in the Apriori algorithm the computational complexity is O(2 n ) + O(2 k ) (Telikani et al. 2020); in FP-growth algorithm, the computational complexity is O(n × n).Recent research has improved computational complexity for the Apriori algorithm, especially regarding massive data environments (Yu 2020) and the FP-growth (Shabtay et al. 2021) algorithm.
Many algorithms are used to discover and extract the rules in rule learning systems.They have progressed over time, but the traditional methods used for discovering the rules for finding frequent patterns can be considered as subsets of the two Apriori and FP-growth algorithms.The FP-growth algorithm operates in its search without frequent data production.Therefore, it does not have the weakness of the Apriori algorithm, but even the optimized versions of Apriori algorithm ultimately fail to reduce the computational complexity (Lin et al. 2011;Yin et al. 2018;Zeng et al. 2015) significantly.Although scholars have conducted adequate research to generalize the FP-growth algorithm for fuzzy data (Sabita et al. 2010;Hoque et al. 2015;Wang et al. 2017), the development of these algorithms for fuzzy data also has certain complexities.Since the 1990 s , much effort has been made to reduce these algorithms' computational complexity.However, in spite of available big data we still face major complexities in critical cases (Ai et al. 2018;Vasoya and Koli 2016;Ait-Mlouk et al. 2017).The methods being used to improve the Apriori algorithm can be divided into the following categories: 1. Some methods make use of specific data structures (Singh et al. 2015;Soysal et al. 2016;Wang and Zheng 2020).
For example, one of the most popular of these methods is the use of hashing function and a file with direct access to store items k > 1 in buckets where the amount of support is calculated based on buckets rather the individual data.Hence, access becomes easy to frequent patterns (Rathinasabapathy and Bhaskaran 2009).2. There are methods based on the elimination of data generation sources that do not produce frequent patterns.These data sources are like transactions that contain items which have not to be repeated in any other transactions.Thus, the discovery of such sources of data generation can reduce the search space of scenarios.Some of these methods are available in Yuan (2017), Cheng et al. (2015), Palshikar et al. (2007), and Vijayalakshmi and Pethalakshmi (2015) research.3.There are methods that allow parallel processing in big data environments.These methods generally divide the state space into different parts and define a value as a minsup, i.e., the minimum support value.The minimum support value of each part of the state space is considered a condition (this value must be equal to l > 1); other values less than l > 1 are not considered.From this value, the final rules are extracted.Of course, these methods usually do not pose superiority in computational complexity because they only ensure the results.As such, it is necessary to scan the entire database after extracting the patterns (Saabith et al. 2016;Xiangyang and Ling 2016;Gan et al. 2019).4. The use of sampling techniques in big databases is another method that has led to increased Apriori efficiency.Examples of these techniques can be seen in Chakaravarthy et al. (2009) and Thakur and Ninoria (2017) research.5.There are also studies that generalize the Apriori algorithm to fuzzy data which have preserved the algorithm's traditional abstraction.These methods face increasing computational complexity (Sowan et al. 2013;Mangalampalli and Pudi 2009;Pierrard et al. 2018).6.Recently, the use of machine learning techniques to discover frequent patterns in the Apriori algorithm has become commonplace, given that this algorithm uses the knowledge of the previous stage in the following one (Bhagat et al. 2010).Combined techniques such as PSO-Apriori, GA-Apriori, and other hybrid algorithms are used to discover frequent patterns in the Apriori algorithm (Djenouri and Comuzzi 2017).7.In recent years, the game theory has been used in learning techniques related to the Apriori algorithm's level-tolevel strategy.However, the basis of these methods is still the traditional Apriori algorithm.Hence, in some cases game theory has reduced the search space and increased the Apriori algorithm's efficiency (Wang 2006;Miyaji and Rahman 2011).8. Recently, a method of combining fuzzy logic and deep neural networks has been proposed (Asghar et al. 2020), which seems to be effective for extracting rules from repetitive patterns in a rule learning system.Although this method is designed to measure customer satisfaction on the web, the idea can contribute to efficient extraction of rules with high confidence.
Limitations of the previous studies could be summarized as follows: 1.The major drawback of the rule extraction methods introduced so far is their high computational complexity.
Although efforts have been made to improve their computational complexity, the limitations of the abstraction chosen by these methods have prevented from significant improvements in recent research in this area.2. In studies that have made use of game theory, the abstraction of the game theory has been utilized as part of the supposed method.For instance, the main abstraction of the method has been graphs or probabilities, and in part of the method, the game theory has been used to improve the method.The problem with such research is that the game theory's abstraction capacity is not fully exploited.For example, in these studies, there is no need for an efficient Payoff function, and therefore, we will not see a significant improvement in the old abstraction.3. Any reduction in the search space of the problem must not lead to ignoring the items of the problem that should have been used to extract the rules of learning.This has often been overlooked in previous research and therefore reduced the confidence of the obtained rules.
The chronological review of the rule extraction methods in rule learning systems is summarized in Table 1.So far, there are not any given methods that extract the rules based on the discovery of frequent patterns from the abstraction of the game theory from the beginning to the end of the algorithm.In the present study, we show how utilizing this abstraction can reduce rule discovery's computational complexity.

Materials and methods
During the past two decades, the game theory abstraction has focused on data mining activities including rule learning (Narahari 2010;Stahl 2000).For example, data mining activity can be modeled as a "non-cooperative game" performed by multiple players.A game defines players, possible acts for each player, and the Payoff of the players' actions.If no player  (Stahl 1997;Wang 2006).His method was based on logical behavior which has a probabilistic basis.This study strongly rejected Nash learning because of the probabilistic basis of logical behaviors.However, it introduced the rule's extraction based on a game by distributing the probabilities in the game's actions in a hypothetical game.Pham et al. proposed a method based on game theory for extracting association rules.In this research, each player's action in a game is assumed according to predetermined and defined strategies.The set of actions of each player against another player is quite clear, and each player's action can effect other players' actions.Finally, any given set of association rules of the studied system is extracted from individual player's practical actions.Because the method presupposes actual behavior instead of logical behavior, it can consider Nash equilibrium as a game answer designed to learn data prediction rules (Wang et al. 2006).As noted before, a definite Nash equilibrium cannot easily be found for rational behavior that has a probabilistic basis, especially for modeling data learning rules.Because a certain space has a wide range of strategies with different probabilistic distributions, and the probability distribution update at each round, the Nash equilibrium in these systems falls into the total find non-deterministic problem (TFNP) computational complexity class (Papadimitriou 2015).Nash equilibrium can also be reduced to the 01IP (Wu et al. 2014).Our main idea is to find rules that cover the most frequent examples.This idea can be modeled based on non-cooperative game.Concerning the Nash equilibrium concept, it is possible to find a method that can be systematically implemented to find rules that cover a larger set of instances.The equilibrium point in the designed game is where the search to find the rules ends.Because finding the Nash equilibrium point falls in the TFNP complexity class, we used the BTA.But before taking action to use the BTA, we reduced the Nash equilibrium to 01IP problem. Figure 1 shows the phases of rule extraction in a system based on rule learning according to the present study's proposed method.
Each of these phases will be explained in detail in the following sections.

Preparing data
This section explains how to prepare data for game abstraction.The non-cooperative game is a kind of strategy game.A strategy game is a model of interactive decision making.Each decision maker chooses its movement plan once and for all, and the choice is simultaneous.A strategy game includes: 1.A finite set N (set of players).
2. An infinite set A i for each player i ∈ N (set of available movements for player i).Having a player (any entity that produces data) forms the matrix of strategies (players).Given that each of the game agents (players) can adopt different strategies, we will use the extensive form to represent the present study's main game in a tree structure.The game's tree structure is a graphical representation of a stage game that provides information about players, final results, modes, and actions.The game's tree comprises some vertices indicating the nodes that players can act on it, and these nodes connect by an edge, which denotes the actions that may be performed on a specific node.The first node (or root) indicates the first decision that has been made.Each set of edges is considered as branches of the first node moving across the tree to the final node, representing the game's end.Each final node is tagged with each player's final result if the game ends in that node.Using a tree form to represent problem space for games is often appropriate.Root node includes game start point.For each node containing the current status, a decision must be made to select the best next action.A branch of the tree indicates each legible action.By applying an evaluation function, the status of the game is evaluated.Leaf knots represent the game's final state where one of the values can be a win, a draw, or a loss.In our proposed method, the status of game tree structure is constructed by defining n default states (by the number of players).The tree's root in each default state starts with one of the players and expands over time until it reaches the leaf.Each node is an agent (player) that selects a behavior (set of strategies) from among the possible behaviors that come out of the node as a downward edge.Static behavior contributes to lack of change in strategies.We consider the path from the root to each node as that node's strategy (player).Tabs are nodes that sum up the values of the combination of operating strategies.The critical point is that each player has a combination of predefined strategies at the beginning that moves from the root node to the entire set of the same strategies.This is because there may be very significant connections in datasets from the beginning that, without calculations, it is possible to discover the rules among the same datasets.The data are defined so that the Payoff of the combination of strategies increases along with it, and the final formed rules avoid duplicate calculations.Depending on the choice of roots, a different tree structure is created, which will vary depending on the number of strategic combinations of players.However, they are equal in the number of edges and nodes.If the number of players in the set P = {P 1 , P 2 , . . ., P n } is equal to |P| and the strategy Payoff for each player i isσ i , so that σ = {σ 1 , σ 2 , . . ., σ k } is the set of strategy composition Payoff of all the players in the combination form and each tree T i so that T = {T 1 , T 2 , . . ., T m } is the set of trees in the whole game where m = |P|!shows the number of game trees depending on the number of players.The time complexity of the game in each tree will be O(P σ ), and the complexity of the entire game space is at least O(P!) .In contrast, if the combination of default strategies based on the basic strategies of each player according to the number of combined strategies of each player reduces the computational complexity of the whole game can be reduced to a maximum of O(P).

Calculating the amount of Payoff
This section explains how to calculate the amount of Payoff via combining possible strategies for each player.We define game ϑ based on the structure of the game tree to discover rules in rule learning as follows: ϑ : G P , P, σ, Γ , E .
(1) P: is a set of players or agents that in the present study's game has n players whose behavior, as mentioned in the previous sections, is defined based on the set of data they produce.Therefore, it can be said that: (2) Usually, the data vertices of the graph (V ) or a subset of them are introduced as game agents, i.e., P ⊆ (V ).If we select a subset of nodes as game agents, the selection of agents from graph nodes can be introduced as a free parameter in this model.If we assume that v i nodes in the game graph are defined as G P , then the vector θ P represents the membership of nodes within the set of game agents, and then, we have: σ : At each stage, players can choose their behavior from a specific set (a set of data generated by them).If we assume that player P i has a permissible behavioral range σ i , then we represent and will have a set of these permissible behaviors for all agents by σ : The decision about what behaviors each player is allowed to adopt can also be considered as a free parameter.If we assume that all acceptable behaviors in the game memory in set σ Allowed , then: And the matrix θ σ with size |P| × k will be equal to: Γ : As mentioned before, each game node's strategy is equal to each player's sequential behavior originating from root to that node.If we assume that the adopted strategy in node χ memory in the set Υ χ , then the Payoff of P i in node χ is Γ i (Υ χ ).For each player, we can set up coefficients as free parameters independent of single variable agents.If in the function Γ i there is q free parameter, then matrix θ parameter with value of |P| × q will be defined as: θ parameter (i, j) is jth coefficients in the function Γ i for ith player; this parameter is a weight for a player.Thus, the function Γ i , in addition to the adopted strategy, also depends on matrix θ parameter , and the profit of the player P i in the node χ will be equal to: Γ is the vector of total players Payoff, which is obtained from their strategies.So: Like most frequent pattern algorithms, using (Support, ), the support value of each data set (item) is extracted, and then, the average of all supports is calculated and called Sup p ave .If in a player's strategy set expressed as A i = (a 1 , a 2 , . . ., a k ), and if the j'th strategy is in the form of a j ≥ Sup p ave , we call it φ j , otherwise we call it ϕ j .The value of Γ i for each player i is calculated as follows: where n is the number of φ in the set A i = (a 1 , a 2 , . . ., a k ) and m is the number ϕ in the same set.Also |P| × P k=1 |A k | is the total number of data generated by all players.If n i=1 φ i = 0, then Γ i = 0, and if ( n j=1 φ j − m j=1 ϕ j ) = 0 (i.e., n j=1 φ j = m j=1 ϕ j ), then: The critical point is that this method unlike conventional rules extraction methods has a strong abstraction for fuzzy data and can be easily generalized to this type of data.For fuzzy data, in addition to Γ i , it is sufficient to use the membership degree μ A i (a v ), which indicates the membership of the element v from the fuzzy set Ãi = (a 1 , a 2 , . . ., a k ), i.e., Finally, defuzzification methods can be used for (Γ i , μ A i (a v )) sets.
E: The condition for stopping the game is set by rule E because, in big games, the number of game stages can vary.When the balance is achieved, agents are reluctant to change their strategy.Nash equilibrium is the most well-known equilibrium in the game theory model, in which changing each player's strategy reduces that player's Payoff interest.

Eliminating the combination of redundant strategies
In this section, we explain how to determine the amount of minsup and eliminate the combination of strategies that are less than this amount.Priority relation i for player i in a strategy game with a Payoff function of u i : A → R (also called the utility function) is shown when The values of such function are called Payoff (or benefits).If the relation between a player's priorities and the efficiency function it represents is clear, we will use the N , (A i ), ( i ) sign instead of N , (A i ), (u i ) to define the game components.If we specify an arbitrary minsup value for the Payoff function (e.g., k u i ), then it is evident that it is always u i < k u i .Therefore, the game's components to be included in the rule extraction process in the present study are defined as N , (A i ), ((u i ) < k u i ) .

Reducing FNE to 0-1 IP
In this section, we explain that how to reduce the problem of FNE based on the remaining "combination of strategies" from the previous phase to the 0-1 IP problem.Nash equilibrium is the most widely used solution concept in game theory.
The concept is an interpretation of a state-of-the-art space of a strategy game.Each player has a accurate prediction of other players' behaviors so that acts on this rational prediction.In this sense, there is no attempt to test how this state-space creates.A strategy game equilibrium N , (A i ), (u i ) is a representation a * ∈ A of possible moves that for all actors i ∈ N has the following property for all a i ∈ A i : Therefore, for a * to be a Nash equilibrium, it must not be possible for player i to make a move that results in a better outcome than the selection of a * i , and each player j must also choose its equilibrium action a * j .In short, no player can make a lucrative move against the given moves of others.In some open problems, the formulation of this definition 13 is also useful.For each a i−1 ∈ A i−1 , B i (a −i ) can be defined as a set of the best acts of player i against the acts of the other a i−1 : Each finite N -player game with mixed strategy has at least one Nash equilibrium point.Strategies (n * 1 , . . ., n * N ) with n * i ∈ M i for i ∈ N form an equilibrium solution if: Two N -player games with the results functions a i n 1 ,n 2 ,...,n N , b i n 1 ,n 2 ,...,n N are called equivalent strategies if there are α i > 0 and the number β i for i = 1, 2, . . ., n such that: Nash solution with mixed strategy: The mixed strategy of y * 1 , y * 2 , . . ., y * N with y * i ∈ Θ M i for i ∈ N forms an equilibrium solution if: then 17 is a Nash equilibrium for the finite N -player game that has a mixed strategy.
One of the problems with non-cooperative game theory is that there are usually more equilibria for different outcome vectors.Another problem is that even if there is a single strategic Nash equilibrium, it may not be considered a logical response or a predictable Payoff.These problems occur when rule learning is modeled as a non-cooperative game.Therefore, we need an algorithm that detects all Nash equilibria so that we can logically evaluate the obtained equilibria to conclude that the obtained rules (at the equilibrium point) are valid.Therefore, our idea is to change the defaults of the BTA to meet these two goals.

Solve 0-1 IP with BTA
Now, we solve the 0-1 IP problem of the previous phase with the BTA.The Nash equilibrium problem is an NP-hard problem, but a particular case of finding Nash equilibrium for a two-player game can be reduced to an NP-complete problem, 0-1 IP, and one can solve this problem by BTA proposed in the present study.Regarding the reduction of the Nash equilibrium problem to the 0-1 IP problem, it is sufficient to adopt the following conditions: Mixing strategy (x * , y * ) with a double matrix (A, B) is called a Nash equilibrium solution if and only if in problem 18, x * , y * , p * , q * are hold 17.This equilibrium under the following conditions will be a 0-1 IP problem, which is a particular problem 17. max x,y, p,q x Ay T + x B y T − p − q Ay T ≤ pe m B T x T ≤ qe n x i = 0 or 1, ∀i Thus, the Nash equilibrium problem can be quickly reduced to a 01IP problem.It has already been pointed out that the present study suggested use of the BTA for solving the 01IP problem.However, BTA researchers believe that with modifications we can use BTA for continuous functions.This algorithm has a particular abstraction that includes buses, stations, and passengers.To use this algorithm in any search query, buses, stations, and passengers must be at first identified.The problem-solving method's main idea is implementing the BTA by loading and unloading passengers, which continues until the optimal answer is achieved.Stations are auxiliary memories that help us simplify and organize the problem at each phase, and as mentioned before, they provide an opportunity to evaluate the answers obtained.Passengers (variables zero or one) move among the stations in question to reach the stations we want (optimized answers); this shifting is based on learning, which oversees how individuals and groups function.The variables of the problem are intelligent travelers.As intelligent agents they have individual intelligence and try to optimize the problem's objective function in interaction with other factors and create collective intelligence.Although the number of stations can be unlimited, each station belongs to one of the following groups of stations: There are at least two stations in each station group: 1.The station that the variable's value becomes zero and it is loaded on the next bus.2. The station that the variable's value becomes one and it is loaded on the next bus.
Each bus is a processing system that contains several variables and transports them to different stations.So increasing the number of buses will mean running a separate bus algorithm that can be implemented with parallel programming.The number of stations and each agent's memory (variable), and the collective memory are fixed numbers.In the BTA, to improve the problem, a heuristic function will be added to the variables' intelligence so that the variables tend to a local optimization more quickly.Random selection between two modes of using heuristic function or the best solutions found in the group experience matrix proposes avoiding local optimization.The proposed heuristic function for each variable j is introduced as follows: α and β are intelligent numbers in the interval [0, 1] determined by moving to achieve the local answer.We also convert the following function for converting F to a vector with numbers in the interval [0, 1] because the answer of the above interval lies in the region [−1, 1].
First, to initialize the variables based on the heuristic function, the more extensive variables than 1 2 must be chosen and set with 1; if 10 percent of the variables do not gain the value 1, then more extensive variables than 1 3 will have a value of 1.This procedure continues to reach a limit of 1 t , where t is a constant number.This process helps us solve the problem and reduce repetition by removing variables, i.e., load/unload actions.The more feasible solution is, the more the constraints are satisfied, and so we can increase the significance of the objective function in the heuristic function by increasing.If we fail to find a feasible solution in our efforts, increasing the importance of the restrictions will increase β value.The value of β must be increased linearly along with a factor of 10% until it reaches the value above 1.If the value of 1 is achieved, we get the last value with a random number weighted average of 1 to 2. To improve the heuristic function, as the increment of one of these variables increases, the second variable decreases correspondingly.For each acceptable answer, a constant neighborhood radius also is considered for changing some other variables' value.If there are a consistent number of acceptable responses, the experience matrix becomes more potent because the possibility of other feasible solutions in the neighborhood of existing feasible solutions is high.The position improvement algorithm is used to find the neighborhood radius.The algorithm can terminate in two ways: 1.If all the variables go to the last stations, the feasible solution is sufficiently investigated, and the maximum recovery has also occurred with a very high probability.2. At the algorithm's input, a certain number ζ specifies the maximum number of loads/unload.If ζ exceeds the set value during the execution of the algorithm, the algorithm stops.Increasing the value of ζ means that the algorithm remains in local optimization and the answer cannot be improved by changing a fixed number of variables.As a result, the best thing to do in these cases is to restart the algorithm.Parallel implementation of the algorithm is another way, because different local optimizations will be generated and global optimization can be found among them.
Flowchart 2 shows how to extract rules using game theory abstraction, and the Nash equilibrium concept when reduced to a 0-1 IP problem input in each phase.The flowchart is the process of converting items into strategies and calculating related Payoff and then calculating the Nash equilibrium after reducing it to the 0-1 IP problem.Finally, the problem is Algorithms 1 and 2 are pseudocodes for the rule extracting process by abstracting game theory and solving the FNE with BTA.The output of the algorithm is a vector which its elements are 0 and 1. 1 indicates the existence of a strategy (data) in the extracted rules, and 0 indicates the absence of a strategy in the extracted rules.
Algorithm 1 Extracting rules by abstracting game theory and solving Nash equilibrium with BTA.

Computational complexity of the proposed method
Researchers showed that the Nash equilibrium complexity falls into the conventional PPAD class (Daskalakis et al. 2006).The PPAD class was first introduced by Papadimitriou (2015).PPAD is a subclass of the TFNP class, which is itself a subclass of the FNP.The TFNP class ensures that there is definitely a solution in the FNP class.Chen and Deng showed that since there is definitely at least one Nash equilibrium in strategic game, the complexity of the Nash equilibrium falls into a class that always has at least one correct answer.The PPAD computational complexity class is expressed as follows: Definition 1 A binary relation P(x, y) is a TFNP if and only if there is an algorithm in a deterministic polynomial time that can determine that P(x, y) has any given x, y and at least for each x exists a y.In PPAD subclass, there is the guarantee, so that P(x, y) is defined as a directional graph.
Given that in the proposed method the FNE problem would be solved by BTA-which is a meta-heuristic algorithm, and the complexity of this algorithm depends on the sensitivity required for the extraction rules-it is impossible to measure the computational complexity for this part of the method, but it is possible or other parts.After calculating the Support level of all items through a linear search at the order of O(n) the items that are less than the average value, Support can be omitted.If we assume that in the worst case no item is omitted, another search at the order of is O(n + n log n ) needed to calculate the Payoffs because each i item must be calculated in two Sup p avr ≤ Γ I tem i and Sup p avr > Γ I tem i modes.Therefore, the computational complexity of the present research's method, i.e., O(n + n log n ) would be in addition to the computational complexity of finding Nash equilibrium with BTA.

Implementation and results
To simulate, we considered 30 hypothetical transactions of a hypothetical store.Each of the ten items in the hypothetical store randomly contains some items purchased by the hypothetical customer.Each customer can buy any amount of items from 1 to 10.Therefore, out of these 30 transactions, we face with transactions with a variety of items purchased.
The goal is to find rules that have 60% confidence.So, a noncooperative game with 30 players is defined.That is, we have P = {P 1 , P 2 , . . ., P 30 }.The extensive form of the game tree will be defined based on Eqs. 3, 4, and 5. Figure 3 shows part of this tree due to space constraints.

Implementation
Given that in the proposed example, for simulation, σ = σ Allowed we have: The matrix θ σ with size |P| × 10 will be equal to: We said that Γ i (Υ χ ) is the amount of Payoff value P i in node χ .In function Γ i , there is the number of q free parameters, where q ≤ 10, and then, the matrix θ parameter with size of |P|× q will be equal to θ parameter = [θ parameter (i, j)] |P|× q .
If equation 16 N = 30 and M = 10 are in the continuous state, the goal is to find the answer to Eq. 17, and in the reduced form to the 0-1 IP, the goal is to find the solutions to Fig. 3 Part of the game tree form Eq. 18. Table 2 shows the hypothetical database of the present study.
Each player's strategy includes exactly A = (a 1 , a 2 , . . ., a k ), where a j = I i and if an item is not selected by the other player (transaction) then a j = 0 and otherwise a j = 1.Therefore, the number of elements in the strategy set is the same for all players, i.e., if |A i | means the number of elements in the set A for player i, then where k, as previously described, is the maximum number of data generated in the game k = 10.In other words, the data source or the game players (in our hypothetical example, are the transactions) make up the strategies.Equation 10is used to calculate the game Payoff for each player i as Γ i in the game Payoff state or Γ = Γ 1 , Γ 2 , . . ., Γ n .Table 3 shows the support value of each item (data) for the hypothetical example of the present study, and Table 4 shows the set of strategy A i = (a 1 , a 2 , . . ., a k ) for each player i plus Γ i based on Eq. 10.
Elimination of players who do not have useful strategies for extracting rules is a policy based on the amount of Payoff.Depending on the type of issue, the policy varies contributing to a significant impact on the computational complexity of the current method.For example, in Table 4, if the policy of selecting rules is based on 0.1 ≤ Γ i , then the rules are extracted from the set illustrated in Table 5; so, the search space is practically reduced.
This reduction in complexity is more pronounced when players with the same strategies are eliminated (Table 6).

Results
Experimental results showed that the more appropriate the stations' initialization based on the probabilistic distribution is, the faster the BTA can find the equilibrium points.Appropriate probability distribution means selecting the appropriate range of x i based on each strategy's Support value.Also, for each extracted major rule, a rank can be set relative to the other rules' Confidence value.And only a subset of the higher-order major rule can be selected as the final rule.This point will also reduce the computational complexity of the method used in the present study.Equilibrium points extract major rules; each subset will be a rule with a Confidence value above the specified value (60% in our example).Tables 7 and 8 show the extracted major rules and sub-rules and the Confidence value for each rule in the hypothetical example of the present study.In Table 7, the rank of each major rule specifies that in Table 8, only the subset of major rules in the first rank is selected as the final rule because the second extraction rule has zero Confidence and no validity.Experimental results after 100 epochs of BTA showed that BTA, in the worst case, could find equilibrium points in only seven epochs (Table 7), which is the same as the extracted major rule.Each subset of this major rule is the final extracted rule.Table 8 shows the extracted sub-rules and related Confidence value.Making use of game theory abstraction, we reduced the computational complexity.When facing processing constraints higher computational complexity is seen when applying a more extensive Payoff range, so controlling computational complexity would be quite smart.Figure 4 shows the BTA convergence rate in achieving the optimal solution and finding equilibrium points.This figure shows the top three executions of the algorithm.In all three executions of BTA, in the worst case, only seven epochs could deliver the answer.Gradually, each epoch took less time and finally reached the optimal answer.The best performance (blue chart) is converged to the answer only after five epochs of performance.
Table 9 shows the fuzzy generated data.This table is the fuzzy mode of Table 3, where each of the data has a membership degree.In fuzzy data, each strategy combination, and the value of Γ i , μ A i (a v ) is also defined, which can determine the value k where k is the number of strategies used by a player.Fig. 4 3 BTA performances in terms of convergence speed to the optimal solution (finding Nash equilibrium points) Figure 5 shows the BTA convergence rate to reach the optimal solution and finding the Nash equilibrium points when the data are fuzzy.This figure shows the top 3 executions of the algorithm.In all three executions, BTA used only 12 epochs to reach the answer in the worst case.Gradually, each epoch took less time and finally reached the optimal answer.The best performance (blue chart) is converged to the answer only after nine epochs of performance.
Table 10 shows a comparative analysis of the proposed method with improved Apriori and FP-growth algorithms.Generalization for fuzzy data, reduction of computational complexity, intelligent control of algorithm parameters, and control of search space for big data are the factors of the proposed method's superiority based on game theory's abstraction for rule learning.

Comparison with other rule extraction methods
To evaluate the performance of a model, two methods are usually used: The criterion for measuring the accuracy and precision of the extracted rules is calculated with two precision and recall indexes, which are obtained from the following relations: in which: tp is the correctly detected true-positive neighborhoods, fp is the correctly detected false-positive neighborhoods, fn is the correctly undetected false-negative neighborhoods, tn is the correctly undetected true-negative neighborhoods.
The f -measure criterion, which is actually a balanced combination of accuracy and precision measures, can be used in cases where the cost of false positive and false negative is different.If the cost of false positive and false negative is approximately equal, the same accuracy criteria can be used.The f -measure criterion is calculated by the following equation: Table 11 shows the comparison of testing accuracy with fivefold cross-validation results with the hypothetical data used in the present study.To compare the accuracy criterion of the present study's method with other methods, we implemented 30, 60, 90, 500, 1200, 1800, 3200, and 5400 hypothetical randomness together with all four methods; the results of the accuracy criterion are shown in Table 11.The results indicate a significant superiority of the present study's method over the methods selected for comparison.Chart 6 shows the accuracy stability of the decision tree, ERENN-MHL, LVI-FLCL methods, and the present study's method.As it can be seen from the results of the chart, the present study's method has better stability in terms of accuracy criterion than other compared methods.
Table 12 shows the average precision for fivefold with the hypothetical data used in the present study.In order to compare the precision criterion of the present research method with other methods, we implemented 30, 60, 90, 500, 1200, 1800, 3200, and 5400 hypothetical randomness together with all four methods.The results of the precision criterion are shown in Table 12.The results indicate a significant superiority of the present study's method over the methods selected for comparison.Chart 7 shows the precision stability of the decision tree, ERENN-MHL, LVIFLCL methods, and the present study's method.As it can be seen from the results of the chart, the present study's method has better stability in terms of precision criteria than other compared methods (Figs. 6 and 7).
Table 13 shows the average recall for fivefold with the hypothetical data used in the present study.To compare the recall criterion of the present study's method with other methods, we implemented 30, 60, 90, 500, 1200, 1800, 3200, and 5400 hypothetical randomness together with all four methods; the results of the recall criterion are shown in Table 13.The results indicate a significant superiority of the present study's method over the methods selected for comparison.Chart 8 shows the recall stability of the decision tree, ERENN-MHL, LVIFLCL methods, and the present study's method.As it can be seen from the results of the chart, the present study's method has better stability in terms of recall criterion than other compared methods (Figs. 8 and 9).
Table 14 shows the average f -measure for fivefold with the hypothetical data used in the present study.To compare the precision criterion of the present study's method with other methods, we implemented 30, 60, 90, 500, 1200, 1800, 3200, and 5400 hypothetical randomness together with all four methods.The results of the f-measure criterion are shown in Table 14.The results indicate a significant superiority of the f-measure method of the present study over the methods selected for comparison.Chart 9 shows the stability of the f-measure of decision tree, ERENN-MHL, LVIFLCL methods, and the present study's method.As it can be seen from the results of the chart, the present study's method has a better stability in terms of f-measure criterion than other compared methods.

Discussion and future work
In the present study, the researchers seek to propose a method with reduced computational capacity which avails more optimal and reliable extracted rules compared to other methods through changing the abstraction of the problem of rule extraction in a rule learning system to game theory.The main idea of the present study is based on finding super rules that can cover the most frequent rules and be modeled based on game theory so that the super rules are capable to be implemented through the concept of Nash equilibrium in game theory.Abstraction based on game theory has been the focus of data mining activities including rule learning for the past two decades.For example, data mining activity can be modeled as a "non-collaborative game" performed by multiple users.The basic elements of a given game include players, actions, the benefits of each action, and information.If no player deviates from their strategy, a Nash equilibrium is achieved.Recently, this kind of view or to be more precise this kind of mathematical modeling based on game theory has become very popular in data mining, so that research on new methods in data clustering, data classification, data pattern extraction, or data prediction models has been presented.Utilizing game theory is also to improve two algorithms, namely to say Apriori and FP-growth, and other algorithms for detecting repetitive patterns to extract rules that rely on previous information.Of course, the use of game theory is based on the initial abstraction of algorithms, and therefore, the use of game theory in extracting rules means utilizing hybrid algorithms that use game theory in some algorithms that use another abstraction and the basis of such algorithms has nothing to do with the abstraction of game theory.Primary methods for learning rules of predicting data are methods that introduce Nash learning.These methods are based on probabilistic abstraction.In methods that have probabilistic abstraction, Nash learning is strongly rejected because of the probabilistic basis of logical behaviors.But they introduce the extraction of the rules of a given game by distributing the probabilities in the actions in the game within a hypothetical game.This probabilistic distribution is updated with each round of play.After these researches, methods based on game theory were proposed to extract association rules.In these studies, the action of each agent in a game is assumed based on predetermined and defined strategies.The set of actions of each agent against another agent is quite clear, and the action of each agent can influence the actions of other agents.Finally, the set of association rules of the studied system is extracted from the effective actions of each of the different agents.Because these methods presuppose definite behavior instead of logical behavior, they can consider Nash equilibrium as a game answer designed to learn the rules of data prediction.But as it was mentioned before, the deterministic Nash equilibrium cannot be found easily for logical behavior that has a probabilistic basis, especially for modeling data learning rules.Because the game space has a wide range of strategies with different probabilistic distributions, and this probability distribution is updated at each round of the game, the Nash equilibrium in these 123 systems falls into the TFNP computational complexity class.Nash equilibrium can also be reduced to the problem of 0-1 IP.So we use the BTA which is designed to solve the 0-1 IP problem to approximately find the Nash equilibrium problem in the data learning rules game and extract the learning rules.In the present method the goal is to find rules that cover most repetitive instances.This idea can be modeled on a non-collaborative game, and through the concept of Nash equilibrium, a way can be found to systematically implement the rules for repetitive instances.In other words, the equilibrium point in a designed game is where the search to find the considered rules ends, and because finding the equilibrium point resides in the TFNP complexity class, we use the BTA to find these points.But before that Nash equilibrium problem is reduced to 0-1 IP.
In the present study, using the abstraction of game theory, we addressed the problem of extracting association rules used in rule learning systems.We turned the problem into a noncooperative game with N players, each having a limited set of strategies.Each player is, in fact, a data source of which the generated data by this source occupy the role of players' strategies.We introduced Eq. 10 to estimate each player's i Payoff.It is to say that equation's efficiency for adequate calculation of each player's Payoff can be proved by induction.We were looking for its Nash equilibrium points to solve the game because the equilibrium points are extracted major rules that their related subsets also have at least as many as confidence major rules.We reduced the game's FNE 0-1 IP problem and solved it through the BTA.After abstracting and simulating the problem, the following results could be summarized: 1.The computational complexity of the present research method has been less than the conventional optimized algorithms for extracting rules, and the amount of computational complexity can be intelligently controlled.2. Equation 10 is the prominent equation for the abstraction of game theory's problem and determines the player's Payoff based on used strategies.Therefore, the efficiency of this relationship is essential and approvable.3. BTA was able to solve the FNE after it turned into a 0-1 IP. 4. In spite of increased computational complexity, the proposed method's efficiency is still reliable when the data are fuzzy.
Although the researchers could obtain an efficient method in terms of computational complexity and validity of the rules via abstracting game theory for the problem of rule extraction based on rules, the following can be considered as limitations of the present study and future research perspectives: 1.The Payoff function (formula 10) introduced in the present study is a fundamental function that in case the value of minsup is not defined correctly, it cannot function correctly in identifying the items that should participate in the final rules.Solving these problem of the introduced function can be a topic for future research.2. Although the BTA operates well to find the Nash equilibrium-after the problem is reduced to a 0-1 IP problem-like any other meta-heuristic algorithm it cannot guarantee the best responses to the FNE problem.As a suggestion for future research, the use of approximate algorithms, especially algorithms based on computational geometry, can guarantee reliable approximations for FNE.

Fig. 1
Fig.1Phases of rule extraction in a rule learning system using game theory and BTA we call the game finite.4. Define set C of results as a function g : A → C where this function is related movements to its results, and we define it * i of priority relation on C. The priority relation i for each player i is defined as a i b if and only ifg(a) * i g(b).

Fig. 2
Fig.2Extracting rules with the proposed method

Fig. 5
Fig. 5 Top 3 BTA performances in terms of convergence speed to the optimal solution (finding Nash equilibrium points)

Table 1
Outstanding researches on rule learning from the beginning up to now 1. Stationary Station Group (SSG): No doubt the variables in this groups possess certain values.2. Short-Term Station Group (STSG): The variables of this group do not have specific values.The variables entering this station remain in the problem, while the other variables would stay out of the problem.3. Medium-Term Station Group (MTSG): This group includes those variables that reside somewhere between indefinite and approximately definite states.Variables either return to short-term stations or go to long-term stations through buses.4. Long-Stage Stations Group(LSSG): In this group, the variable are in a balanced state to some extent, and probably, the accuracy of the variables is high.As such, they might return to the middle stations based on future experiences.

Table 2
The hypothetical database used in the present study

Table 3
Support percentage of each item (data) and average support

Table 4
Strategy set for each player and Payoff of each strategy set

Table 5
Players with candidate strategies for extracting rules based on Payoff value (0.1 ≤ Γ i )

Table 7
Finding Nash equilibrium points by BTA (extracting major rules, the confidence value, and related ranks)

Table 10
Comparative analysis among the proposed approach, Apriori, and FP-growth To evaluate the model based on the first procedure, we rely on the data that have been observed and used to structure the model, which is called cross-validation.We expect the model to have the least sum of squares of error compared to any other given models.If the sum of training data is differentiated randomly into k sub-sample or fold with the same volume, in each stage of CV, k −1 number of these layers can be consid-

Table 11
Comparison of testing accuracies with fivefold cross-validation results Stability of the f-measure of decision tree, ERENN-MHL, LVIFLCL methods, and the present study's method