Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data

Recently with the advancement of high-throughput sequencing, gene regulatory network inference has turned into an interesting subject in bioinformatics and system biology. But there are many challenges in the field such as noisy data, uncertainty, time-series data with numerous gene numbers and low data, time complexity and so on. In recent years, many research works have been conducted to tackle these challenges, resulting in different methods in gene regulatory networks inference. A number of models have been used in modeling of the gene regulatory networks including Boolean networks, Bayesian networks, Markov model, relational networks, state space model, differential equations model, artificial neural networks and so on. In this paper, the fuzzy cognitive maps are used to model gene regulatory networks because of their dynamic nature and learning capabilities for handling non-linearity and inherent uncertainty. Fuzzy cognitive maps belong to the family of recurrent networks and are well-suited for gene regulatory networks. In this research study, the Kalman filtered compressed sensing is used to infer the fuzzy cognitive map for the gene regulatory networks. This approach, using the advantages of compressed sensing and Kalman filters, allows robustness to noise and learning of sparse gene regulatory networks from data with high gene number and low samples. In the proposed method, stream data and previous knowledge can be used in the inference process. Furthermore, compressed sensing finds likely edges and Kalman filter estimates their weights. The proposed approach uses a novel method to decrease the noise of data. The proposed method is compared to CSFCM, LASSOFCM, KFRegular, ABC, RCGA, ICLA, and CMI2NI. The results show that the


Introduction
In recent years, the extraction of interactions between genes has become one of the main goals of systemic biology. To achieve this, the use of network modeling is highly fruitful /very useful. Various networks are used/utilized for this purpose, including metabolic networks, Protein-Protein Interaction Networks(PPI), and Gene Regulatory Networks(GRNs) [1]. Gene regulatory networks are used to represent interactions between genes at the genome level. Information from these networks is beneficial in developing new therapeutic approaches and understanding how living organisms function at the molecular level. These networks consist of a number of nodes and edges between them, with each node representing a gene and the edges reflecting causal links between these genes. In fact, gene regulatory networks are the networks of regulatory and causal interactions between genes. These networks are sparse communication, large in size, dynamic, recursive, and nonlinear. The purpose of modeling this network is to simulate the effect(s) of genes on each other [1,2].
The inference of gene regulatory networks is referred to as the modeling process that best mimics the behavior of the gene regulation network based on gene expression data. The data used is a time series gene expression matrix, in which columns and rows represent genes and time points, respectively. In this data, the number of time points is much smaller than the number of genes, which is usually due to the high cost of experiments. Therefore, it is crucial to use a method that involves a proper estimation of the system from such data [3,4]. The inference of gene regulatory networks faces many challenges, for instance, Gene expression data includes a large number of genes and a small number of samples, which makes the learning process more difficult and less accurate. Gene expression data is uncertain and generally noisy. The number of parameters to be estimated increases exponentially with increasing the number of genes. Causal relationships between genes are nonlinear and the learning method must be able to consider such relationships as well [3].To address these challenges, several computational methods have been proposed to infer the Gene Regulatory Network, which are categorized into four classes: Co-expression-based methods [5], Supervised learning-based methods [6], Model-based methods [7] and Information theory-based methods [8].
Co-expression-based methods have low computational complexity, but are not capable of extracting the direction of interaction(s) and system dynamics. Supervised learning-based approaches such as SIRENE [6] and GENIES [9] use well-known regulations in genomic data to infer gene regulation networks. In [10], a supervised learning method based on a support vector machine is proposed to infer gene regulation networks. In this method, various features are extracted based on the distance graph profile of genes. The edge between the gene pairs is then checked using a support vector machine. The accuracy and efficiency of these methods excel other ones/methods; however, the number of datasets that provide sufficient information about known regulations is very limited that would confine the use of these methods.
On the other hand, model-based methods consist of a wide range of gene regulation network inference methods, such as Ordinary differential equations [11], linear regression [12], linear programming [13], Boolean networks [14] and probabilistic graphical models including Bayesian networks [15][16][17] and graphical gaussian model [18,19]. Model-based approaches provide an indepth understanding of the behavior of systems at the network level. Moreover, these methods are capable of inferring the direction of interaction in the gene regulation network, despite having high computational complexity and difficulty of usage in large-scale networks. Information theorybased methods have the ability to identify nonlinear relationships between genes; and can be used for large-scale networks. However, information theory-based methods are not able to detect the direction of regulatory interactions, thus leading to false positive predictions [20]. Two of the best methods in this field are MRNET and ARACNE methods [22,21]. These methods use mutual information (MI) between gene expression profiles and a feature selection method (Minimum Redundancy Maximum Relevance-MRMR) to infer the interaction between genes. More precisely, these methods consider each gene (j) as the target and other genes (V) as its regulators. In this case, the MI is calculated between the target gene and the regulators. The MRMR method is then applied to select the best subset of regulators.
In [23], the GRNTE method is proposed. In this method, the interactions between genes are calculated based on partial mutual information. In [24], the maximum information coefficient is used to infer gene regulatory network. In this method, an undirected network is first built based on this criterion. The direction between genes and targets are then identified. In this step, the mean conditional entropies of each pair of nodes (or genes) are used to indicate the direction of the edges.
In model-based methods, regression methods have been considered. The GENIE3 method is one of the most popular methods based on decision tree models. This method is similar to the MRNET method in that one gene plays a target role while other genes play a regulatory role, and it makes the use of a feature selection method. Unlike the MRNET method, random forests and Extra trees are used for regression and feature selection [25].In this method, one gene is considered as the target and the rest of the genes are considered as regulators. A set of decision trees are created for each of the target genes. The genes used in decision trees are ranked, and low-ranked genes are removed based on a threshold. Finally, the remaining genes are considered as regulatory ones. Because of the high efficiency of this method, it has been used as a basis for many other methods [26][27][28][29]. The Jump3 method is a modified version of the GENIE3 method to be used in time series data [26]. In [28], a rotation forest-based method is proposed in order to accurately infer the interactions between genes. The use of a rotation forest, unlike a random forest, allows the best features to be extracted from gene expression. The GRNBoost2 method is an efficient algorithm to infer the gene regulation network using gradient descent, based on the GENIE3 architecture. In this method, Arboreto computational framework is used for the inference of the scalability of gene regulation networks [29].
One of the good model-based methods is Bayesian network method. There are advantages in using Bayesian networks to infer the structure in gene regulation networks. For example, Bayesian networks, can still perform well on lost and noisy data because of their probabilistic nature; the comprehensibility of probabilistic relations in the graphical form of Bayesian networks. In addition, Bayesian networks have the ability to avoid over-fitting when the amount of learning data is not large. However, there are two problems in using Bayesian methods in inferring the gene regulation network: one is to find the optimal graph and the other is its inadequacy in inferring large-scale networks. One solution is to decompose the large network into smaller networks. The use of local Bayesian network (LBN) methods has been considered in [30]. In [31], the Bayesian model averaging and linear regression (BMALR) method are used to infer the interactions of biological systems. This method uses a new way to calculate the posterior probability of edges, i.e. from regulators to the target gene. This method is based on Bayesian model averaging and linear regression. The BMALR method applies linear regression to the target gene and all possible combinations of other genes. The final score of the edge from the parent node to the target node is based on the sum of the subsequent probabilities of the linear regression models. In [32], a combined method based on BMALR and clustering method is proposed. In this method, prior knowledge matrix is extracted using clustering method. Then, a gene regulation network is created by the use of Bayesian averaging method and the extracted knowledge matrix. The Expectation Propagation(EP) for Bayesian Network Learning is suitable for large-scale networks. Narimani et al. have used this method to learn the Bayesian network and infer the gene regulation network [33]. A Bayesian theory-based method called BGRMI has been proposed to infer the gene regulation network from time series data [34]. In this method, Bayesian framework is utilized to calculate the probability of different models of the gene regulation network. It then uses an exploratory search method to navigate and find the space of the effective model. In [35], in order to overcome the problem of small number of data samples and the large number of features in gene expression data as well as the presence of noise in the data, a Bayesian network-based method using a Gaussian process called BINGO is proposed. This method infers the gene regulation network using a non-parametric approach, including statistical sampling of gene expression profiles. The results show that the proposed method has acceptable efficiency. In [36], the gene regulation network is created on noisy time series data, based on the probabilistic Petri net. In order to infer the gene regulation network from the noisy time series data and to solve the problem with a false positive, a method based on Kalman filter and Bayesian network is proposed in [37,38],. In this method, Bayesian network parameters are estimated using the Kalman filter. The edge space in the created graph is also pruned using conditional mutual information. In [39], fuzzy Cognitive maps (FCMs) have been used to construct the gene regulation network. In this method, large networks are created using mutual information and FCM. Fuzzy Cognitive map performs well in identifying the relationship and the weight between genes, while in large networks, they face problems since learning this mapping on a large scale has time-consuming algorithms. To solve this problem evolutionary optimization methods [40] and multi-objective genetic algorithm [41] have been proposed. In [42], a hybrid method of multi-objective genetic algorithm and random forest based on FCM have been used to infer the gene regulation network. In this method, FCM is first used to build the network. Then, a multi-objective genetic algorithm is used to identify the interactions. Finally, using random forest, candidate genes are identified for each target gene.
In this paper, Kalman filtering and compressed sensing are used to learn the gene regulation network using fuzzy cognitive map. Compressed sensing is a new optimization method for the reconstruction of sparse signals, which has become widely used in signal and image processing. This algorithm effectively provides the ability to reconstruct a network with sparse interactions by assuming sparsity. Unfortunately, this method is sensitive to noise and does not produce good results for time series data that are inherently noisy. The Kalman filter is a method that allows the extraction of system state from a series of samples that may contain noise. Although the Kalman filter is robust to noise, it does not meet the sparsity of the estimated network , thus reducing the accuracy of the model. Thus, we expect that a combination of compressed sensing and Kalman filtering provides a good way to learn the gene regulation network using fuzzy cognitive mapping.

Fuzzy cognitive map
Fuzzy cognition map is a robust model for modeling dynamic, non-deterministic and nonlinear systems, which have made these models a suitable platform for modeling gene regulation networks. In this network, the neurons are known as the concept, and the weights, being usually a number in the range of [-1,1], represent the causal relation between the concepts in the network. The activity value of a concept is a number in the range of [0,1] [45]. The fuzzy cognitive map is a weighted directed graph, represented by a multiple (V, D, W, f) where V is a set of n concept nodes ( = { 1 , 2 , . . . , }) , and the state values of these nodes are denoted as a vector = { 1 , 2 , . . . , }, where ∈ [0,1] | = 1, . . , . The relationships between genes are defined as an n×n weight matrix W and f is a state transfer function and is used to estimate the state values of the concepts at the next time point, based on the current state of the network [45]. The states are updated based on the following equation.
There are various transfer functions that can be used. The comparison study suggests that sigmoid transfer functions outperform the others in general [45]. Thus, the following sigmoid transfer function is employed. (2) The purpose of learning FCM is to find the correct weights so that the FCM learned is very close to the observed data. The problem is that the number of weights to be learned is the quadratic function of the existing concepts. In the discussion of the gene regulation network, given that the number of genes is high, the number of weights that must be optimally calculated is extremely high. Therefore, powerful learning algorithms are needed to control these large-scale issues. The most important algorithms for FCM learning are classified into three types on the basis of the type of knowledge used: Adaptive methods , Population-Based Methods and hybrid methods [45]. In adaptive methods, weights matrices are determined based on algorithms using expert knowledge such as differential Hebbian Learning, balanced differential algorithm, active Hebbian learning, nonlinear Hebbian learning, and nonlinear Hebbian learning based on data. All of these algorithms update the initial model weights in an iterative process, using the existing data and learning formulas defined in [45]. In optimization-based methods such as Particle Swarm Optimization, Simulated Annealing, Real-Coded Genetic, Tubo Search and ant colony optimization, weights matrix is calculated based on learning data. The main purpose of using these algorithms is to remove experts from learning fuzzy cognitive mapping. In hybrid methods, expert knowledge and learning data can be used together. When these two sources are available together, the advantages of both methods can be exploited of by these hybrid methods [45].

Compressed sensing
Compressed sensing is an optimization problem in the form of the following equation: where || || 1 = ∑| | is norm-L1 of vector X. Y and Q are the sensing matrices that are described next in this section. The advantage of compressed sensing is that the number of samples can be much less than the number of elements in the unknown vector (n >> m). Dividing the fuzzy cognitive map learning problem into several separate sub-problems of learning the local connections of each gene means that each of the W columns is learned by a separate optimization problem. To turn the fuzzy cognitive map learning problem into a compressed sensing problem, Equation (1) can be rewritten as follows: is the inverse of the function f. The above equation can be written by changing the variable as follows: where i Y is the i column of the Y matrix, and Y and Q are defined as follows: Finally, the problem of learning column i from the weight matrix of a fuzzy cognitive map can be expressed by a compressed sensing optimization problem such as the following equation :

Proposed method
In this paper, a hybrid method to infer gene regulation networks in the form of fuzzy cognitive mapping from time series data, called KFCS-FCM, is proposed. This method uses a combination of compressed sensing and Kalman filter to estimate the fuzzy cognitive map weight matrix. As mentioned, gene regulation networks are sparse in terms of interactions between genes. This feature is a challenge in most methods of inferring gene regulation networks. In spite of the fact that this method involves a lot of errors against noise data, compressed sensing has a high accuracy in cases where the time points of the data are much smaller than its dimensions. It is also a batch method, which means that it is necessary for all time points to be available at the moment of starting the inference process. On the other hand, the Kalman filter method alone is not appropriate for inferring the gene regulation network because of the small number of samples compared to genes. The compressed sensing Kalman Filter creates an online inference method that is more robust to noise than compressed sensing by reformulating the least squares error estimation problem.
Filtered Kalman compressed sensing estimates the initial support set with length S (the set of genes or coefficients that regulate the target gene is called the target gene support set) by usual compressed sensing and, it then uses the Kalman filter to obtain a more accurate estimate of the elements in the support set. Innovation error may sometimes increase in the estimation process by the Kalman filter. At this time, the required changes in the support set can be estimated by performing a compressed sensing on the innovation error. The Kalman filter is then run with a new support set. In this process, during this process, if some coefficients remain close to zero, they can be removed from the support set. Assuming that compressed sensing always has a very accurate estimate of the support set, then the KFCS method will always have an accurate Minimum Mean Squared Error (MMSE) estimate for the problem. Since the compressed sensing is very likely to return an accurate estimate of very sparse vectors, we use it to obtain a very sparse vector of changes in the support set using a filter error (support set changes are always very small and sparse). [46].
In the proposed method, if the number of genes is assumed as Ng and the number of response sequences and time points of the time series data as Nr and Nt, respectively, then in each iteration of the algorithm, one gene is selected as the target gene and the algorithm attempts to obtain an appropriate estimate of the amount of other genes on the target one. In fact, the problem of building a gene regulation network is divided into Ng sub-problems. In each sub-problem, if we assume that the sparse vector where w is the measured noise and 2 obs  is the variance of the Gaussian probability distribution of the measured noise. In gene regulation network inference, there is a y vector for each gene. Using the Kalman Filtered compressed sensing, one xt is inferred for each yt (y of noise). Ultimately, the best xt with the least data error is selected as the result. The general steps of the proposed method are shown in Figure 1. The pseudocode of the proposed method is shown in Figure 2.

Preprocessing
At this stage, if the gene expression matrix is as follows, a sensing matrix ( where w1 = 0 ,

Problem decomposition:
At this stage, the inference of the gene regulation network is divided into Ng sub-problems. In this respect, in the i-th sub-problem, the i-th gene is selected as the target and other genes are considered as regulators. The input to the i-th sub-problem is matrix A along with the i-th column of the Yk matrix's and the output is the i-th column of the weight matrix of the fuzzy cognitive map of the gene regulation network (Wi). The following steps are repeated separately for each of the sub-problems.

Initialize the support set:
The vector T is the support set of the vector x, meaning that T contains only the index of non-zero elements of x. The initial value of the support vector is empty ( 0 T = ). In the next step, due to the fact that the Filtering Error Norm will increase for the empty support set, compressed sensing is automatically used to estimate T 1 .

Kalman filter step:
Assume that the support set corresponding to t = 1(T 1 ) is available and the first change in the support set occurs in a tt = time . The system model and measurement model of Kalman filter for gene regulation network in the form of fuzzy cognitive map are as follows: The prediction step of Kalman filter will be as follows: Initial values 0 0 x = and 0 0 P = . The next step is to update the Kalman filter as follows:

Recognize the need to add new elements to the support set:
The innovation error in estimating nodes for time series data is the difference between the observed value of a variable at time t and the predicted value of that variable using the existing knowledge up to time t.
In the inference problem up to time a tt  , the innovation error is as follows: At a tt = a new set is added to the support set of Xt. At this time, the innovation error changes as follows: where the set is unknown. At time a tt = , t y has a non-zero mean ( () t Ax ) . In other words : That c T  is a non-zero set that must be specified at the present time. Therefore, the question of whether an element should be added to the support set is mapped to the question of whether the innovation error (

Estimation  using compressed sensing:
If the filter error norm This means that only coefficients of ̂ that are larger than this threshold are added to  . Including the new elements, the new support set will be t new TT = +  .

Execution of Kalman filter update step:
In this step, from each iteration, the update phase of Kalman filter is executed on new TT = based on equation 17 .

Perform compressed sensing and Kalman filter update step Repeatedly:
With one step of performing the compressed sensing (step 6), all the correct elements of the  set could be estimated incorrectly. To solve this problem, step 6 and step 7 can continue until the soft filter error ( 1 () t tt ff fe ty −  ) falls below a threshold or until set  is empty. It is possible to add a lot of wrong coefficients to the support set at this point. Which necessitates the performance of the next step in order to solve this problem.

Remove coefficients close to zero from the support set:
At this point, coefficients of zero (close to zero) as well as coefficients that are incorrectly added to the support set due to compressed sensing error must be removed from the support set (Tt increasing the deletion error. To avoid this problem, the removal operation is performed only when the Kalman filter is in the removal mode.

Dataset
The data sets used in this paper are DREAM3 and Dream4 data sets. These data sets contain time series data in different sizes. DREAM3 is a set of networks with sizes of 10, 50 and 100 genes, with each having 5 different types. The various types of networks are Ecoli1, Ecoli2, Yeast1,

INPUT D # Is the time series gene expression matrix OUTPUT
W # is the FCM weight matrix Preprocessing: 1: A = D t 2:  There are two sizes to the networks in DREAM4: 10 and 100, with each size including 5 different types.

Evaluation criteria
SSmean, DataError, OutOfSampleError and ModelError criteria were used to evaluate the accuracy of the inferred network structure. For this purpose, the inferred weight matrix must be converted to a binary matrix. Hence, all weights that are less than 0.05 are considered zero and the weight of other interactions is one. The weight matrix determines which genes are related to each other. To obtain the accuracy of the model, the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) edges must be extracted. Based on these four values, the SSmean criterion is calculated as follows.  (27) where Ng is the number of genes. ij w is the element of row i and column j of the weight matrix for target cognitive map (W). ij w is also the element of row i and column j of the weight matrix of the estimated cognitive map (W ). The closer this criterion is to zero, the better the proposed method.
The Data Error criterion is to evaluate the difference between the main network data and the data generated by the fuzzy cognitive map. (28) where Nr are time frames, Nt is the number of time points in each time frame.
This criterion is similar to Data Error, except that the initial state is generated randomly and the values generated by the main network are compared with the values generated by the inferred network.

Evaluate the proposed method on time series data
The majority of the existing methods to infer the gene regulation networks in Dream3 and Dream4 use only the last 11 time points of each time series. However, in this paper, experiments are performed on all time points of each data set to maintain generality and also to extend the results to data whose structure is unknown (to us). In these experiments, the initial value of the parameters is Given that in the proposed method, gene regulation networks are modeled in the form of fuzzy cognitive map, it is essential to select the appropriate parameter λ for the sigmoid function for fuzzy cognitive map, to increase accuracy. In this experiment, the effect of changes in sigmoid function on the performance of the proposed method for Dream3 and Dream4 networks has been studied. The parameters of λ increase from 0.1 to 6 with steps of 0.1 (for λ less than 1) and 0.5 (for λ greater than 1). The results are shown in Figures 3. In this experiment, the Model Error criterion is calculated by binarizing the results of the proposed method and comparing it with the standard network in Dream networks.
Model Error results include a lot of errors because of data loss caused by binarization. Nevertheless, its results can be used to find the right density. The results are shown in Figure 1 as the average of the Dream3 and Dream4 dataset with the same size, and in the diagrams Dx-y which show the average of all networks with y node in Dreamx. The results show that the selection of λ for the sigmoid function in the range of 0.8 to 1 leads to better results (in total of the three tested criteria) than other values. The parameter λ = 0.8 is used to compare the proposed method with other available ones. The results are shown in Table 1. These results show that with the increase of the data length coefficient ( ), the SSmean criterion increases ,while other criteria decrease. For fixed , adding up to the number of genes generally decreases the SSmean criterion and increases other criteria. Moreover, when the density increases, poorer results will come up because the number of wieght of the edges to be learned will increase, but the available data will remain unchanged. The quality of learning also declines when the density is constant and there is an increase in the number of nodes. The reason for this problem is that the number of edges to be learned in this case goes up.
In order to compare the proposed method with the existing methods, the 25 networks in Dream3 and Dream4 have been compared to CSFCM [43], LASSOFCM [8], KFregular, Artificial Bee Colony (ABC) [47], RCGA [48], ICLA [49] and CMI2NI [50]. The criteria used in this experiment are SSmean and Data Error, the results of which are expressed in Tables 2 and 3, respectively. In these tables, the Dx-y-z dataset represents network with number z with y gene from the Dreamx dataset.
The results of Table 2 show that, in most cases, the proposed method has better SSmean criteria from CSFCM and CMI2NI methods. The proposed method outperforms other methods in all cases. In terms of Data Error, in most cases, the proposed method performs better than other methods.  In fact, it has the best performance in 22 out of 25 cases, and in only 1 case it is weaker than KFregular and LASSOFCM methods, and in two (cases it is weaker) than ABC method. This shows that the proposed method has better performance than other ones. In addition, the results show that the proposed method has good inference when both the number of genes and the network size are large.

Effects of the threshold parameters
There are a parameters (thr) in the proposed method, which determine whether or not there is a interaction or an edge between two genes in the reconstructed GRN. In order to evaluate the impact of the parameters in our method, we performed simulations on dataset100 by calculating SSmean with different values by fixing another parameter, and the simulated results are shown in From Fig 4, we found that the SSmean value increases gradually in the range 0 thr<0.6, and reaches the highest value (SSmean = 0.961 and thr=0.58). The experimental results show that we should select the suitable parameters for different datasets to obtain the best GRNs. In this section, the proposed approach is compared with the several well-known approaches which are briefly described as follows:  GENIE3, is an algorithm for inferring regulatory networks from expression data using tree-based methods. The implementation of matlab codes by its authors and with default parameters and protocols are used [25]. BMALR is an algorithm for inferring cellular regulatory networks with Bayesian model averaging for linear regression algorithm. The author's system code is used [31]. TIGRESS, this method solves the network inference problem by using a feature selection technique (LARS) combined with stability selection. In the method Web-based platform is performed [51]. G1DBN is a method based on dynamic Bayesian network [52]. MRNET algorithm has been implemented by the minet package into R Language [21]. Jump3 [26] is based on a formal on/off model of gene expression, but it uses a non-parametric procedure based on decision trees (jump trees) to reconstruct the GRN topology. TIGRNCRN, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modelling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done [32]. To have a better comparison, the ROC and PR curve has been drawn for two networks100 in size selected from DREAM4 dataset (see Figure.5 and 6). As is evident from two figures, the proposed method has produced better results compared to other methods. Further, the PR curve has been shown in Fig.6. Based on PR curve, the proposed method has more acceptable results. In these figures, the area under the diagrams of ROC and PR curves is greater than other methods. This indicates that the proposed method has succeeded to achieve a balance between the false negative and positive rates.

Conclusion
In this paper, fuzzy cognitive map is used to infer gene regulatory network. It is capable of being used in complex, uncertain and nonlinear networks. To learn the fuzzy cognition map, we have used the Kalman filter and compressed sensing. The proposed method has the following advantages: Creating a spars network, stability against noise, creating a proper balance between data error and network structure, ability to use prior knowledge. The proposed approach uses a novel method to decrease the noise of data. The proposed method was compared to CSFCM, LASSOFCM, KFRegular, ABC, RCGA, ICLA, and CMI2NI. The results show that the proposed approach is superior to the other approaches in fuzzy cognitive maps learning.