Information-theoretic analysis of complex systems modeled by two dimensional pairwise Ising models

Stochastic complex systems are composed by a large number of seemingly simple variables that exhibit non-linear interactions with each other, causing the emergence of complexity and non-deterministic dynamics in the edge between order and chaos. Hence, the evolution of these systems seems to be completely out of control, with unpredictable behaviors. In this paper, using information geometry as a mathematical approach to chaos and complexity, we investigate how information theory can be used to analyze the dynamics of pairwise Ising random ﬁelds along Markov Chain Monte Carlo simulations in which phase transitions are observed. Our experiments indicate that Fisher information regarding the inverse temperature parameter can bring important insights, since it signalizes changes in the global spatial dependence structure. Information-theoretic curves are built to show that, despite the random nature of the system, it is possible to identify an asymmetric pattern of evolution when the system moves towards diﬀerent entropic states.


Introduction
In the last decades, there has been a great interest in complex system science, whose goal is to understand how the interaction between different parts of a system give rise to collective behaviors [7,23].A feature of complex systems is related to the propagation of information through local rules in a way that the whole is controlled by a dynamics that cannot be fully understood by analyzing only its parts [30].Besides, it has been observed that complexity emerges at the edge between order and chaos [34].Being able to capture patterns in a seemingly random behavior is the key for understanding the difference between chaos and randomness.The relevance of the study and characterization of complex systems dynamics resides in the fact that if we understand it, we have powerful insights regarding how to gain control over several natural phenomena and non-deterministic processes [25].
Several mathematical models for the study and characterization of complex systems can be found in the literature: cellular automata [15], whose deterministic rules cause the emergence of complexity through the definition of a transition function, complex networks [32], in which the connection of nodes cause the emergence of non-trivial topological features and random fields [10], which are particularly suitable for modeling non-deterministic phenomena where non-linear interactions between random variables lead to the emergence of long range correlations and phase transitions [33].Many examples of the application of random field models arise in physics, especially in statistical mechanics and thermodynamics [19].
In complex systems dynamics, critical phenomena occur in critical states.In these states, the system is extremely susceptible to small perturbations.Phase transitions play an important role in complex systems.Usually, a phase transition is related to a change in the degree of order in the system and spontaneous symmetry breaking [8].Being able to predict whether a complex system is approaching phase transitions is relevant to many process in physics, biology and economics.In this context, one question that is raised by many researchers is: can we predict if a system is moving towards a phase transition?Or, in other words, can we predict an imminent order/disorder phase transition using information flow dynamics?In this paper, we propose an informationtheoretic approach for the analysis of pairwise Ising model dynamics using Fisher information regarding the inverse temperature parameter.
Our motivation for using information-theoretic measures to analyze complex systems dynamics is information geometry as a mathematical tool for the study of intrinsic geometric properties of parametric spaces associated with random variables [4,6].The main goal is to build computational tools in order to predict the emergence of complex behavior in non-deterministic dynamical systems.At the beginning of the dynamics, the inverse temperature parameter is zero and we have a collection of independent Bernoulli random variables.However, when the inverse temperature increases, the underlying parametric space suffers a series of geometric deformations.Fisher information is capable of bringing insights about this complex process, since it is directly related to the metric tensor of the parametric space, being the most important ingredient to the computation of geodesic distances [1,17].Statistical geometrodynamics have been proposed as a theoretical information geometric framework suitable to characterize chaotic dynamical behavior of arbitrary complex systems on curved statistical manifolds [14,11].Both information geometry and statistical geometrodynamics have been explored in the derivation of measures of chaos and complexity in physical systems [12,13,17].
In summary, the main contributions of this paper are: 1) to be a proof of concept that statistical geometrodynamics and information geometry theories can be employed to perform complex systems analysis in practice, through a feasible computational application using the pairwise Ising model; 2) to propose an original equation to estimate the inverse temperature in the pairwise Ising model through maximum pseudo-likelihood estimation, avoiding the computation of the partition function in the joint Gibbs distribution; 3) to derive expressions for the observed Fisher information regarding the inverse temperature parameter in the pairwise Ising model; 4) to propose an information-theoretic curve to analyze the pairwise Ising model dynamics, characterizing the asymmetric behavior of the system when visiting different entropic states, which induces the emergence of the hysteresis phenomenon [26], often associated to systems that exhibit phase transitions.It has been verified the existence of attractors in the Ising model in terms of physical properties (i.e., magnetization) in regular lattices [5] and Cayley trees [20].However, in this work, we derive a geometric based framework, build in terms of information-theoretic measures related to statistical divergences.Hysteresis has also been observed in Ising model dynamics in terms of physical properties [2,31], showing that information-theoretic analysis is capable of reproducing the same effects, but with a pure geometric interpretation.
The remaining of the paper is organized as follows: in Section 2, we define the pairwise Ising model and derive an original equation for the maximum pseudo-likelihood estimator of the inverse temperature parameter in this model.In Section 3, we discuss Fisher information, some properties and we derive expressions for the information-theoretic measures.Section 4 shows the computational simulations using Markov Chain Monte Carlo (MCMC) algorithms, the obtained information curves for complex systems modeled by pairwise Ising model dynamics.Finally, Section 5 presents the conclusions and final remarks.

The Ising model
Consider a 2D lattice composed by n sites, S = {s 1 , s 2 , ..., s n }, and a neighborhood system η i that defines an adjacency relation between s i and its ∆ nearest neighbors.In this study, we assume a second-order neighborhood system, that is, ∆ = 8, defining the Moore neighborhood.In each site s i we observe a random variable x i that assume values in {−1, +1}, representing the spin of a particle [28].A realization X = (x k ) k∈S is an assignment of a spin value to each s i , for i = 1, 2, ..., n.For any pair of adjacent sites s i , s j ∈ S there is an interaction J ij .Besides, a site s i have an external magnetic field h i interacting with it.The energy of the global configuration of spins X (Hamiltonian), is given by [16]: In the absence of an external field and assuming that all nearest neighbors have the same interaction strength, we have: It is possible to assign a probability to the global configuration pattern X through the Boltzmann distribution: where k B is the Boltzmann constant, T is the temperature and Z is the partition function, defined by the summation: It is worth mentioning that the computation of the partition function requires a sum over all possible realizations of X, which is prohibitive for large fields.Making the inverse temperature parameter β = 1/k B T and considering a ferromagnetic model (J = 1), we can write: Note that the summation over all bonds (i, j) for a given site s i is precisely the difference between the number of neighboring sites having x j = x i , denoted by U (x i ), and the number of neighboring sites having x j = x i , given by ∆ − U (x i ).Hence, we can express P (X) as: Denoting U ′ (x i ) = 2U (x i ) − ∆, we have: According to the Hammersley-Clifford theorem [21], which state the equivalence between Gibbs random fields (global models) and Markov random fields (local models), the pairwise Ising model can be defined by the set of local conditional density functions p(x i |η i , β), for i = 1, 2, ..., n, as: The advantage of the local conditional density functions is that we do not have to deal with the partition function, making inference on the pairwise Ising model mathematically tractable and computationally viable.Note that when the inverse temperature parameter is zero, we have a sample of n independent Bernoulli random variables with p = 0.5.

Maximum pseudo-likelihood estimation of the inverse temperature parameter
An important problem in the proposed information-theoretic analysis of the dynamics of the Ising model is the estimation of the inverse temperature parameter of the system, given an outcome of the model.Maximum likelihood estimation is not tractable due to the partition function in the joint Gibbs distribution.An alternative is to use maximum pseudo-likelihood estimation by means of the local conditional density functions, which are combined to build the pseudo-likelihood function [9].The pseudo-likelihood function for the Ising model is: Usually, it is simpler to maximize the logarithm of the pseudo-likelihood function: = Differentiating with respect to the inverse temperature parameter, we have: Note that from a statistical mechanics perspective, the solution for this equation provides the value of the inverse temperature for which the free energy (first term) is equal to the expected energy (second term).To solve the pseudo-likelihood equation, we propose an algorithm based on the identification of all different local contextual patterns that provide different contributions to the equation.Note that in a 2D Ising model defined on a second-order neighborhood system (∆ = 8), there are only five patterns that offer different contributions, as illustrated by Figure 1 (the integer partitions of 8 into two categories).We now identify what are the contributions to the second term of the pseudo-likelihood equation given by each one of the possible local configuration patterns.Note that for the five patterns, [8, 0], [7,1], [6,2], [5,3], and [4,4], we have: Knowing that the hyperbolic tangent function is defined by: and denoting by N [i,j] be the number of [i, j] local configuration patterns along the random field, we obtain the maximum pseudo-likelihood estimator for the inverse temperature parameter is by solving the following transcendental equation: In our simulations, we build a histogram of the pattern configurations at each iteration of the dynamics and numerically solve the equation in β to estimate the inverse temperature.

Information-theoretic measures
Information geometry provides the connection between geometry and statistics in a formal and elegant way.The key ingredient that unify these two seemingly unrelated areas is the concept of Fisher information.In the following, we discuss what is Fisher information and how we can compute it in the pairwise Ising model.

Fisher information
Let p(X; θ) be the likelihood function (probability density as a function of θ), where θ is the parameter of the model.The Fisher information, which is related to metric tensor of the parametric space, is defined by: It is known that Fisher information is the KL-divergence (relative entropy) in the limiting case.It has been shown that Fisher information has an important role in defining geodesic distances between different statistical models, since it defines the metric tensor of the parametric space [29].

The information equality
It is known that for several statistical models the information equality holds.For purposes of simplification we consider the uniparametric case, knowing that the extension to multiparametric models is quite natural.Let X be a random variable with a probability density function p(X; θ).Note that: By the product rule, we have: which, by applying the expectation operator and rearranging the terms, leads to: By the definition of expected value, the last term of the previous expression can be rewritten as: Under certain regularity conditions, it is possible to differentiate under the integral sign by interchanging differentiation and integration operators, which implies in: leading to the information equality condition.

Fisher information and curvature of the relative entropy
Fisher information is also directly related to the relative entropy, or KLdivergence [18].It has been shown that Fisher information is, in fact, the curvature of the relative entropy.Consider two distributions p(x; θ) and p(x; θ ′ ).Note that the KL-divergence is given by: The second derivative of the relative entropy at the point θ ′ = θ is: indicating that Fisher information encodes information about the curvature of the relative entropy.Another important connection between these two informationtheoretic measures is that Fisher information is the KL-divergence in the limiting case, that is, when we have two nearby densities p(x; θ) and p(x; θ + ∆ θ).
It has been shown that the KL-divergence between two infinitesimally close densities can be expressed by a quadratic form whose coefficients are given by the elements of the Fisher information matrix.

Fisher information as the KL-divergence in the limiting case
Another important connection between these two information-theoretic measures is that Fisher information is the KL-divergence in the limiting case, that is, when we have two nearby densities p(x; θ) and p(x; θ + ∆ θ).It has been shown that the KL-divergence between two infinitesimally close densities can be expressed by a quadratic form whose coefficients are given by the elements of the Fisher information matrix.First, recall that the symmetric KL-divergence between p(x) and q(x) is given by: Assuming we have a family of distributions parametrized by θ = (θ 1 , θ 2 , ..., θ k ) and θ ′ = (θ 1 + ∆θ 1 , θ 2 + ∆θ 2 , ..., θ k + ∆θ k ) where ∆θ i is an infinitesimal displacement, then: By definition, let the variation ∆p(x; θ) be: which leads to: Note that the argument of the logarithm can be expressed as: Using a Taylor approximation for the logarithm, for small values of x we have: Thus, we can write the following approximation: which leads to: Since ∆p(x; θ) is the arc length in the parametric space (manifold), we can express it in the tangent space as the dot product: Note that the above equation is the dot product between the gradient (local tangent coordinates) and the displacement vector ∆ θ.Hence, we can write: Then, we can express equation (30) as: where In the matrix vector notation, we have: where I( θ) is the Fisher information matrix.

Observed Fisher information
In some cases, the computation of the expected values in Fisher information can be a challenging task, or even unfeasible.To overcome this problem, we can compute the observed Fisher information: which can be estimated by: for θ = θ being the maximum likelihood estimator of the parameter.Similarly, using the information equality condition, we have: which can be estimated by: Furthermore, it has been shown that equations (36) and (39) are consistent estimators to the expected Fisher information, that is, asymptotically they converge to I(θ).

Fisher information and entropy in the Ising model
In the pairwise Ising model, we cannot derive a closed-form expression for the expected Fisher information, due to the impossibility to analytically compute the expected values involving the derivatives of the logarithm of the pseudolikelihood function.However, with the local conditional density functions, we can derive a closed-form expression for the observed Fisher information in this model to replace the expected information.The derivative of the local conditional density function with respect to the inverse temperature parameter is: Then, we have the following expression for I o (β), from now on denoted by Φ β (PHI): Similarly, we can compute the negative of the second derivative of the local conditional density functions to derive I o (β), from now on, denoted by Ψ β (PSI), as: It is straightforward to check that a local configuration pattern can provide different contributions to Φ β and Ψ β .Moreover, the condition for information equality can be summarized as: for k, l ∈ {−1, +1}.Note that, for the configuration pattern [4,4], the right hand side of equation ( 43) becomes zero, showing that it does not provide any contribution to the Fisher information regarding the inverse temperature parameter, since there is no prevalence of positive or negative spins in the neighborhood.

Entropy in the pairwise Ising model
To compute the entropy in in the Ising model using the local conditional density functions, we recall that it can be defined as the expected value of self-information, which leads to: Note that for β = 0, we have H(β) = log(2) = 0.69315, which means that the two states are equiprobable.Recall that in this case, we have a sample of independent Bernoulli random variables with θ = 0.5.The probability density function of a Bernoulli random variable is: The entropy is given by: which, for θ = 0.5, becomes: perfectly matching the value obtained by the pairwise Ising model.

Computational simulations
In this study, we consider two different scenarios for the information-theoretic analysis of the Ising model dynamics, denoted by: 1) known inverse temperature; 2) unknown inverse temperature.In the first case, we assume that we have direct access to the real inverse temperature parameter in the computation of entropy and Fisher information.In other words, there is no uncertainty in the definition of this important parameter.On the other hand, in the blind analysis, we observe the system's outcomes and the unknown inverse temperature must be estimated from data using maximum pseudo-likelihood.To simulate the pairwise Ising model dynamics, we employ the Metropolis-Hastings algorithm, a popular Markov Chain Monte Carlo method used to generate outputs of random fields, given the parameter values [27,22].In every simulation, the initial condition is composed by a n×n lattice of Bernoulli(1/2) samples, which is the degenerate distribution for the pairwise Ising model when the inverse temperature is zero.During the first half of the simulation, after the k-th iteration, a positive small increment of ∆β is added to the inverse temperature until we reach a maximum value β M AX .In the second half of the simulation, a small negative increment −∆β is added to the inverse temperature after each iteration, until we reach zero once again.In all experiments, a full cycle takes 600 iterations, with n = 512 and β M AX = 1.5.Figure 2 shows some samples of the random field during the evolution of the system.

The information curves of the Ising model
The proposed information curve for the analysis of the pairwise Ising model dynamics is build by a parametric curve defined as the sequence of vectors p β = (Φ β , Ψ β , H β ) in the positive (first) octant of the three dimensional space (R 3 ), obtained by varying the inverse temperature parameter from an initial value A to a final value B and back.Let X (β1) , X (β2) , . . ., X (βn) be a sequence outcomes from the pairwise Ising model produced by a MCMC simulation for which defines the information curve of the system from A to B. Therefore, we can define one cycle of the curve W as the concatenation of the information curves F B A (β) and F A B (β).In the following, we will see through several computational experiments, that the characteristic shape of this information curve for the Ising model dynamics, does not suffer significant changes even for large variations in the initial conditions.

Analysis of random field dynamics: known inverse temperature
In the first set of experiments, we consider that we have access to the real inverse temperature parameter, which is not a reasonable hypothesis in the analysis of real world systems.Figure 3 shows two different plots: 1) the variation of Φ(β) (blue curve) and Ψ (β) (red curve) along the MCMC simulation; and 2) the variation of H(β) (entropy) along the MCMC simulation.Looking at the graph of Φ(β), the the first local maximum point seems to be related to a rapid decrease of the system's entropy.In fact, this point signalizes that a complex phase-transition is emerging.Figure 4 shows the When the inverse temperature is zero, we are at the top of the blue curve and as we increase its value, we move along down to the state with minimum entropy.There is a natural orientation in the information curves along the evolution of the random field.In order to have a different visualization, Figure 5 illustrates that the use both forms of Fisher information defines a 2D curve for the Ising model dynamics.

Analysis of random field dynamics: unknown inverse temperature
In our second set of experiments, we assume a more realistic hypothesis: the inverse temperature parameter is not known a priori, that is, we have to estimate β for each one of the iterations of the MCMC simulation.The main Fig. 5 2D information curve built by combining Fisher information forms Φ(β) and Ψ (β).Note that the blue line refers to increasing inverse temperature and the red line refers to decreasing inverse temperature.The curve shows that the dynamics is highly asymmetrical in terms of variations in the spatial dependence structure.
difference with this scenario is that we have some degree of uncertainty regarding the inverse temperature parameter, which controls the spatial dependence structure of the random field.Figure 6 shows four different plots: 1) the variation of Φ(β) (blue curve) and Ψ (β) (red curve) along the MCMC simulation; and 2) the variation of H(β) (entropy) along the MCMC simulation.
Note that the maximum pseudo-likelihood estimator of the inverse temperature parameter, in average, underestimates the real value for large parameter values.However, the Fisher information and entropy curves does not suffer significant changes.It is worth mentioning that the two peaks in the Fisher information curves indicate that the system is undergoing phase-transitions.Figure 7 shows the 3D information curve and Figure 8 shows the 2D curve.
The analysis of the curves shows that from an information-theoretic perspective, the behavior of the system when it moves towards lower entropy states (β increases) is significantly different from the behavior of the system when it moves towards higher entropy states (β decreases).This effect is known as the hysteresis phenomenon [3] and basically suggests that the system has non-local memory [24] and undergoes a series of irreversible changes.In this sense, our model can be considered mathematical model of hysteresis.In summary, when temperature is infinite (β = 0) entropy reaches a maximum value (log 2) and the information equality prevails.From an informationtheoretic perspective, a reduction in temperature (increase in β) causes a series of changes in the parametric space, since its metric tensor (Fisher information) is changed in an apparently asymmetric non-reversible way, inducing the emergence of a natural orientation of evolution.
Finally, to show how the shape of the proposed information-theoretic curve does not suffer significantly changes along different pairwise Ising model dynamics, we performed three complete cycles in which the inverse temperature is gradually increased up to the maximum value β M AX = 1.5 and then slowly reduced to zero once again.Each cycle is composed by 600 iterations.Figures 9  and 10 show the obtained 3D and 2D information curves, where the first cycle is represented by the blue curve, the second cycle is represented by the red curve and the third cycle is represented by the green curve.Note that despite small perturbations, the characteristic shape of the curves is preserved in all cycles, indicating that Fisher information regarding the inverse temperature parameter is a robust measure in the analysis of the global system's behavior along the pairwise Ising model dynamics.Besides, it is clear the distinction between the two regimes of the system: the point denoted by A in Figure 10 denotes total lack of spatial dependence structure, that is, we have several independent Bernoulli random variables and a very noisy configuration in which entropy is maximum, while in B we have a high degree of spatial dependence, large clusters and a very smooth, low entropy configuration.

Conclusions
The Ising model is a mathematical structure that express complex behavior.In this work, we addressed the problem of analyzing the dynamics of a nondeterministic complex system modeled by pairwise Ising random field models by means of an information-geometric perspective.Investigations on the relation between entropy and Fisher information, led us to the definition of the information curve of the system.In summary, the information curve models the dynamics of random fields in a geometrical perspective, allowing us to predict whether the system is approaching a phase transition: large values of Φ(β) and Ψ (β) indicate that the system's dynamics is abruptly changing.Computational experiments with different initial conditions and parameter settings lead to the same pattern in the information space.We have observed that, when the random field moves along different entropic states, its parametric space is Fig. 8 2D information curve built by combining both forms of Fisher information.Note that when the inverse temperature is unknown, the shape of the curve is changed.However, in both cases, the phenomenon of hysteresis is observed.
actually being deformed by changes that happen in Fisher information.Our findings show that when temperature deviates from infinity, entropy decreases as a cause of variations in Fisher information.Moreover, moving towards lower entropy states is different from moving towards higher entropy states, since the information curves are not the same.This asymmetry induces a natural orientation to the information curve indicating that during the phase transitions our system has memory and experience irreversible changes.In this scenario, the proposed information-theoretic model can be considered a mathematical model of hysteresis in which the natural orientation is pointed by an arrow of time.One feature of the proposed model is that it does not rely on physical interpretations: we believe the proposed information-theoretic framework can be adapted to study the evolution of complex systems exhibiting local interactions along time, such as deterministic cellular automata models.Besides, with the proposed maximum pseudo-likelihood equation it is possible to estimate the unknown inverse temperature parameter in a computationally efficient way.Future works may include the study of other techniques for the estimation of the inverse temperature parameter, simulations of the Ising model dynamics on general non-regular graphs and the application of this information-theoretic model in other discrete random field models, such as the q-state Potts models, in which each cell assume one of q different states.Finally, we intend to study Gaussian-Markov random field model dynamics, where we have an infinite number of states for each cell.Figure 8 2D information curve built by combining both forms of Fisher information.Note that when the inverse temperature is unknown, the shape of the curve is changed.However, in both cases, the phenomenon of hysteresis is observed.
Information curves of the random eld dynamics.The three curves (blue, red, green) from three different cycles of the pairwise Ising dynamics show that the characteristic shape is invariant to changes in the initial conditions.
2D Information curves of the random eld dynamics.The characteristic shape of the information curves is invariant to changes in the initial conditions.Note also that in the information-theoretic space, the two regimes of the system are clearly identi ed by two distinct points A and B

Fig. 1
Fig. 1 Different local configurations patterns for the Ising model defined on a second-order neighborhood system.There are only five patterns that offer different contributions to the pseudo-likelihood equation for the Ising model on a 2D lattice with ∆ = 8.

Fig. 2
Fig. 2 Ising model dynamics along a Markov Chain Monte Carlo (MCMC) simulation.Evolution of the random field as the inverse temperature parameter β is first increased from zero to β M AX and then decreased to zero again.

Fig. 3
Fig. 3 Inverse temperature, Fisher information and entropy along Ising model dynamics field dynamics.In the Ising model dynamics, for some values of the inverse temperature parameter, the information equality does not hold.

Fig. 4
Fig. 4 Information curve of the random field dynamics knowing the true inverse temperature parameter.The 3D information curve was built by varying the inverse temperature parameter β from β M IN = 0 (state A) to β M AX = 1.5 (state B) and back.The results show that moving along different entropic states causes the emergence of a natural orientation in terms of information.

Fig. 6
Fig. 6 Estimated inverse temperature, Fisher information and entropy along random field dynamics.The estimated inverse temperature has an upper bound, making the behavior of Fisher information quite different than in the case where the true inverse temperature is known.

Fig. 7
Fig. 7 Information curve of the random field dynamics.The 3D curve was built by varying the inverse temperature parameter β from β M IN = 0 (state A) to β M AX (state B) and back.The results show that moving along different entropic states causes the emergence of a natural orientation in terms of information.

Fig. 9 3D
Fig. 9 3D Information curves of the random field dynamics.The three curves (blue, red, green) from three different cycles of the pairwise Ising dynamics show that the characteristic shape is invariant to changes in the initial conditions.

Fig. 10 2D
Fig.102D Information curves of the random field dynamics.The characteristic shape of the information curves is invariant to changes in the initial conditions.Note also that in the information-theoretic space, the two regimes of the system are clearly identified by two distinct points A and B.

Figures
Figures

Figure 1 Different
Figure 1

Figure 4 Information
Figure 4

Figure 6 Estimated
Figure 6

Figure 7 Information
Figure 7