A note on negation of a probability distribution

Evaluating the negation of an uncertain event is an open issue. Yager (IEEE Trans Fuzzy Syst 23:1899–1902, 2004) suggested a transformation for evaluating the negation of a probability distribution. He used the idea that any event whose outcome is not certain can be negated by supporting the occurrence of other events with no bias or prejudice for any particular outcome. Various authors have tried to generalize the negation transformation proposed by Yager (IEEE Trans Fuzzy Syst 23:1899–1902, 2004). However, we need to focus on developing the basic structure of negation so that the behaviour of the process modelled by negation transformation can be understood in detail. Yager’s negation is based on distribution of maximum entropy. If a probability distribution is uncertain(a state other than maximum entropy), the more the iterations of negation, the more uncertain this probability event becomes, eventually converging to a homogeneous state, i.e. maximum entropy. In other words, it is the realization of the process. What is noted that during each negation, Yager’s method ensures that the negation is intuitive; the next negation weakens the probability of the event occurring in the previous step. Since negation involves reallocation of probabilities at each step in such a way that the reallocation at each step can be determined from the reallocation at the previous step, it is clear that Yager’s negation has various attributes similar to that of a Markov chain. In the present work, we have shown that Yager’s definition of negation can be modelled as a Markov chain which is irreducible, aperiodic with no absorbing states. Two examples have been discussed to strengthen and support the analytical results. Also, we have defined an information generating function (IGF) whose derivative evaluated at specific points gives the moments of the self-information of negation of a probability distribution. The properties of the generating function along with its relationship with the information generating function proposed by S. Golomb (IEEE Trans Inf Theory 12:75–77, 1966) have been explored. A closer look at the properties of IGF confirms the existence of Markovian structure of Yager’s negation.


Introduction
Affirmation and negation are two key concepts in various forms of human communication. An affirmation form is generally used to express the validity or truth of a assertion, whereas the negative form determines its falsehood. In classical logic, if a statement P is TRUE, then its negation ∼ P is FALSE, and if a statement P is FALSE, then its negation ∼ P is TRUE. The negation gives a different perspective for any happening in either society or nature. Two persons can examine the same situation from distinct (positive and negative) perspectives, and both may have reasonable justification for it. For example, various psychological studies outline four approaches for regulating behaviour based on the repercussions and desired objective-positive reinforcement, negative reinforcement, positive punishment, and negative punishment (see Table 1).
Rewarding your child so that he/she performs an activity that is any case expected from him/her is an example of positive reinforcement, whereas the seat belt reminders installed in cars are examples of negative reinforcement (the irritating sound stops when you perform the desired behaviour). Scolding a student when he/she misbehaves in school is an example of positive punishment, whereas taking away privileges by parents when children misbehaves proves that negative punishment can be an effective discipline strategy. Both reinforcement and punishment have notable drawbacks, but it solely depends on the individual how he/she uses any of the above approaches to his/her benefit. Anything negative does not necessarily mean unpleasant or unacceptable; it represents the opposite side of different aspects of life. This opposite side of life may be fascinating, annoying, and it will have lots of uncertainty inherent in it. Also, the whole idea of any assertion being either true or false is very restrictive. In our daily lives, we experience situations which are neither true nor false; in fact, we come across so many events which have uncertainty inherent in it. An event or a sequence of events whose occurrence is not guaranteed cannot be expressed via true/false logic. The probability theory has been quite effective in handling such situations. How to express the negation (opposite side) of an uncertain event has been a matter of discussion for so many years.If the happening of an event is uncertain, then we can oppose (negate) it by using its probability. Keeping this in mind, Yager (2014) gave the basic framework of negation of a probability distribution. The negation proposed by Yager is basically an unbiased reallocation of probabilities. Many studies focus on determining how much uncertainty/knowledge is embedded in negation of a probability distribution. Various authors (Srivastava and Maheshwari, 2018;Srivastava and Kaur 2019;Srivastava and Tanwar 2021) have shown that more uncertainty (information) is embedded in the negation of a probability distribution. However, we need to focus more on the underlying mathematical structure of negation and its properties which can increase its applicability in various domains. We consider the probability distribution P(3) = ( p 1 , p 2 , p 3 ) such that p 1 + p 2 + p 3 = 1 and 0 ≤ p i ≤ 1; i = 1, 2, 3. For simplicity, we consider the degenerate distribution P(3) = (1, 0, 0). The negation of P(3) is given as P(3) = (0, 0.5, 0.5) (see Fig. 1). While determining the negation, p 1 = 1 is equally distributed among the second and third components, which signifies that by negating (opposing) the occurrence of the first event (with probability 1), we are supporting the occurrence of the second and third events without any bias. Therefore, the probability p 1 = 1 is equally distributed among the probability of the second and third events. The second and third probabilities p 2 = 0 and p 3 = 0 could not contribute anything to the negation. In other words, the support that the occurrence of first event had in the original distribution has been equally divided among the second and third events. Applying the negation transformation again, we obtain p(3) = (0.5, 0.25, 0.25). Here, the first entry p 1 is a result of equal contributions of 0.25 and 0.25 from p 2 and p 3 , respectively. Again, the second entry p 2 is a result of equal contributions of 0 and 0.25 from p 1 and p 3 , respectively. Similar is the case with the third entry. Here, it is interesting to note that whatever support we have for . But the support for P(3) can be easily determined from the support for P(3), ignoring the support for various events in P(3). Therefore, for determining the distribution at the second iteration, one needs knowledge of distribution at first iteration only (and not of the original distribution). Similar will be the case for further iterations of the negation transformation. This shows that the negation transformation proposed by Yager has various attributes identical to that of a Markov chain. One more question that immediately arises is that whether there exists any generating function associated with the negation transformation. For a discrete finite complete probability distribution P(n) = ( p 1 , p 2 , . . . , p n ), S. Golomb (1966) introduced the information generating function (IGF) given as where H (P(n)) is the well-known Shannon entropy function (Pal and Pal 1991;Shannon 1948). On further differentiating (k − 1) times, we will get the k th moment of the self-information embedded in P(n). Various authors have tried to generalize the negation transformation proposed by Yager (2014). Zhang et al. (2020) proposed the negation of probability distribution based on Tsallis entropy that degenerates into Yager's negation. The concept of negation is widely applied in various fields. In Liu et al. (2020), negation of Z-numbers is proposed. Researchers applied the concept of negation in D-S evidence theory also by defining negation of BPA based on Pythagorean fuzzy numbers Mao and Cai (2020), maximum uncertainty allocation Deng and Jiang (2020), reallocation Yin et al. (2018), matrix method Luo and Deng (2019) and belief interval approach Mao and Deng (2021) and applied that in service supplier selection system Mao and Cai (2020), medical pattern recognition Mao and Deng (2021), decision-making Xiao (2021) and many more. In the present work, we have investigated the properties of Markov chain that is embedded in the negation of a probability distribution. Some illustrative examples have been considered which show obvious correlation between the Markov chain and negation transformation. Also, we have proposed an information generating function whose derivatives at specific points give the moments of the self-information (information content) embedded in the negation of a probability distribution. The discussed examples clearly indicate that the proposed generating function has evident connection with the information generating function proposed by S. Golomb (1966).

Negation of a probability distribution
Let X = {X 1 , X 2 , . . . , X m } be the frame of discernment(FOD), the set of all possible hypothesis under consideration and let, P(n) = ( p 1 , p 2 , . . . , p n ) be a discrete finite complete probability distribution defined on X with p i ∈ [0, 1] for i = 1, 2, . . . , n and n i=1 p i = 1. The negation of probability distribution proposed by Yager can be written as the set P(n) = ( p 1 , p 2 , . . . , p n ), where which can be further written in matrix form as Here, the probabilities in the set P(n) satisfy 0 ≤ p i ≤ 1 and n i=1 p i = 1. Yager (2014) specified that there can be many distinct negations embedded in a probability distribution and the above is the one that has the maximum entropy allocation among all the possible negations. In particular, let If the probabilities in P(n) are all equal, then all entries of Q(n) are also equal. Further, the pooled opinion of P(n) and Q(n) (Convex combination of P(n) and Q(n)) should not reflect any additional knowledge about the occurrence of events.
preserve the underlying mathematical structure of the probability distribution P.
For better understanding of third condition, consider a ran- 10,11,12,13,14) with probabilities P (5) The expectation of X is given as The Yager's negation of P (5) is Here, the negation transformation has preserved the symmetry of P(5) since the probabilities are redistributed among all the alternatives equally (basic structure of P(5) remains unaltered). If the rearrangement is biased, i.e. we distribute more to some and less to others, then the symmetry may get disturbed. Repeatedly applying the negation transformation on P(n) will yield the probability distribution (0.2, 0.2, 0.2, 0.2, 0.2).

Markov chain
Markov chain gives a mathematical framework which characterizes transitions from one state to another using some probabilistic rules. Markov chains are the stochastic processes for which the description of the present state fully captures all the information that could influence the subsequent developments of the process. Mathematically, a stochastic process X = {X n : n ≥ 0} on a countable set S is a Markov Chain if, for any i, j, k 1 , k 2 , . . . k n−1 ∈ S and n ≥ 0, P(X n+1 = j|X n = i, X n−1 = k 1 , X n−2 = k 2 , . . . , X 0 = k n−1 ) = P(X n+1 = j|X n = i) = p i j Here, p i j is the probability representing transitions from state i to state j. These transition probabilities sum up to 1 i.e. n j=1 p i j = 1.
for each i ∈ S. In general, we list all the transition probabilities in a matrix. This matrix is called the stochastic matrix or state transition matrix or transition probability matrix. For n states, the transition probability matrix is a n × n square matrix given as The above matrix can be right stochastic (each row summing to one) or doubly stochastic (each row and each column summing to 1). Moreover, the above matrix represent one step transitions only. The probability of transition from state i to state j in m steps is represented as p m i j = P(X n+m = j/X n = i). The m-step transition probabilities can be evaluated by multiplying P with itself m times. Further, a Markov chain is said to be irreducible if transitions are possible between every pair of states(in a finite number of steps) with positive probability. Also if the return to a particular state occurs at equal intervals of time, then that state is said to be periodic, otherwise aperiodic. A Markov chain is said to be aperiodic if all its states are aperiodic. Here, it is worth mentioning that if one of the states in an irreducible Markov chain is aperiodic, then all the other states are also aperiodic. Irreducibility and aperiodicity properties are important for characterizing the ergodicity of a Markov chain. There is possibility that once we reach any particular state in a Markov process, then it is impossible to leave that state i.e. p ii = P(X n+i = i/X n = i) = 1. Such states are called absorbing states. A Markov chain is an absorbing Markov chain if (a) there is at least one absorbing state among all the states characterizing the Markov chain; and (b) it is possible to go from any state to at least one absorbing state in a finite number of steps. A state which is not absorbing is called a transient state. Further, as the time index approaches infinity, some Markov chains may exhibit a steady-state behaviour. The steady-state distribution of a Markov chain is generally represented as a row vector π whose entries sum to one and satisfies π P = π , P being the transition probability matrix. The stability of a random process can be determined using steady-state distribution, and in some cases, it describes the limiting behaviour of the Markov chain. An example of doubly stochastic matrix is permutation matrix which is very important from the combinatorial point of view. Given a permutation f of k elements, f : {1, 2, . . . , k} → {1, 2, . . . , k} which can be represented as a permutation matrix is a k × k matrix P f = ( p i j ); i, j = 1, 2, . . . , n obtained by setting p i j = 1 if j = f (i) and p i j = 0, otherwise for all i = 1, 2, . . . , n ; or alternatively p i j = 1 if i = f ( j) and p i j = 0, otherwise for all j = 1, 2, . . . , n. It is possible to define the negation of a probability distribution via the entries of a doubly stochastic matrix. Consider a n ×n doubly stochastic matrix A = (a i j ) i.e. n i=1 a i j = n j=1 a i j = 1.
In particular, if a ii = 0 for all i = 1, 2, . . . , n and if a i j = 1 n−1 for all i = j; i, j = 1, 2, . . . , n, then given a probability distribution P(n) = ( p 1 , p 2 , . . . , p n ), we can define its negation P(n) = ( p 1 , p 2 , . . . , p n ) as for all i = 1, 2, . . . , n. Since the elements in negation of a probability distribution can be represented in terms of entries of a doubly stochastic matrix and there are many Markov chains where the transition probability matrix is doubly stochastic, it is clear that the underlying mathematical structure of negation can be modelled in a Markov chain. In the next section, we will discuss some examples which will validate the above discussion.

Negation and Markov chain
Let Y n represent the sum of n independent rolls of a fair die and O i be the outcome on i th die, i = 1, . . . , n. Further, let X n denote the remainder when Y n is divided by 7. Then Y n = O 1 + O 2 + . . . + O n = Y n−1 + O n and X n = (X n−1 + O n mod7)mod7 will represent a Markov chain with states 0,1,2,3,4,5,6. As O n can take values 1, 2, . . . , 6; O n mod7 cannot be zero. So transition from one state to itself is not possible, and thus probability of transition from a state to itself is always zero. The probability of transition from state any state i to state j, j = 0, 1, . . . , 6, j = i is same as probability of getting outcome O n as 1, 2, . . . , or 6, i.e. 1 6 . Thus, transition probability matrix is given by Now suppose we obtain a 3 in first roll of the die. Then, the initial distribution is given by P(1) = P(X 1 = 0) P(X 1 = 1) P(X 1 = 2) P(X 1 = 3) P(X 1 = 4) P(X 1 = 5) P(X 1 = 6) = 0 0 0 1 0 0 0 Here, P(1) represents the initial probability vector. The probability distribution after one transition is given as P(2) = P(X 2 = 0) P(X 2 = 1) P(X 2 = 2) P(X 2 = 3) P(X 2 = 4) P(X 2 = 5) P(X 2 = 6) = P(1).P It is clear that P(2) is the negation of P(1). Similarly, P(3) will be the negation of P(2) and so on. It is interesting to note that the transition probability matrix P in the above example basically represents the negation transformation proposed by Yager. The matrix P is a doubly stochastic transition probability matrix on seven states 0, 1, 2, 3, 4, 5, 6 and since it is regular (P 2 has only strictly positive entries), the limiting distribution is given as For a probability distribution P(n) = ( p 1 , p 2 , . . . , p n ; j = 0, 1, 2, 3, 4, 5, 6 i.e. all the probabilities after the second roll of the die are bounded in the interval [0, 1 6 ]. Similarly, we have ; j = 0, 1, 2, 3, 4, 5, 6, i.e. all the probabilities after the third roll of the die are bounded in the interval [ 5 36 , 1 6 ]. Similarly, we can obtain bounds for further probabilities. Uncertainty increases on repeatedly applying the negation i.e., uncertainty embedded in P(n) = ( p 1 , p 2 , . . . , p n ) is less than the uncertainty embedded in P(n) = ( p 1 , p 2 , . . . , p n ) and so on. Therefore, using the well-known Shannon entropy function As the time index approaches infinity, uncertainty also approaches its maximum (Srivastava and Maheshwari, 2018;Srivastava and Kaur 2019;Srivastava and Tanwar 2021) and the probabilities gets constrained in intervals that keep on shrinking. Figure 2 shows after every transition, the process appears to move on the opposite side of the limiting distribution, i.e. it flips back and forth in orientation. This is obvious because every transition is actually representing the negation of the probability vector at each step. For better understanding, we take another example. Consider an ant performing a random walk on vertices of a complete graph K 4 (vertices of a tetrahedron). We assume that the ant begins at any of the four vertices taken at random (say A) and at each time step moves to another vertex. Also, we assume that amount of time ant takes in turning is negligible as compared to the time ant takes travelling between the vertices. Considering the graph as undirected and unweighted, the vertex the ant moves to is chosen uniformly at random among the neighbours of the present vertex. This random walk can be modelled as a Markov chain which is irreducible, aperiodic and whose transition probability matrix (TPM) can be written as Whatever vertex the ant starts from, it can move to any of the neighbouring vertices with probability 1 3 . The matrix P clearly represents the transitions associated with the negation transformation. If the ant starts at vertex A(say), then the initial probability vector is P(1) = (1, 0, 0, 0) and probability vector at the next step is the negation of P(1) i.e. P(2) = P(0).P = (0, 1 3 , 1 3 , 1 3 ). Similarly, we can obtain P(2),P(3) . . . by applying the negation transformation again and again. Here, P(2) can be viewed as how one can oppose the event of ant being at vertex A. (In the absence of any external information, we can assume ant being at vertices B, C and D with equal probabilities.) Also, the above random walk has a stationary distribution (in this case limiting also) given as ( 1 4 , 1 4 , 1 4 , 1 4 ). Here, stationary distribution indicate that as the number of transitions approach infinity, the probability of the ant being at vertices A, B, C and D becomes identical. Clearly, the movements in the random walk are uncorrelated and unbiased. Unbiased means that the ant explores every possible direction with equal probability, i.e. there is no preferred direction. Further, uncorrelated means that the direction of movement at each step is independent of the previous directions moved; the location at each step is dependent only on the location in the previous step. We now summarize as follows.
The Markov chain used to characterize negation of a probability distribution typically has the following attributes.
1. The transition probability matrix used to represent negation of P(n) = ( p 1 , p 2 , . . . , p n ) is a n × n doubly stochastic matrix with all diagonal entries 0 and all off-diagonal entries equal to 1 n−1 . 2. All the states are transient (no absorbing state)so that p i j < 1 for all i and j. 3. There exists a positive integer N such that P N has no zero entries, which implies that each state may be reached from every other state in N transitions indicating that the Markov chain is regular. 4. The stationary distribution is uniform, since the Markov chain is irreducible and aperiodic. 5. If we alter any two (or more than two) entries of the initial probability vector, then the probability vector at the subsequent iterations will get altered at those two positions only, the rest will remain unchanged. In the above example if P(1) = 0 − δ 0 + δ 0 1 0 0 0 then P(2) = 1 6 − δ 1 6 + δ 1 6 0 1 6 1 6 1 6 and so on. Mathematically, we can represent negation by the recurrence relation

IGF without any bias
We again consider a discrete finite complete probability distribution P(n) = ( p 1 , p 2 , . . . , p n ) and its negation P(n) = ( p 1 , p 2 , . . . , p n ). Then, the information generating function(IGF) corresponding to P(n) can be defined as Here, t is a real(or complex variable). Clearly, I 1 (P(n)) = 1 and since 0 ≤ p i ≤ 1 for all i ⇒ 0 ≤ 1− p i n−1 ≤ 1 n−1 ⇒ 0 ≤ p i ≤ 1 n−1 for all i; therefore, (4) is convergent for all t ≥ 1. Further, we can write where H P(n) is the Shannon entropy of negation of a probability distribution. On further differentiating (k − 1) times, we obtain (−1) k ∂ k ∂t k I t (P(n)) t = 1 = (−1) k n i=1 p i log k p i which represents the k th moment of the self-information embedded in P(n).
1. If all the entries of P(n) = ( p 1 , p 2 , . . . , p n ) are equal then all the entries of P(n) = ( p 1 , p 2 , . . . , p n ) are also equal i.e. if p i = 1 n ∀i then p i = 1 n ∀i. In this case I t (P(n)) = n 1−t = I t (P(n)) ; t ≥ 1 which gives − ∂ ∂t I t (P(n)) t = 1 = log n = − ∂ ∂t I t (P(n)) t = 1 2. Suppose we add k events with zero probability in the probability distribution P(n) = ( p 1 , p 2 , . . . , p n ), then the revised probability distribution is P(n + k) = ( p 1 , p 2 , . . . , p n , 0, 0, . . . , 0 k−times ). In this case, the IGF given by (1) remains unaltered since events with zero probability does not provide any information regarding the occurrence of events (Pal and Pal 1991;Shannon 1948). However, the revised negation in this case is