Quantum-like Gaussian mixture model

A new concept of a quantum-like mixture model is introduced. It describes the mixture distribution with the assumption that a point is generated by each Gaussian at the same time. The quantum-like mixture Gaussian improves the classification accuracy in machine learning by indicating that the uncertain points should not be assigned to any class. It increases the accuracy of the mixture Gaussian model on the iris data set from 96.67 to 99.24%.


Introduction
We introduce a new concept of a quantum-like mixture model inspired by quantum physics. This concept describes the mixture distribution with the assumption that a point is generated by each Gaussian at the same time. Between different Gaussians interference is present that changes the shape of the mixture distribution. The shape of the distributions is changed in such a way that the probability of points between the Gaussians is lowered and in the center of the Gaussians is increased. A difference between the relative probabilities determined by the mixture Gaussians and the relative probabilities that are defined by quantum-like mixture Gaussians indicates the overlapping error that results from the striking distance to a decision boundary. The quantum-like mixture Gaussian improves the classification quality of a Gaussian classifier. Points that cannot be classified correctly, because they are too close to a decision boundary, are not considered.

Quantum probabilities
In quantum physics a probability p(x) of a state x as the squared magnitude of a probability amplitude A(x) is evaluated, which is represented by a complex number (1) This is because the product of complex number with its conjugate is always a real number. With A(x) * · A(x) = α 2 + β 2 = |A(x)| 2 .
Quantum physics by itself does not offer any justification or explanation beside the statement that it just works fine, see Binney and Skinner (2014). These quantum probabilities are as well called von Neumann probabilities.

Two-slit interference
Suppose there are two mutual exclusive events x and y. This means x and y do not occur together Wichert (2020). The classical probability of an event x or event y is just This is the sum rule for probabilities for exclusive events. For probability amplitudes, it is as well However, converting these amplitudes into probabilities according to Eq. 1 leads to an interference term 2 · (A(x) · A * (y)), |A(x) + A(y)| 2 = |A(x)| 2 + |A(y)| 2 + 2 · (A(x) · A * (y)) |A(x) + A(y)| 2 = p(x) + p(y) + 2 · (A(x) · A * (y)), making both approaches, in general, incompatible In other words, the summation rule of classical probability theory is violated, resulting in one of the most fundamental laws of quantum mechanics, see Binney and Skinner (2014). Suppose that an unobserved state evolves smoothly and continuously. However, during the measurement, it collapses into a definite state with a probability p(x) = |A(x)| 2 . For instance, let us imagine a gun that fires electrons and a screen with two narrow slits x and y and a photographic plate. An emitted electron can pass through slit x or slit y and reaches the photographic plate at the position z, which is equidistant from both slits. The electron detectors show from which slit the electron went through, and we find that the probability of the electron hitting the photographic plate is This probability means that, when measured, the electron behaved as a particle. On the other hand, if we remove the detectors, the electron is unobserved, not knowing through which slit it went through. Now, the electron is represented as a wave with the amplitudes a(y, θ 2 ) = p(y) · e i·θ 2 = A(y).
These amplitudes contain a parameter θ , which corresponds to the phase of the wave. The equation then becomes for most values of θ 1 , θ 2 . Since the norm is being positive or more precisely non-negative, the value are always bigger equal zero. Then, With we simplify to The amplitude is the root of the belief multiplied with the corresponding phase with the phase θ ∈ [0, 2 · π) Empirical evidence from the cognitive psychology literature suggests that most human decision-making cannot be adequately modeled using classical probability theory as defined by Kolmogorov's axioms (Tversky and Kahneman 1973;Tversky and Kahnenman 1974). These empirical findings show that, under uncertainty, humans tend to violate the expected utility theory and consequently the laws of classical probability theory [e.g., the law of total probability (Busemeyer et al. 2006(Busemeyer et al. , 2009Khrennikov 2009)], Busemeyer et al. (2011), Yukalov and Sornette (2011), Busemeyer et al. (2012, Busemeyer and Wang (2014), (Wichert and Moreira 2018;Wichert et al. 2020), Wichert (2020) leading to what is known as the "disjunction effect" which, in turn, leads to violation of the Sure Thing Principle. The violation results from an additional interference that influences the classical probabilities. We present new findings in which the interference is present in the quantum-like mixture of Gaussians. They describes the mixture distribution with the assumption that a point is generated by each Gaussian at the same time. The decision boundary of a quantum-like mixture Gaussian separates probabilities. This leads to a better interpretation of the switching Kalman filter and to an improvement of the classification accuracy in quantum-like machine learning Sergioli (2020), Sergioli et al. (2021).

Quantum-like mixture of Gaussians
The Gaussian distribution or normal distribution is defined as a where μ is the mean, σ is the standard deviation and σ 2 is the variance and ρ(x|μ, σ 2 ) represents the conditional relative probability. The Gaussian distribution or normal distribution is defined as a probability density function (PDF) that reflects the relative probability. The PDF can be bigger one for small σ since it is the area under the PDF that represents the probability. However, the PDF reflects the relative probability ρ (rho) and in the following we will write ρ(x|μ, σ 2 ) to indicate the relative probability. The Gaussian distribution can be generalized to a D- where µ is the D-dimensional mean vector, is a D × D covariance matrix and | | is the determinant of . A Gaussian mixture distribution is a combination of multiple Gaussians where each has a certain weight with 0 ≤ π k ≤ 1, K k=1 π k = 1.
With the Gaussian relative amplitude α with parameter θ A quantum-like Gaussian mixture model is defined as

Multiplication of Gaussian amplitudes
For multiplication of Gaussians by two, it follows In the next step, we interpret the multiplication of Gaussian amplitudes ρ(x|μ 1 , σ 2 1 ) · ρ(x|μ 2 , σ 2 2 ). We will follow the approach as described in Freier (2013). Given · e a 2 = c 1 · e c 2 · 1 2 · π · σ 2 1 4 · e a it follows We assume: and it follows and and using the equality we get We know that e a 1 +a 2 = e c 2 +a 1 2 · π · σ 2

Interference waves
The quantum-like Gaussian mixture model is defined as since we can simplify to θ = θ 1 − θ 2 with The interference part value depends on θ , for θ = 0 we have the maximal interference and for θ = π the minimal interference. It means that However it should be which is present with θ = π/2 resulting in no interference, since cos(π/2) = 0. We can satisfy Eq. 38 for not fixed θ values. By the linear change of θ , the interference part becomes a wave canceling it out so that Eq. 38 is valid. A wave is determined by the phase and the period (wave length). The frequency f is the inverse of the period p, with the interference represented as a wave √ π 1 · π 2 · c 1 · e c 2 · 1 2 · π · σ 2 1 4 · e a · cos( f · x).
For frequency f → ∞, the wave of Eq. 38 is satisfied with In the case f → ∞, the relative probability ρ(x, x · f ) of x is either ρ(x, 0) or ρ(x, π), and we are uncertain about the value.
With a finite value f , the value becomes certain. Figure 2 indicates an example with f = 10.
The shape of the Gaussian mixture changes in such a way that the the two individual Gaussians become separated with a frequency defined by the two mean values f = 2 · π |μ 1 − μ 2 | and the phase by the smaller mean value μ min small probability values between the two Gaussians. The shape of the Gaussian mixture changes in such a way that the the two individual Gaussians become separated with  of the two Gaussians with μ 1 = 3, σ 1 = 3 and μ 2 = −3, σ 2 = 2 with equal mixture coefficients π and the corresponding Gaussian mixture with equal mixture coefficients π

Generalizations for several Gaussian
A quantum-like Gaussian mixture model is defined as with and the phase by the smaller mean value μ min Fig. 2 a Quantum-like Gaussian mixture with the interference wave with the frequency f = 10 of the two Gaussians with μ 1 = 3, σ 1 = 3 and μ 2 = −3, σ 2 = 2 with equal mixture coefficients π . b Representation at a higher scale of the two Gaussians with μ 1 = −2, σ 1 = 1 and μ 2 = −1, σ 2 = 1 with equal mixture coefficients π and their Gaussian mixture Figure 5 shows an example of quantum-like Gaussian mixture of four Gaussians with small probability values between neighbor Gaussians.
We will follow the approach as described in Freier (2013). Given with (as before) The values of c 2 and c 1 are (2 · π) D/4 The shape of the Gaussian mixture changes in such a way that the two individual Gaussians become separated with a frequency defined by the two mean vectors with small probability values between the two Gaussians. The shape of the Gaussian mixture changes in such a way that the two individual Gaussians become separated with Figures 6 and 7 show examples of quantum-like Gaussian mixture with small probability values between the two-dimensional Gaussians with µ 1 = (0, −2) T , 1 = 2 0.5 0.5 1 and µ 2 = (−2, 0) T , 2 = 2 0 0 2 with equal mixture coefficients π .

Generalizations for several Gaussian
A quantum-like Gaussian mixture model is defined as with the normed direction vector

Switching Kalman filter
Kalman filter is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables Russell and Norvig (2003). It estimates joint probability distribution over the variables for each timeframe. Kalman filter makes a linear prediction. It assumes that the variables as well as the errors are random and Gaussian. A bird is heading at a high speed for a tree trunk, and the mean Gaussian as predicted by the Kalman filter will be centered on the obstacle, see Fig. 10. A bird would predict evasive action on one side or the other. A switching Kalman filter marginalizes the action of the bird; in our case, two different movements are described by the Gaussians. The marginalization is performed by the law of total probability resulting in the Gaussian mixture of the two Gaussians. The Gaussian mixture has a high probability values at the center of the obstacle. The quantum-like Gaussian mixture with the interference wave separates the two Gaussians, and the resulting probability value is low or nonpresent values at the center of the obstacle, see Figs. 10 and 11.

Gaussian mixture model
A Gaussian mixture model (GMM) is a probabilistic clustering model assumes that all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters Bishop (2006). It incorporates information about the covariance structure of the data as well as the centers of the latent Gaussians. We will use the Gaussian mixture model for classification of the iris data set by comparing the obtained clusters with the actual classes from the data set. It contains 3 classes (Iris-Setosa, Iris-Versicolor, and Iris-Virginica ) of 50 instances each, where each class refers to a type of iris plant. Each instance is described by a four dimension vector the sepal and petal length and width Géron (2017). The use of this data set in cluster analysis is not common, since the data set only contains two clusters with rather obvious separation. One class is linearly separable from the other 2; the latter are not linearly separable from each other. As a result, two clusters are present. One of the clusters contains Iris-Setosa, while the other cluster contains both Iris-Versicolor and Iris-Virginica and is not separable. We will use the whole data set for learning and test since our task is to demonstrate the advantage of quantum-like mixture model for an unsupervised classifier. Between different Gaussians, interference is present that changes the shape of the quantum-like mixture distribution. A difference between the relative probabilities determined by the mixture Gaussians and the relative probabilities that are defined by quantum-like mixture Gaussians indicates the overlapping error that results from the striking distance to a decision boundary. For a Gaussian classifier, this area corresponds to the points that are close to the decision boundaries. Because of the resulting uncertainty, the points shouldn't be assigned to any of the present classes.

Iris data set
In the first experiment we only take the first two features, sepal length and width, and plot the results in a two-dimensional coordinate system. We use the Scikit-learn Géron (2017) implementation to estimate Gaussian mixture model with full covariance matrices with following learned parameters We use the model for classification of the iris data set by comparing the obtained clusters with the actual classes from the data set, see Fig. 12. In Figs. 13 and 14 we indicate the resulting Gaussian mixture of the three Gaussians. One class is linearly separable from the other 2; the latter are not linearly separable from each other. As a result, two clusters are present with rather obvious separation. One of the clusters contains Iris-Setosa, while the other cluster contains both Iris-Versicolor and Iris-Virginica as indicated by the by the Gaussian mixture. The quantum-like Gaussian mixture separates the two clusters Iris-Versicolor and Iris-Virginica to some extent.
We use the model for classification of the iris data set by comparing the obtained clusters with the actual classes from the data set with the accuracy of 52.67%. The difference between the relative probability determined by the  Fig. 10 a A bird is heading at a high speed for a tree trunk, and the mean Gaussian as predicted by the Kalman filter will be centered on the obstacle. b A switching Kalman filter marginalizes the action of the bird; in our case two different movements are described by Gaussians. However, the resulting Gaussian mixture has a high probability values at the center of the obstacle if the two Gaussians. c The quantum-like Gaussian mixture with the interference wave separates the two Gaussians; the resulting probability value at the center of the obstacle is very low or zero mixture Gaussian ρ(x) and the relative probability ρ q (x) that is defined by quantum-like mixture Gaussian indicates the overlapping error that results from the striking distance of a decision boundary with indicating that x is too close to the decision boundary. Because of the resulting uncertainty, the point is not assigned to any of the present class represented by the Gaussian classifier. Through this constraint, 42 points were not assigned to any of the present class and the accuracy increased to 59.26%. With the constraint that the relative probability determined by the mixture Gaussian ρ(x) should be above a certain threshold, an optimum was reached for the threshold ρ(x) > 0.04, 8 points were not assigned to any of the present class, and the accuracy was only slightly improved 54.93%.
In the following experiment, we use all four features, the sepal and petal length and width. We use the Scikit-learn implementation to estimate Gaussian mixture model with full covariance matrices with following learned parameters  Fig. 11 a Two Gaussians with μ 1 = −2 , σ 1 = 2 and μ 2 = −2, σ 2 = 2 representing the two different movements of the bird. b Gaussian mixture with equal mixture coefficients π representing the switching Kalman filter. The resulting Gaussian mixture has a high probability values at the center corresponding to the obstacle. c Quantum-like Gaussian mixture with the interference wave with θ = (x−μ min 1,2 )·2·π |μ1−μ2| of the two Gaussians with μ 1 = −2, σ 1 = 1 and μ 2 = −1, σ 2 = 1 with equal mixture coefficients π . The resulting probability value of the center of the obstacle is low We use the model for classification of the iris data set by comparing the obtained clusters with the actual classes from the Fig. 12 Classification of the first two features sepal length and width of the Iris data set by the Gaussian mixture model data set with the accuracy of 96.67%. The difference between the relative probability determined by the mixture Gaussian ρ(x) and the relative probability ρ q (x) that is defined by Fig. 13 a The three Gaussians. b Gaussian mixture of the three Gaussians. c Quantum-like Gaussian mixture with the interference wave with . The quantum-like Gaussian mixture separates the two clusters Iris-Versicolor and Iris-Virginica to some extent b Contour plot of quantum-like Gaussian mixture of the three Gaussians. The quantum-like Gaussian mixture separates the two clusters Iris-Versicolor and Iris-Virginica to some extent quantum-like mixture Gaussian indicates the overlapping error that results from the striking distance of a decision boundary with ρ(x) − ρ q (x) ≥ 0.002.
indicating that the x is too close to the decision boundary. Because of the resulting uncertainty, the point is not assigned to any of the present class represented by the Gaussian classifier. Through this constraint, 19 points were not assigned to any of the present class and the accuracy increased to 99.24%. With the constraint that the relative probability determined by the mixture Gaussian ρ(x) should be above a certain threshold, an optimum was reached for the threshold ρ(x) > 0.079, 26 points are not assigned to any of the present class, and the accuracy was only slightly improved 96.77%. The quantum-like mixture Gaussian can improve the classification accuracy by indicating that the uncertain points should not be assigned to any class.

Conclusion
A new concept of a quantum-like mixture model was introduced. It describes the mixture distribution with the assumption that a point is generated by each Gaussian at the same time. It was shown that the decision boundary of a quantum-like mixture Gaussian corresponds to the separation of probabilities for the switching Kalman filter. The quantum-like mixture Gaussian improves the classification accuracy in machine learning by indicating the uncertain points should not be assigned to any class. A difference between the relative probabilities determined by the mixture Gaussians and the relative probabilities that are defined by quantum-like mixture Gaussians indicates the overlapping error that results from the striking distance to a decision boundary. For a Gaussian classifier this area corresponds to the points that are close to the decision boundaries. Because of the resulting uncertainty, the points shouldn't be assigned to any of the present classes. We demonstrated the principle by the quantum-like Gaussian mixture model for classification of the iris data set. The use of this data set in cluster analysis is not common, since the data set only contains two clusters with rather obvious separation. The accuracy of the mixture Gaussian model increased on the iris data set from 96.67% to 99.24%.