Weighted Means of Likelihood Ratio as Indices Measuring Similarity Between Densities


 Criteria for similarity between probability density functions are important in the field of statistics such as density estimation. In this short paper, a set of indices measuring similarity between probability densities is proposed using the weighted means of the likelihood ratio function. Numerical simulations demonstrate that the estimates of these indices are easily obtained from observations and could be useful for both parametric and nonparametric density estimation with numerical optimization.


Introduction
Criteria for similarity between probability density functions are important in the field of statistics such as density estimation, goodness-of-fit test, etc. Today, the supremum norm, the integrated absolute error (IAE, L 1 -norm in other words), the integrated squared error (ISE, equivalent to the squared L 2 -norm), the likelihood ratio function, the Kullback-Leibler distance, the Hellinger distance, and other tools are ready for practical applications (Scott, 2015). Although these tools are useful enough, there could be another one since the choice of optimality criteria may be somewhat subjective. In this paper, a set of indices measuring similarity between probability densities is proposed, and numerical examples demonstrate that the estimates of these indices are easily obtained from observations and useful for both parametric and nonparametric density estimation with numerical optimization.

Methods
We can deduce from the high utility of the likelihood ratio that the set of weighted means I 1 (g; f ), · · · I n (g; f ) of the form (1) can be used to evaluate the similarity between two density functions f and g, where n is a positive integer, F is the cumulative distribution function corresponding to the density f , and b k is the density function of the beta distribution with shape parameters k and n−k+1. Indeed, I k (g; f ) = 0 for all k = 1, · · · , n when g = f almost everywhere. Moreover, if we assume that the approximation with the small error term ε(y) is available with real coefficients α k by virtue of sharp localized shape of b k for large n, the similarity between f and g would be inferred from the values of I k (g; f ). That is to say, if there exist small positive real numbers ε 0 and δ 0 such that sup y |ε(y)| ≤ ε 0 and |I k (g; f )| ≤ δ 0 for all k, then g is similar to f , since A little calculation shows that I k (g; f ) is also given by where f k is the density function of the order statistic X (k) of a random sample of size n from the population with the density f , and B is the beta function. An important property n k=1 I k (g; f ) = 0 is derived from equation (3) and n k=1 f k (x) = nf (x), and the relation It is convenient to calculate the estimate of I k (g; f ) from n observations x 1 , x 2 , · · · , x n by equation (2) in the following two ways: (a) The empirical density functionĝ(x) = 1 n χ {x1,x2,··· ,xn} (x) is employed, where χ is an indicator function, and the estimate I k (ĝ; f ) for given f is obtained by is employed, and the estimate I k (g;f ) for given g is obtained by Numerical examples in the next section present applications of these estimates to probability density estimation.

Numerical Examples
In this section, numerical simulations of parametric and nonparametric density estimation are demonstrated as the applications of the proposed indices I k (g; f ). Each density estimation was performed with a random sample of size n = 51 and repeated 10000 times in each setting. All numerical experiments were implemented in Mathematica 12.3.1 (Wolfram Research Inc., Champaign, IL). FindMinimum, NIntegrate and SmoothKernelDistiribution of Mathematica functions by default settings were used for searching local minima, calculating L 1 and L 2norms, and obtaining estimates by the conventional nonparametric density estimator, respectively.

Parametric Density Estimation
All observations were generated from the Cauchy distribution with location parameter µ 0 = 2 and scale parameter σ 0 = 2. The values of I k (g; f ) were estimated by equation (4), where F was set to be the distribution function of the Cauchy distribution with unknown location parameter µ and scale parameter σ. Each density estimation was performed as solving the unconstrained optimization problem min µ,σ n k=1 I k (ĝ; f ) 2 to determine unknown parameters. Figure 1 shows the histograms and density histogram of the estimatesμ andσ, and the descriptive statistics of these estimates are given in Table 1. The estimates may be unbiased and the distribution ofμ might be normal, but the distribution ofσ is skewed.

Nonparametric Density Estimation
Observations were generated from the mixture distribution whose density was f (x) = 0.3φ 2,1 (x) + 0.4φ 6,0.8 (x) + 0.3φ 10,2 (x), where φ µ,σ (x) is the density function of the normal distribution with mean µ and standard deviation σ. In this numerical example, the idea of kernel density estimation was employed, and the Gaussian kernels g l , l = 1, · · · , 2n/3 + 1 whose location and scale parameters were calculated from the sets of ordered observations x (l) , · · · , x (l+n/3−1) were applied. The value of I k (g l ; f ) for each density g l was estimated by equation (5) with Stirling's formula for calculating factorials in the beta function. Each density estimation was performed as solving the constrained optimization problem min λ1,λ2,··· ,λ 2n/3+1 ≥0 to determine the optimal mixture weights λ l for g l , and the estimated densityĝ was given bŷ g = 2n/3+1 l=1 λ l g l . Figure 2 shows the histograms of ∥ĝ − f ∥ L 1 and ∥ĝ − f ∥ L 2 , and the descriptive statistics of them are presented in Table 2 -the integrals over R required to obtain these norms were approximated by the numerical integrals over the interval [−5, 20]. The L 1 -norm best-fit and worst-fit estimates are depicted in Figure 3. One should notice that the density estimation with the proposed method is more accurate than that with conventional one but less precise, and the values of L 1 and L 2 -norms are not normally distributed.
In addition, it must be mentioned that the proposed estimator would not always be superior to the conventional one in accuracy, since the problem of bandwidth selection heavily affects kernel density estimates.

Conclusion
The set of indices to measure similarity between probability densities is proposed in this paper, and its applications to parametric and nonparametric density estimation have been presented. Numerical examples have shown that the density estimation with the proposed indices could have higher accuracy and less precision than that with the conventional one. Although the proposed indices are simply explained as the weighted means of the likelihood ratio function, a more detailed theoretical analysis must be carried out for further applications. In addition, similar indices for multivariate distributions should be explored, since there is an increasing interest in multivariate density estimation, and the definition of the proposed set of indices is easily extended to the multivariate case with the density function of the multivariate beta distribution.

Declarations
• Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. • Competing interests: None.     Fig. 2 Histograms of L 1 and L 2 -norms in nonparametric density estimation by the proposed method for the mixture distribution of the density 0.3φ 2,1 (x) + 0.4φ 6,0.8 (x) + 0.3φ 10,2 (x). x Density Fig. 3 The L 1 -norm best-fit (left panel) and worst-fit (right panel) estimated densities by the proposed method in nonparametric inference for the mixture distribution of the density 0.3φ 2,1 (x)+0.4φ 6,0.8 (x)+0.3φ 10,2 (x). Solid curve is the true density. Dashed and dotted curves represent the estimated densities by the proposed and conventional method, respectively.