Expectation identity of the hypergeometric distribution and its application in the calculations of high-order origin moments

Abstract We provide a novel method to analytically calculate the high-order origin moments of a hypergeometric distribution, that is, the expectation identity method. First, the expectation identity of the hypergeometric distribution is discovered and summarized in a theorem. After that, we analytically calculate the first four origin moments of the hypergeometric distribution by using the expectation identity. Furthermore, we analytically calculate the general kth ( ) origin moment of the hypergeometric distribution by using the expectation identity, and the results are summarized in a theorem. Moreover, we use the general kth origin moment to validate the first four origin moments of the hypergeometric distribution. Next, the coefficients of the first ten origin moments of the hypergeometric distribution are summarized in a table containing Stirling numbers of the second kind. Moreover, the general kth origin moment of the hypergeometric distribution by using the expectation identity is restated by another theorem involving Stirling numbers of the second kind. Finally, we provide some numerical and theoretical results.


Introduction
The hypergeometric distribution has attracted continuous interest in the literature since 2000. Childs and Balakrishnan (2000) examined some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Kumar (2002) introduced extended generalized hypergeometric probability distributions. Hida and Akahira (2003) considered an improvement on the approximation to the generalized hypergeometric distribution. Dinwoodie, Matusevich, and Mosteig (2004) presented two new transform methods for computing with hypergeometric distributions on lattice points. Hush and Scovel (2005) provided an improved concentration of measure theorem for the hypergeometric distribution. Kumar (2007) studied some properties of bivariate generalized hypergeometric probability distributions. Lahiri, Chatterjee, and Maiti (2007) investigated normal approximation to the hypergeometric distribution in nonstandard cases and established a sub-Gaussian Berry-Esseen theorem. Fog (2008) developed sampling methods for Wallenius' and Fisher's noncentral hypergeometric distributions. Li and Tian (2010) provided a simple algorithm for the high-order origin moments of the hypergeometric distribution. Eisinga and Pelzer (2011) proposed saddlepoint approximations to the mean and variance of the extended hypergeometric distribution. Lebrun (2013) proposed an efficient time/space algorithm to compute rectangular probabilities of multinomial, multivariate hypergeometric and multivariate Plya distributions. Liu and Wang (2014) obtained the high-order origin moments, the high-order central moments, and the high-order cumulants of the hypergeometric distribution. de Klerk, Laurent, and Sun (2015) considered an error analysis for polynomial optimization over the simplex based on the multivariate hypergeometric distribution. Danielian, Chitchyan, and Farbod (2016) proposed a new regularly varying generalized hypergeometric distribution of the second type. Greene and Wellner (2017) established exponential bounds for the hypergeometric distribution. Mano (2017) discussed partition structure and the A-hypergeometric distribution associated with the rational normal curve. Takayama, Kuriki, and Takemura (2018) considered A-hypergeometric distributions and Newton polytopes. Hafid, Hafid, and Samih (2020) proposed a novel methodology-based joint hypergeometric distribution to analyze the security of sharded blockchains. Krishnamoorthy and Lv (2020) constructed prediction intervals for hypergeometric distributions. Themangani, Porwal, and Magesh (2020) introduced a generalized hypergeometric distribution and discussed its applications on univalent functions. For earlier literature on the hypergeometric distribution, we refer readers to Riordan (1937); Kemp and Kemp (1956); Lieberman and Owen (1962); Harkness (1965) ;Brunk, Holstein, and Williams (1968); Ord (1968); Skibinsky (1970); Molenaar (1973); Tripathi and Gurland (1977); Guenther (1978); Chvatal (1979); Lai (1979); Sibuya and Shimizu (1981); Panaretos (1983); Ling and Pratt (1984); Kachitvichyanukul and Schmeiser (1988); Kochman, Murray, and West (1989); Tohma et al. (1991); Wu (1993); ; Lin, Hsiung, and Hsiao (1994); Ma (1999).
Exponential families include the continuous families (normal, gamma, and beta) and the discrete families (binomial, Poisson, and negative binomial). For the continuous exponential families, there are some expectation identities that rely on integration by parts. Stein's Lemma (Lemma 3.6.5 in Casella and Berger (2002)) gives an expectation identity for the normal family. Moreover, the expectation identities of the gamma and beta families are given in Exercise 3.49 of Casella and Berger (2002). It is worth noting that these expectation identities are useful in the calculations of high-order origin moments of the corresponding families. The discrete analogs of the expectation identities of the continuous exponential families are given in Theorem 3.6.8 of Casella and Berger (2002), which gives the expectation identities for the Poisson and negative binomial families (see also Hwang (1982)). Note that the Poisson and negative binomial families take countable infinite values. For the binomial family taking finite values, similar to the deviations of the Poisson and negative binomial expectation identities, Zhang, Rong, and Li (2019) discovered an expectation identity for the binomial family and obtained a closed-form formula of the high-order origin moments for the binomial family by using the binomial expectation identity.
The analytical calculations of the high-order origin moments of the hypergeometric distribution are quite challenging. In this paper, we provide a novel method to analytically calculate the high-order origin moments of the hypergeometric distribution, that is, the expectation identity method. First, the expectation identity of the hypergeometric distribution is discovered and summarized in a theorem. After that, we analytically calculate the first four origin moments and the general kth (k ¼ 1, 2, :::) origin moment of the hypergeometric distribution by the expectation identity method. Moreover, we use the general kth origin moment to validate the first four origin moments of the hypergeometric distribution. Finally, the coefficients of the first 10 origin moments of the hypergeometric distribution are shown in a table. By comparing the tables in the literature, we find that the coefficients are Stirling numbers of the second kind.
The rest of the paper is organized as follows. In the next Section 2, we discover an expectation identity of the hypergeometric distribution. In Section 3, we will use the expectation identity of the hypergeometric distribution to calculate its high-order origin moments. First, we will use the expectation identity of the hypergeometric distribution to calculate its first four origin moments. Second, we will use the expectation identity of the hypergeometric distribution to calculate its general kth (k ¼ 1, 2, :::) origin moment. Third, we will use the general kth origin moment to validate the first four origin moments of the hypergeometric distribution. Finally, the coefficients of the first 10 origin moments of the hypergeometric distribution are summarized in a table. In Section 4, we will provide some numerical and theoretical results. Some conclusions and discussions are provided in Section 5.

Expectation identity of the hypergeometric distribution
In this section, we discover an expectation identity of the hypergeometric distribution. The expectation identity is useful in analytical calculations of the high-order origin moments of the hypergeometric distribution.
Let Y $ HGðN, M, nÞ have a hypergeometric distribution with probability mass function (pmf) given by (see Casella and Berger (2002)) where N is the total number of identical balls in a large urn, M is the number of red balls in the urn, N -M is the number of green balls in the urn, n is the total number of selected balls, y is the number of selected balls that are red, and ny is the number of selected balls that are green. Similarly, let X $ HGðN À 1, M À 1, n À 1Þ have a hypergeometric distribution with pmf given by x ¼ 0, 1, :::, n À 1: By definition, for k ¼ 0, 1, 2, :::: The expectation identity of the hypergeometric distribution is discovered and summarized in the following theorem whose proof is simple as with many important results.
Theorem 1. Let X $ HGðN À 1, M À 1, n À 1Þ and Y $ HGðN, M, nÞ. Moreover, let gðxÞ be a function satisfying À1 < E gðXÞ Â Ã < 1 and À1 < gðÀ1Þ < 1. Then we have the following expectation identity of the hypergeometric distribution: Proof. By the definition of expectation, we have According to the identities of combinatorial numbers The proof of the theorem is complete. w 3. Using the expectation identity of the hypergeometric distribution to calculate its high-order origin moments In this section, we will use the expectation identity of the hypergeometric distribution to calculate its high-order origin moments. First, we will use the expectation identity of the hypergeometric distribution to calculate its first four origin moments. Second, we will use the expectation identity of the hypergeometric distribution to calculate its general kth (k ¼ 1, 2, :::) origin moment. Third, we will use the general kth origin moment to validate the first four origin moments of the hypergeometric distribution. Finally, the coefficients of the first 10 origin moments of the hypergeometric distribution are summarized in a table.
3.1. The analytical calculations of the first four origin moments of the hypergeometric distribution by using the expectation identity In this subsection, we will analytically calculate the first four origin moments of the hypergeometric distribution Y $ HGðN, M, nÞ by the expectation identity (2). First, let us calculate EY: Let gðxÞ ¼ ð1 þ xÞ 0 ¼ 1: By the expectation identity (2), we have Therefore, which can be obtained by replacing N, M, and n by N À 1, M À 1, and n À 1 in the formula of EY (3). Second, let us calculate EY 2 : Let gðxÞ ¼ ð1 þ xÞ 1 : By the expectation identity (2), we have Hence, by (4), we obtain Therefore, which can be obtained by replacing N, M, and n by N À 1, M À 1, and n À 1 in the formula of EY 2 (5).
Third, let us calculate EY 3 : Let gðxÞ ¼ ð1 þ xÞ 2 : By the expectation identity (2), we have Hence, by (4) and (6), we obtain Therefore, which can be obtained by replacing N, M, and n by N À 1, M À 1, and n À 1 in the formula of EY 3 (7). Fourth, let us calculate EY 4 : Let gðxÞ ¼ ð1 þ xÞ 3 : By the expectation identity (2), we have Hence, by (4, 6), and (8), we obtain Therefore, which can be obtained by replacing N, M, and n by N À 1, M À 1, and n À 1 in the formula of EY 4 (9). It is easy to see that our analytical formulas of the first four origin moments of the hypergeometric distribution by using the expectation identity (2) are the same as those obtained in Li and Tian (2010), in which they use the definition of expectation for the calculations.
3.2. The analytical calculations of the kth origin moment of the hypergeometric distribution by using the expectation identity In this subsection, we will analytically calculate the kth origin moment of the hypergeometric distribution Y $ HGðN, M, nÞ by the expectation identity (2). Define , k ¼ 0, 1, 2, 3, :::: In particular, From the calculation results in the previous section, we find that the first four origin moments of the hypergeometric distribution Y $ HGðN, M, nÞ can be written as a linear combination of H 0 , H 1 , H 2 , and H 3 . This fact inspires us to claim that the general kth origin moment of Y $ HGðN, M, nÞ, EY k ðk ¼ 0, 1, 2, :::Þ, can also be written as a linear combination of H 0 , H 1 , … , H kÀ1 given in (11).
To prove Theorem 2, we need the following lemma, which gives a relationship between the expressions of EY k and ðMn=NÞEX k : Lemma 1. Let X $ HGðN À 1, M À 1, n À 1Þ and Y $ HGðN, M, nÞ. If is a linear combination of H 0 , H 1 , … , H kÀ1 given in (11) Proof. If is a linear combination of H 0 , H 1 , … , H kÀ1 , then which can be obtained by replacing N, M, and n by N À 1, M À 1, and n À 1 in the formula of EY k (12). Hence, is a linear combination of H 1 , H 2 , … , H k . The proof of the lemma is complete.
w By Lemma 1 and the mathematical induction method, we have the following theorem, in which EY k is expressed as a linear combination of H 0 , H 1 , … , H kÀ1 given in (11).
In conclusion, by the mathematical induction method, (13) is established and the proof of the theorem is completed. w

Verifications of the first four origin moments by the kth origin moment
In this subsection, we will verify the first four origin moments of the hypergeometric distribution Y $ HGðN, M, nÞ by the kth origin moment (13). The verifications ensure that the general formula for the kth origin moment (13) is correct.

The coefficients table of the kth origin moment of the hypergeometric distribution
The coefficients of the first 10 origin moments of the hypergeometric distribution Y $ HGðN, M, nÞ are shown in Table 1.
Comparing Table 1 with the table of Stirling numbers of the second kind and exponential numbers (P310) in Comtet (1974) and Table 1 in Li and Tian (2010), we find that are Stirling numbers of the second kind. Moreover, the Aðk, jÞ given by (14) satisfy the "vertical" recurrence relations given in Theorem B [3c] (P209) in Comtet (1974), and this fact further reassures that Aðk, jÞ are Stirling numbers of the second kind. Therefore, Theorem 2 can be restated by the following theorem in the symbol of Sðk, jÞ, which are Stirling numbers of the second kind.
Theorem 3. Let Y $ HGðN, M, nÞ. Then is a linear combination of H 0 , H 1 , … , H kÀ1 given in (11), where the coefficients Sðk, jÞ are Stirling numbers of the second kind.

Numerical, theoretical results
In this section, we will provide an assessment of the computation complexity (see Chapter 5 of Progri (2011)) which is given in number of multiplications of the Definition Method (DM) (1) and the Analytical Method (AM) (16). Moreover, we will investigate the memory size of the DM and the AM. Furthermore, we will carry out some numerical experiments to compare the computations of EY k and the computing times by the DM and the AM. Finally, we will compare the absolute errors and the relative errors between the two methods. See the Numerical, Theoretical Results section of Progri (2021aProgri ( , 2021bProgri ( , 2021c for details. The numerical experiments are carried out on a Lenovo computer 11th Gen Intel(R) Core(TM) i5-11300H @ 3.10 GHz with 16.0 GB RAM using R version 4.1.1. Now let us provide an assessment of the computation complexity which is given in number of multiplications of the DM and the AM. For the DM (1), which can be computed in Oð2y À 1Þ: Similarly, N À M n À y can be computed in Oð2ðn À yÞ À 1Þ, and N n can be computed in Oð2n À 1Þ: Moreover, y k can be computed in Oðk À 1Þ: Hence, Therefore, EY k by the DM can be computed in For the AM (16), Sðk, jÞ can be precomputed and stored in a lower-triangular matrix S of size k Â k, which is independent of N, M, and n, and thus the computation cost of Sðk, jÞ can be regarded as 0. Note that H k (11) can be computed in Oð3k þ 2Þ, and thus H jÀ1 can be computed in Oð3j À 1Þ: Therefore, EY k by the AM can be computed in Now let us investigate the memory size of the DM and the AM. For the DM, we only need to store a scalar, and thus the memory size of the DM is 1 double floating-point number. For the AM, we need to store two vectors of length k, and a precomputed lower-triangular matrix S of size k Â k. Therefore, the memory size of the AM is double floating-point numbers. In most cases, k would be small, and thus the memory size of the AM would be small. In summary, compared to the DM, the AM would be computationally more efficient, but suffers from a heavier burden of memory size.
The numerical values of EY k ðk ¼ 1, :::, 10Þ and the computing times (in seconds, 10, 000 replications) by the two methods (DM and AM) for N ¼ 1000, M ¼ 500, and n ¼ 10 are summarized in Table 2. From the table, we see that EY k increases as k increases for other parameters fixed. Moreover, the two methods produce the same answers accurate to two decimal points. To obtain reliable computing times, we have done 10,000 replications, that is, we have computed each moment for 10,000 times, and the total computing times in seconds are reported in the table. Therefore, for each moment in a single replication, the computing time is 1=10, 000 of the computing time reported in the table. For example, the computing time for EY 1 for once by the DM is 1:71 10, 000 seconds ¼ 1:71 10 milliseconds ¼ 0:171 milliseconds: Hence, the computing for each moment by either method is very fast. Finally, the computation of EY k by the AM is consistently faster than that by the DM. Now let us consider the absolute errors and the relative errors between the DM and the AM. The absolute error between the two methods to compute EY k is where EY k DM and EY k AM are the EY k values computed by the DM and the AM, respectively. The relative error between the two methods to compute EY k is The absolute errors and the relative errors between the two methods to compute EY k for k ¼ 1, :::, 20 with N ¼ 1000, M ¼ 500, and n ¼ 10 are reported in Table 3. From   Now let us carry out some sensitivity analysis to compute EY k by allowing one of the parameters (k, N, M, and n) to change, holding other parameters fixed. The parameter values are fixed as k ¼ 10, N ¼ 1000, M ¼ 500, and n ¼ 10: Other parameter values can also be specified.
The sensitivity analysis of EY k by the DM and the AM against the parameters k, N, M, and n are plotted in Figure 1. From the figure, we see that EY k are increasing functions of k, M, and n, and it is a decreasing function of N. The two curves in the plots are indistinguishable, which means that the two methods produce almost the same answers to EY k : It is worth noting that EY k for n ! 385 by the DM are 1 (see the lower right plot), while the AM produces reasonable EY k values for n ! 385: Therefore, the AM is more robust to extreme parameter values than the DM.
The sensitivity analysis of the absolute errors and the relative errors against the parameters k, N, M, and n are plotted in Figure 2. The upper left plot is a graphical version of Table 3. From the upper left plot, we see that the absolute errors of EY 18 and EY 20 are large, however, their relative errors are acceptably small. Moreover, the absolute errors against N and M are acceptably small, and the relative errors against N and M are much smaller. Furthermore, the absolute errors against n is very large, however, the relative errors against n are acceptably small. All the plots indicate that the relative errors are acceptably small, and in fact, they are almost 0.

Conclusions and discussions
We have provided a noval method to calculate the high-order origin moments of the hypergeometric distribution Y $ HGðN, M, nÞ, that is, the expectation identity method. First, the expectation identity of the hypergeometric distribution is discovered and summarized in Theorem 1. After that, we analytically calculate the first four origin moments of the hypergeometric distribution by using the expectation identity. Furthermore, we analytically calculate the general kth (k ¼ 1, 2, :::) origin moment of the hypergeometric distribution by using the expectation identity, and the results are summarized in Theorem 2. Moreover, we use the general kth origin moment to validate the first four origin moments of the hypergeometric distribution. The coefficients of the first 10 origin moments of the hypergeometric distribution are shown in Table 1. Comparing Table 1 with the tables in the literature, we find that Aðk, jÞ in Theorem 2 are Stirling numbers of the second kind. Finally, Theorem 2 is restated by Theorem 3 in the symbol of Sðk, jÞ, which are Stirling numbers of the second kind.
We have provided an assessment of the computation complexity which is given in number of multiplications of the DM and the AM, which are respectively Oðnðk þ 4nÞÞ and Oð3k 2 =2Þ: Moreover, we have investigated the memory size of the DM and the AM, which are respectively 1 and kðk þ 5Þ=2 double floating-point numbers. Furthermore, we have carried out some numerical experiments to compare the computations of EY k and the computing times by the two methods. Finally, we have compared the absolute errors and the relative errors between the two methods. In summary, compared to the DM, the AM would be computationally more efficient, but suffers from a heavier burden of memory size. Moreover, the AM is more robust to extreme parameter values than the DM, as observed from the lower right plot in Figure 1.

Supporting information
Additional information for this article is available.
The R codes used in the article can be downloaded from the link: https://pan.baidu. com/s/1SzEN1fHwnRvB8HzOdtpoxg, with password: 1234. Alternatively, the readers can send emails to the corresponding author.