A fast and accurate kernel-based independence test with applications to high-dimensional and functional data

Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert--Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root-n consistent. A three-cumulant matched chi-squared approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.


Introduction
With development of data collection techniques, complicated data objects such as high-dimensional data or functional data in some separable metric spaces are frequently encountered in various areas.In many big data applications nowadays, we are often interested in measuring the level of association between a pair of potentially high-dimensional random vectors or functional random variables.Testing the independence of random elements is an important inference problem in statistics and has important applications.The work of this study is motivated by the Canadian weather data set, available in the R package fda.usc and discussed in details by Ramsay and Silverman (2005).This Canadian weather data set has been studied in the literature of multivariate functional data analysis; see Górecki and Smaga (2017) and Zhu et al. (2022).It contains the average daily temperature curves and the average daily precipitation curves at 35 Canadian weather stations over a year, obtained via averaging the daily temperature curves and the daily precipitation curves yearly over the period 1960 to 1994.Of interest is to check whether the average daily temperature curves and the average daily precipitation curves are statistically independent.This is a two-sample independence testing problem for functional data.If the above testing problem is rejected, we may take this dependence into account in an inference procedure so that it is more efficient.
Mathematically, a two-sample independence testing problem can be described as follows.Let x and y be two random elements defined in two separable metric spaces X and Y, respectively.Suppose we have a paired sample: with each (x i , y i ) ∈ X × Y independently and identically following the joint Borel probability measure P xy .Of interest is to test the following hypotheses: H 0 : P xy = P x P y , versus H 1 : P xy = P x P y , where P x and P y be the marginal probability measures of x and y, respectively.
There exist some classical dependence measures such as Spearman's ρ and Kerdall's τ which have been widely applied.However, they are typically designed to capture only particular forms of dependence (e.g.linear or monotone) and they are not able to detect all modes of dependence between random variables.
As availability of complicated data objects, dependence measures are sought that capture more complex dependence patterns and those that occur between high-dimensional datasets or functional datasets.
Hilbert-Schmidt Independence Criterion (HSIC), introduced and studied by Gretton et al. (2005a,b), is one of the most successful nonparametric dependence measures.It uses the distance between the kernel embeddings of probability measures in a reproducing kernel Hilbert space (RKHS) (Gretton et al. 2007;Smola et al. 2007;Zhang et al. 2011) which can be used for measuring the dependency between not only univariate or multivariate random variables, but also random variables valued into more complex structures such as high-dimensional data or functional data.By employing HSIC, Gretton et al. (2007) proposed a novel test whose test statistic is an empirical estimate of HSIC using V-statistics.The authors approximated the null distribution of the test statistic by a two-parameter Gamma distribution.
The resulting Gamma approximation based test costs O(n 2 ), where n is the sample size.However, the simulation results in Tables 1 and 3 indicate that the Gamma approximation based test works well only when the dimension p is small and it is very conservative or totally fails to work when the dimension p is large.Gretton et al. (2007) also proposed a permutation test which works well generally in terms of size control but it costs O(mn 2 ) operations, where m is the number of permutations, indicating that it is about m times more time-consuming than the Gamma approximation based test.This is partially confirmed by Table 2 of Section 3.1 which shows that the permutation test is about 100 to 1000 times more time-consuming than the Gamma approximation based test.To overcome this problem, Zhang et al. (2018) introduced three fast estimators of HSIC to speed up the computation in HSIC based tests.
However, the computational complexity reduced by their fast estimators is not a free lunch.According to the simulation results in Zhang et al. (2018), there is some loss in power when the sample size is small.
A much lower computational cost in large-scale examples is then offset by the requirement for a larger sample size.
In recent years, functional data analysis has emerged as an important area of statistics.Most studies are conducted by assuming that the random curves are independent without any checking.To overcome this problem, a few methods have been developed for detecting the dependency between random curves.
Most of these independence tests are based on the measures of correlation including the classical Pearson correlation (Pearson 1895), the dynamical correlation (Dubin and Müller 2005), and the global temporal correlation (Zhou et al. 2018) among others.However, since zero correlation does not imply independence generally, these functional correlations may be insufficient for independence testing (Miao et al. 2022).Kosorok (2009) applied the distance covariance proposed by Székely et al. (2007) to top FPC scores which cumulatively account for 95% of the variations of random functions.Unfortunately, as discussed in Shen et al. (2019) and the simulation results shown in Table 6, for testing the dependence between two random functions, the correlation and distance covariance based tests are less powerful for non-monotone dependencies, although they are powerful for monotone relationship.
In this paper, we propose a new HSIC-based test which works well for multivariate, high-dimensional, and functional data and it computes very fast.To the best of our knowledge, there are few tests which work well for multivariate, high-dimensional, and functional data.The main contributions of this work are as follows.First of all, we propose an unbiased and root-n consistent estimator for the centered reproducing kernel used in the proposed test statistic.It gives a good basis for the proposed new test to have much better size control than the Gamma approximation based test of Gretton et al. (2007).Second, under some regularity conditions, we show that under the null hypothesis, the proposed test statistic has a chi-squared-type mixture limit.Third, we derive the first three cumulants (mean, variance, and third central moment) of the proposed test statistic.This allows us to employ the three-cumulant (3-c) matched χ 2 -approximation of Zhang (2005) to accurately approximate the distribution of the chi-squared-type mixture with the approximation parameters consistently estimated from the data.The 3-c matched χ 2approximation avoids permutation and significantly reduces the computational cost.It guarantees that the proposed new test computes very fast and has a good size control.Fourth, we derive the asymptotic power of the proposed new test under a local alternative and show that it is root-n consistent.To the best of our knowledge, this has not been considered in the literature.Lastly, via three simulation studies and two real data examples, we demonstrate that in terms of size control, power, and computational cost, our new test works well and outperforms several existing tests for independence testing for multivariate, high-dimensional, and functional data.
The rest of this paper is organized as follows.The main results are presented in Section 2. Simulation studies and real data applications are given in Sections 3 and 4, respectively.Some concluding remarks are given in Section 5. Technical proofs of the main results are outlined in the Appendix.
2 Main results canonical feature maps such that we have the following kernel tricks:

Test statistic
where (x , y ) is an independent copy of (x, y).It follows that φ(X ) ⊂ F and ψ(Y) ⊂ G. Following Fukumizu et al. (2004), the cross-covariance operator C xy : G → F is defined such that for all f ∈ F and Set µ x = E x [φ(x)] and µ y = E y [ψ(y)] to be the mean embeddings of the probability measures P x and P y , respectively.The cross-covariance operator itself can then be written where ⊗ denotes the tensor product.
According to Gretton et al. (2005b, Theorem 4) (4) Gretton et al. (2007) proposed to measure the dependence between x and y using the following squared Hilbert-Schmidt-norm of C xy : where K(x, x ) and L(y, y ) denote the centered versions of K(x, x ) and L(y, y ), respectively, and (x , y ) is an independent copy of (x, y).Notice that for the kernel K(•, •), we have where z and z are independent copies of x and x , respectively.Notice also that we have the following useful properties: when x = x, we have and when x and x are independent, we have The above properties are valid after replacing K and x with L and y, respectively.Using (5), to test (4), we can construct the following test statistic where K * (x i , x j ) and L * (y i , y j ) are the unbiased estimators of K(x i , x j ) and L(y i , y j ), respectively, which are given by , and For simplicity, let K = (K(x i , x j )) : n×n and L = (L(y i , y j )) : n×n denote the Gram matrices of the two kernels K(•, •) and L(•, •), respectively.Similarly, set K = ( K(x i , x j )) : n × n and L = ( L(y i , y j )) : n × n, and K * = ( K * (x i , x j )) : n × n and L * = ( L * (y i , y j )) : n × n.Then we have where tr(A) denotes the trace of the square matrix A, i.e., the sum of the diagonal entries of A. Note that T n can be easily computed using O(n 2 ) operations.
Remark 1.To test (4), Gretton et al. (2007) proposed the following test statistic: where H = I n − J n /n with I n and J n being the n × n identical matrix and the n × n matrix of ones, and ) and L G (y i , y j ) being the biased estimators of K(x i , x j ) and L(y i , y j ) defined as , and It is worthwhile to emphasize that the differences between our test statistic T n (8) and Gretton et al.
(2007)'s test statistic T n,G (9) mainly come from the differences between the unbiased estimators (7) and the biased estimators (10) of K(x i , x j ) and L(y i , y j ).
Lemma 1.Under the condition (11), as n → ∞, we have Lemma 1 gives the uniform convergence rate of K * (x i , x j ) to K(x i , x j ).Similarly, we can also have ) uniformly for all y i , y j .Therefore, we can write that That is, T n and Tn have the same distribution for large values of n.Thus, studying the asymptotic null distribution of T n is equivalent to studying that of Tn .
Remark 2. Theorem 1 is parallel to Theorem 2 of Gretton et al. (2007) where the authors treated T n,G /n (See Remark 1) as a V-statistic of an order 4 kernel while actually we can show that T n,G /n = Tn /n + O(n −3/2 ) where Tn /n is a V-statistic of an order 2 kernel only [see ( 13) for details].Theorem 1 is the same as Theorem 1 of Zhang et al. (2018) but our proof is much simpler than that of the latter.

Null distribution approximation
As mentioned in the introduction section, Gretton et al. (2007) approximated the null distribution of T n,G (9) by permutation and by a two-parameter Gamma distribution, resulting in a permutation test and a Gamma approximation based test.In this subsection, since T is a χ 2 -type mixture with unknown coefficients, we approximate the null distribution of T n using the three-cumulant (3-c) matched χ 2approximation (Zhang 2005, Zhang 2013), resulting in a 3-c matched χ 2 -approximation based new test.
Remark 3. In terms of computational cost, the permutation test is very time-consuming with a cost O(mn 2 ) where m is the number of permutations and n is the sample size while the Gamma approximation based test computes very fast, with a cost of O(n 2 ).However, in terms of size control, the permutation test generally performs quite well but the Gamma approximation based test performs well only for low dimensional data and it is very conservative and totally fails to work for high-dimensional data, as demonstrated by the simulation results presented in Tables 1 and 3 of Section 3.1.
The key idea of the proposed new test is to approximate the null distribution of T n using that of the following random variable of form The parameters β 0 , β 1 , and d are determined via matching the first three cumulants of T n and R. For this purpose, we derive the first three cumulants of T n as in the following theorem whose proof is given in the Appendix.
Theorem 2. Under the condition (11) and the null hypothesis, the first three cumulants of T n are given by where with x, x , x i.i.d.
∼ P x and y, y , y The first three cumulants of R are given by β 0 + β 1 d, 2β 2 1 d, and 8β 3 1 d while the first three cumulants of T n are given in ( 14).Equating the first three-cumulants of T n and R and ignoring the higher order terms then leads to In addition, the skewness of T n can also be approximately expressed as Thus the skewness of T n will become small as d increases.
Under the condition (11), by Lemma 1, we have K * (x i , x j ) = K(x i , x j ) + O(n −1/2 ) uniformly for all x i , x j 's, and L * (y i , y j ) = L(y i , y j ) + O(n −1/2 ) uniformly for all y i , y j 's.Then by ( 15), the natural and where K * (x i , x j ) and L * (y i , y j ) are defined in (7).For fast computation, we can write where AoB = (a ij b ij ) denotes a dot product of two matrices A = (a ij ) and B = (b ij ), and diag(A) denotes a diagonal matrix formed by the diagonal entries of A. The proof of ( 19) is given in the Appendix.
Similarly, we have test substantially.This is partially confirmed by the simulation results presented in Tables 1 and 3 of Section 3.1.Further, in terms of computational cost, the proposed new test, with a cost of O(n 3 ), is much less time-consuming than the permutation test when the number of permutations is larger than the sample size and is slightly more time-consuming than the Gamma approximation based test.This is partially confirmed by Table 2 of Section 3.1.

Asymptotic power
In this subsection, we investigate the asymptotic power of the proposed test under the following local alternative hypothesis: where 0 < ∆ < 1/2 and h is a positive constant.The above local alternative hypothesis will tend to the null hypothesis as the sample size n tends to infinity.Therefore, it is often challenging to detect it.A test is usually called to be root-n consistent if it can detect the local alternative hypothesis (20) with probability tending to 1 as n tends to infinity.A root-n consistent test is often preferred.
Theorem 4.Under the condition (11) and the local alternative (20), as n → ∞, we have where with (x , y ) being an independent copy of (x, y), that for any significance level α, the asymptotic power of the proposed test T n is given by which tends to 1 as n → ∞ where Φ(•) denotes the cumulative distribution function of N (0, 1).
The proof of Theorem 4 is given in the Appendix.Theorem 4 shows that the proposed new test T n is root-n consistent.
Remark 5.The first result of Theorem 4 is parallel to Theorem 1 of Gretton et al. (2007) where the authors treated T n,G /n as a V-statistic of an order 4 kernel while actually we have with Tn /n being a V-statistic of an order 2 kernel only [see ( 13) for details].The result of Theorem 1 of Gretton et al. (2007) may be problematic.

Simulation studies
In this section, we conduct three simulation studies, namely Simulations 1, 2, and 3, to compare the proposed new test, denoted as NEW, against several existing competitors for the two-sample independence testing problem for multivariate, high-dimensional, and functional data, respectively.We compute the empirical size or power of a test as the proportion of the number of rejections out of 10, 000 simulation runs.Throughout this section, we set the nominal size α as 5%.
In the three simulation studies described below, for simplicity, we choose the kernel K(•, •) to be the following Gaussian radial basis function (RBF) kernel: where σ 2 is the so-called kernel width.For multivariate and high-dimensional data as in Simulations 1 and 2, x denotes the usual L 2 -norm of a vector x and for functional data as in Simulation 3, it denotes the usual L 2 -norm of a function x(t), t ∈ T given by x = T x 2 (t)dt 1/2 and it is computed via approximating the integrals using the trapezoidal rule.It is easy to see that the above Gaussian RBF kernel is bounded above by 1 so that the condition ( 11) is always satisfied.The kernel width σ 2 is selected by employing the data-adaptive Gaussian kernel width selection method proposed in Zhang et al. (2022, sec. 2.6).For the kernel L(•, •), it is done similarly.

Simulation 1
In this simulation study, we demonstrate the performance of the NEW test for multivariate data against the Gamma approximation based test and the permutation test proposed and studied in Gretton et al. (2007), denoted as HSICg and HSICp, respectively.The HSICg test is implemented in the R package dHSIC (Pfister and Peters 2017) and the number of permutations used in the HSICp test is set as 200.
We make use of the multivariate benchmark data scheme used in Gretton et al. (2007).We conduct the independence test for p-dimensional random variables for p = 2, 4, 10, and 20.The data are generated as follows.First, using rjordan in the R package ProDenICA (Hastie et al. 2022), we generate n observations of two univariate random variables randomly and with replacement, each drawn at random from the Independent Component Analysis (ICA) benchmark densities in Table 3 of Gretton et al. (2005b), including super-Gaussian, sub-Gaussian, multimodal, and unimodal distributions.Second, we mix these random variables using a rotation matrix parameterized by an angle θ, varying from 0 to π/4 (a zero angle means the data are independent, while dependence becomes easier to detect as the angle increases to π/4).That is, we set θ = 0, π/8, and π/4.Third, we append p − 1 dimensional Gaussian noise of 0 mean and 1 standard deviation to each of the mixtures.Finally, we multiply each resulting vector by an independent random p-dimensional orthogonal matrix, to obtain vectors which are dependent across all observed dimensions.The sample size we consider includes n = 30, 50, 100, 200 and 500, respectively.
The empirical sizes and powers of the HSICp, HSICg, and NEW tests in Simulation 1 are displayed in Table 1.We can draw several interesting conclusions.When θ = 0, the null hypothesis holds so that we can compare the performances of the three tests in terms of size control.However, with increasing the dimension p, the HSICg test becomes more and more conservative with its empirical sizes becoming as small as 0.00% especially when p ≥ 10 and the sample size n is small.This means that the HSICg test does not work for moderate or high dimensional data while the HSICp and NEW tests still work well.On the other hand, when θ > 0, the alternative hypotheses hold so that we can compare the performances of the three tests in terms of power.As expected, the empirical powers of the HSICp and NEW tests are generally comparable although the empirical powers of the HSICp test are slightly smaller than those of the NEW test but when the dimension p ≥ 10, the empirical powers of the HSICg test are generally smaller than those of the HSICp and NEW tests, showing the impact of the level accuracy of the three tests.Notice that as the value of θ increases or as the sample size n increases or both, the empirical powers of the three tests are generally getting larger.Notice also that as the dimension p increases, the empirical powers of each of the three tests are getting smaller.This is not surprising, however, because when θ = 0, only the first elements in the two variables are correlated so that as the dimension p increases, the dependence between the variables becomes harder and harder to detect.
In the above, we compare the HSICp, HSICg, and NEW tests in terms of size control and power.We now compare their computational costs.To this end, the total execution time (in minutes) of the three tests for the 10, 000 simulation runs when θ = 0, p = 4, 10, and 20 and n = 30, 50, 100, 200, and 500
To generate the high-dimensional data under "large p, small n" settings, we choose n = 30, 50, 100 and p = 50, 100, 200, respectively.Here and throughout, to measure the overall performance of a test in maintaining the nominal size α = 5%, we employ the average relative error (ARE) criterion of Zhang (2011).The ARE value of a test is calculated as ARE = 100M −1 M j=1 |α j −α|/α, where αj , j = 1, . . ., M , denote the empirical sizes under M simulation settings.A smaller ARE value of a test indicates a better performance of that test in terms of size control.
Table 3 presents the empirical sizes of the HSICp, HSICg, and NEW tests under various settings, with the last row denoting their ARE values associated with the three values of ρ.In terms of size control, both the HSICp and NEW tests perform well regardless of whether the data are less correlated (ρ = 0.1), moderately correlated (ρ = 0.5), or highly correlated (ρ = 0.9) since their empirical sizes are generally around 5% and their ARE values are generally below 20.They perform much better than the HSICg test which does not work at all with its empirical sizes being 0 when ρ = 0.1 and 0.5, and is very conservative with its empirical sizes being much smaller than 5% when ρ = 0.9.These conclusions are consistent with those observed from Simulation 1 when the data dimension is small or moderate.
The empirical powers of the HSICp, HSICg, and NEW tests under various configurations are presented in Table 4.It is seen that for fixed n and p, as the value of δ increases, the empirical powers of the three tests increase, and under each setting, the empirical powers of the NEW test are generally larger than those of the HSICp test, while the HSICg test is totally no power.The power magnitude order of the three tests is obviously affected by that of their empirical sizes, as seen from Table 3.

Simulation 3
In this simulation study, we compare the NEW test against a few representative existing tests for the two-sample independence testing problem for functional data.These existing tests include the Pearson correlation based test, dynamical correlation based test (Dubin and Müller 2005), FPCA-based distance covariance test (Kosorok 2009), and global temporal correlation based test (Zhou et al. 2018), denoted as Pearson, dnm, FPCA, and gtemp, respectively.The dnm, FPCA, and gtemp tests are permutation based and are implemented in Miao et al. (2022).
∼ N (0, 1) for j = m+1, . . ., 50 where m is an integer used to control the dependency level of the functional observations such that when m = 0, the null hypothesis holds and otherwise the dependency level increases with m.
In numerical implementation, these functional observations will be evaluated at a grid of equal-spaced time points: t r = (r − 1)/(k − 1), r = 1, . . ., k.For power consideration, we consider four functions: , and f (u) = u cos(u), with the first one being monotone and the last three are not; see Figure 1 for some details.It is often more difficult to detect the dependency when f (u) = u sin(u) and f (u) = u cos(u) than when f (u) = u 3 and f (u) = u 2 .Thus we set m = 3, 5, 10, and 15 when f (u) = u 3 and f (u) = u 2 , and set m = 15, 20, 25, and 30 when f (u) = u sin(u) and Table 5 displays the empirical sizes of the five considered tests with the last row being the associated ARE values.It is seen that all the five tests have good level accuracy with their empirical sizes generally around 5%. Admittedly, in terms of size control, the NEW test performs slightly worse than the other four tests but it also computes much faster than them.
For space saving, Table 6 only displays the empirical powers (in %) of the five considered tests under Model 1 with k = 201 since the conclusions drawn for other values of k are similar.These empirical powers are quite revealing in several ways.First of all, for monotone relationship (i.e., when f (u) = u 3 ), the NEW test is just slightly less powerful than the other four tests.The slight power inferiority of the NEW test is possibly due to the fact that the NEW test is an HSIC-based test since, as pointed out by Shen et al. (2019), to detect a monotone relationship, the HSIC based tests may be slightly inferior to the distance covariance based tests.Second, for non-monotone relationship (i.e., when f (u) = u 2 , u sin(u) and u cos(u)), the NEW test is generally more powerful than the other four tests.The performances of the other four tests are quite different for different non-monotone relationships.For example, the Pearson test has almost no powers when f (u) = u 2 and u sin(u), the gtemp test has almost no powers when f (u) = u sin(u) and u cos(u), while the dnm and FPCA tests have very low powers when f (u) = u sin(u).From the above three simulation studies, in terms of level accuracy, power, and computational costs, the NEW test outperforms the other competitors generally and hence it should be recommended in real data analysis.

Applications to functional and high-dimensional data
In this section, we present the applications of the NEW test, together with several existing competitors mentioned in the previous section, to a functional data set and a high-dimensional data set.For the NEW test, we continue to use the Gaussian RBF kernel and choose the kernel width as described in the previous section.

Canadian weather data
In this subsection, we illustrate the applications of the Pearson, dnm, FPCA, gtemp and NEW tests to functional data using the Canadian weather data set, briefly introduced in Section 1.For each of the 35 weather stations, over a year period, the variable "Temperature" records the average daily temperature and the variable "Precipitation" records the average daily rainfall rounded to 0.1 mm.The raw temperature and precipitation curves for the 35 weather stations are presented in Figure 2. It is expected that there is some dependency between the average daily temperature and the average daily precipitation since they were recorded from the same 35 Canadian weather stations.Of interest is to check how strong this dependency is.
To this end, we apply the Pearson, dnm, FPCA, gtemp, and NEW test to this Canadian weather data set to check whether the temperature curves and the precipitation curves are independent.The p-values of the five tests are shown in Table 7.It is seen that the p-values of all the five tests are quite small and much smaller than 1%, suggesting that there is some strong evidence to reject the null hypothesis, i.e., there is strong dependency between the temperature curves and the precipitation curves for the 35 Canadian weather stations, as expected.

Colon data
In this subsection, we illustrate the applications of the HSICp, HSICg, and NEW tests to high-dimensional data using the well-known colon data set which contains 62 tissues, each having 2000 gene expression levels, and can be downloaded from http://microarray.princeton.edu/oncology/affydata/index.
html.In order to construct a two-sample test for independence for high-dimensional data, we choose the first 31 tissues to form the first group, and the remaining 31 tissues to form the second group.Thus, the two groups should be independent since these tissues are independent.Table 8 displays the p-values of the three considered tests, which are all larger than 50%, showing that there is no evidence at all to reject the null hypothesis, as expected.To further demonstrate the level accuracy of the NEW test against the HSICp and HSICg tests, a small scale simulation study based on this colon data set is conducted to simulate the empirical sizes of the three tests, obtained from 10, 000 simulation runs.In each run, the 62 tissues are randomly split into two groups of equal-size.The empirical size of a test is calculated as the proportion of times when the p-value of the test is smaller than the nominal size α = 5% or 10% based on the 10, 000 runs.The empirical sizes of the three tests are displayed in Table 9.It is seen that both the NEW and HSICp tests have good level accuracy but the HSICg test is rather conservative.This is consistent with the conclusions drawn from the simulation results presented in Tables 1 and 3.

Concluding remarks
In the literature, several tests have been proposed for two-sample independence test in separable metric spaces based on the Hilbert-Schmidt Independence Criterion (HSIC).In this paper, we propose and study a new HSIC based independence test in separable metric spaces with applications to functional and high-dimensional data.Under some regularity conditions and the null hypothesis, it is shown that the proposed test statistic asymptotically has a chi-squared-type mixture limit.To conduct the proposed test, we employ the three-cumulant matched chi-squared-approximation of Zhang (2005) to approximate the distribution of the chi-squared-mixture with the approximation parameters consistently estimated from the data.Simulation studies and real data applications demonstrate that in terms of size control, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.Nevertheless, Tables 3 and 5  where * means "i < j, α < β" and "(i, j) = (α, β)" while * * means "i < j, α < β, u < v" and "(i, j), (α, β), (r, s) are not mutually equal to each other."It follows that E[ Tn − E( Tn )] 3 = 8M 3 N 3 + O(n −1 ).The theorem is then proved.1≤i<j<k≤n K(x i , x j ) K(x j , x k ) K(x k , x i ).

Proof of (19)
be two continuous, positive characteristic reproducing kernels.Let F and G be the two reproducing kernel Hilbert spaces (RKHS) with inner products , F and , G , generated by K and L, respectively.Let φ(x) = K(x, •) and ψ(y) = L(y, •) denote their associated

Theorem 3 .
Under the condition (11), as n → ∞, we have M p −→ M , = 1, 2, 3 and N p −→ N , = 1, 2, 3.It follows that as n → ∞, we have β0 p −→ β 0 , β1 p −→ β 1 , and d p −→ d.Remark 4. Since the Gamma approximation based test matches the mean and variance of T n,G while the proposed new test matches the mean, variance, and the third central moment of T n , it is expected that in terms of size control, the proposed new test should outperform the Gamma approximation based

Figure 2 :
Figure 2: Raw temperature and precipitation curves for 35 Canadian weather stations.

Table 1 :
Empirical sizes and powers (in %) of the HSICp, HSICg, and NEW tests in Simulation 1.However, the HSICg test performs much worse than the HSICp and NEW tests.When p = 2, the HSICg test performs quite well with its empirical sizes generally around 5%.
It is seen that in terms of size control, both the HSICp and NEW tests have very good level accuracy and their performances are generally comparable since the empirical sizes of the HSICp and NEW tests are generally around 5% and below 6% under most of the settings although the empirical sizes of the NEW test are slightly larger than those of the HSICp test.Admittedly, the HSICp test slightly outperforms the NEW test since the

Table 2 .
It is seen that the HSICp test is 10 ∼ 100 times more time-consuming than the NEW test although here the number of permutations is only 200 while the NEW test is only about 1 ∼ 10 times more time-consuming than the HSICg test for the sample size n = 30, 50, 100, 200 and 500.

Table 3 :
Empirical sizes (in %) of the HSICp, HSICg, and NEW tests in Simulation 2.

Table 4 :
Empirical powers (in %) of the HSICp, HSICg, and NEW tests in Simulation 2.

Table 7 :
p-values of the Pearson, dnm, FPCA, gtemp, and NEW tests for testing the independence between the underlying temperature curves and the underlying precipitation curves.

Table 8 :
p-values of the HSICp, HSICg, and NEW tests for testing the independence between the two groups of the colon data.

Table 9 :
Empirical sizes (in %) of the HSICp, HSICg, and NEW tests obtained from the small scale simulation study.