Seminar Details

Seminar Details


May 13

3:30 pm

Ɛ-Statistics and t-Statistics

Gabor Szekely


Bowling Green State University - Hungarian Academy of Sciences

Define the potential (or relative) energy of the d-dimensional rv's X and Y with finite expected values as follows:

Ɛ(X,Y) := 2E || X - Y || - E || X - X' || - E || Y - Y' ||.

We prove Ɛ(X,Y) >= 0 and = 0 iff X and Y are identically distributed. This result can be applied for testing homogeneity, independence, goodness-of-fit, etc. Empirical versions of Ɛ(X,Y) will be called Ɛ-statistics or energy statistics. Ɛ-tests based on Ɛ-statistics are not only rotation invariant and consistent against general alternatives but also very powerful. The Ɛ-test of homogeneity is a natural rotation invariant multivariate version of Cramer’s univariate distribution-free test which is not rotation invariant (and not distribution-free) in higher dimensions. The energy perspective of statistics, the principle of least possible effort = minimizing Ɛ is very appealing and also powerful in terms of simplicity and effectiveness of statistical decisions. The asymptotic behavior of the Ɛ-statistic for goodness-of-fit depends on a sequence of possible "energy levels" (eigenvalues) λ of the stationary Schrodinger equation. In the univariate case this equation is Ψ" - VΨ + λΨ = 0 with potential energy function V = (logf)"/f - (f^-¾)"f^¾, where f is the probability density function of the null distribution, and Ψ denotes an eigenfunction.

In the second part of the lecture generalized t-tests are constructed under weaker than normal conditions. If we assume only symmetry of errors, then an explicit formula is given for the level α critical values of the corresponding t(s/n)-test. The tail probabilities are:

t(s/n)(a) = sup(1<=k<=n) Σ(k/j=[(k+a+√k)/2]) [k/j]/2^k

for 0 < a <= √n (and t(s/n)(a) = 0 for a > √n). Assuming symmetry and unimodality of errors, the critical values of the corresponding t^U-test are even closer to the critical values of Student's classical t-test. For scale mixtures of Gaussian errors the critical values simply coincide with Student's t-values.