May 13

3:30 pm

## Æ-Statistics and t-Statistics

### Gabor Szekely

Seminar

Bowling Green State University - Hungarian Academy of Sciences

Define the potential (or relative) energy of the d-dimensional rv's X and Y with finite expected values as follows:

Æ(X,Y) := 2E || X - Y || - E || X - X' || - E || Y - Y' ||.

We prove Æ(X,Y) >= 0 and = 0 iff X and Y are identically distributed. This result can be applied for testing homogeneity, independence, goodness-of-fit, etc. Empirical versions of Æ(X,Y) will be called Æ-statistics or energy statistics. Æ-tests based on Æ-statistics are not only rotation invariant and consistent against general alternatives but also very powerful. The Æ-test of homogeneity is a natural rotation invariant multivariate version of Cramerâ€™s univariate distribution-free test which is not rotation invariant (and not distribution-free) in higher dimensions. The energy perspective of statistics, the principle of least possible effort = minimizing Æ is very appealing and also powerful in terms of simplicity and effectiveness of statistical decisions. The asymptotic behavior of the Æ-statistic for goodness-of-fit depends on a sequence of possible "energy levels" (eigenvalues) Î» of the stationary Schrodinger equation. In the univariate case this equation is Î¨" - VÎ¨ + Î»Î¨ = 0 with potential energy function V = (logf)"/f - (f^-Â¾)"f^Â¾, where f is the probability density function of the null distribution, and Î¨ denotes an eigenfunction.

In the second part of the lecture generalized t-tests are constructed under weaker than normal conditions. If we assume only symmetry of errors, then an explicit formula is given for the level Î± critical values of the corresponding t(s/n)-test. The tail probabilities are:

t(s/n)(a) = sup(1<=k<=n) Î£(k/j=[(k+a+âˆšk)/2]) [k/j]/2^k

for 0 < a <= âˆšn (and t(s/n)(a) = 0 for a > âˆšn). Assuming symmetry and unimodality of errors, the critical values of the corresponding t^U-test are even closer to the critical values of Student's classical t-test. For scale mixtures of Gaussian errors the critical values simply coincide with Student's t-values.