Generalized Score Matching for Non-Negative Data

Shiqing Yu

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation (MLE) may be implemented using numerical integration, the approach becomes computationally intensive. In contrast, the score matching method of Hyvärinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions on the m-dimensional Euclidean space R^m. However, the arguments underlying this score matching estimator may fail when the distributions in question are supported on a proper subset of R^m. Hyvärinen (2007) thus extended the approach to distributions supported on the non-negative orthant R_m^+.

In this talk, we give a generalized form of the score matching estimator for non-negative data that improves estimation efficiency. We show that for exponential-family distributions, the generalized loss is still quadratic in the parameter, thus giving a simple closed-form solution in the low-dimensional setting. In high-dimensional settings, we generalize the regularized score matching method of Lin et al. (2016) for non-negative Gaussian graphical models, and fix an issue overlooked in that paper, where the score matching loss may be unbounded from below. In both low- and high-dimensional settings, we theoretically show that this generalized estimator benefits from improved theoretical guarantees. We also show simulation results for high-dimensional non-negative Gaussian graphical models.

At the end of the talk, we discuss a directed graphical model for zero-inflated observations. The model is specified by conditional densities from a multivariate Hurdle model that can be thought of as mixtures of singular Gaussian distributions. The topological ordering can be estimated by a recursive procedure, and edges can be estimated using penalized regressions with group lasso as in McDavid et al. (2016). I will present the current progress in this project, with some simulation results.