Suppose we have a graphical model with sample observations of only a subset of the variables. Can we separate the extra correlations induced due to marginalization over the unobserved, hidden variables from the structure among the observed variables? In other words is it still possible to consistently perform model selection despite the unobserved, latent variables? As we shall see the key problem that arises (in Gaussian models) is one of decomposing the concentration matrix of the observed variables into a sparse matrix (representing graphical model structure among the observed variables) and a low rank matrix (representing the effects of marginalization over the hidden variables). Such a decomposition can be accomplished by an estimator that is given by a tractable convex program. This estimator performs consistent model selection in the high-dimensional scaling regime in which the number of observed/hidden variables grows with the number of samples of the observed variables. The geometric aspects of our approach are highlighted, with the algebraic varieties of sparse matrices and of low rank matrices playing an important role. We also demonstrate the empirical utility of our approach in analyzing the effects of recent drought conditions on California reservoir levels.
Joint work with Armeen Taeb (Caltech); John Reager, Michael Turmon (JPL); Pablo Parrilo, Alan Willsky (MIT)