Harvard University - Assistant Professor of Biostatistics
Joint work with Elizaveta Levina, George Michailidis, and Ji Zhu
Gaussian graphical models explore dependence relationships between random variables, through estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method which jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its superior performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is also included. The proposed method is extended to characterize the heterogeneous dependence structures arising from categorical data. We apply the extended method to describe the internal network of the US Senate on several important issues, and show that there is individual structure for each issue as well as the underlying well-known bipartisan structure common to all categories.