Models of network data have witnessed a surge of interest in statistics and related areas. Such data arise in the study of insurgent and terrorist networks, contact networks facilitating the spread of infectious diseases, social networks, the World Wide Web, and other areas.
A powerful approach to modeling network data are discrete exponential-family random graph models, which are inspired by - and related to - discrete exponential families in physics (e.g., Ising models) and spatial statistics (e.g., Markov random fields). However, in the past decade, some of the most interesting exponential-family random graph models have turned out to possess undesirable properties and statistical inference for them is not meaningful.
The main idea of my talk is that exponential-family random graph models lack structure and should be endowed with additional structure to facilitate statistical inference. As an example, I discuss exponential-family random graph models endowed with additional structure in the form of neighborhood structure which induce local dependence. Exponential-family random graph models with local dependence make sense in applications and have both theoretical and computational advantages.
On the theoretical side, I show that when the neighborhood structure is known, M-estimators of canonical and curved exponential-family random graph with local dependence and growing neighborhoods are consistent under weak conditions. These consistency results are the first consistency results for a wide range of canonical and curved exponential-family random graph models with complex dependence, such as transitivity. In practice, the neighborhood structure is known in some applications (e.g., in multilevel networks), but is unknown in others. If the neighborhood structure is unknown, the first and foremost question is whether it can be recovered with high probability. I show that it is possible to do so as long as data-generating exponential-family random graph model satisfies weak dependence and smoothness conditions.
On the computational side, I discuss a two-step likelihood-based approach for estimating the neighborhood structure. The first step estimates the neighborhood structure by using approximations of the likelihood function. The second step estimates parameters given the estimated neighborhood structure. Both steps can be implemented in parallel and can hence be applied to large networks. I demonstrate the advantages of the two-step likelihood-based approach by simulations and applications to large social networks.