Applications of statistical machine learning increasingly involve datasets with rich hierarchical, temporal, spatial, or relational structure. Bayesian nonparametric models offer the promise of effective learning from big datasets, but standard inference algorithms often fail in subtle and hard-to-diagnose ways. We explore this issue via variants of a popular and general model family, the hierarchical Dirichlet process. We propose a framework for "memoized" online optimization of variational learning objectives, which achieves computational scalability by processing local batches of data, while simultaneously adapting the global model structure in a coherent fashion. Using this approach, we build improved models of text, image, motion capture, and social network data.