University of Washington - Department of Statistics
Advisor: Vladimir Minin
Changes in population size influence genetic diversity of the population and, as a result, leave imprints in genomes of individuals in the population. We are interested in an inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that times at which genealogical lineages coalesce contain all information about population size dynamics. Viewing these coalescent times as a point process, estimation of population size trajectories is equivalent to estimating a conditional intensity of the coalescent point process. We build upon recent work by Adams, Murray and MacKay (2009), who devise a clever data augmentation MCMC algorithm to accomplish Bayesian nonparametric inference of inhomogeneous Poisson process intensity function using a sigmoidal Gaussian process prior. We demonstrate how to extend Adams et al., (2009)â€™s data augmentation to nonparametric estimation of population size dynamics under the coalescent. Similar to Adams et al., (2009)â€™s MCMC algorithm, our algorithm is exact in a sense that it does not require finite-dimensional approximations. We validate our method using simulated and real data and compare our approach to competing Gaussian Markov random field smoothing and change-point model methods.