University of Oxford - Department of Statistics
Modern molecular genetics generates extensive data which document the genetic variation in natural populations. Such data give rise to challenging statistical inference problems both for the underlying evolutionary parameters and for the demographic history of the population. These problems are of considerable practical importance and have attracted recent attention, with the development of algorithms based on importance sampling (IS) and Markov chain Monte Carlo (MCMC). We begin our talk by introducing some of the models relevant to the study of molecular population genetic data, and describe perhaps the simplest inference problem, which can be viewed as a "missing data" problem where the dimension of the missing data is huge. We introduce a novel IS scheme for dealing with this missing data, and compare it with existing MCMC and IS schemes on a variety of genetic examples. These comparisons suggest some general insights into the design of efficient methods for inference in problems with high-dimensional missing data.