Seminar Details

Seminar Details


Tuesday

Jun 4

11:00 am

Bayesian Nonparametric Inference of Effective Population Size Trajectories from Genomic Data

Julia A. Palacios Roman

Final Exam

Phylodynamics is an area on the intersection of phylogenetics and population genetics that aims to reconstruct population size trajectories from genetic data.
Phylodynamic methods rely on a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. The shape of a genealogy is influenced by the effective population size trajectory and, under the coalescent framework, the times at which genealogical lineages coalesce contain information about population size dynamics. We show that these coalescent times can be viewed as realization of a point process and that estimation of population size trajectories is equivalent to estimating a conditional intensity of the coalescent point process. This thesis presents a Gaussian process-based Bayesian nonparametric approach to estimate effective population size trajectories. First, I summarize and discuss current approaches for estimation in phylodynamics. Next, I demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics when the genealogy is assumed fixed. I compare our Gaussian process (GP) approach to one of the state of the art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Next, I show that when a representative genealogy is available, perhaps estimated using one of the phylogenetic reconstruction methods, we can replace Markov chain Monte Carlo (MCMC) methods to perform inference by integrated nested Laplace approximation (INLA). This approximation, actively used in spatial statistics, results in recovery of population size trajectories that is much faster than current MCMC-based methods. However, the INLA algorithm cannot be generalized to a more realistic setting, where one starts with molecular data instead of a genealogy.
Therefore, I return to MCMC to extend the GP approach to infer population size trajectories from molecular data directly.