University of California, Los Angeles - Department of Biomathematics
Signatures of spatial variation, left by evolutionary processes in genomic sequences, provide important information about the function and structure of genomic regions. I discuss statistical methods for detection of such signatures in a Bayesian framework. I start with phylogenetic analysis of recombination in the HIV genome. I present a recombination detection method that allows accurate estimation of recombination break-points from a molecular sequence alignment. The method simultaneously incorporates discrepancies in phylogenies, caused by recombination, and spatial variation in evolutionary pressure across the alignment using a dual multiple change-point (DMCP) model. Next, I turn to mapping recombination hot-spots in the HIV genome. Based on the DMCP model, I build a hierarchical framework for simultaneous inference of break-point locations and spatial variation in recombination frequency from multiple putative recombinant sequences. The model allows for information about spatial preferences of recombination to be shared among individual datasets. To overcome the sparseness of break-point data, dictated by the modest number of available recombinant sequences, I a priori impose a biologically relevant correlation structure on recombination location log-odds via a Gaussian Markov random field. Applied to HIV sequences from several epidemiological studies, this approach reveals a previously unknown recombination hot-spot. I conclude with posterior predictive model diagnostics for locating spatial patterns of variation in genomic sequences. Evolutionary biologists routinely use the number of a priori labeled mutations as a discrepancy measure for informal model diagnostics. However, such measures remain underutilized in formal statistical tests of evolutionary hypotheses because computing probabilistic properties of evolutionary counting processes in a model-based framework is a formidable task. I take an algorithmic probability approach that allows for an exact and efficient computation of certain properties of evolutionary counting processes. I demonstrate that these properties allow detection of recombination and periodic patterns of mutational rate variation in nucleotide sequence alignments.