Advisor: Vladimir Minin
The field of phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from the population of interest. One way to accomplish this task is to formulate an observed sequence data likelihood by using a coalescent model for the sampled individualsâ€™ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from sequence data. These strategies also work when molecular sequences are sampled serially through time. However, when analyzing serially sampled data, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Using a simulation study, we show that when sampling times do probabilistically dependent on the population size trajectory effective population size estimation methods may be systematically biased. We propose a new model that explicitly takes preferential sampling into account. We demonstrate that in the presence of preferential sampling our new model not only eliminates bias, but also improves estimation precision. Finally, we compare performance of the currently used phylodynamics methods and our proposed model using multiple data sets that consist of molecular sequences of seasonal human influenza.