Advisors - Adrian Raftery and Ka Yee Yeung (UW Tacoma)
Abstract - The recent explosion in the availability of gene expression data has opened up new possibilities in advancing our understanding of the fundamental processes of life. To keep up with the increasing size of the datasets, new models and inference methods must be developed that can handle tens of thousands of genes effi ciently, provide a systematic framework for the integration of multiple data sources, and yield robust, accurate, and compact gene regulatory networks. In this thesis, I develop methods for different types of gene expression data that are able to handle large datasets and produce meaningful inference of relationships between genes. These methods also incorporate information from other data sources in the form of an informative prior when available. These methods are applied to gene expression data from yeast and humans, as well as synthetic benchmark data. I also look at data artifacts present in the LINCS L1000 data and propose a model-based clustering method for addressing these issues and correcting the data and show how the improved data lead to improved subsequent analysis. Finally, I propose a model for inferring network information from steady-state data. I prove some theoretical results about a constrained version of the model and explore results from applying it to synthetic benchmark data from GeneNetWeaver.