EEB

Building Map

Can machine learning survive the artificial intelligence revolution?

Time
Speaker
Francis Bach

Data and algorithms are ubiquitous in all scientific, industrial and personal domains. Data now come in multiple forms (text, image, video, web, sensors, etc.), are massive, and require more and more complex processing beyond their mere indexation or the computation of simple statistics, such as recognizing objects in images or translating texts.

Building
Room
105

Interpretable Prediction Models for Network-Linked Data

Time
Speaker
Liza Levina

Prediction problems typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects’ social networks is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account should allow us to improve prediction.

Building
Room
125

Applied Random Matrix Theory

Time
Speaker
Joel Tropp

Random matrices now play a role in many areas of theoretical, applied, and computational mathematics. Therefore, it is desirable to have tools for studying random matrices that are flexible, easy to use, and powerful. Over the last fifteen years, researchers have developed a remarkable family of results, called matrix concentration inequalities, that balance these criteria.

Building
Room
125

Some Algorithmic Challenges in Statistics: Convexity, Non-Convexity, and Depth

Time
Speaker
Sham Kakade

Faculty Host: Carlos Guestrin Stat Liason: Emily Fox Abstract: Society is witnessing remarkable technological and scientific advances as numerous disciplines are adopting more advanced statistical and computational methodologies. Along with this progress comes an increasing need for scalable algorithms with solid theoretical foundations; the hope is that algorithms which address efficiency (with regards to both statistical and computational perspectives) can further facilitate breakthroughs.

Building
Room
105

Functional Quantitative Genetics and the Missing Heritability Problem

Time
Speaker
Serge Sverdlov

In classical quantitative genetics, the correlation between the phenotypes of individuals with unknown genotypes and a known pedigree relationship is expressed in terms of probabilities of IBD states. In existing models of the inverse problem where genotypes are observed but pedigree relationships are not, probabilities and correlations have either a Bayesian or a hybrid interpretation. We introduce a generative evolutionary model of the inverse problem based on the classic infinite allele mutation process, IBF (Identity by Function).

Building
Room
042

Bayesian Structured Sparsity for Genetic Association Mapping

Time
Speaker
Barbara Engelhardt

In genomic sciences, the amount of data has grown faster than statistical methodologies necessary to analyze those data. Furthermore, the complex underlying structure of these data means that simple, unstructured statistical models do not perform well. We consider the problem of identifying multiple, functionally independent, co-localized genetic regulators of gene transcription. Sparse regression techniques have been critical to multi-SNP association mapping because of their computational tractability in large data settings.

Building
Room
037

Feature allocations, probability functions, and paintboxes

Time
Speaker
Tamara Broderick

Clustering involves placing entities into mutually exclusive categories. We wish to relax the requirement of mutual exclusivity, allowing objects to belong simultaneously to multiple classes, a formulation that we refer to as "feature allocation." The first step is a theoretical one. In the case of clustering the class of probability distributions over exchangeable partitions of a dataset has been characterized (via exchangeable partition probability functions and the Kingman paintbox).

Building
Room
105

Adapting group sequential methods to observational drug and vaccine safety surveillance studies using large electronic healthcare data

Time
Speaker
Jennifer Nelson

Gaps in medical product safety evidence have spurred the development of new national post-licensure systems that prospectively monitor large observational cohorts of health plan enrollees. These multi-site systems, which include CDC’s Vaccine Safety Datalink (VSD) and FDA’s Mini-Sentinel (MS) Pilot Program for the Sentinel Initiative, attempt to leverage the vast amount of administrative and clinical information that is captured during the course of routine medical care and contained within computerized health plan databases.

Building
Room
037