Studying Statistics at the UW

Studying Statistics at the UW

Thinking about becoming a statistician?

Wondering what it is a statistician even does?

Curious what you might end up studying as a graduate student in the field of statistics?

Statistics is everywhere, as witnessed by the interesting and exciting research projects underway at the University of Washington Statistics Department. Below you'll find some of the topics that graduate students and faculty are currently working on. Areas of work include mathematical statistics, environmental statistics, statistical genetics, social sciences, finance, and artificial intelligence. If you are preparing for undergraduate studies look at our ACMS Program. If you are thinking of graduate school take a look at some of the research projects described below, at our faculty home pages, and at the current research opportunities for our graduate students.

The NSF VIGRE program supports undergraduate and graduate students, postdocs who want to pursue carreers in statistics and mathematical sciences in creating an exciting research and learning environment.

Finally, read on to learn about what life could be like once you have your degree. Graduates from the Statistics Department at UW work in industry and academia in a variety of fields including business, physics, medicine, ecology, public policy, earth science, and biology.


Spectral clustering image

Spectral clustering in graphs

Marina Meila

Spectral clustering aims to group data based on their similarity. Spectral methods discover the grouping by examining the eigenvalues and eigenvectors of a matrix formed from the pairwise similarities between the data points. It has been shown that there is a strong connection between similarity matrices and transition matrices of Markov chains. The research problems that arise are multiple, and all extremely interesting: what yet unused information is there in the eigenvectors? (We have reasons to believe that the current algorithms do not use all existent information.) How can we use this information to derive better clustering algorithms? How can we find the number of clusters? Can we make the algorithms be efficient for very large data sets? Can we use machine learning to obtain a good similarity function? How can we taylor spectral methods to specific applications like text clustering, analysis of biological sequences, image analysis, the study of web communities?


Image Analysis of Gene Expression Data

Raphael Gottardo, Julian Besag, Matthew Stephens, Alejandro Murua

DNA microarrays are an increasingly important tool that allow biologists to gain insight into the function of thousands of genes in a single experiment. Image analysis is critical in interpreting the results of these experiments. Statisticians analyze images like the one at right and work on problems such as identifying the spot locations, classifying pixels as signal or background, and for each spot, estimating signal intensity and background intensity pairs.


Weather Map

Visualizing Uncertainty in Weather Prediction

Veronica Berrocal, J. McLean Sloughter, Michael Polakowski, Tilmann Gneiting, Adrian E. Raftery

Current methods of weather prediction produce forecasts with unknown levels of uncertainty. Statisticians are working with atmospheric scientists and physicists to develop methods for estimating the uncertainty in weather predictions. They also work with psychologists to create tools for visualizing the uncertainty in these predictions and develop images like the one at left.


Estimating the Ages of Whales

Judy Zeh

Alaskan researchers hold a 13-foot baleen plate from a bowhead whale. UW research professor Judy Zeh, Statistics, and graduate student Susan Lubetkin, Quantitative Ecology and Resource Management, are developing statistical models for estimating ages of the whales from measurements of stable isotope ratios in the baleen.

13 foot baleen whale

regression

A Semiparametric Regression Model for Panel Count Data

Jon A. Wellner, Ying Zhang, and Hao Liu.

For a counting process with mean function conditional on a vector of covariates, we study the panel count data from the process. Our goal is to estimate the baseline mean function, and the vector of regression parameters.


Forecasting Wind Energy

Tilmann Gneiting

Statisticians working with the 3TIER Environmental Forecast Group Inc. are developing next-generation computer algorithms for forecasting the energy produced at large wind farms, like the one pictured at right. Current goals of the project focus on short-range wind forecasting, a critical issue for both system operators and energy power marketers. More accurate wind forecasts translate into higher system reliability and lower costs. Later research will focus on improving wind forecasts at longer lead times, an issue that will be of increasing importance as wind energy production levels increase in future years, especially in regions with significant hydropower assets.

windpic

graphical markov models

Graphical Markov Models in Multivariate Analysis

Michael Perlman, Thomas Richardson, Sanjay Chaudhuri, Mathias Drton

A central aspect of statistical science is the assessment of dependence among stochastic variables. The familiar concepts of correlation, regression, and prediction are special cases, and identification of causal relationships ultimately rests on representations of multivariate dependence. Graphical Markov models (GMM) use graphs, either undirected, directed, or mixed, to represent multivariate dependences in a visual and computationally efficient manner. A GMM is usually constructed by specifying local dependences for each variable, equivalently, node of the graph in terms of its immediate neighbors and/or parents by means of undirected and/or directed edges. This simple local specification can represent a highly varied and complex system of multivariate dependences by means of the global structure of the graph, thereby obtaining efficiency in modeling, inference, and probabilistic calculations. For a fixed graph, equivalently model, the classical methods of statistical inference may be utilized. In many applied domains, however, such as expert systems for medical diagnosis or weather forecasting, or the analysis of gene-expression data, the graph is unknown and is itself the first goal of the analysis.


Modeling HIV and STDs in Drug User and Sexual Networks

Mark Handcock

Infectious diseases are distinguished from other diseases by being transmissible. Our understanding of disease transmission, and the preventive strategies that arise from such understanding, are therefore rooted in an implicit or explicit theory of population transmission dynamics. For infectious diseases like STDs and BBIs, that are only transmitted through the exchange of bodily fluids, the structure of the transmission network plays a particularly critical role. The epidemiology of these diseases - how quickly they spread and who gets infected - is driven by the network of person-to-person contact. Mathematical models of this process have provided a number of insights that have led to changes in STD control strategies. With the advent of HIV, however, new modeling challenges have emerged. In this research we develop new models for drug user and sexual networks as a means to understand the factors that influence the spread of HIV and other STDs.

network

fractal

Separating Fractal Dimensions and Hurst Effect

Tilmann Gneiting

Fractal behavior and long-range dependence have been observed in a large number of physical, biological, geological, and socio-economic systems. Time series, profiles, and surfaces have been characterized by their fractal dimension, a measure of roughness, and by the Hurst coefficient, a measure of long-memory dependence. For self-similar processes, a linear relationship between fractal dimension and Hurst coefficient links local and global behavior.  However, there are stochastic models that separate fractal dimension and Hurst Effect. In this display, the fractal dimension varies from left to right (D = 2.75, 2.5, 2) but is constant along columns. The Hurst coefficient varies from top to bottom (H = 0.9875, 0.9, 0.55) but is constant along rows. The images are simulated realizations of stationary Gaussian random fields, and were generated by the contributed package RandomFields within the R environment.


Center for Statistics and the Social Sciences

There are a number of exciting research projects underway at the University of Washington Center for Statistics and the Social Sciences. Topics of some current projects include:

  • Marriage and Assortative Mating
  • Hybrid Population-Average and Individual-Specific Models for Clustered Longitudinal Data
  • Model-Based Clustering Methods for Medical Images
  • The Dimensions of Supreme Court Decision-Making, 1946-2000
CSSS Logo

Where to go from here?

Statisticians work in a wide range of fields, applying their analytical skills to cutting-edge problems across many disciplines. The UW Statistics alumni page contains a list of over 100 former graduate students who have provided us with information. Also, take a look at some of the things that UW Statistics alumni are working on right now:

Academia
Moulinath Banerjee, Assistant Professor in the Department of Statistics at the University of Michigan.
Samantha Bates, Assistant Professor in the Department of Statistics at Virginia Tech.
Sandra Catlin, Associate Professor in th Department of Mathematical Sciences at University of Nevada.
Biology
Abhijit Dasgupta works in the Division of Cancer Epidemiology and Genetics at the National Cancer Institute.
Robert Gentleman has worked on the Bioconductor project.
Hongzhe Li is Associate Professor of Statistics and Human Genetics at the University of California Davis School of Medicine.
Business
David Poole works in the Statistics Group at AT&T Labs.
Kristian Windfeld is a Senior researcher at Novo Nordisk.
Jeremy York is with Amazon.com
Consulting
Patrick Burns founded the Burns Compnay.
Chris Pounds is with the consulting company ZS Associates.
Earth Sciences
Enrica Bellone works on the Geophysical Statistics Project.
Ecology
David Higdon, works on statistical problems in ecology, among many other things, at Los Alamos National Laboratory.
Jennifer Hoeting examines ecological isssues in her research at the Department of Statistics at Colorado State University. 
She writes, "Statistics is a great career because you  learn about other fields of science while doing statistics. In my research I've learned a great deal about diverse areas of study, particularly ecology. For example, I've learned about monitoring rivers and streams for pollution, monitoring of rare species, the biology of chronic wasting disease in deer, and how dams impact sandbars in river ecosytems. It is rewarding to develop new methods to help scientists study these challenging problems."
Medicine
Hannah Payne is a statistician with Novartis Pharmaceuticals.
Ping Gao is Director of Biostatistics at The Medicines Company.
Brandon Whitcher works in the Research Statistics Unit at GlaxoSmithKline.
Mike Kahn works with Cancer Center Statistics at the Mayo Clinic.
Physics
Don Percival is a Principal Mathematician at the Applied Physics Laboratory at University of Washington.
Public Policy
Gregory Ridgeway is a statistician in the RAND Statistics Group.
Statistical Software
Robert Gentleman has worked on the R project.
Derek Stanford works for the Insightful Corporation.