In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title and the PCS workflow that is built on the three principles. The principles will be demonstrated in the context of two collaborative projects in neuroscience and genomics for interpretable data results and testable hypothesis generation. If time allows, I will present proposed PCS inference that includes perturbation intervals and PCS hypothesis testing. The PCS inference uses prediction screening and takes into account both data and model perturbations.
In a seminal paper, Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time.
Note 2/7/2018: We are canceling this seminar as a precaution in anticipation of the expected Winter storm.
As the pace and scale of data collection continues to increase across all areas of biology, there is a growing need for effective and principled statistical methods for the analysis of the resulting data. In this talk, I'll describe two ongoing projects to help fill this gap.
A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p-value. The sceptical p-value integrates traditional significance of both the original and replication study with a comparison of the respective effect sizes.
Did you know that your skills in statistics can be applied to ensure natural resources, such as fish, wildlife and even ecosystems, remain resilient into the future? That your love of algebra can take you to wild, remote, and amazing places? That there are careers where you get to collaborate with a wide variety of dedicated scientists working to better understand the world, how it is changing, and what it will be like in the future?
In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or ``normal" behavior. In this talk, I will first discuss a principled Bayesian approach for estimating time varying functional connectivity networks from brain fMRI data. Dynamic functional connectivity, i.e., the study of how interactions among brain regions change dynamically over the course of an fMRI experiment, has recently received wide interest in the neuroimaging literature.
The asymptotics of the second-largest eigenvalue in random regular graphs (also referred to as the "Alon conjecture") have been computed by Joel Friedman in his celebrated 2004 paper. Recently, a new proof of this result has been given by Charles Bordenave, using the non-backtracking operator and the Ihara-Bass formula. In the same spirit, we have been able to translate Bordenave's ideas to bipartite biregular graphs in order to calculate the asymptotical value of the second-largest pair of eigenvalues, and obtained a similar spectral gap result.
Non-Gaussian spatial data arise in a number of disciplines. Examples include spatial data on disease incidences (counts), and satellite images of ice sheets (presence-absence). Spatial generalized linear mixed models (SGLMMs), which build on latent Gaussian processes or Markov random fields, are convenient and flexible models for such data and are used widely in mainstream statistics and other disciplines. For high-dimensional data, SGLMMs present significant computational challenges due to the large number of dependent spatial random effects.