We discuss several problems related to the challenge of making accurate inferences about a complex phenomenon, given relatively little data. We show that for several fundamental and practically relevant settings, including estimating the intrinsic dimensionality of a high-dimensional distribution, and learning a population of distributions given few data points from each distribution, it is possible to ``denoise'' the empirical distribution significantly.
Fundamental to the study of the inheritance is the partitioning of the total phenotypic variation into genetic and environmental components. Using twin studies, the phenotypic variance-covariance matrix can be parameterized to include an additive genetic effect, shared and non-shared environmental effects. The ratio of the genetic variance component to the total phenotypic variance is the proportion of genetically controlled variation and is termed as the ‘narrow-sense heritability’.
Functional data analysis has been increasingly used in biomedical studies, where the basic unit of measurement is a function, curve, or image. For example, in mobile health (mHealth) studies, wearable sensors collect high-resolution trajectories of physiological and behavioral signals over time. Functional linear regression models are useful tools for quantifying the association between functional covariates and scalar/functional responses, where a popular approach is via functional principal component analysis.
Bayes Shrinkage at GWAS Scale: A Scalable Algorithm for the Horseshoe Prior with Theoretical Guarantees
The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and poor mixing of existing MCMC algorithms.
Since their introduction in statistics through the seminal works of Julian Besag, Gaussian Markov random fields have become central to spatial statistics, with applications in agriculture, epidemiology, geology, image analysis and other areas of environmental science. Specified by a set of conditional distributions, these Markov random fields provide a very rich and flexible class of spatial processes, and their adaptability to fast statistical calculations, including those based on Markov chain Monte Carlo computations, makes them very attractive to statisticians.
This talk presents a variational framework for the asymptotic analysis of empirical risk minimization in general settings. In its most general form the framework concerns a two-stage inference procedure. In the first stage of the procedure, an average loss criterion is used to fit the trajectory of an observed dynamical system with a trajectory of a reference dynamical system. In the second stage of the procedure, a parameter estimate is obtained from the optimal trajectory of the reference system.
Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this talk, a case is made for the importance of stability in statistics.