Many traditional statistical prediction methods mainly deal with the problem of overfitting to the given data set. On the other hand, there is a vast literature on the estimation of causal parameters for prediction under interventions. However, both types of estimators can perform poorly when used for prediction on heterogeneous data. We show that the change in loss under certain perturbations (interventions) can be written as a convex penalty. This motivates anchor regression, a “causal” regularization scheme that encourages the estimator to generalize well to perturbed data.
Scientific research is often concerned with questions of cause and effect. For example, does eating processed meat cause certain types of cancer? Ideally, such questions are answered by randomized controlled experiments. However, these experiments can be costly, time-consuming, unethical or impossible to conduct. Hence, often the only available data to answer causal questions is observational.
We discuss several problems related to the challenge of making accurate inferences about a complex phenomenon, given relatively little data. We show that for several fundamental and practically relevant settings, including estimating the intrinsic dimensionality of a high-dimensional distribution, and learning a population of distributions given few data points from each distribution, it is possible to ``denoise'' the empirical distribution significantly.
Fundamental to the study of the inheritance is the partitioning of the total phenotypic variation into genetic and environmental components. Using twin studies, the phenotypic variance-covariance matrix can be parameterized to include an additive genetic effect, shared and non-shared environmental effects. The ratio of the genetic variance component to the total phenotypic variance is the proportion of genetically controlled variation and is termed as the ‘narrow-sense heritability’.
Functional data analysis has been increasingly used in biomedical studies, where the basic unit of measurement is a function, curve, or image. For example, in mobile health (mHealth) studies, wearable sensors collect high-resolution trajectories of physiological and behavioral signals over time. Functional linear regression models are useful tools for quantifying the association between functional covariates and scalar/functional responses, where a popular approach is via functional principal component analysis.
Bayes Shrinkage at GWAS Scale: A Scalable Algorithm for the Horseshoe Prior with Theoretical Guarantees
The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and poor mixing of existing MCMC algorithms.
Since their introduction in statistics through the seminal works of Julian Besag, Gaussian Markov random fields have become central to spatial statistics, with applications in agriculture, epidemiology, geology, image analysis and other areas of environmental science. Specified by a set of conditional distributions, these Markov random fields provide a very rich and flexible class of spatial processes, and their adaptability to fast statistical calculations, including those based on Markov chain Monte Carlo computations, makes them very attractive to statisticians.
This talk presents a variational framework for the asymptotic analysis of empirical risk minimization in general settings. In its most general form the framework concerns a two-stage inference procedure. In the first stage of the procedure, an average loss criterion is used to fit the trajectory of an observed dynamical system with a trajectory of a reference dynamical system. In the second stage of the procedure, a parameter estimate is obtained from the optimal trajectory of the reference system.