Seminar Details

Seminar Details


Aug 6

1:00 pm

Covariance Estimation in the Presence of Diverse Types of Data

Xiaoyue Niu

Final Exam

University of Washington - Department of Statistics

Advisor: Peter Hoff

Multivariate analysis often involves statistical models for the covariance matrix of random variables. Estimating the covariance matrix enables us to study the associations among random variables and provides standard error estimates to construct confidence regions. Most of the existing multivariate methods are for homogenous normal populations. However, multivariate data usually contain non-normal measurements of diverse types, including continuous, ordinal, and non-ordered categorical. In this dissertation, we discuss theories and methods of estimating the covariance matrix in the presence of diverse types of data, with two main deviations from the normal situation,

1. the marginal distributions of the multivariate data are not normal;

2. the population is heterogeneous due to some explanatory variables x.

In the first situation, we discuss the idea of copula models for estimating the association parameters for multivariate data. We mainly concentrate on the rank likelihood method proposed by Hoff (2007) and investigate its asymptotic properties. We compare the asymptotic results with other rank-based estimators for the bivariate Gaussian copula model.

In the second case, we propose a covariance regression model for the heterogeneous population, and describe the covariance matrix of continuous variables as a function of other variables, such as categorical variables. The model we propose is a parsimonious model which can be considered as a natural analogy to linear regression for the mean. We present a geometric interpretation of the model and both the maximum likelihood and the Bayesian method for the parameter estimation. We demonstrate the application of the model using a very simple example with two response variables and one continuous explanatory variables. We apply the covariance regression model to a large health dataset with four continuous response variables and four categorical variables. We discuss in detail several practical issues when fitting the covariance regression model, such as model selection, interpreting the coefficients, presenting the fitted results, and model misspecification.