Dr. Pei Wang
Fred Hutchinson Cancer Research Center - Associate Member, Biostatistics and Biomath Program, PHS
Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector of a p-dimensional random vector X is mu0. This problem is complicated by the facts that the sample sizes are often small and there are substantial missing data in proteomic studies. To tackle these challenges, we first propose a regularized Hotelling\'s T2 (RHT) statistic together with a non-parametric testing procedure, which effectively controls the type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the RHT statistic under pertinent assumptions and compare the test performance with other existing methods through simulations and real data examples. In the second part of this talk, we further propose to employ regularization in EM algorithm to more accurately estimate the mean vector and covariance matrix when data are missing at random and when data are missing not at random.