Seminar Details

Seminar Details


Nov 30

3:30 pm

A Semiparametric Method for Two Phase Studies

Nilanjan Chatterjee


University of Washington

Multi-phase designs with judicious stratification at various stages of sampling can yield efficient estimates of population parameters while minimizing the costs of data collection. Consider fitting regression models to data arising from two-phase designs for missing or mismeasured data problems, where certain covariates (X) are ascertained only for units selected in the second (validation) phase of sampling, while the response (Y) and some other covariates (Z) are ascertained for all units. Using a "psuedo-score" function for the regression coefficients, which depends on the observed conditional distribution function of X given Z in phase II data, we construct semiparametric estimators for both discrete and continuous outcomes which are generally much simpler to implement than some recently developed "semiparametric efficient" procedures. Extensive simulation studies show that it clearly outperforms all other "inefficient procedures" proposed in the recent literature and achieves efficiency comparable to that of semiparametric maximum likelihood. These results are confirmed by illustrative analysis of data from third and fourth studies of the National Wilms Tumor Group. The new method will be of practical use for obtaining highly efficient estimates in situations where semiparametric efficient estimates are hard to compute.