Dec 4

11:30 am

## Inference for High-Dimensional Instrumental Variables Regression

### David Gold

General Exam

This work, conducted with Professors Johannes Lederer and Jing Tao, concerns inference for low-dimensional components of a high-dimensional regression parameter under endogeneity of the respective regressors. Endogeneity, which denotes the failure of a moment orthogonality condition between the regressors of interest and a noise term, can occur due to unmeasured confounders, errors in measurement, or the interplay of mutually influential processes that each exhibit random variation. The linear instrumental variables (IV) model, which treats endogeneity, has found wide use in economics and the social sciences. The linear IV model stipulates that the endogenous regressors can be modeled in terms of a further set of instrumental variables. Classical results from the 20th century establish root-n consistent estimation in settings in which the number of endogenous variables is fixed and the number of instrumental variables is either fixed or grows slowly with the number of observations. Recent work has treated the case in which the number of instrumental variables can be high-dimensional but the number of endogenous variables remains fixed.

Our primary contribution is to develop a method for inference when both the endogenous and instrumental variables are high-dimensional. We propose a one-step update to a generic second-stage estimator and demonstrate that the updated estimator decomposes into a main term linear in the noise and four remainder terms. We show that if the remainder terms vanish in probability, the updated estimator provides an asymptotic pivot for components of the high-dimensional second-stage regression vector. We then study a two-stage Lasso estimation routine and obtain specific conditions under which the remainder terms vanish in probability. We complement our asymptotic theory with numerical studies, which demonstrate the relevance of our method in finite samples.