Jan 25

3:30 pm

## Respondent-Driven Sampling: Realistic Models and Appropriately Conservative Variance Estimates or RDS for Skeptics

### W. Whipple Neely

Seminar

University of Washington - Department of Statistics

In this talk we examine the problem of analyzing data collected using using Respondent-Driven Sampling (RDS) with particular attention to making conservative variance estimates and dealing with situations in which the basic assumptions of the technique fail. RDS is a sampling design that has been widely applied as a method for studying networked "hidden" and "hard-to-reach" populations (Heckathorn, 1997; Salganik and Heckathorn, 2004; Volzand Heckathorn, 2008; Gile and Handcock, 2010). As a survey methodology, RDS has been widely employed as a means for recruiting study participants from populations at high risk of HIV infection such as sex workers, injecting drug users and men who have sex with men (Johnston et al., 2008). RDS selects survey respondents from hidden populations by employing a chain-referral sampling method that can be highly successful as a means of recruiting respondents from populations that are seemingly intractable to study by conventional techniques. However, the statistical techniques currently being used to analyze data collected through RDS are based on naive reasoning about the relationship between the data and the structure of an unobserved (and essentially unobservable) social network that is presumed to connect all members of the target population. Yet, regardless of the structure or existence of an underlying social network, all of the current RDS statistical theory can be derived from a trio of assumptions: (1) respondents' self-reported personal network size can serve as a proxy for an unknown sample inclusion probability, (2) any dependence between observations is completely explained by a homogeneous first order Markov model, (3) inclusion probabilities are conditionally independent given any outcome of interest (Neely, 2009b). The first of these assumptions is essentially untestable without confirmation by a conventional survey (Heimer, 2005), however the second and third assumptions are directly testable using data routinely collected in RDS surveys. In this talk we show how the standard RDS estimators for population proportions and means are derived under the assumptions (1), (2) and (3); and then show how replacing assumptions (2) and (3) with models that reflect the observed data can lead to improved estimation and more conservative variance estimates. In the process we introduce a class of generalized models that can be used to carry out regression modeling with RDS data, and a sensitivity analysis that can be used to assess the impact of violations of assumption (1) (Neely, 2009a).

This talk can serve as an introduction to the mathematical statistics of the classical RDS estimators and as an introduction to the problem of making conservative inferences based on data collected using RDS.

**References**

Gile, K. J., Handcock, M. S., 2010.

Respondent-Driven Sampling: An Assessment of Current Methodology. Sociological Methodology 40, to appear.

Heckathorn, D. D.,1997.

Respondent Driven Sampling: A New Approach to the Study of HIdden Populations. Social Problems 44 (2), 174-199.

Heimer, R., December 2005.

Critical Issues and Further Questions About Respondent-Driven Sampling: Comment on Ramirez-Valles, et al . (2005). AIDS and Behavior 9 (4), 403-408.

Johnston, L. G., Malekinejad, M., Rifkin, M. R., Rutherford, G. W., Kendall, C.,2008.

Implementation challenges to using respondent-driven sampling methodology for HIV biological and behavioral surveillance: Field experiences in international settings. AIDS and Behavior 12 (Supplement 1), 131-141.

Neely, W. W., 2009a.

Statistical Theory & Respondent-Driven Sampling, (under review).

Neely, W. W., 2009b.

Statistical Theory for Respondent-Driven Sampling. Ph.D. thesis, University of Wisconsin-Madison, Madison, Wisconsin.

Salganik, M. J., Heckathorn, D. D., 2004.

Sampling and Estimation in Hidden Populations using Respondent-Driven Sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D. D., 2008.

Probability Based Estimation Theory for Respondent-Driven Sampling. The Journal of Ocial Statistics 24 (1), 79-97.