Syllabus: STAT/BIOST/CSSS 529 -- Spring 2013



Elena Erosheva  

C 14C, Padelford Hall (CSSS)

elena at

Office hour:  Wednesday 2:30-3:30


Teaching Assistant:

Laina Mercer, Statistics

Padelford Hall

mercel at

Office hour: Wednesday 3:30-4:30, Padelford C14-A


  • Lecture: Tuesday and Thursday 10:30 - 11:50, Loew Hall 105
  • Lab: Thursday 3:30-4:30, SAV 121

·         Web: follow the class link from my homepage at

·         Questions by e-mail are welcome. They will often be answered quite quickly, but this is not guaranteed. In particularly, I don't always check e-mail over weekends.

Course description

This is an applied statistical methods course that will cover the statistical design and analysis of complex surveys, with applications in the social and health sciences. In addition to traditional topics in survey analysis we will cover data visualization, regression modeling of data from complex surveys, and the design and analysis of two-phase samples from existing cohorts. 


·         Lumley, T. (2010) Complex Surveys: a guide to analysis using R.

Other recommended texts:

·         Cochran, W. (1977) Sampling Techniques

·         Korn and Graubard (1999) Analysis of Health Surveys

·         Groves et al. (2009) Survey Methodology


I plan to post lecture notes at the class web page the night before each lecture. Please print out your own copies or plan to use electronic tools if you would like to write notes on the same pages.

Quiz sections

Labs/quiz sections will aim to reinforce the material covered in lectures. The labs will include hands-on experience with R, going over some mathematical derivations of important formulae, and providing a forum for discussing homework solutions, assigned articles and additional examples of sample survey design and analysis.


·         After successfully completing this course, students should ordinarily expect to be able to:

    • Define a probability sample and explain its importance in statistics. 
    • Describe situations where a probability sample can lead to greater precision than an attempt at complete enumeration.
    • Distinguish finite-population and superpopulation inference and give examples where each would be appropriate
    • Determine whether a survey design uses a probability sample.
    • Define common features of complex surveys -- strata, clusters, unequal sampling probabilities, -- and explain how they affect the cost of the survey and precision of estimates.
    • Write down the Horvitz--Thompson estimator of the population total and explain to a non-statistician why it gives an unbiased estimate.
    • Compute summary statistics and fit regression models to data from complex surveys using R. Describe these analyses in language suitable for an academic paper in a health sciences or social sciences journal.
    • Explain why assumptions about the distribution of data are not relevant to standard survey inference and what criteria are relevant for choosing summary statistics and models.  
    • Define post-stratification and raking, and explain how they can increase precision.
    • Describe some strategies for mitigating the bias from non-response.
    • Explain the advantages and disadvantages of including sampling weights in a regression model.
    • Describe case-cohort, case-control, two-phase case-control, and coutermatching designs for sampling from a cohort, and how data from these designs can be analyzed.



We will use the survey package in R. The class ‘computing’ web page contains info and links to helpful resources. 


Students must have taken a graduate-level introductory course in applied statistics and have prior experience with statistical computing in R. A regression modeling course is recommended.

Homework assignments and grades

  • Final grades will be based on midterm exam (30%), homework assignments (30%), short project presentations (10%) and a final poster project (30%).
  • I encourage you to work on the homework assignments with each other in small groups. However, each student is required to prepare and submit their own solution and write-up.
  • Aim to resolve all technical questions or problems you might have with running software at least 3 days before an assignment is due.
  • Homework assignments that are not handed in on time will receive zero points (except in cases of documented emergency).
  • Please hand in a hard copy of your homework. I will not accept electronic submissions (except in cases of documented emergency).
  • Please type up your homework assignments using a text editor (equations may be written in by hand, if necessary). Please label all graphs, tables, variables, etc., appropriately. Insert appropriate parts of the output into your write-up.
  • Include the R code you used for an assignment in the appendix at the end of your write up. Unless specifically asked, do not insert raw R code in the body of your write-up.
  • The midterm exam is tentatively scheduled for April 25.
  • Short project presentations are tentatively scheduled for May 9 and May 14.
  • The final projects will be presented in poster sessions during class time on June 4 and June 6.
  • There is no need to get a fancy printed poster, but it is important to design the poster so that it can be read and understood by your target audience. I will invite Stat, Biostat, and CSSS faculty and students to the poster session. 

Students with Disabilities

If you would like to request academic accommodations due to a disability, please contact Disabled Student Services, 448 Schmitz, 543-8924 (V/TTY).  If you have a letter from Disabled Student Services indicating you have a disability that requires academic accommodations, please present the letter to me so we can discuss the accommodations you might need for this class.