# In class we saw that Neyman provides us with formulas for the optimal # sample size, and their allocation to the various strata. # The following contains Neyman's formulas. Ignore the details, for now. n = ((N1*sigma1 + N2*sigma2)^2) / ( (N*B/1.96)^2 + ( N1*sigma1^2 + N2*sigma2^2) ) n = round(n) # Just round it to an integer. n # This is the minimum sample size required # to have a margin of error of 3% (i.e., B). # # The following is the *allocation* of n to the 2 strata, according to Neyman: n1 = (n*N1*sigma1)/( (N1*sigma1 + N2*sigma2 ) ) n2 = (n*N2*sigma2)/( (N1*sigma1 + N2*sigma2 ) ) n1 = round(n1) n2 = round(n2) n1 n2 # Naively, one may have thought that n1 should be half of n2, because # N1 is half of N2. But that is not the case. Why? Under what condition # would it be? ###################################################################### # So far, we have no data at all; we have been only setting the # parameters of the problem. Now, let's make the population data. # Remember: What we are doing here is a *simulation*, which means # that we create a population whose mean we know! Then we try to # see if the theory (i.e., Neyman's formulas) give us the answer # we expect. set.seed(123) # so that we all get same answers. stratum1 = rnorm(N1,mu1,sigma1) # Populate stratum 1. stratum2 = rnorm(N2,mu2,sigma2) # and stratum 2, population = c(stratum1,stratum2) # and combine the two. pop.mean = mean(population) # Here is the true pop mean pop.mean # which we want to estimate. hist(stratum1,breaks=100) # Just look at the histograms. hist(stratum2,breaks=100) hist(population,breaks=100) # Note the two humps/strata. # So, like I always say, histogram! In this case, you would know that the # pop is stratified, i.e. has clusters/strata in it. So, you've got to # do either cluster or stratified sampling. Here, we'll do the latter. ##################################################################### # Let's start! First, let's take 1000 simple random samples (SRS) from the # population, and see how much their sample means vary: xbar.srs = numeric(1000) # Space for sample means. for(i in 1:1000){ sample1 = sample(population, n, replace=F) # SRS is w/o replacement. xbar.srs[i] = mean(sample1) } hist(xbar.srs,breaks=100) # Hist of the sample *means*. mean(xbar.srs) # This is the mean of *sample means* from SRS; # Note that it's close to pop.mean, as expected. sd(xbar.srs) # This is the *std dev* of the *sample means* from SRS, # Make a note of it : 0.02556108 #####################################################################