Sample Size and Statistical theory - Stratified Sampling
Sample Size and Statistical theory - Stratified Sampling
In stratified sampling, the population is divided into subgroups or strata and a sample is taken from each. Stratified sampling is worthwhile when one or both of the following are true:
1. The population standard deviation differs by strata.
2. The interview cost differs by strata.
Suppose we desired to estimate the usage of electricity to heat swimming pools. The population of swimming pools might be stratified into
commercial pools at hotels and clubs and individual home swimming pools. The latter may have a small variation and thus would require a smaller sample. If, however, the home-pool owners were less costly to interview, that would allow more of them to be interviewed than if the two groups involved the same interview cost.
How does one determine the best allocation of the sampling budget to the various strata? This classic problem of sampling was solved in 1935 by Jerzy Neyman. His solution is represented by the following formula:
S((TTtcrt/Vcg
where
n = the total sample size
= the proportion of the population in stratum i
= the population standard deviation in stratum i
ct = the cost of one interview in stratum i
s( = the sum over all strata
= the sample size for stratum i
Figure 12-7 presents information on a survey of the monthly usage of bank teller machines. The population is stratified by income. The high-income segment has both the highest variation and the highest interview cost. The low- and medium-income strata have the same interview cost but differ with respect to the standard deviation of bank teller usage. The column at the right shows the breakdown of the 1000-person sample into the three strata. Note that the high-income stratum is allocated 235 people. If a simple random sample of size 1000 had been taken from the population, around 200 would have been taken from the high-income group, since 20 percent of the population is from that stratum.
The formula shows how to allocate the sample size to the various strata; however, how does one determine the sample size in the first placeI One approach is to assume that there is a budget limit. The sample size is simply adjusted upward until it hits the budget limit. The budget should be figured as follows:
Budget = St ct nl
The second approach is to determine the sampling error and decide whether or not it is excessive. If so, the sample size would be increased. The sampling error formula is:
Sampling error = zo^
and is based on the standard error of X, which is found as follows:
Standard error of X or ux = V2f IT/2 o-j2/ni
It is based on the variances of the individual strata. In the example given in Figure 12-7
o* = .07
As was illustrated in the last chapter, the estimate of the population mean under stratified sampling is a weighted average of the sample means found in each stratum sample:
Estimate of the population mean = 2( rnl X,
where
X( = the sample mean for stratum i
In stratified sampling, the population is divided into subgroups or strata and a sample is taken from each. Stratified sampling is worthwhile when one or both of the following are true:
1. The population standard deviation differs by strata.
2. The interview cost differs by strata.
Suppose we desired to estimate the usage of electricity to heat swimming pools. The population of swimming pools might be stratified into
commercial pools at hotels and clubs and individual home swimming pools. The latter may have a small variation and thus would require a smaller sample. If, however, the home-pool owners were less costly to interview, that would allow more of them to be interviewed than if the two groups involved the same interview cost.
How does one determine the best allocation of the sampling budget to the various strata? This classic problem of sampling was solved in 1935 by Jerzy Neyman. His solution is represented by the following formula:
S((TTtcrt/Vcg
where
n = the total sample size
= the proportion of the population in stratum i
= the population standard deviation in stratum i
ct = the cost of one interview in stratum i
s( = the sum over all strata
= the sample size for stratum i
Figure 12-7 presents information on a survey of the monthly usage of bank teller machines. The population is stratified by income. The high-income segment has both the highest variation and the highest interview cost. The low- and medium-income strata have the same interview cost but differ with respect to the standard deviation of bank teller usage. The column at the right shows the breakdown of the 1000-person sample into the three strata. Note that the high-income stratum is allocated 235 people. If a simple random sample of size 1000 had been taken from the population, around 200 would have been taken from the high-income group, since 20 percent of the population is from that stratum.
The formula shows how to allocate the sample size to the various strata; however, how does one determine the sample size in the first placeI One approach is to assume that there is a budget limit. The sample size is simply adjusted upward until it hits the budget limit. The budget should be figured as follows:
Budget = St ct nl
The second approach is to determine the sampling error and decide whether or not it is excessive. If so, the sample size would be increased. The sampling error formula is:
Sampling error = zo^
and is based on the standard error of X, which is found as follows:
Standard error of X or ux = V2f IT/2 o-j2/ni
It is based on the variances of the individual strata. In the example given in Figure 12-7
o* = .07
As was illustrated in the last chapter, the estimate of the population mean under stratified sampling is a weighted average of the sample means found in each stratum sample:
Estimate of the population mean = 2( rnl X,
where
X( = the sample mean for stratum i
Comments
Post a Comment