**Cluster Sampling **

**Problem 3.**

An oil company has gas stations in 56 cities and many stations in each city. A company official wants to estimate the average cleaniness rating (from 1 to 7) of the bathrooms of these gas stations. A two-stage cluster sampling is desirable in this case due to travel costs. An inspector randomly samples half of these gas stations from 6 randomly sampled cities. The total number of gas stations of the company is 1200. The data is given below:

City | M_{i} |
m_{i} |
Cleaniness rating of bathroom |

1 | 20 | 10 | 5, 7, 6, 5, 4, 7, 6, 5, 6, 6 |

2 | 21 | 10 | 7, 7, 7, 6, 5, 4, 7, 5, 7, 6 |

3 | 28 | 14 | 4, 6, 5, 6, 4, 5, 6, 5, 4, 5, 4, 6, 5, 6 |

4 | 27 | 14 | 6, 5, 7, 6, 7, 6, 5, 7, 5, 7, 6, 5, 7, 5 |

5 | 18 | 9 | 4, 5, 4, 6, 5, 6, 4, 4, 4 |

6 | 24 | 12 | 5, 7, 6, 4, 4, 3, 2, 5, 5, 6, 4, 3 |

- Estimate the mean cleaniness rating by the unbiased estimator. Also estimate the variance of the unbiased estimator.
- Check whether the ratio estimator is appropriate. Proceed with using the ratio estimator to estimate the mean. Also estimate the variance of the ratio estimator.
- For this problem, will you use the unbiased estimator or the ratio estimator?

**Problem 4.**

There are 40 plots in a large nursery. The plots vary a lot in size. There are a total of 1800 seedlings. The height of seedlings in each plot are fairly constant but they may vary considerably from plot to plot. A two-stage cluster sampling using probability proportional to size for the primary unit is carried out. The results are listed in the table below. Estimate the population mean using p.p.s. estimator and estimate the variance of that estimator.

Plot | Number
of seedlings |
Number of seedlings
sampled |
Heights of seedlings (in inches) |

1 | 50 | 5 | 12, 13, 11, 12, 12 |

2 | 70 | 7 | 8, 8, 9, 7, 9, 7,9 |

3 | 30 | 3 | 15, 14, 13 |

4 | 30 | 3 | 9, 10, 10 |

5 | 60 | 6 | 13, 13, 12, 12, 12, 11 |

**Problem 5**

A social worker wants to estimate the total number of retired people residing in a small city. He blocked the city into 300 blocks and randomly sampled 5 blocks. From each sampled block, he randomly sampled 4 households. The data are given in the table below:

Block | Number
ofhouseholds |
Number of households
sampled |
Number of retired residents per household |

1 | 18 | 4 | 0,0,0,0 |

2 | 15 | 4 | 0,0,1,0 |

3 | 12 | 4 | 1,1,2,1 |

4 | 10 | 4 | 2,2,2,1 |

5 | 16 | 4 | 1,0,1,0 |

- Estimate the total number of retired residents in the small city by the unbiased estimator. Also estimate the variance of the unbiased estimator.
- Check whether the ratio estimator is appropriate. Proceed with using the ratio estimator to estimate the population total. Also estimate the variance of the ratio estimator.
- For this problem, will you use the unbiased estimator or the ratio estimator?

**Solution**** **

**3.a.**

In this stage the two stage cluster sampling is already done ,the 1st stage clusters are the cities and the samples ofvillages taken from them is the 2 nd stage sampling ,that is given to you ,to in order to calculate the mean you need to do this:

>y=c(5,7,6,5,4,….,6,4,3) #right all the ratings given one by one

>y1=(5, 7, 6,…,6)#loading the ratings for first city do this by hand typing .

>y2=c()#similarly loading the observation for the second city and do it for all the 12 cititesi.e y3,y4,..,y6 by hand typing

>a=c(var(y1),var(y2),…,var(y6))..#while typing write all from y1 to y6

>b=c(mean(y1),mean(y2),…,mean(y6))

>M=c(20,21,28,27,..,24)#load the M values one by one

>m=c(10,10,14,…,12)#load the m values one by one

>k=M*b

>est=56/1200*(sum(k)/6)gives the estimate of polpulation mean

>var=(56*51/4*var(k)+(56/6)*sum(M*(M-m)*(m/m-1)*a))/(1440000)

**3.b.**

r=(sum(k)/sum(M))

V1=56/6*(50)*(1/5)*sum((l-(sum(k)/sum(m))*M)^2)+56/6*sum(M*(M-m)*(v/m-1))#estimator of the variance of the ratio eatimator

**3c.**

this is a simple problem if the variance in the 1a is greater than 1b ,then take ratio estimator and vice versa. So in this problem You have to type the r codes given above in computer and the problem will be done.

**4..a**

The P.P.S. unbiased estimator is the mean of the sample means of the sample cluster choosenabove

y1=c(12, 13, 11, 12, 12)

y2=c(8, 8, 9, 7, 9, 7,9)

y3=c(15, 14, 13)

y4=c(9, 10, 10)

y5=c(13, 13, 12, 12, 12, 11) #loading the observation by hand

y=c(mean(y1),mean(y2),mean(y3),mean(y4),mean(y5))

PPS=mean(c(mean(y1),mean(y2),mean(y3),mean(y4),mean(y5)))#gives the PPS estimator

For the estimate of the varince of the pps estimator

var=(1/5)*(1/4)*sum((y-PPS)^2) # gives the estimate of the variance of the PPS estimator

**5.**

In order to get the answers run the following codes

**Qu.a**

>y=c(0,0,0,0,0,0,1,0,1,1,2,1,2,2,2,1,1,0,1,0) #right all the data given one by one

In order to get estimate of the variance of the unbiased estimator you will have run this code

>y1=(0,0,0,0)#loading the ratings for first block do this by hand typing .

>y2=c(0,0,1,0)#similarly loading the observation for the second block

>y3=c(1,1,2,1)

>y4=c(2,2,2,1)

>y5=c(1,0,1,0)

>l=c(mean(y1),mean(y2),mean(y3),mean(y4),mean(y5))

>v=c(var(y1),var(y2),var(y3),var(y4),var(y5)))

>M=c(18,15,12,10,16)

>c=300*(300-5)*(sum(M*l)/4) the value of the estimator

>m=c(rep(4,5))

>k=l*M

>c=5/4*var(k)+60*sum(M*(M-m)*(m/m-1)*v) #gives you the estimate of the variance

**Qu.b**

For the ratio estimator

r=(sum(k)/sum(M))*300*mean(M)

V1=60*(295)*(1/4)*sum((l-(sum(k)/sum(M))*M)^2)+60*sum(M*(M-m)*(v/m-1))#estimator of the variance of the ratio eatimator

**Qu.c**

again compare the 2 variance ,and follow the one with the smaller one

**Qu.2.**

The estimator of the population mean

>y1=c(7,5,3)

>y2=c(4,2,3)

>mean(c(mean(y1),mean(y2))) #gives the estimate of poluplation mean

>j=c(mean(y1),mean(y2))

>sb=2*var(j)

>sw=3/4*(var(y1)+var(y2))

>var=1/2*(1/3-1/6)*sw+(1/2-1/10)*sb #gives the estimate of variance