stat代写-STAT6086-Assignment 20 -- 2023-01-07
时间:2023-01-08
STAT6086 Sampling Techniques
Assignment 2022-23
You must submit one electronic copy of your report in PDF by 11.59pm on Tuesday 10 th January
2023. You must submit your electronic copy (a single file) via the STAT6093 Blackboard website using
TurnItIn (in the Assignments folder). Your Student ID Number must appears in your electronic copy.
Make sure that your assignment fits in a single PDF document. A scanned handwritten document is not
allowed. Note that the file has to be smaller than 10 MB. If your Word file is larger than this, try
converting it to a readable PDF or save your images (graphs, plots, …) in JPEG (instead of BMP).
It is the policy of the Department of Social Statistics that courseworks should be anonymous, therefore
only your Student ID Number appears in your Word or PDF document. To maintain anonymity please
do not put your name on any part of your submission. You must put your Student ID Number on the
first page of your coursework.
Note that it is not acceptable that you read and gain ideas for your coursework from another student’s
finished work. It is very important that you read carefully the Section “Academic Integrity and
Referencing” from the module outline (available on blackboard).
Make sure that you have 3 sections called Task 1, Task 2, Task 3 and Task 4. Each subsections should
be also clearly labelled: 1a), 1b),...2a), 2b), 2c),...
The maximum number of words is 6000.
Information about coursework submission, penalty for late submission, policy for over-length work,
procedure for coursework extensions, feedback and academic integrity and referencing can be found in
module outline (available on blackboard). It is very important that you read carefully the module
outline.
1/5
ASSIGNMENT
The target population consists of 1653 farms in Australia (file “OzFarm_Frame.xls” on blackboard).
For each farm, you have: (i) ID for each farm, (ii) variable STATE, (iii) variable ZONE (iv) variable
REGION (v) variable INDUSTRY and (vi) variable DSE. The description of these variables are
STATE:
1 New South Wales
2 Victoria
3 Queensland
4 South Australia
5 Western Australia
6 Tasmania
7 Northern Territory
ZONE:
1 Pastoral zone (inland)
2 Wheat-sheep zone (hinterland)
3 High rainfall zone (coastal)
REGION: Subdivision of State x Zone indicating a more homogeneous (in terms of climate, soil type
etc.) farming area within a State. Three digits code, with first digit = state, second digit = zone and third
digit denoting region.
INDUSTRY:
1 crops specialist farm
2 mixed livestock/crops farm
3 sheep farm
4 beef farm
5 sheep-beef farm
DSE: A measure of size of a farm in terms of its productive capacity. DSE stands for "Dry Sheep
Equivalent" and is a linear combination of the reported numbers of sheep and beef cattle and hectares of
crops area reported by the farm at the previous Agricultural Census.
TASK 1: (30%)
For this task, you need to use the size variable DSE to create strata.
1a) Create 4 strata using two different methods:
(i) the Dalenius and Hodges method (with classes of size 5000),
(ii) the cum( ) rule,
where the variable is the variable DSE. For each methods, present the details of you calculation and
any analytic expressions needed.
[10%]
2/5
1b) Suppose you want to select a sample of size . What would be the optimal allocation
(according to the variable DSE) for the 2 methods of stratification (i) and (ii) described in 1a)? Provide
the details of your calculation and any analytic expressions needed. [5%]
1c) Compute the variances of the mean of the variable DSE under the 2 methods of stratification (i) and
(ii), when and under optimal allocation. Provide the formulae and details of your calculation.
Which stratification method would you recommend? [5%]
1d) What would be minimal sample size needed to achieve a coefficient of variation (CV) of 5% for
the mean of the variable DSE, for the stratification obtained with the cum( ) rule and optimal
allocation? What would be the minimal sample size if you use proportional allocation instead of optimal
allocation? [10%]
TASK 2: (40%)
A stratified sample of units has been selected from the 1653 farms in Australia. The
stratification is given by the variable ZONE (3 strata). The sample data can be found in the file
“OzFarm_Sample.xls” on blackboard. You will see that the sample data contains additional variables:
TCC Total Cash Costs of farm over financial year
TCR Total Cash Receipts of farm over financial year (A$)
EQUITY Value of farm assets less farm debt at end of financial year (A$)
DEBT Farm debt at end of financial year (A$)
2a) Which allocation has been used? Explain your answer. [3%]
2b) Estimate the population mean of TCC. Provide the variances estimates and 95% confidence
intervals. Provide the formulae used and the details of you calculation. [5%]
2c) Estimate the population proportion of farms with DEBT < EQUITY. Provide the variances estimates
and 95% confidence intervals. Provide the formulae used and the details of you calculation.
[5%]
2d) Estimate the population mean of TCC using the separate ratio estimator, with DSE as auxiliary
variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and
the details of you calculation. [7%]
2e) Estimate the population mean of TCC using the combine ratio estimator, with DSE as auxiliary
variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and
the details of you calculation. Compare your results with 2d). [10%]
2f) Estimate the population domain mean of TCR for the 5 types of INDUSTRY. Explain why you
should use the combine ratio estimator. Compute the variance estimates and 95% confidence intervals.
Provide the formulae used and the details of you calculation, for the first industry. Your final estimates,
variance estimates and confidence intervals should be given in a table. [10%]
3/5
TASK 3 (15%)
The sample dataset “Labor.xls” contains the following variables:
Cluster: cluster number
Person: person number
age: age of person
agecat: age category
1: 19 years and under
2: 20-24
3: 25-34
4: 35-64
5: 65 years and over
race: 1 for non-black and 2 for black
sex: 1 for male and 2 for female
HourPerWk: usual number of hours worked per week
Wkly Wage: usual amount of weekly wages (in 1976 US$)
We suppose that these sample data have been selected with a two-stage sampling design. For both
stages, simple random sampling has been used. The file “ClusterSize.xls” contains the (population)
sizes of the cluster. We suppose that we have 2 000 000 individuals in the population, and that we have
30 000 clusters in the population. An electronic copy of “Labor.xls” and “ClusterSize.xls” is
available on the module blackboard site.
3a) Estimate the population mean of weekly wage (per individuals). [5%]
3b) Compute the 95% confidence interval of the estimate found in 3a). [10%]
For 3a) and 3b): Provide details of your calculation. You should describe and justify the approach you
used. You should provide the analytic expressions of the estimator and variance estimator used. You
should also describe the key steps of your calculation.
TASK 4 (15%)
Suppose that we use a sampling design without replacement to select a sample of size from a
population of size N. Let and denotes the first and
second-order inclusion probabilities of the sampling design used. We suppose that and
for all and . Let denote the value of a variable of interest for the individual of the sample .
Consider the following estimator:
For which population parameter is this estimator an unbiased estimator? Justify your answer.
[15%]
Dr. Yves Berger December 2022
4/5