0% found this document useful (0 votes)
32 views

Instructions

Uploaded by

loluise127
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Instructions

Uploaded by

loluise127
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

[Print: May 19, 2022]

[]

Statistics Project Directions

Along with this set of directions you will find a .csv file containing three columns of numbers.
The name of the file is one of CUcccxx.csv, Gcccxx.csv, Bcccxx.csv. The first letter(s) of the
file name indicates the type of distribution contained in the first column of the table (CU means
continuous uniform, G means gamma, and B means binomial). The characters ccc indicate the
presence of some number of other characters and the digits xx (or in some cases only x) are an
identification number for the file.

Column A. When the file is opened with Excel (or any other spreadsheet that can read .csv
files) the first column should contain these items:
Distribution Name
Distribution Variance
<a blank cell>
data point 1
data point 2
...
data point n

Column A Tasks.
A.1 For the type of distribution presented in the first column, plot a histogram (normalized to
unit area) of the data.
A.2 Estimate the mean and the variance for the column A distribution. For comparison, the
exact variance is given in the cell labeled (here) Distribution Variance.
A.3 The distribution found in the first column depends on two parameters as follows:
Continuous Uniform a and b (the endpoints of the interval)
Gamma Distribution α and β
Binomial Distribution n and p
Estimate the two parameters associated with the distribution given in Column A and deter-
mine a 96% confidence interval around each parameter.
A.4 How large a data set is needed to get 96% confidence intervals of width 0.01 or smaller around
the two parameters? (Assume X̄ and S 2 do not change significantly with N when N is large.)
A.5 Plot a graph of the density function for the distribution in column A using the estimated
parameter values determined in part A.3. Compare this graph to your normalized histogram.

Column B. The second column is similar to the first and looks like this:
Normal
<a blank cell>
<a blank cell>
data point 1
data point 2
...
data point 10000

This column contains 10,000 values from some Normal distribution.

1
[Print: May 19, 2022]
[]

Column B Tasks.
B.1 From the B column select two non-overlapping chunks1 of consecutive data points. The first
chunk should contain a large number of data points. The second chunk should contain exactly
25 data points. Estimate the mean and variance of this normal distribution using your first
(large) chunk of data.
B.2 Compute 98% confidence intervals around each of the parameters µ and σ based on the large
chunk of data.
B.3 Test the claim µ ≥ 4 using the second (small) chunk of data in a significance test. State the
null and research hypotheses. Describe the location of the critical region. Give the P-value
and the Z, T, or χ2 -stat as appropriate. Find the region where the power of the test exceeds
0.95.

Column C. The third column is similar to the second column and looks like this:
?????
<a blank cell>
<a blank cell>
data point 1
data point 2
...
data point 1000

This column contains random values taken from a mystery distribution. The distribution is known
to be either gamma or normal. Read section 15.2 in the textbook concerning a significance test
called The Goodness of Fit test or consult the Statistical Goodies handout.

Column C Tasks.
C.1 Create a raw histogram of your data. On this basis make a guess at the distribution.
C.2 Assume the distribution is normal, estimate the parameters that it would have. Use this
information to perform a Goodness of Fit test on the data under the assumption that the
data is normal. State the research and null hypotheses. Describe the location of the critical
region. Give the P-value and the Z, T, or χ2 -stat as appropriate. State your conclusions
using 0.10 as the rejection level.
C.3 Assume the distribution is gamma, estimate the parameters that it would have. Use this
information to perform a Goodness of Fit test on the data under the assumption that the
data is gamma. State the research and null hypotheses. Describe the location of the critical
region. Give the P-value and the Z, T, or χ2 -stat as appropriate. State your conclusions
using 0.10 as the rejection/acceptance level.
C.4 If some P-values from steps 2 and 3 are above 90%, assume that the data set with the largest
of these P-values represents the distribution of the data. This is your candidate distribution.
If all of the P-values from steps 2 and 3 are below 90%, locate the analysis that had the largest
P-value and perform a goodness of fit hypothesis test on that data to resolve the question of
its distribution. <continued on next page>
1
a technical term. . . don’t try to understand this!

2
[Print: May 19, 2022]
[]

If you cannot get a positive resolution from either the significance or the hypothesis test,
report your findings and go on the step (C.5)– the distribution you found with the best
goodness of fit P-value will stand in for your candidate distribution.
C.5 Based on the Goodness of Fit test results, name the mystery distribution. Does this agree
with your initial guess? Plot a graph of the actual density and compare it with your Part
C.1 histogram normalized to unit area.

The Work.

Divide up the responsibility for solving these problems among the team members. In your report,
explain who was responsible for which aspects of the work. In the end, it is important for everyone
to understand how the problems were solved — this will make the task of preparing for the final
exam easier.

Many spreadsheets contain packages that can do a lot of these calculations for you. Avoid using
these since a computer of any kind is unavailable for the final exam – it is best to learn how to
perform these computations from the basic ideas. You may use spreadsheet functions, such as
NORMINV, NORMDIST, CHIINV, TINV, AVERAGE, STDEV, (to name a few), in lieu of the tables in the
back of the book. Make sure you know what these functions are reporting. Do not use Excel data
analysis functions to determine histograms, confidence intervals, hypothesis/significance tests, and
so on. You should construct histograms, confidence intervals, perform tests of hypothesis from
scratch.

The Report.

Write up (in a few pages) who performed the analysis and how the data was analyzed. Describe
any estimators, equations, theorems, etc. used in performing the confidence interval estimates.
Explain what you are doing for a significance test and interpret the results. Include any remarks
about the data (or the results) that you feel are pertinent.

When you are ready to submit, print a paper copy of your report, check the copy one last time
for errors, and submit.

It isn’t necessary (or desired) to turn in pages and pages of spreadsheet computations. The results
and your interpretations of them must suffice. Use graphs, charts, and summary tables as needed
to support your work. Above all, make this report readable! . . . omit needless words, be succinct!

You might also like