2022 Final
2022 Final
equal to zero.
iil) Find the variance of Y and
hence show that Yis the consistent
of 2. estimator
a) Define chi-square variate. Show that
is greater than mean.
the variance of chi-square distribution 5
b) Define t-statistic. If X1,X2, .....X, bea
that&-) follows random sample from N(4, o). Show 5
distribution with (n-1) degrees of freedom. Where, Xis
sample mean and s is sample variance.
10. a) Define CBR, TFR, and ASFR indicating their merits and
demerits as a 3
measure of fertility.
b) Define NRR. How does it differ from GRR? Assuming the current TFR and 3
the proportion of female birth in Japan to be 4.09 and 0.46428, respectively.
Calculate GRR and interpret your result. Also, interpret NRR<1, NRR=1,
and NRR>1.
c) What is meant by doubling time of population? The enumerated population
of Bangladesh was reported to be 142319 (in thousands). If the population
of Bangladesh is assumed to grow at a constant rate of 1.34%, calculate the
time required for this population to be doubled by using exponential growth
rate.
2
THE END.
Fourth Year BS Final Practical Examination 2021
Subject: Statistics
Analysis and Experimental Design)
Course: Stat H-409: Statistical Computing VII (Multivariate Time: 0S Hours
Total Marks: 30
Table 1 contains
3. Satellite applications motivated the development of a silver-zinc battery.
failure data collected to characterize the performance of the battery during its life cycle. Use
these data to
Source: Selected from S. Sidik, H. Leibccki, and J. Bozek, kuiture of Silver-ZincCells with Competing
Fuilure Modes-Preliminary Daa Analysis, NASA Technical Memorandun 81556 (Cleveland: Lewis Research
Center. 1980).
Group-B (Experimental Design)
1. ALatin square was laid out to test the effects of fertilizers on the yield 10
of potatoes. Relevant data are given below.
Columns
A 449 B444 C401 D 299 E 292
D 323 E 264 A 415
B 463 C375
Rows B425
C393 D 353 E 278 A404
D 371 E 241 A 441 B410 C392
E 258 A 430 B450 C 385 D 347
and
State the necessary assumptions for analysis of variance
check whether the assumptions are violated for the data.
b. Carry out analysis of variance for the data.
no
C. Carry out analysis of variance pretending that there is
information about column in the data. That is, you are left with
the
the yields corresponding to the treatments A, B, C, D, E and
five rows (block).
d. You are supposed to have information on neither row nor
column. Now, carTy out analysis of variance for the yields
coresponding to the five treatments.
e. Compare the results obtained from Questions b, c, andd.
b 18 ab 30 b 23
(1) 28 a 32 ab 29
36 (1) 25 32
a. Draw interaction plot and interpret possible interaction effect
from the plot.
b. Conduct analysis of variance and
comment on your findings.
The End.
Fourth Year BS (Honors) Examination 2021
Department of Statistics, University of Dhaka
Course No.:Stat H10 (Statistical computing Vll: Survival Analysis and Time Series
Analysis)
Full Marks: 30 Time: 5 Hours
Group A: Survival Analysis
Answer ALL questions. Numerals in the right margin indicate marks.
N.B.: You must wnite all required R codes and only relevant output.
Conven the data in time series using command, ts(data, start=1871). Plot the data and comment
on stationarity.
2 l'se Augmentod Dickey-Fuller (ADF) test and verify your answer in question (1).
3. If your answer in (2) is non-stationary", then use appropriate transformation to make the series
stationary. Use ADF test to show that the transformed data is "stationary".
4. Draw ACF and PACF, and suggest appropriate model(s). Write down the model in
ARIMA(P.dq) notation.
3. Ifyou have suggested more than one models in (4), fit each of
them, compare AIC and select the
best one as your final model.
6. Wite down the best fitted model in
terms of Y,, the untransformed data and w:, the white
at time point t. noise
7. Perform model
diagnostic and comment whether your suggested model is
8. Forecast values of the series for
the next 5 time points.
adequate.
Useful Codes
Data=scan path");
modeln.ahead=n)
acf(data).predict(model, -arimaldata,order=c(p,d,q) : plot(data); tsdiag(model), adf.test(data),
The End.
Fourth Year B.S. (Hons.) Practical Examination, 2021
Subject: Statisties
Course: Stat H-411 Statistical Computing IX (Econometrics and Generalized Linear
Models)
Total marks: 30 Time: 5 hours
Group A(Econometries)
You have given a data named, "Bangladesh" in the folder named (3)
"Econometrics". Draw rvf plot and make a comment about the assumptions
of the regression modei. Here gdp will be considered as dependent variable
and carbon (carbon di oxide emission) will be considered as independent
variable.
2.
You have given data named LFP data". Fit the logit model and probit model. ()
Interpret the results. In this data, hhouse means the respondents have house
or not. lf hhouse =1, means the respondent has house. If hhouse =0, means
the respondent not having house. Here age means age of the respondents.
Here gender =1 means, the respondent is a male and genderr0 means, the
respondent is a female. LFP =l means, the respondent is doing work. LFP
=0 means, the respondent is not doing work.
3. You have given a data named, "regression data". Here Y is considered as a (5)
dependent variable and the others (W, R, L, and k) are considered as
independent variables. Run regression and write down the fitted model.
Interpret the coefficients.
Check the data for the following:
a) Multicollinearity,
b) Autocorrelation,
c) Heteroscedasticity,
d) Specification error.
e) And comment on the
goodness of fit of the model.
Group B (Generalized Linear Models) margin.
Answer all questions. Marks are given in the right
after its launch, exploded and crashed 7.5
1. On January 28, 1986 the space shuttle Challenger, shortly board. The cause of the accident was
into the Atlantic Ocean, killing seven astronauts on
booster. Data were
traced to failure of the O-rings at low temperatures on the solid rocket
collected on two variables: Temperature at launch in degrees Celsius (temp) and whether
at least one of the O-rings (failure) suffered failure (1=yes, O=no).
(a) Write down the appropriate generalized linear model (GLM) fitted, along with
the assumptions, and identify clearly all quantities. Also write R-code and link
function.
(b) On the launch day, the temperature was -0.6° C. Calculate the estimated
probability of O-rings failure at this temperature.
(c) Interpret the estimated effect of temperature on the odds of failure.
(d) Implement a change of deviance test to assess the significance of temp at the 5%
significance level.
(e) Describe the test associated with the p-value for temp in the output. How does
the conclusion compare to the answer fronm (d)?
2.
Consider the data set medpar (Hilbe, 2011) and our main goal is to investigate the effect of 2.5
died (1: died; 0: alive), hmo(1: patient belongs to a health maintenance organization; 0:
private pay) and white (1: white; 0: non-white) on the los (length-of-stay spent in the
hospital [number of days]).
(a) Write down the appropriate GLM with the necessary explanation of each
(b) Write down the R-code and link function for fiting an appropriate GLM. term.
(c) Predict the average length of stay for the person who is non-white, alive and a
member of the health maintenance organization.
(d) Conduct an overall significance test (deviance test) for the model
null and alternative hypotheses at 5% level. mentioning
(e) Compute incidence rate ratios for died and white.
Interpret the results.
University of Dhaka
Fourth Year B.S. (Honors) Final Examination, 2021
Subject: Statistics
Course: Stat H-401: Multivariate Analysis
Total Marks: 70
Time: 3 Hours
ay What do you understand by multivariate analysis? Give an example. Also, discuss the (5)
features of multivariate data analysis.
) Let X (with p components) be distributed according to N(u, E). Then show that,
(5)
Y= AX is distributed as N(Au, AEA') forA non-singular.
\ a) What do you mean by distribution of a quadratic form? If X~N, (u, D), then show (4)
that E(XTAX) = tr (AE) + p'Au.
b) Show that if X is distributed as X~N,(4, I), then the necessary and sufficient (6)
condition that the quadratic form XAX will be distributed as non-central y² with r
degrees of freedom and non-centrality parameter l=pAs is that A is an
idempotent matrix of rank r (sp).
a) Show how you can construct the multivariate confidence regions and
simultaneous (5)
comparisons of component means.
b) Discuss how you would do the test of hypothesis and find the
confidence regions of (5)
a population mean vector when the sample size is large.
6/ a) Define generalized variance. How could you find the first two moments of (6)
generalized variance.
b) Compute the maximum likelihood estimator of individual cell probabilities of (4)
multinomial distribution.
1
9. Discuss a classical linear regression model with its application in real life. Als
the Gauss' least squares theorem with its uses.
6) Let Y= Zß + ¬, where Z has full rank rtland e is distributed as N, (0, o²). Then (5)
show that the maximum likelihood estimator of f is the same as the least squares
estimator . Moreover, = (27)2Y is distributed as N,+1ß, a(ZZ)*) and is
distributed independently of the residuals @=Y- Z. Further, show that n-² =
ltis distributed as o x-r- where &²is the maximum
likelihood estimator of
10.
Discuss the general diagnostic purposes of
b) Discuss the multivariate multiple residuals graphically. (5)
regression briefly. (5)
THE END.
University of Dhaka
Fourth Year B.S. (Honors) Final
Examination, 2021
Subject: Statistics
Course: Stat H 402 (Time Series Analysis)
Time: 3 bours
Full Marks:70
a) How dóes time series data set differ from
cross-sectional data set? Explain with a (4)
real life example.
b) Introduce weekly and strictly stationary time series with
example. (4)
c) Briefly discuss the importance of studying autocorrelation function (ACF) and partial (2)
autocorrelation function (PACF) in time series data analysis.
Discuss the importance and effect of differencing in time series analysis. Suppose (6)
Y= BotBt+ X¢, here X is a zero-mean stationary series with autocovariance
funcion Yand ß 's are constants. Show that Yis not stationary, but the series Z, =
Y-Y-1 is stationary.
b) Show that, white noise process is strictly stationary. (4)
Write down the autoregressive process of order 1: AR(1)_process. Find lag-k (5)
autocorrelation function (ACF) for AR(1) process.
b) Findmean and autocorrelation function for second order moving average process: (5)
MA(2).
Discuss stationarity and invertibility condition for AR(1) and MA(I) process, (6)
respectively.
6 Show that MA(1)model can be expressed as AR(0) model. (4)
8 y Discuss several techniques of diagnostic checking for time series models, using plots (0)
of residuals.
V Write short note on "Ljung-Box Test". (4)
Considering AR(1) process with a nonzero mean, find out (), i.e., l -step ahead (6)
forecasted value of Y. Show that, the corresponding error variance.increases as the
lead l increases.
1
obtained from AR(1) model
fitted to a (4)
likelibood estimation results
The maximum
shown below
time series data are partially arl
intercept
Coeffcients 74.3293
0.5705
0.1435
1.9151
3686
364
362
Co2
300
388
368 -
1994 1995 1998 1997 1998
Time
(4)
Fourth Year B.S.University of Dhaka
(Honors) Final Examination, 2021
Subject: Statistics
Time: 3 hoursCourse: Stat H- 403 (Design and Analysis of
Experiment) Full Marks: 70
1.
Answer any seven (7) qestions
a) What is meant by design of
how replication is related to experiment? What are its main purposes? Discuss (5)
precision of the estimators in an
b) What are the sources of
experimental error? How can we control experiment.
it? (5)
Estimate the parameters involved in the fixed effect linear
model for Completely (
Randomized Design (CRD).
"ANOVA Ftest is an upper tailed test" - discuss the
Context of CRD. statement briefly in the (3)
The End
University of Dhaka
Fourth Year B.S. (Hons.) Final Examination, 2021
Subject: Statistics
Course No: Stat H-404
Course Name: Econometrics
Full Marks: 70, Time: 04 Hours
All questions are of equal value. Answer any seven ofthe following questions.
1. (a) Define Econometics. Also, define Economic variable. Give some examples of (3)
Macroeconomic variables.
main aims of (1+3)
(b) What are the 3most important cconomic indicators? What are the
Eoonometrics? Discuss them.
(c) Consider the following models. (3)
Model A: Y,= a1 + aX t asXs t un
Model B: (Y,- X)=B + BXt BXsrt u2
estimates of as and
Check if the OLS estimates of aj and Band the OLS
Bs are same? Give reasons for your answers.
What is the relationship between az and ß2?
Why or why not?
III. Can you compare the R terms of the two models?
multicollinearity (5)
What is multicollinearity? Discuss Farrar -Glauber test to detect
along with the hypothesis.
measure of
(b) How does a priori information can be helpful as a remedial (2)
muiticollincarity? Explain in detail.
(3)
(c) Suppose in the model:
Y = B + BzX + B3X31 t + BrX7i t ui
X to Xk are all uncorrelated.
i) What is the name of these variables (X2 to Xk)?
ii) What will be the structure,of the (X'X) matrix?
What will be the nature of the var-cov matrix of B?
iii)
(a) What is autocorelation? Draw the graph which indicate positive autocorrelation (3)
and also draw the graph which indicate negative correlation.
(b) Discuss the Durbin-Watson test for the detection of autocorrelation along with (4)
the hypothesis.
(c) Find out the value of the error terms when it is autocorrelated with the first order (3)
autoregresive scheme.
Page 1 of 3
Explain with example.
(2)
variable?
What do you mean by limited dependent probability model for which we (5)
(b) What is the fundamental problem of lincar model, when we
logit
consider logit model? Discuss estimation procedure of
consider replicated data.
(c) From the household budget survey of 2000 of the Dutch Central Bureau of (3)
Statistics, JS. Cramer obtained the following logit model based on a sarmple of
2820 households. The purpose of the logit model was to determine car ownership
as afunction of (logarithm of) income. Car owhership was a binary variable:
Y=1, ifa household owns a car, zero otherwise.
Where, Liestimated logit and where In income is the logarithm of income. The
x measures the goodness of fit of the model. Interpret the
estimated logit model.
Comment on the statistical significance of the estimated logit model.
(a) What do you mean by
which shows simuitaneoussimultaneous-equation
equations bias.
models? Discuss an example, (3)
(b) Show that simuitaneous relations
produce biased and inconsistent estimates. (4)
(c) Consider the following modified
Keynesian model of income determination: (3)
Cç= Bo + B11Y +ut
I,=B20 + Bz1Ye +B22Y-1 t u2t
Y,= C + lt + Ge
Where,
Consumption Expenditure
Investment Expenditure
Income
G=Govt. Expenditure
G and Y are assumed
predetermined.
Obtain the reduced form of the equations.
Page 2 of 3
t= time
u= stochastic disturbances
Which of the variables would you regard as endogenous and which as
exogenous?
followings: (4)
9. (a) What is residual analysis? Draw a residual plot in each case of the
i) Modl fits the data well.
ii) Model includes a square term.
iii) Model includes a square term as well as a cubic term.
iv) Heteroscedasticity is present in the model.
(b) In what situation we will use polynomial regression? Suppose that you have (3)
given a data. How you will understand that, we will use polynomial regression or
not in this data.
(c) If you have monthly data over a number of years, how many dummy variables (3)
will you introduce to test the following hypotheses:
a) Allthe 12 months of the year exhibit seasonal patterns.
b)Only February, April, June, August, October and December exhibit
Seasonal patterns.
In both hypotheses, consider the regression model including intercept term and
excluding intercept term.
THE END.
University of Dhaka
Fourth Year B.S. (Honors) Examination, 2021
Subject: Statistics
Course: Stat H-405 (Survival Analysis)
Total Marks: 70 Time: 3 Hours
Answer any seven questions.
1.
a) What do you mean by 'Survival analysis"'? Write three real life situations (4)
where survival analysis can be conducted.
b) Define 'Event' and Time' in survival analysis. ldentify the event of interest (4)
and the outcome variable from the examples you mentioned above.
c) Construct a table of survival time data for five individuals based on any of (2)
the examples mentioned in question 1(a).
2 a) Distinguish between left censoring and right censoring with a suitable (2)
example.
b) Define probability density function, survival function and hazard function. (6)
Describe their characteristics.
c) How do you estimate the hazard function if there is no censored observation? (2)
3. a) How do you get the life-table cstimate of a survival function? How does it (4)
differ from Kaplan-Meier estimate?
b) Consider the following data on the survival times of some multiple myeloma (6)
patients.
6. 352». 10,4, 66,. 14, .4,16, 65,4. G59 10. 6.(5)
"S, 76+ 56)88, 24]514, 40+, 8, 18,5
Determine the life-table estimate of the survival function, and show the
estimated survival function graphically.
S. a) Mention the problems that you encounter in finding the confidence intervals (4)
for values of the survivor function. How do you address these problems?
b) Time in weeks to discontinuation of the use of an intrauterine device (IUD) (6)
is given below. Let us call this IUD data.
10, 13+, 18+, 19, 23+, 30, 36, 38+, 54+, 56+, 59, 75, 93, 97, 104+, 107,
107+, 107+
Note that the time origin coresponds to the first day in which a woman uses
the IUD, and the end-point is discontinuation because of bleeding problems.
Using this data construct 95% pointwise confidence intervals of the
survivor function.
6. a) What is a Kaplan-Meier type estimate of the hazard function? Find its (5)
standard error.
b) Using the IUD data from Question 5(b), determine the Kaplan-Meier type
(5)
estimate of the hazard function. Also plot the estimated hazard function.
data with an (3)
of survival
comparing groups
7. a) Explain the impörtance of (7)
example. groups of survival data
comparing two
b) Find the log-rank test statistic for assumptions. How do you
hypotheses and underlying
clearly stating the groups of survival data?
extend this to compare three or more
9.
Let T be the lifetime random variable with pdf f(t) = (e-/9); 0,t > 0.
(6)
a) Under type I censoring, obtain the maximum likelihood estimator (MLE) 0.
Also find var(®).Show that the estimated standard error of® is /r,
where r = Sô; is the observed number of complete lifetimes.
b) Under type II censoring, ô = Y/r, where Y= T + (n-r)Tr). It is (4)
known that 2Y/0 follows x distribution with 2r df. Obtain 100(1- a)%
confidence interval of tp, the p-th quantile.
10. a) Define a location-scale model along with its survival function. Give some (3)
examples of location-scale and log-location-scale models.
b) Find the estimates of the parameters of location-scale model. How do you (7)
obtain the Wald type confidence intervals for the parameters of location-scale
model?
THE END.
University of Dhaka
Fourth Year B.S. (Honors) Final Examination, 2021
Subject: Statistics
Course: Stat H-406 - Stochastic Process
Time: 3 hours
Total Marks: 70
I. (a) Define stochastic process. How does it differ from deterministic process? (4)
(b) Give examples of different types of stochastic processes based on time and state (4)
space.
(c) Explain state space and sample space with an example. (2)
ka) Define Markov Process and Markov Chain with examples. (4)
What is a transition probability matrix? What are the important properties of a (3)
transition probability matrix? lllustrate with an example.
(c) Weather can be classified as sunny, cloudy or rainy. If it is sunny ona given day, (3)
then on the following day it is cloudy with probability 0.3 and rainy with
probability 0.2. If it is cloudy on a given day, then on the following day it is sunny
with probability 0.3 and rainy with probability 0.4. If it is rainy on a given day,
then on the following day it is sunny with probability 0.2 and cloudy with
probability 0.5. If it is sunny today, what is the probability that it will be rainy on
the day after tomorrow? G9%
3 ) Explain time-homogeneous and time non-homogeneous Markov Chains. (4)
5 Let (Xi n = 0,1,2,} be a Markov Chain with state space S= (1, 2, 3}, initial (3)
distribution a = i4,5) and transition probability matrix
/0.2 0.2 0.6\
P=0.2 0.4 0.4
\0.1 0.2 0.7/
Obtain P(X = 2,X, = 3, X, = 1).
4 (a) Define ireducible Markoy Chain, positive recurrent state and transient state with (3)
examples.
o) Define period ofaMarkov Chain with an example. When aMarkov Chain is called (3)
ergodic?
(e) Classify the states of a Markov Chain with the following TPM. Also, determine (4)
the period of each state.
/0.5 0 0.5\
0.3 0 0.3 0.4 0
0.1 0.2 0.1 0.2 0.1 0.1
P=
0 0.2 0 0.2 0.6 0
o2 o.2
0.6 0 0.2 0.2 0
G\0.5 0 0 0.5/
(a) What is the diference between pure birth process and Yule process? (4)
o) Write down the probability distribution of the population size at time for a Yule (2)
process.
(c) In a toy store, the number of toys sold follows a Yule process with initial sale 2 (4)
and rate 0.5/hour.
i) What is the probability that, after 3 hours, the number of toys sold will
be 6?
(ii) What is the average number of toys sold during a 3-hour period? os.
The End
University of Dhaka
Fourth Year B.S. (Honors.) Examination, 2021
Subject: Statistics
Course: Stat H-407 (Generalized Linear Models)
Total marks: 70 Time: 3 hours
Answer any 7(SEVEN) questions. Marks are given in the right margin.
1. () Consider asingle random variable Ywhose distribution depends on asingle parameter 8. (4)
Write down the form of the exponential family for f(y: 0) and hence show that Poisson(0)
is a one-parameter member of the exponential family.
(b) Extend the single-parameter exponential family form for the distribution with two (6)
parameters and hence show that N(u, o) is a 2-parameter member of the exponential
family.
2 ) What are the three components of a generalized linear model (GLM)? Explain the (3)
associated terms.
(3)
(b) Write down the general form of a GLM for the pdf/pmf of a distribution.
general
(c) Suppose Y~Poisson(). Show that the pmf of Ycan be written in the context of the (4)
form of a GLM.
d. (a) Briefly describe the role of the link funotion in a GLM. (2)
framnework with (8)
(b) Write down the name and form of the associated link function in the GLM
justification for the following:
() Y~Poisson(4),
(i) Y~Bernoulli(u) and
(iüi) Y~N(4, o').
5. Let Y,.., Yy, .., Yn be n independent response variables from a Bernoulli distribution with
parameter n.
i) Show that this distribution belongs to the exponential family of distributions. Is it (4)
in canonical form?
ii) Identify the natural parameter. Argue that this natural parameter can be used as a
link function to construct the GLM for the given random variables. (3)
ii) Hence, find an expression for the mean of random variable using covariates and
parameters. (3)
a) Derive the score equation, used for the estimation of a GLM, by using the chain rule of (6)
differentiation.
b) Suppose Y~Bernoulli (0). Derive the asymptotic sampling distribution of the score ()
statistic.
the information matrix () in a GLM and hence (6)
for the elements of (4)
V (a) Derive the expression mentioning the diagonal matrix.
write I in matrix notation testing the entire parameter vector of a
GLM
Wald and score statistics for
(b) Write down
mentioning hypothesis of our interest.
Suppose Y's are independent and Y~Poisson (4). Letß = (Bo, ...B) be aparameter vector
and x = (Xi1,..,X)' is a covariate vector for ith individual. under the saturated model. (4)
(a) Obtain the expression for the maximized log-likelihood tables (4)
(b) Derive the deviance and hence write down its expression used in the contingency
and log-liner models.
(2)
(c) Write down the expression for Pearson chi-squared standardized residuals.
10: Suppose that a logistic model having the dependent variable hypertension status (0, 1) and a set
of covariates age (continuous), smoking status SMK [O(non-smoker), 1 (smoker)], gender
[0 (female), 1 (male)], cholesterol level CHOL (Continuous), occupation 0CC [0 (non
worker), 1 (worker)] have been fitted and estimated coefficients are given in the following
table:
Table: Regression coefficients with standard errors for Logistic regression Model
SMK Gender CHOL OCC
Variable Constant Age -0.327
-0.691 0.142 0.356 0.472 0.269
Coefficient
0.035 0.127 0.205 0.142 0.128
Standard 0.231
Error
of
a. Assuming a follow-up study design, compute and interpret the estimated risk
developing hypertension for a 40-year-old male worker who also smokes with (2)
CHOL=150.
g)
The End.
Üniversity of Dhaka
Second Year B.S. (Honors) Final
Subject: Statistics
Examination, 2021
Course: Stat H-408 (Comprehensive)
Total Marks: 70 Time: 4 Hours
0sxs2,0 <y s 2
f(x, y)
0, Otherwise
Page 1 of 3
r close to +1, iv) r positive but close to zero, v)
i)r= +1, i)r = -1, ii)
No linear relationship. 6
to model the association between mean daily
b) A restaurant owner wishes customers served (Covers). The oWner
Costs (Costs), and the number of seven days are shown in the
collected a week of data and the values for
accompanying table
2240 2410 2590 3060 >Y
2820
Costs 1000 2180
60 120 133 143 |175
175 >X
Covers0
a) Find which one is the dependent variable and which is the independent
variable.
6) Estimate the regression equation.
c) Suppose the number of customers served is 150. Calculate the fitted value
of y and corresponding residuals.
6. a) () Define Type Ieror and Type Ilerror with practical examples. Distinguish 4
between normal test and t-test.
b) Anew software package has been developed by Microsoft Ltd., which seeks 6
to reduce the time required by system analysts to design, develop and
implement information systems. To test the effectiveness of the software, a
random sample of 8 analysts using existing technology and another random
sample of 22 analysts who are trained to use the new software package are
selected. The two sets of analysts perform the work and the following results
were obtained:
What can be said about the underlying mean time of these two systems in
terms of developing and implementing an information system?
6) Define sampling unit and sampling frame. Write down the criteria of an ideal 3
sampling frame.
What do you mean by simple random sampling (SRS)? When SRS is used?
Suppose that a SRS of size 2 is drawn from a population of size 4 without
replacement. Also, suppose that values obtained from sampling units 1, 2, 3,
and 4 are 5, 10, 15, and 20, respectively.
Find the population mean.
ii) Find the all possible sample with sample values. Also, find the all
possible sample mean values.
iii) Show that sample mean is an anbiased estimator of population
mean.
iv) Find the sampling distribution of sample mean. Hence show that
it is an unbiased estimator of population mean.
Define stratified sampling. What are the allocation procedures used in
stratified sampling? Explain Neyman-allocation.
Let Tbe an estimator for 0. Then prove the following identity: 2
MSE(T) = var(T) + [bias(T)]"
b Suppose that Yi, Y2, ...Y, are independent iid Bin(m, ) random variables
where m is known.
i) Explain why Yfollows Bin (mr,9).
ii) Show thatis an unbiased estimator of .
ii) Find the variance of. What is the value of ksuch that, *(m-
m²
s:an
unbiased estimator of this variance?
e Suppose that Yi, Yz, ...Y, are independent Poisson (2) random variables. 3
Page 2 of3