0% found this document useful (0 votes)
1 views

2022 Final

The document contains a series of statistical problems and practical examination questions related to multivariate analysis, experimental design, survival analysis, time series analysis, econometrics, and generalized linear models. It includes tasks such as calculating bias and variance, defining statistical terms, performing analyses of variance, estimating survival functions, and fitting regression models. The document emphasizes the use of statistical software like R and STATA for data analysis and interpretation.

Uploaded by

024. Al- Yasfy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

2022 Final

The document contains a series of statistical problems and practical examination questions related to multivariate analysis, experimental design, survival analysis, time series analysis, econometrics, and generalized linear models. It includes tasks such as calculating bias and variance, defining statistical terms, performing analyses of variance, estimating survival functions, and fitting regression models. The document emphasizes the use of statistical software like R and STATA for data analysis and interpretation.

Uploaded by

024. Al- Yasfy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

i) Show that the bias of Yis

equal to zero.
iil) Find the variance of Y and
hence show that Yis the consistent
of 2. estimator
a) Define chi-square variate. Show that
is greater than mean.
the variance of chi-square distribution 5
b) Define t-statistic. If X1,X2, .....X, bea
that&-) follows random sample from N(4, o). Show 5
distribution with (n-1) degrees of freedom. Where, Xis
sample mean and s is sample variance.
10. a) Define CBR, TFR, and ASFR indicating their merits and
demerits as a 3
measure of fertility.
b) Define NRR. How does it differ from GRR? Assuming the current TFR and 3
the proportion of female birth in Japan to be 4.09 and 0.46428, respectively.
Calculate GRR and interpret your result. Also, interpret NRR<1, NRR=1,
and NRR>1.
c) What is meant by doubling time of population? The enumerated population
of Bangladesh was reported to be 142319 (in thousands). If the population
of Bangladesh is assumed to grow at a constant rate of 1.34%, calculate the
time required for this population to be doubled by using exponential growth
rate.

2
THE END.
Fourth Year BS Final Practical Examination 2021
Subject: Statistics
Analysis and Experimental Design)
Course: Stat H-409: Statistical Computing VII (Multivariate Time: 0S Hours
Total Marks: 30

Answer all the following questions:

Group A (Multivariate Analys

use R/STATA to solve at most two of the three questions.)


(NB: You must use both R and STATA. You can

observations from a multivariate normal distribution N(4, 2),where p'


1. Draw a sample of 30 3 -1 0]
3
covariance matrix. Then draw
0 is the
-3, 2, 1) is the mean matrix and 3J
discuss the aspects of multivariate data.
scatter plots and boxplots of the variables and
[S]
testing the hypothesis that Ho:H= (0.8, 2, 60,30, 2, 200, 3) vs
2. (a) Evaluate Tfor
data in Table 1.
H:p (0.8, 2, 60,30, 2, 200,3) using the
in (a).
(b) Specify the distribution of Tfor the situation
a =0.01.What conclusion do you reach?
(c) Using (a) and (b), test the H at the level

Table 1 contains
3. Satellite applications motivated the development of a silver-zinc battery.
failure data collected to characterize the performance of the battery during its life cycle. Use
these data to

(a) find the estimated linear regression of on an appropriate (best") subset of


predictor variables.
(b) analyze the residuals and comment on your results.
Table 1: Battery-Failure Data
Z Z4 Y2
Depth of End of
Charge Discharge discharge charge Cyeles
rate rate (% of rated Temperature voltage Cycles to to

(amps) (amps) ampere-hours) (°C) (volts) failue replace


375 3.13 40 2.00 101 3
60.0
1.000 76.8 1.99 141 4
3.13 30
1.000 3.13 60.0 20 2.00 96
1.000 3.13 60.0 20 1.98 125
1.625 3.13 43.2 10 2.01 43 2
L.625 3.13 60.0 20 2.00 16
1.625 3.13 60.0 20 2.02 188
375 5.00 76.8 10 2.01 10
l.000 S.00 43.2 10 1.99 3 6
l.000 5.00 43.2 30 2.01 386
1.000 5.00 100.0 20 2.00 45 5
l.625 S.00 76.8 10 1.99 2 2
375 1.25 76.8 10 2.01 76 10
1.000 1.25 43.2 10 1.99 78
1.000 1,25 76.8 30 2.00 160 9
l.000 1.25 60,0 0 2.00
1.625 1.25 43.2 30 1.99 216 2
1.625 1.25 60.0 20 2.00 73 3
375 3.13 76.8 30 1.99 314 12
375 3.13 60.0 20 2.00 170 :4

Source: Selected from S. Sidik, H. Leibccki, and J. Bozek, kuiture of Silver-ZincCells with Competing
Fuilure Modes-Preliminary Daa Analysis, NASA Technical Memorandun 81556 (Cleveland: Lewis Research
Center. 1980).
Group-B (Experimental Design)
1. ALatin square was laid out to test the effects of fertilizers on the yield 10
of potatoes. Relevant data are given below.
Columns
A 449 B444 C401 D 299 E 292
D 323 E 264 A 415
B 463 C375
Rows B425
C393 D 353 E 278 A404
D 371 E 241 A 441 B410 C392
E 258 A 430 B450 C 385 D 347
and
State the necessary assumptions for analysis of variance
check whether the assumptions are violated for the data.
b. Carry out analysis of variance for the data.
no
C. Carry out analysis of variance pretending that there is
information about column in the data. That is, you are left with
the
the yields corresponding to the treatments A, B, C, D, E and
five rows (block).
d. You are supposed to have information on neither row nor
column. Now, carTy out analysis of variance for the yields
coresponding to the five treatments.
e. Compare the results obtained from Questions b, c, andd.

2. A2² factorial experiment is conducted in 3 replicates. The results are as 5


shown below:
Replication 1 Replication 2 Replication 3
ab 31 b 19 (1) 27

b 18 ab 30 b 23

(1) 28 a 32 ab 29

36 (1) 25 32
a. Draw interaction plot and interpret possible interaction effect
from the plot.
b. Conduct analysis of variance and
comment on your findings.

The End.
Fourth Year BS (Honors) Examination 2021
Department of Statistics, University of Dhaka
Course No.:Stat H10 (Statistical computing Vll: Survival Analysis and Time Series
Analysis)
Full Marks: 30 Time: 5 Hours
Group A: Survival Analysis
Answer ALL questions. Numerals in the right margin indicate marks.
N.B.: You must wnite all required R codes and only relevant output.

1 Alaboratory investigator interested in the relationship between diet and the


development of tumors divided 90 rats into three groups and fed them low-fat,
saturated fat, and unsaturated fat diets, respectively. The rats were of the same
age and species and were in similar physical condition. An identical amount of
tumor cells were injected into a foot pad of each rat. The rats were observed for
200 days. Many developed a recognizable tumor early in the study period. Some
were turmor-free at the end of the 200 days. The tumor-free time, the time from
injection to the time that a tumor develops or to the end of the study are given
below.
LowFat 140, 177,50, 65, 86, 153, 181, 191,77, 84, 87, 56, 66, 73, 119, 140+,
200+, 200+, 200+, 200+, 200+, 200+, 200+, 200+, 200+t, 200+, 200+, 200+,
200+, 200+
Saturated Fat: 124, 58, 56, 68, 79, 89, 107, 86, 142, 110, 96, 142, 86, 75, 117,
98, 105, 126,43, 46, 81, 133, 165, 170+, 200+, 200+, 200+, 200+,
200+, 200+
Unsaturated Fat: 112, 68, 84, 109, 153, 143, 60, 70, 98, 164, 63, 63, 77, 91,
91, 66,70, 77, 63, 66, 66, 94, 101, 105, 108, 112, 115,
126, 161, 178
(a) Estimate the survival (tumor-free) function of
the three diet groups by a
using the Kaplan-Meier PL method.
(b) Estimate the median suvival time for
each of the group. 2
(c) The investigator's main interest was to
compare the three diets' abilities
to keep the rats tumor-free. 3
Carry out a test whether the
among the groups are statistically diferences
significant.
2. The following lifetimes are generated from a
f(t) =aBaryf- exp[-(At), t> 0, d > Weibull distribution having pdf
1.35, 2.36,1.70, 0.73, 3.93, 2.98, 2.59, 0,8>0:
2.60, 0.97, 1.69, 0.89, 1.49, 1.70,
1.25 2.65,
(a) Find a 95% confidence
(b) Test the
interval (Cl) for B. Interpret your result. 4
hypothesis H, = (1,2)" versus H:0 +
that :
e= (B,a). (1,2)". Note that 3
Group B: Time Series Analysis
following questions:
Use the dataset "tobacco.txt" located in D" drive to answer the

Conven the data in time series using command, ts(data, start=1871). Plot the data and comment
on stationarity.
2 l'se Augmentod Dickey-Fuller (ADF) test and verify your answer in question (1).
3. If your answer in (2) is non-stationary", then use appropriate transformation to make the series
stationary. Use ADF test to show that the transformed data is "stationary".
4. Draw ACF and PACF, and suggest appropriate model(s). Write down the model in
ARIMA(P.dq) notation.
3. Ifyou have suggested more than one models in (4), fit each of
them, compare AIC and select the
best one as your final model.
6. Wite down the best fitted model in
terms of Y,, the untransformed data and w:, the white
at time point t. noise
7. Perform model
diagnostic and comment whether your suggested model is
8. Forecast values of the series for
the next 5 time points.
adequate.

Useful Codes

Data=scan path");
modeln.ahead=n)
acf(data).predict(model, -arimaldata,order=c(p,d,q) : plot(data); tsdiag(model), adf.test(data),
The End.
Fourth Year B.S. (Hons.) Practical Examination, 2021
Subject: Statisties
Course: Stat H-411 Statistical Computing IX (Econometrics and Generalized Linear
Models)
Total marks: 30 Time: 5 hours

Group A(Econometries)
You have given a data named, "Bangladesh" in the folder named (3)
"Econometrics". Draw rvf plot and make a comment about the assumptions
of the regression modei. Here gdp will be considered as dependent variable
and carbon (carbon di oxide emission) will be considered as independent
variable.

2.
You have given data named LFP data". Fit the logit model and probit model. ()
Interpret the results. In this data, hhouse means the respondents have house
or not. lf hhouse =1, means the respondent has house. If hhouse =0, means
the respondent not having house. Here age means age of the respondents.
Here gender =1 means, the respondent is a male and genderr0 means, the
respondent is a female. LFP =l means, the respondent is doing work. LFP
=0 means, the respondent is not doing work.
3. You have given a data named, "regression data". Here Y is considered as a (5)
dependent variable and the others (W, R, L, and k) are considered as
independent variables. Run regression and write down the fitted model.
Interpret the coefficients.
Check the data for the following:
a) Multicollinearity,
b) Autocorrelation,
c) Heteroscedasticity,
d) Specification error.
e) And comment on the
goodness of fit of the model.
Group B (Generalized Linear Models) margin.
Answer all questions. Marks are given in the right
after its launch, exploded and crashed 7.5
1. On January 28, 1986 the space shuttle Challenger, shortly board. The cause of the accident was
into the Atlantic Ocean, killing seven astronauts on
booster. Data were
traced to failure of the O-rings at low temperatures on the solid rocket
collected on two variables: Temperature at launch in degrees Celsius (temp) and whether
at least one of the O-rings (failure) suffered failure (1=yes, O=no).

(a) Write down the appropriate generalized linear model (GLM) fitted, along with
the assumptions, and identify clearly all quantities. Also write R-code and link
function.
(b) On the launch day, the temperature was -0.6° C. Calculate the estimated
probability of O-rings failure at this temperature.
(c) Interpret the estimated effect of temperature on the odds of failure.
(d) Implement a change of deviance test to assess the significance of temp at the 5%
significance level.
(e) Describe the test associated with the p-value for temp in the output. How does
the conclusion compare to the answer fronm (d)?
2.
Consider the data set medpar (Hilbe, 2011) and our main goal is to investigate the effect of 2.5
died (1: died; 0: alive), hmo(1: patient belongs to a health maintenance organization; 0:
private pay) and white (1: white; 0: non-white) on the los (length-of-stay spent in the
hospital [number of days]).
(a) Write down the appropriate GLM with the necessary explanation of each
(b) Write down the R-code and link function for fiting an appropriate GLM. term.
(c) Predict the average length of stay for the person who is non-white, alive and a
member of the health maintenance organization.
(d) Conduct an overall significance test (deviance test) for the model
null and alternative hypotheses at 5% level. mentioning
(e) Compute incidence rate ratios for died and white.
Interpret the results.
University of Dhaka
Fourth Year B.S. (Honors) Final Examination, 2021
Subject: Statistics
Course: Stat H-401: Multivariate Analysis
Total Marks: 70
Time: 3 Hours

Answer any Seven (7) questions (7*10=70) from the following:

ay What do you understand by multivariate analysis? Give an example. Also, discuss the (5)
features of multivariate data analysis.
) Let X (with p components) be distributed according to N(u, E). Then show that,
(5)
Y= AX is distributed as N(Au, AEA') forA non-singular.

Define Hotelling's T² statistic. Show that Hotelling's T² is a generalization of (4)


univariate tn-1)
b) Show that Hotelling's T² is a function of likelihood ratio criterion (a) for testing (6)
Ho:= Po:
When do you need Wishart distribution? Compute the Mean and Variance of Wishart (5)
disuribution.
b) IfXX, .., X be arandom sample from N, (4,), then find the sampling distribution (5)
of A= X-1(X; - X)(X, - }).

\ a) What do you mean by distribution of a quadratic form? If X~N, (u, D), then show (4)
that E(XTAX) = tr (AE) + p'Au.
b) Show that if X is distributed as X~N,(4, I), then the necessary and sufficient (6)
condition that the quadratic form XAX will be distributed as non-central y² with r
degrees of freedom and non-centrality parameter l=pAs is that A is an
idempotent matrix of rank r (sp).

a) Show how you can construct the multivariate confidence regions and
simultaneous (5)
comparisons of component means.
b) Discuss how you would do the test of hypothesis and find the
confidence regions of (5)
a population mean vector when the sample size is large.

6/ a) Define generalized variance. How could you find the first two moments of (6)
generalized variance.
b) Compute the maximum likelihood estimator of individual cell probabilities of (4)
multinomial distribution.

a) Discuss the paired comparison procedure for multivariate case.


(5)
b) Discuss the multivariate one-way fixed effects model.
(5)
4) Discuss how to construct the simultaneous confidence intervals for treatment
effects (5)
in MANOVA.
b) Discuss how to test the parameters of multivariate two-way fixed
effects model with (5)
interaction.

1
9. Discuss a classical linear regression model with its application in real life. Als
the Gauss' least squares theorem with its uses.
6) Let Y= Zß + ¬, where Z has full rank rtland e is distributed as N, (0, o²). Then (5)
show that the maximum likelihood estimator of f is the same as the least squares
estimator . Moreover, = (27)2Y is distributed as N,+1ß, a(ZZ)*) and is
distributed independently of the residuals @=Y- Z. Further, show that n-² =
ltis distributed as o x-r- where &²is the maximum
likelihood estimator of
10.
Discuss the general diagnostic purposes of
b) Discuss the multivariate multiple residuals graphically. (5)
regression briefly. (5)
THE END.
University of Dhaka
Fourth Year B.S. (Honors) Final
Examination, 2021
Subject: Statistics
Course: Stat H 402 (Time Series Analysis)
Time: 3 bours
Full Marks:70
a) How dóes time series data set differ from
cross-sectional data set? Explain with a (4)
real life example.
b) Introduce weekly and strictly stationary time series with
example. (4)
c) Briefly discuss the importance of studying autocorrelation function (ACF) and partial (2)
autocorrelation function (PACF) in time series data analysis.

Discuss the importance and effect of differencing in time series analysis. Suppose (6)
Y= BotBt+ X¢, here X is a zero-mean stationary series with autocovariance
funcion Yand ß 's are constants. Show that Yis not stationary, but the series Z, =
Y-Y-1 is stationary.
b) Show that, white noise process is strictly stationary. (4)

3. Let, X}, 1,2,...be atime series such that X} = m, + s, + Y, where m, denotes


a trend, St denotes a seasonal effect with period length 12 and Y denotes a zero-mean
weakly stationary process. It is assumed that s, = St-12
a) Give an example of a weakly stationary process. (3)
b) Show that V12 is a weakly stationary process when the trend is linear and give its (7)
autocovariance furnction in terms of the process Y.

Write down the autoregressive process of order 1: AR(1)_process. Find lag-k (5)
autocorrelation function (ACF) for AR(1) process.
b) Findmean and autocorrelation function for second order moving average process: (5)
MA(2).

5. a) Discuss the problem of overdiferencing. Discusa logarithm transformation of series (5)


with non-stationarity due to non-constant variance.
b) Define mixed autoregressive moving average (ARMA) process. Find ACF for (5)
ARMA(1,1) model.

Discuss stationarity and invertibility condition for AR(1) and MA(I) process, (6)
respectively.
6 Show that MA(1)model can be expressed as AR(0) model. (4)

Discuss method of moment estimation for AR(P) model. (5)


The least squares and method-of-moments estimators are nearly identical-especially (5)
for large samples'-justify your answer considering AR(1) model.

8 y Discuss several techniques of diagnostic checking for time series models, using plots (0)
of residuals.
V Write short note on "Ljung-Box Test". (4)

Considering AR(1) process with a nonzero mean, find out (), i.e., l -step ahead (6)
forecasted value of Y. Show that, the corresponding error variance.increases as the
lead l increases.

1
obtained from AR(1) model
fitted to a (4)
likelibood estimation results
The maximum
shown below
time series data are partially arl
intercept
Coeffcients 74.3293
0.5705
0.1435
1.9151

forecast 5 time period ahead.


If the last observed value is 67,
(5)
models.
10. a) Write short note on seasonal ARIMA from January 1994 to (5)
shows the monthly carbon dioxide levels
b) The following figure that can explain the behavior of this time
December 1997.Suggest a possible model
series data.

3686
364
362
Co2
300
388
368 -
1994 1995 1998 1997 1998

Time

(4)
Fourth Year B.S.University of Dhaka
(Honors) Final Examination, 2021
Subject: Statistics
Time: 3 hoursCourse: Stat H- 403 (Design and Analysis of
Experiment) Full Marks: 70
1.
Answer any seven (7) qestions
a) What is meant by design of
how replication is related to experiment? What are its main purposes? Discuss (5)
precision of the estimators in an
b) What are the sources of
experimental error? How can we control experiment.
it? (5)
Estimate the parameters involved in the fixed effect linear
model for Completely (
Randomized Design (CRD).
"ANOVA Ftest is an upper tailed test" - discuss the
Context of CRD. statement briefly in the (3)

a) Discuss the concept of Randomized Block Design (RBD) with


the associated layout. example. Show (5)
b) Mentioning the necessary assumptions, show how to analyze data in
RBD. (5)
a For arbitrary four treatments, find the (four) standard Latin squares.
Randomize (5)
any one of them.
) Why is LSD an incomplete three-way layout? Compare CRD, RBD and
LSD in (5)
terms of having/reducing error variation.
5. a Discuss Graeco Latin square design (GLSD) with a sitable example. (3)
b) What is a replicated Latin square design? What are the
advantages of this design (7)
over LSD? Describe the analysis procedure of a replicated latin square
design
where the row factors are different but column factors are same in all
squares.
Estimate single missing observation in RBD.
(6)
Show the analysis of data with a single missing value in RBD.
(4)
a)Find the relative efficiency of CRD relative to RBD. Based on your result, (9)
comment on the situation when we should conduct experiment in CRD rather
than RBD.
b "fmissing value occurs in RBD, property of orthogonality will be lost"- prove (6)
the theorem.

What is factorial experiment? Give a practical example. (3)


Consider two factors Aand B, both having two levels. Define the main effects (7)
and interaction effect by Yate's method. Also discuss the analysis procedure
while conducted in a RBD.

What is confounding in a factorial experiment? Explain why confounding is used (3)


in factorial experiment.
b) Choose two interaction effects to be simultancously confounded in a 25 factorial ()
experiment. Find out the generalized interaction and show the plan for the
.
experiment. Also, discuss the analysis procedure.
10 a) What is
asymmetrical
etfects and interactions byfactorial
contrast
experiment? Show the comnponents of main (4)
b) Why multiple method in a 2 ×3 factorial
comparisons are necessary in experiments. experiment.
least significance difference Discuss Fisher's (6)
method in short.

The End
University of Dhaka
Fourth Year B.S. (Hons.) Final Examination, 2021
Subject: Statistics
Course No: Stat H-404
Course Name: Econometrics
Full Marks: 70, Time: 04 Hours

All questions are of equal value. Answer any seven ofthe following questions.
1. (a) Define Econometics. Also, define Economic variable. Give some examples of (3)
Macroeconomic variables.
main aims of (1+3)
(b) What are the 3most important cconomic indicators? What are the
Eoonometrics? Discuss them.
(c) Consider the following models. (3)
Model A: Y,= a1 + aX t asXs t un
Model B: (Y,- X)=B + BXt BXsrt u2
estimates of as and
Check if the OLS estimates of aj and Band the OLS
Bs are same? Give reasons for your answers.
What is the relationship between az and ß2?
Why or why not?
III. Can you compare the R terms of the two models?

Discuss in detail model selection (1+3)


2. (a) What do you mean by model specification bias?
criteria.
examples. (3)
(b) Discuss different types of specification errors along with the proper
specification error (3)
there is
(c) Discuss the RAMSEY's RESET test" to test whether
in the odelor Dot.

multicollinearity (5)
What is multicollinearity? Discuss Farrar -Glauber test to detect
along with the hypothesis.
measure of
(b) How does a priori information can be helpful as a remedial (2)
muiticollincarity? Explain in detail.
(3)
(c) Suppose in the model:
Y = B + BzX + B3X31 t + BrX7i t ui
X to Xk are all uncorrelated.
i) What is the name of these variables (X2 to Xk)?
ii) What will be the structure,of the (X'X) matrix?
What will be the nature of the var-cov matrix of B?
iii)

(a) What is heteroscedasticity? Why do we want homoscedasticity? In which type of (5)


data where heteroscedasticity is common? Explain.
(b) Discuss Glejser Test for the detection of heteroscedasticity along with the (4)
hypothesis and decision rule.
(c) State with reaon wbether the following statement is true, false, or uncertain. (1)
Even though the disturbance term in the CLRM is not normally distributed, the
OLS estimators are still unbiased.

(a) What is autocorelation? Draw the graph which indicate positive autocorrelation (3)
and also draw the graph which indicate negative correlation.
(b) Discuss the Durbin-Watson test for the detection of autocorrelation along with (4)
the hypothesis.
(c) Find out the value of the error terms when it is autocorrelated with the first order (3)
autoregresive scheme.

Page 1 of 3
Explain with example.
(2)
variable?
What do you mean by limited dependent probability model for which we (5)
(b) What is the fundamental problem of lincar model, when we
logit
consider logit model? Discuss estimation procedure of
consider replicated data.
(c) From the household budget survey of 2000 of the Dutch Central Bureau of (3)
Statistics, JS. Cramer obtained the following logit model based on a sarmple of
2820 households. The purpose of the logit model was to determine car ownership
as afunction of (logarithm of) income. Car owhership was a binary variable:
Y=1, ifa household owns a car, zero otherwise.

-2.8+ 0.35 Inncome


(-3.35) (4.05)
x'(1 d.f.)= 16.681 (p-value-0.0000)

Where, Liestimated logit and where In income is the logarithm of income. The
x measures the goodness of fit of the model. Interpret the
estimated logit model.
Comment on the statistical significance of the estimated logit model.
(a) What do you mean by
which shows simuitaneoussimultaneous-equation
equations bias.
models? Discuss an example, (3)
(b) Show that simuitaneous relations
produce biased and inconsistent estimates. (4)
(c) Consider the following modified
Keynesian model of income determination: (3)
Cç= Bo + B11Y +ut
I,=B20 + Bz1Ye +B22Y-1 t u2t
Y,= C + lt + Ge
Where,
Consumption Expenditure
Investment Expenditure
Income
G=Govt. Expenditure
G and Y are assumed
predetermined.
Obtain the reduced form of the equations.

(a) Define identification. Explain with the example.


order condition for identification, in detail.
Discuss the rank condition and (5)
(b) Consider the following model:
R=Bo + BiM+ B2Y+ uit (4)
Y= t ajR+ u2t
where M (money supply) is exogenous, R is the interest rate
and Yis
GDP. ldentify each equation.
(c) G. Menges developed the following
econometric model for the West (1)
German economy:
YF Bot BY-t Blt uir

Cr Bot BY+ BrCri+ BaPt us

where Y= national income


1= net capital formation
-C= personal consunmption
Q- profits
P= cost of living index
R= industrial productivity

Page 2 of 3
t= time
u= stochastic disturbances
Which of the variables would you regard as endogenous and which as
exogenous?

followings: (4)
9. (a) What is residual analysis? Draw a residual plot in each case of the
i) Modl fits the data well.
ii) Model includes a square term.
iii) Model includes a square term as well as a cubic term.
iv) Heteroscedasticity is present in the model.
(b) In what situation we will use polynomial regression? Suppose that you have (3)
given a data. How you will understand that, we will use polynomial regression or
not in this data.
(c) If you have monthly data over a number of years, how many dummy variables (3)
will you introduce to test the following hypotheses:
a) Allthe 12 months of the year exhibit seasonal patterns.
b)Only February, April, June, August, October and December exhibit
Seasonal patterns.
In both hypotheses, consider the regression model including intercept term and
excluding intercept term.

Define Intrinsicaly linear regression model and Intrinsically non-linear


(2)
regression model with example.
(b) Discuss how to estimate the Exponential
(c) Define structural break. Discuss the
regression model, in detail. (4)
CHOW test for
points along with the hypothesis and decision rule. identifying structural break (4)

THE END.
University of Dhaka
Fourth Year B.S. (Honors) Examination, 2021
Subject: Statistics
Course: Stat H-405 (Survival Analysis)
Total Marks: 70 Time: 3 Hours
Answer any seven questions.
1.
a) What do you mean by 'Survival analysis"'? Write three real life situations (4)
where survival analysis can be conducted.
b) Define 'Event' and Time' in survival analysis. ldentify the event of interest (4)
and the outcome variable from the examples you mentioned above.
c) Construct a table of survival time data for five individuals based on any of (2)
the examples mentioned in question 1(a).

2 a) Distinguish between left censoring and right censoring with a suitable (2)
example.
b) Define probability density function, survival function and hazard function. (6)
Describe their characteristics.
c) How do you estimate the hazard function if there is no censored observation? (2)
3. a) How do you get the life-table cstimate of a survival function? How does it (4)
differ from Kaplan-Meier estimate?
b) Consider the following data on the survival times of some multiple myeloma (6)
patients.
6. 352». 10,4, 66,. 14, .4,16, 65,4. G59 10. 6.(5)
"S, 76+ 56)88, 24]514, 40+, 8, 18,5
Determine the life-table estimate of the survival function, and show the
estimated survival function graphically.

4. a) How do you find the Greenwood's formula? Show it in detail. (5)


b) Suppose a lifetime random variable X follows a certain distribution having a
bazard rate of the form , where a > 0, X > 8 and 8 > 0.
i) Find the pdf. (2)
ii) Determine the mean lifetime mentioning any required (3)
assumption.

S. a) Mention the problems that you encounter in finding the confidence intervals (4)
for values of the survivor function. How do you address these problems?
b) Time in weeks to discontinuation of the use of an intrauterine device (IUD) (6)
is given below. Let us call this IUD data.
10, 13+, 18+, 19, 23+, 30, 36, 38+, 54+, 56+, 59, 75, 93, 97, 104+, 107,
107+, 107+
Note that the time origin coresponds to the first day in which a woman uses
the IUD, and the end-point is discontinuation because of bleeding problems.
Using this data construct 95% pointwise confidence intervals of the
survivor function.

6. a) What is a Kaplan-Meier type estimate of the hazard function? Find its (5)
standard error.
b) Using the IUD data from Question 5(b), determine the Kaplan-Meier type
(5)
estimate of the hazard function. Also plot the estimated hazard function.
data with an (3)
of survival
comparing groups
7. a) Explain the impörtance of (7)
example. groups of survival data
comparing two
b) Find the log-rank test statistic for assumptions. How do you
hypotheses and underlying
clearly stating the groups of survival data?
extend this to compare three or more

function of lifetime random variable T having distribution (3)


8. a) Find the hazard
function with the following pdf
B(t - G)-'exp-1 (t-c)l, t > G, d >value 0, B>0.
f() = distribution. ()
b) Find the moment generating function of standard extreme
based on this, mention the
Hence find the mean of this distribution. Now,
mean of extreme value distribution.

9.
Let T be the lifetime random variable with pdf f(t) = (e-/9); 0,t > 0.
(6)
a) Under type I censoring, obtain the maximum likelihood estimator (MLE) 0.
Also find var(®).Show that the estimated standard error of® is /r,
where r = Sô; is the observed number of complete lifetimes.
b) Under type II censoring, ô = Y/r, where Y= T + (n-r)Tr). It is (4)
known that 2Y/0 follows x distribution with 2r df. Obtain 100(1- a)%
confidence interval of tp, the p-th quantile.
10. a) Define a location-scale model along with its survival function. Give some (3)
examples of location-scale and log-location-scale models.
b) Find the estimates of the parameters of location-scale model. How do you (7)
obtain the Wald type confidence intervals for the parameters of location-scale
model?

THE END.
University of Dhaka
Fourth Year B.S. (Honors) Final Examination, 2021
Subject: Statistics
Course: Stat H-406 - Stochastic Process
Time: 3 hours
Total Marks: 70

[Answer any 7 questions.]

I. (a) Define stochastic process. How does it differ from deterministic process? (4)
(b) Give examples of different types of stochastic processes based on time and state (4)
space.
(c) Explain state space and sample space with an example. (2)

ka) Define Markov Process and Markov Chain with examples. (4)
What is a transition probability matrix? What are the important properties of a (3)
transition probability matrix? lllustrate with an example.
(c) Weather can be classified as sunny, cloudy or rainy. If it is sunny ona given day, (3)
then on the following day it is cloudy with probability 0.3 and rainy with
probability 0.2. If it is cloudy on a given day, then on the following day it is sunny
with probability 0.3 and rainy with probability 0.4. If it is rainy on a given day,
then on the following day it is sunny with probability 0.2 and cloudy with
probability 0.5. If it is sunny today, what is the probability that it will be rainy on
the day after tomorrow? G9%
3 ) Explain time-homogeneous and time non-homogeneous Markov Chains. (4)
5 Let (Xi n = 0,1,2,} be a Markov Chain with state space S= (1, 2, 3}, initial (3)
distribution a = i4,5) and transition probability matrix
/0.2 0.2 0.6\
P=0.2 0.4 0.4
\0.1 0.2 0.7/
Obtain P(X = 2,X, = 3, X, = 1).

Ae) Refer to the problem above. Calculate P(X, = 2| X, = 1). (3)

4 (a) Define ireducible Markoy Chain, positive recurrent state and transient state with (3)
examples.
o) Define period ofaMarkov Chain with an example. When aMarkov Chain is called (3)
ergodic?
(e) Classify the states of a Markov Chain with the following TPM. Also, determine (4)
the period of each state.
/0.5 0 0.5\
0.3 0 0.3 0.4 0
0.1 0.2 0.1 0.2 0.1 0.1
P=
0 0.2 0 0.2 0.6 0
o2 o.2
0.6 0 0.2 0.2 0
G\0.5 0 0 0.5/

Explain Gambler's Ruin problem with an example. (4)


(b) You bought a share of stock for $12. Stock price moves $1 each day as a simple (4)
random walk. The probability of an increase is 0.55. What is the probability that
the stock price will reach $15 before decreasing to $5?
and two-dimensional random walks? Give examples. (2)
(c) What are one-dimensional

process? What are the characteristics of a queuing process? (4)


What is a queuing
the queuing system MMII.
Obtain the steady state solutions in case of
telephone booth follow Poisson Process with a mean of 10 per (6)
(b) The arivals at a
calls is exponential with mean
hour and the distribution of the length of telephone
2 minutes.
will find the
1. What is the probability that the ariving customer
telephone occupied?
What is the average length of the queue?
II. What is theaveragc waiting time for a customer?

(a) What is the diference between pure birth process and Yule process? (4)
o) Write down the probability distribution of the population size at time for a Yule (2)
process.
(c) In a toy store, the number of toys sold follows a Yule process with initial sale 2 (4)
and rate 0.5/hour.
i) What is the probability that, after 3 hours, the number of toys sold will
be 6?
(ii) What is the average number of toys sold during a 3-hour period? os.

(a) Define a birth-death process with an example. (3)


(b) Show that a pure death process is a special case of abirth-death process. (3)
(c) There are 10 boxes of candies in a department store. The sale rate depends on the (4)
number of boxes remaining. For one box the rate is 0.5/hour.
() What is the probability that, after 2 hours, the number of boxes remaining
will be 5?
(ü) What is the probability that no box will be left after 4hours? o 09,
(a) What is Poisson Process? Prove that the interval between two successive (5)
occurrences of Poisson Process. (N(t), t20} having parameter 2 has a negative
exponential distribution with mean and variance
(b) Customer arrivals at a store form a Poisson process with rate 2.5 per hour. What (5)
is the probability that exactly 3 customers will enter the store from 3:30 p.m. to
6:00 p.m.? Also, specify the distribution of inter-arrival time.
(4)
19. () Explain M/M/s queue with an example. Discuss its traffic intensityand existence
of the limiting distribution.
(b) A convenience store has 2 sales counters. People arrive in a Poisson process with (6)
a rate of 10/hour. Service time follows exponential distribution with mean 5
minutes.
() Under steady state condition, what is the probability that both counters
will be busy?
(ii) What is the average number of customers in the system?
time?
(ii) Given that a customer has to wait, what is his expected waiting

The End
University of Dhaka
Fourth Year B.S. (Honors.) Examination, 2021
Subject: Statistics
Course: Stat H-407 (Generalized Linear Models)
Total marks: 70 Time: 3 hours

Answer any 7(SEVEN) questions. Marks are given in the right margin.

1. () Consider asingle random variable Ywhose distribution depends on asingle parameter 8. (4)
Write down the form of the exponential family for f(y: 0) and hence show that Poisson(0)
is a one-parameter member of the exponential family.
(b) Extend the single-parameter exponential family form for the distribution with two (6)
parameters and hence show that N(u, o) is a 2-parameter member of the exponential
family.
2 ) What are the three components of a generalized linear model (GLM)? Explain the (3)
associated terms.
(3)
(b) Write down the general form of a GLM for the pdf/pmf of a distribution.
general
(c) Suppose Y~Poisson(). Show that the pmf of Ycan be written in the context of the (4)
form of a GLM.

d. (a) Briefly describe the role of the link funotion in a GLM. (2)
framnework with (8)
(b) Write down the name and form of the associated link function in the GLM
justification for the following:
() Y~Poisson(4),
(i) Y~Bernoulli(u) and
(iüi) Y~N(4, o').

(a) For a GLM prove that (8)


) E(Y) = b'(0) and
(ü) V() = a(p)b"(Ø)
by assuming Yis continuous with pdf fy:0, ).
(b) Suppose Y~Bernoulli (p). Justify the mean and variance properties ofa GLM. (2)

5. Let Y,.., Yy, .., Yn be n independent response variables from a Bernoulli distribution with
parameter n.
i) Show that this distribution belongs to the exponential family of distributions. Is it (4)
in canonical form?
ii) Identify the natural parameter. Argue that this natural parameter can be used as a
link function to construct the GLM for the given random variables. (3)
ii) Hence, find an expression for the mean of random variable using covariates and
parameters. (3)

a) Derive the score equation, used for the estimation of a GLM, by using the chain rule of (6)
differentiation.
b) Suppose Y~Bernoulli (0). Derive the asymptotic sampling distribution of the score ()
statistic.
the information matrix () in a GLM and hence (6)
for the elements of (4)
V (a) Derive the expression mentioning the diagonal matrix.
write I in matrix notation testing the entire parameter vector of a
GLM
Wald and score statistics for
(b) Write down
mentioning hypothesis of our interest.

) and Y's are independent. Let ß = (Bo,..B) be a parameter vector


Suppose Y;-Bin (r4, i.
observation
and x = (Xi1 .,X) is a covariate vector ofassociated link function in the context of a
(a) Write down the logistic model with the
GLM. (6)
(b) Wite down the log-likelihood for Band hence derive the deviance. (2)
(c) Consider the linear predictor n=a+Bx. How do you interpret B? Justify with
mathematical expression.

Suppose Y's are independent and Y~Poisson (4). Letß = (Bo, ...B) be aparameter vector
and x = (Xi1,..,X)' is a covariate vector for ith individual. under the saturated model. (4)
(a) Obtain the expression for the maximized log-likelihood tables (4)
(b) Derive the deviance and hence write down its expression used in the contingency
and log-liner models.
(2)
(c) Write down the expression for Pearson chi-squared standardized residuals.

10: Suppose that a logistic model having the dependent variable hypertension status (0, 1) and a set
of covariates age (continuous), smoking status SMK [O(non-smoker), 1 (smoker)], gender
[0 (female), 1 (male)], cholesterol level CHOL (Continuous), occupation 0CC [0 (non
worker), 1 (worker)] have been fitted and estimated coefficients are given in the following
table:
Table: Regression coefficients with standard errors for Logistic regression Model
SMK Gender CHOL OCC
Variable Constant Age -0.327
-0.691 0.142 0.356 0.472 0.269
Coefficient
0.035 0.127 0.205 0.142 0.128
Standard 0.231
Error
of
a. Assuming a follow-up study design, compute and interpret the estimated risk
developing hypertension for a 40-year-old male worker who also smokes with (2)
CHOL=150.

controlling all other (4)


b. Interpret the estimated logistic regression coefficient of gender
covariates. Conduct a test of hypothesis for this regression parameter (calculate p
construct a 99%
value). Do you think that gender is a risk factor? Why? Also,
confidence interval for the regression parameter. Hence, argue that whether gender is a
risk factor or not.

smoking controlling all


C. Compute and interpret the estimated odds ratio for the effect of
odds ratio? Hence, (4)
other covariates. Also, construct a 95% confidence interval for the
will you allow to
argue that whether smoking is a risk factor or not. What assumption
conclude that the estimated odds ratio is approximately a risk ratio estimate?

g)

The End.
Üniversity of Dhaka
Second Year B.S. (Honors) Final
Subject: Statistics
Examination, 2021
Course: Stat H-408 (Comprehensive)
Total Marks: 70 Time: 4 Hours

Answer qny 7(Seven) of the following Questions


a) Define statistics. Write down the characteristics of statistics. 2
b) What is scale of measurement? Describe and compare the basic 3
characteristics of four measurement scale.
c) For the following variables, identify the appropriate scale of measurement.
Also, write down which graphs can be used to represent these variable.
i) Economic status, ii) Eye color, ii) Wage, iv) Exam grade, v) Room
number, vi) Family size, vii) Temperature, vii) Race, ix) IQ score, and x)
Telephone number.
2. a) Define arithmetic mean (AM), median (Me), and mode (Mo). Identify the 4
appropriate measure of central tendency for the data of different
measurement scale. In skewed and open ended distribution, which measure
is appropriate?
b) What are the absolute measure of dispersion? Compare among them. 4
c) Describe Pearson and Bowler's measure of Skewness. 2

What do you mean by possibility and probability? Define mutually exclusive 3


and exhaustive events.
b) Define random variable. Write down the properties of probability 3
distribution/density function.
c) A bag contains 10 balls of which 4 are black. If 3 balls are selected at random
without replacement, obtain the probability distribution for the number of
black balls drawn.
a) The joint probability density of X and Y is given by

0sxs2,0 <y s 2
f(x, y)
0, Otherwise

i) Verify that it is a joint density function.


ii) Find the marginal density function of X.
ii) Find P[X> and
iv) Find the conditional density function of Y.
by An appliance dealer sells three different models of upright freezers having 3
14.5, 16.5, and 19.4 cubic feet of storage space, respectively. Let X= the
amount of storage space purchased by the next customer to buy a freezer.
Suppose that X has the following probability mass function:

X 14.5 16.5 19.4


P(Y) 0.3 0.5 0.2
Compute E(X), E(X') and V(K). The price of a freezer having capacity X
cubic feet is given by Y= 15X-7.5. Findthe expected price and variance
paid by the next customer to buy a freezer.
3
é Under what condition the Negative binomia! distribution turns into
geometric distribution. You are surveying people exiting from a polling
booth and asking them if they voted independent. The probability that a
person voted independent is 40%. What is the probability that 15 people must
be asked before you can find Speople who voted independent?
What is scater diagram? What purpose does it serve? Draw scatter diagram 4
for the following situation about two variables:

Page 1 of 3
r close to +1, iv) r positive but close to zero, v)
i)r= +1, i)r = -1, ii)
No linear relationship. 6
to model the association between mean daily
b) A restaurant owner wishes customers served (Covers). The oWner
Costs (Costs), and the number of seven days are shown in the
collected a week of data and the values for
accompanying table
2240 2410 2590 3060 >Y
2820
Costs 1000 2180
60 120 133 143 |175
175 >X
Covers0
a) Find which one is the dependent variable and which is the independent
variable.
6) Estimate the regression equation.
c) Suppose the number of customers served is 150. Calculate the fitted value
of y and corresponding residuals.
6. a) () Define Type Ieror and Type Ilerror with practical examples. Distinguish 4
between normal test and t-test.

b) Anew software package has been developed by Microsoft Ltd., which seeks 6
to reduce the time required by system analysts to design, develop and
implement information systems. To test the effectiveness of the software, a
random sample of 8 analysts using existing technology and another random
sample of 22 analysts who are trained to use the new software package are
selected. The two sets of analysts perform the work and the following results
were obtained:

Existing technology New software

Mean time (hours) 133 127


Standard deviation 15 18

What can be said about the underlying mean time of these two systems in
terms of developing and implementing an information system?
6) Define sampling unit and sampling frame. Write down the criteria of an ideal 3
sampling frame.
What do you mean by simple random sampling (SRS)? When SRS is used?
Suppose that a SRS of size 2 is drawn from a population of size 4 without
replacement. Also, suppose that values obtained from sampling units 1, 2, 3,
and 4 are 5, 10, 15, and 20, respectively.
Find the population mean.
ii) Find the all possible sample with sample values. Also, find the all
possible sample mean values.
iii) Show that sample mean is an anbiased estimator of population
mean.
iv) Find the sampling distribution of sample mean. Hence show that
it is an unbiased estimator of population mean.
Define stratified sampling. What are the allocation procedures used in
stratified sampling? Explain Neyman-allocation.
Let Tbe an estimator for 0. Then prove the following identity: 2
MSE(T) = var(T) + [bias(T)]"
b Suppose that Yi, Y2, ...Y, are independent iid Bin(m, ) random variables
where m is known.
i) Explain why Yfollows Bin (mr,9).
ii) Show thatis an unbiased estimator of .
ii) Find the variance of. What is the value of ksuch that, *(m-

s:an
unbiased estimator of this variance?
e Suppose that Yi, Yz, ...Y, are independent Poisson (2) random variables. 3

Page 2 of3

You might also like