Practice Multiple Choice Questions and Feedback - Chapters 1 and 2
Practice Multiple Choice Questions and Feedback - Chapters 1 and 2
If a researcher uses daily data to examine a particular problem and creates a variable that
1 assigns a numerical value of 1 to Monday observations, what term would best describe this
type of number?
a) Continuous
b) Cardinal
c) Ordinal
d) Nominal
Correct! This would be a good example of a nominal number, since it does not even produce an
ordering - the numbers assigned to each day of the week are entirely arbitrary. There is no sense that
Tuesday is "better" than Monday because it is assigned a higher value. We could instead and equally
validly have assigned the value 5 to Monday, 4 to Tuesday, and so on. Clearly since the numbers
assigned to the days of the week would only comprise 5 values, we would not term it a continuous
variable.
Which of the following is NOT a feature of continuously compounded returns (i.e. log-
3 returns)?
a) They can be interpreted as continuously compounded changes in the prices
b) They can be added over time to give returns for longer time periods
c) They can be added across a portfolio of assets to give portfolio returns
d) They are usually fat-tailed
Correct! Log-returns can indeed be interpreted as continuously compounded changes in the price or
index value over time. This is useful since it means we don't have to worry about the compounding
frequency. Log-returns can also be added up over time, so that the return over a year is simply the
sum of the daily returns for all trading days in that year. Asset returns are usually fat-tailed
(leptokurtic), and this is true whether they are measured as log-returns or simple returns. However,
log-returns cannot be aggregated across a portfolio to get a portfolio return. This would be possible
with simple returns but does not work for log-returns because taking the log is a non-linear
transformation process. Therefore the sum of a log is not the same as the sum of a log. In order to
calculate portfolio log-returns, it is necessary to calculate the value of the whole portfolio first at each
point in time and then to take the log of the portfolio price changes.
Which of the following are alternative names for the dependent variable (usually denoted by
4 y) in linear regression analysis?
Which of the following are alternative names for the independent variable (usually denoted by
5 x) in linear regression analysis?
Correct! The independent variable, usually denoted by x, is also known as the regressor or the causal
variable. The regressand and effect variable are alternative names for y.
Which of the following statements is TRUE concerning the standard regression model?
6 a) y has a probability distribution
b) x has a probability distribution
c) The disturbance term is assumed to be correlated with x
d) For an adequate model, the residual (u-hat) will be zero for all sample data points
Correct! Since y depends on u as well as x, and since u is a random variable, y will also be a random
variable. x is assumed to be non-stochastic, i.e. to be fixed and it is therefore not a random variable.
Since x is assumed to be non-stochastic, it cannot be correlated with a random variable u, otherwise
it would be stochastic! A good model would be one where the residuals are as close to zero as
possible. However, unless there is a perfect relationship between y and x (i.e. all of the points lie on a
straight line), the residuals cannot all be zero.
Which one of the following statements best describes the algebraic representation of the fitted
9 regression line?
a)
b)
c)
d)
Correct! The fitted value for y is obtained by taking the value of the explanatory variable for a
particular observation, multiplying it by the slope estimate and adding the intercept estimate. This
then gives a value for y-hat from the fitted line for that observation. The answers for a and c are not
plausible equations for anything, since the fitted value from the regression model cannot include
either a residual or a disturbance in its calculation. The equation in d is a valid equation that splits the
actual value y into a part that is explained by the model and a part which the model cannot explain
(the residual). However, the equation in d is not the equation for the fitted value.
Which of the following statements concerning the regression population and sample is
10 FALSE?
Which of the following statements is true concerning the population regression function
11 (PRF) and sample regression function (SRF)?
a) The PRF is the estimated model
b) The PRF is used to infer likely values of the SRF
c) Whether the model is good can be determined by comparing the SRF and the PRF
d) The PRF is a description of the process thought to be generating the data.
Correct! The PRF is the true population model for the relationship between the variables x and y.
Some researchers draw a distinction between the PRF and data generating process, but the two
terms have been used synonymously on this course. The sample is used to estimate a SRF, which
is used to determine what are the likely values of the population parameters described by the PRF.
Therefore a, b, and c are false and d is a true statement.
Which of the following models can be estimated using OLS, following suitable
12 transformations if necessary? (Note that "e" denotes the exponential).
i)
ii)
iii)
iv) .
a) (i) only
b) (i) and (iii) only
c) (i), (iii), and (iv) only
d) (i), (ii), (iii), and (iv)
Correct! In fact, all of models (i) to (iv) can be estimated using OLS, following suitable
transformations where necessary. Clearly (i) is simply the standard model. For (ii), creating a new
variable (call it z) as z = e^x, would give the standard model as a regression of y on a constant and
z. In (iii), substituting Y = ln(y) and X = ln(x) and regressing Y on a constant and X would again
give the standard model. Finally, to estimate (iv), set z = x^2, and regress y on a constant and z.
Which of the following is an equivalent expression for saying that the explanatory variable
13 is "non-stochastic"?
If an estimator is said to have minimum variance, which of the following statements is NOT
15 implied?
a) The probability that the estimate is a long way away from its true value is minimised
b) The estimator is efficient
c) Such an estimator would be termed "best"
d) Such an estimator will always be unbiased
Correct! An estimator that has minimum variance would also be defined as efficient and "best" -
these terms are equivalent to one another. A minimum variance estimator means that the sampling
variation in the parameter estimates between one sample and another will be minimised. This is
also equivalent to stating that the probability that the estimate for any given sample is a long way
off from its true value will be minimised. An estimator can have minimum variance but be a biased
estimator. Typically there is an implicit trade off between choosing an unbiased but inefficient
estimator and choosing an estimator with a smaller variance that is biased.
Consider the OLS estimator for the standard error of the slope coefficient. Which of the
16 following statement(s) is (are) true?
(i) The standard error will be positively related to the residual variance
(ii) The standard error will be negatively related to the dispersion of the observations on
the explanatory variable about their mean value
(iii) The standard error will be negatively related to the sample size
(iv) The standard error gives a measure of the precision of the coefficient estimate.
Correct! All of statements (i) to (iv) are true. The bigger the residual variance is, the bigger must be
the RSS, and therefore the further away are the points from the line. Therefore, the bigger the
residual variance is, the bigger will be the coefficient standard errors. This can bee seen since the
term "s" appears positively in the standard error formulae for the intercept and the slope. The more
dispersed are the observations on the explanatory variable (x) about its mean value, the more
precisely the coefficient estimates can be calculated since we would have information about the
relationship between y and x over a wider range of values for x. In the formulae, the variation of x
about its mean value enters into the denominator for both the slope and the intercept standard
errors, so the bigger the dispersion is, the smaller will be the standard errors. The bigger the
sample size, the more pieces of information are available from which to estimate the model
parameters. The number of observations appears explicitly in the formula for the intercept
standard error and implicitly in the formula for the slope standard error. In the latter case, the
standard error is inversely related to the sample size since the sum of the squares of the
observations on x about their mean value appears in the denominator, and the larger the sample
size is, the more terms will be included in this sum.
Which of the following statements is INCORRECT concerning the classical hypothesis testing
17 framework?
a) If the null hypothesis is rejected, the alternative is accepted
b) The null hypothesis is the statement being tested while the alternative encompasses the
remaining outcomes of interest
c) The test of significance and confidence interval approaches will always give the same
conclusions
d) Hypothesis tests are used to make inferences about the population parameters.
Correct! Hypothesis tests are used to make statements about the plausibility of certain values for
the population parameters given the estimates made from the sample. By definition, the null
hypothesis is the statement being tested while the alternative encompasses other outcomes of
interest. The test of significance and confidence interval approaches will always give the same
answer (so long as a fixed significance level is used for both) since one can be viewed as just a
rearrangement of the other. It is never said that the alternative hypothesis is accepted. The reason
that this is not done is that, in general terms, it is possible to reject a null hypothesis without the
alternative hypothesis being correct. Therefore a is the only incorrect statement.
Suppose that a hypothesis test is conducted using a 5% significance level. Which of the
18 following statements are correct?
(iii) 2.5% of the total distribution will be in each tail rejection region for a 2-sided test
(iv) 5% of the total distribution will be in each tail rejection region for a 2-sided test.
Consider an identical situation to that of question 21, except that now a 2-sided alternative
19 is used. What would now be the appropriate conclusion?
a) H0 is rejected
b) H0 is not rejected
c) H1 is rejected
d) There is insufficient information given in the question to reach a conclusion
Correct! Now, if a 2-sided test is used, the test statistic would still take the same value, and
rejection would occur if the test statistic fell in either region. Since the 5% 2-sided critical values
are close to -2 and +2, the statistic is clearly now in the rejection region, and hence a is correct.
Which one of the following would be the most appropriate as a 95% (two-sided) confidence
20 interval for the intercept term of the model given in question 21?
a) (-4.79,2.19)
b) (-4.16,4.16)
c) (-1.98,1.98)
d) (-5.46,2.86)
Correct! Recall that the formula for estimating a confidence interval for the intercept parameter
would be
putting the relevant terms would give the interval in this case as
(-1.3-1.98X2.1, -1.3+1.98X2.1) or (-5.46,2.86). Therefore d is the correct answer. Errors that you
could have made would include using the one-sided 5% critical value, which would be about 1.66
instead of 1.98. This would have given answer a. The second possible error would be to forget to
add in the coefficient value, so that the interval would be wrongly calculated as (-1.98X2.1,
1.98X2.1), which would give answer b. Answer c would have been obtained if the critical values
alone had been used!
Which one of the following is the most appropriate definition of a 99% confidence interval?
21
a) 99% of the time in repeated samples, the interval would contain the true value of the
parameter
b) 99% of the time in repeated samples, the interval would contain the estimated value of the
parameter
c) 99% of the time in repeated samples, the null hypothesis will be rejected
d) 99% of the time in repeated samples, the null hypothesis will not be rejected when it was
false
Correct! Although from a philosophical perspective, some researchers would disagree with this
definition, on this course a 99% confidence interval is taken to mean that 99% of the time in
repeated samples, the interval would contain the true parameter value. Thus a is correct. Of
course, by construction the interval will always contain the parameter estimate exactly in the
middle, so b is incorrect. For a 99% confidence interval, we can say that 99% of the time the null
would not be rejected when the null was correct (i.e. we made the right decision), which is not the
formulation of d, so d is incorrect. We cannot say how often the null hypothesis will be rejected - it
depends on whether it is right or wrong! All we could say is how often the null would be rejected as
a result of chance alone. Therefore c is incorrect.
Suppose that a test statistic has associated with it a p-value of 0.08. Which one of the
23 following statements is true?
(i) If the size of the test were exactly 8%, we would be indifferent between rejecting and
not rejecting the null hypothesis
(ii) The null would be rejected if a 10% size of test were used
(iii) The null would not be rejected if a 1% size of test were used
Suppose that observations are available on the monthly bond prices of 100 companies for 5
24 years. What type of data are these?
a) Cross-sectional
b) Time-series
c) Panel
d) Qualitative
Correct! Since the data have the dimensions of both time series (5 years of observations) and of
cross-sections (100 companies), this would be known as a panel data set. A cross-sectional series
would not have data over a period of time, while a time-series data set would use information on
one company at a time. Bond prices are clearly an example of quantitative rather than qualitative
data, since they can take on any (non-negative) values and are not constrained to take on only
certain values as qualitative data would be.