Assignment3 05.01.24
Assignment3 05.01.24
Assignment 3
Due date: Friday, 05.01.2024 by 12noon
Question 1 is compulsory for all Groups.
1. Highly publicized salaries of corporate chief executive officers (CEO’s) in the US have
generated sustained interest in understanding the factors related to CEO compensation in
general. The EXCEL sheet contains data on the annual compensation of CEO’s of 50 large
publicly-traded corporations in the US in the previous year, as well as other information
that could be used to predict CEO compensation. An explanation of the entries in EXCEL
is as follows:
• Column 2 shows the total compensation of the CEO of each company. This
number includes direct salary plus bonuses, stock options, and other
compensation.
• Column 3 shows the number of years that the executive has been CEO of the
company.
• Column 4 shows the percentage change in the stock price of the company from
the previous year.
• Column 5 shows the percentage change in the company’s sales from the previous
year.
• Column 6 shows information on whether or not the CEO has MBA. A “1” is used
to indicate that the CEO has MBA, and “0” is used to indicate that the CEO does
not have an MBA.
a. Comment on the extent to which you think that the data shown in EXCEL sheet might or
might not be appropriate choice to use to predict the compensation.
b. Construct and run a regression model to predict CEO compensation as a function of the
independent variables indicated in EXCEL sheet.
c. Evaluate the regression output of your regression model. Is there evidence of multi-
collinearity?
d. Try constructing a smaller regression that uses fewer independent variables. You might
need to experiment with several different regression models using different combinations
of independent variables. Evaluate your best regression model by looking at the Adjusted
R2, the t-statistics, regression residuals, etc. Are you satisfied with your regression
model?
e. Now, try interacting the independent variables of your best regression to check whether it
leads to an improved model by looking at the t-statistics, F-statistics, Adjusted R2,
regression residuals, etc. Does the incorporation of the interactive terms lead to an
improved model?
f. Which are the critical factors that are good predictors of CEO compensation? Does
having an MBA have an effect on CEO compensation? Why or why not?
2. The credit exposure of a large financial institutions has increased in recent times. Standard
Lesotho Bank has just implemented the internal ratings based (IRB) foundation approach
of determining the likelihood of a customer defaulting. You have been tasked to develop an
appropriate model for estimating the probability of default of prospective loan applicants
(application scorecard) with the bank. You have been furnished with the following data set
on 20000 clients to assist you with the analysis.
Loan_status (1 = default, 0 = repaid)
Loan_amount (total loan amount in Maloti)
Rate (annual interest rate)
Emp_length (total years in employment)
Gender (1 = 𝑀𝑎𝑙𝑒, 0 = 𝐹𝑒𝑚𝑎𝑙𝑒)
Ratings (calculated score of creditworthiness, the higher the better)
Annual_income (Annual income in Maloti)
Age (Age in years since his/her birth)
i) What type of regression will be appropriate for the estimating the probability of
default of a prospective loan applicant?
The following are the regression run in Stata for which you modeled Loan_Status in terms
of the other variables terms.
Regression results
Number of obs = 20,000
LR chi2(6) = 569.30
Prob > chi2 = 0.0000
Log likelihood = -6371.6288 Pseudo R2 = 0.0428
ii) Based on your regression results above, write down the drivers of loan default of a
prospective loan applicant and explain your reason for your choice.
iii) The odds ratio of Loan_amnt is 0.9999866. What does this mean? What is an odd
ratio?
iv) How would you interpret the confidence interval for Int_rate?
v) You presented your results to your directors and they exclaimed that you left out one
important variable: Home Ownership. Do you believe their assertion? Explain your
answer.
vi) Suppose you grudgingly agreed to test Home Ownership on your results and that
available data for home ownership is in the form: RENT, OWN or Mortgage.
Explain how you will incorporate home ownership in your regression results.
vii) State two methods you would you use to assess the quality of fit of your results with
the inclusion of home ownership with your original results above?
Groups E, F, O, P, G, H, Q, R, T and V.
3. For a child i living in a particular school district, let 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 be a dummy variable equal
to one if a child is selected to participate in a school voucher program, and let 𝑠𝑐𝑜𝑟𝑒𝑖 be
that child’s score on a subsequent standardized exam. Suppose that the participation
variable, 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 , is completely randomized in the sense that it is independent of both
observed and unobserved factors that can affect the test score.
a. If you run a simple regression 𝑠𝑐𝑜𝑟𝑒𝑖 on 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 using a random sample of size n,
does the OLS estimator provide an unbiased estimator of the effect of the voucher
program?
b. Suppose you can collect additional background information, such as family income,
family structure (e.g., whether the child lives with both parents), and parents’
education levels. Do you need to control for these factors to obtain an unbiased
estimator of the effects of the voucher program? Explain.
c. Why should you include the family background variables in the regression? Is there
a situation in which you would not include the background variables?
Groups D, N, Q, F, I, P, S and U.
4. To test the effectiveness of a job training program on the subsequent wages of workers, we
specify the model,
log(𝑤𝑎𝑔𝑒) = 𝛽0 + 𝛽1 𝑡𝑟𝑎𝑖𝑛 + 𝛽2 𝑒𝑑𝑢 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝜀
where 𝑡𝑟𝑎𝑖𝑛 is a binary variable equal to unity if a worker participated in the program.
Think of the error term 𝜀 as containing unobserved worker ability. If less able workers
have a greater chance of being selected for the program, and you use an OLS analysis,
what can you say about the likely bias in the OLS estimator of 𝛽1? (Hint: Refer back to
Chapter 3 of the text.)
Groups C, H, M, R, E, J, O and T.
5. In the simple regression model under MLR.1 through MLR.4, we argued that the slope
estimator, 𝛽1, is consistent for b1. Using 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅ , show that 𝑝𝑙𝑖𝑚𝛽̂0 = 𝛽0. [You need
to use the consistency of 𝛽̂1 and the law of large numbers, along with the fact that 𝛽0 =
𝐸(𝑦) − 𝛽1 𝐸(𝑥1 )].
Groups B, I, L, A, D, G, K, N and U.