DADM Original cheat data
DADM Original cheat data
Q2
2. Categorical values are called as Qualitative and Numerical value are called as Quantitative variable
a. Q2
b. A second quartile.
C. =Quartile.inc(data,2)
d. A data point such that 50% of all data are above this
value. 4.
For each of the charts listed below, indicate how many variables from your
sample data (i.e., how many columns from your Excel spreadsheet) you need
to construct this particular chart.
Question Correct Match
Relative frequency histogram D.
1
Scatterplot E.
2
Bubble chart A.
3
Time series plot E.
2
Pie chart D.
1
ordinal. 9.
False
Response Feedback: ◆_:¨ No, we first need to create a summary table.
10 . Relative Frequency histogram is used to Illustrate the distribution of a quantitative variable. To
create a relative frequency histogram, You need 1 data from variables on the vertical axis of a relative
frequency histogram we have %/proportion/relative frequency/fraction of observation
11. Time series plot can be created only for a quantitative variable.to create a time series plot, you need
data from 2 variables.
12 For left skewed data, nearly always mean < median < mode
13. Quantitative variable can be turned into qualitative variables using the technique called
binning. Qualitative variables can be be turned into quantitative variables, but only for ordinal
qualitative variables.
14. Categorical variables can be turned into numerical variables but only when such categorical
variables are ordinal varaiable.
True
True
TRUE
False
TRUE
False
22.To model this type of relationship between X and Y, an appropriate regression equation is:
Predicted Y = a + b1 * X + b2 * X2
23. the two most popular graphical method to illustrate the distribution of a sample from a categorical
variables are Bar and PIE chart
Average()
Average.S
()
9. Which command computes 75% of data
Percentile.INC(data,0.75)
Quartile(data,3)
10. COUNTIF does what?
Count the number of data points that match 1 condition.
11. COUNTIFS does what?
COUNT the number of data points that match 1,2,3 conditions
12. To create a pie chart what we need to do?
Make the summary table then insert-> pie chart
13. Scatter plot illustrates the relationship between
2 quantitative variable
14. A common method to detect Outlier formally is:
Compute the Z-score and see if |Z| >3
See if any data point are outside UF and LF fence
15. Outlier is the data point that is -unusual
16. Sample std dev is to outliers
Sensitive
17. Z-score =-4.56
The data point is outlier.
18. Pivot tables can be used for categorical data and quantitative variables
True
It depends
19. Pivot chart can be only be bar graph
False
20. Pick up the method that will allow you to filter the data
—>0.4
31.
is 0.6 32.
The correlation between X and Y is 0.3
33.
37. Cheese consumption is positively correlated with # death of being tangled in bedsheet.
It is spurious relationship
47. In multiple linear regression, we evaluate the predictive power of linear model
by looking at R square adjusted and r square
49. In multiple linear regression , when we add new variable the R-sqaure
always goes up? True
50. In multiple linear regression , when we add new variable the R-sqaure
adjusted always goes up?
False
52. Yhat= a+b*Dummy, where dummy is :0=Female and 1 = Male what is the
value of dummy coefficient b?
ybar(male)
2. Y = MX+ C
For interpreting intercept consider X = 0 , So the C is the intercept value for example
When a borrower has zero years of education, FICO score is predicted to be 631.17.
For intercepting slope consider x increasing so the M is the value of slope for example
For every additional year of education, FICO score is predicted to decrease by 0.5153.
2. Use Index to find the location of data it located in row and column
1. Use an appropriate Excel command to figure out which row number contains Item No. 9966.
= MATCH ( 9966 , C:C , 0 )
+ 4.64 * CUSTOMER_Occasional
+ 2.81 * CUSTOMER_Frequent
− 0.70 * Num. Lipsticks
Solution Y= -0.59+2.05*1+4.64*0+2.81*1-0.70*6=0.07
Logit = EXP(Y)/(1+EXP(y))
2. X can take values 1, 2, and 3. Respective probabilities are 0.455, 0.311, and 0.234.
This X variable is discrete
3. You are a risk-averse investor. (You avoid risk.) You prefer to pick two stocks that are...
negatively correlated
4. You are a risk-averse investor. You invest in 2 stocks. Your portfolio risk will ...
go up if the two stocks are positively correlated, go down if the two stocks are negatively
correlated
5. You are a risk-averse investor. You invest in 2 stocks. Your portfolio expected return will ...
not change even if the stocks are correlated
10. X follows Normal distribution with mu=10 and sigma=2. (Hint: Use Empirical Rule)
P(X > 14) ≈ 0.025
14. In Central Limit Theorem, sample size is considered large when n ≥ 30.
True
15. The distribution of XBAR is ________ for larger samples (large n).
Narrower
16. Prob(MU-1 < XBAR < MU+1) is ________ for larger samples.
Higher
17. The probability that XBAR is within 10 points of MU depends on the value of MU.
FALSE
18. Confidence interval that we learnt today is an interval for ____________ (xbar / mu).
population mean
19. We estimate population mean ________ (more / less) accurately if we use data from a
larger sample.
More
20. Confidence interval for MU is wider if it's based on a larger sample size. (true / false)
FALSE
Look at the formula for the margin of error: Zα/2×δ/√n . Sample size n is in the denominator,
so, if n is smaller then the margin of error is higher. So, the confidence interval is wider if it's
based on a smaller sample.
21. Confidence interval for MU is wider if confidence level is higher. (true / false)
TRUE
formula. Higher conf. level ⇨ higher Z. (Recall from lecture: you are 99.999% confident that
C.I. for MU is wider if confidence level is higher." Conf. level determines the Zα/2 value in the
22. Confidence interval for MU is: [$3,000 to $5,000]. Margin of error = ______________ .
$1,000
23. Confidence interval for MU is: [$3,000 to $5,000]. Sample mean (XBAR) =
______________ .
$4,000
Recall: XBAR is the point estimate of MU and lies exactly in the center of the interval. So, it's
4,000.
24. We can estimate the population mean (μ) more accurately if... (pick any one that is
correct)
we have collected a large sample of data, the confidence interval is narrow
26. The confidence interval in the previous question was an interval for ...
population mean
Confidence interval is ALWAYS for population something (e.g., population mean, difference
in population means).
27. The interval [-13.5 , 28.7] is the confidence interval for... (2 correct answers)
Population mean, Difference between population means
28. 95% confidence interval for mu1-mu2 is: [0.53, 4.79]. What's the conclusion?
Population mean is higher for group 1 than for group 2
29. 95% confidence interval for mu1-mu2 is: [-1.79, -0.05]. What's the conclusion?
Population mean is higher for group 2 than for group 1
31. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
collect larger samples
32. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
decrease confidence level
33. We have 2 INDEPENDENT samples. n1=10, n2=15. What degrees of freedom should we
use to construct C.I. for μ1-μ2? (number)
23
34. We have 2 MATCHED samples. n1=10, n2=10. What degrees of freedom should we use
to construct C.I. for μ1-μ2? (number)
9
We have 2 matched samples. Each pair of data is an observation. We have a total of 10
observations (pairs that can't be broken). d.f.= n-1 = 10-1 = 9. If you're not convinced,
review how we solved the Sales Presentations problem and took differences.
35. Which of these hypotheses is the RESEARCH HYPOTHESIS (i.e., the hypothesis that we
want to test)?
Alternative hypothesis
42. In regression output, when p-value is close to 0, we say that this explanatory variable
this variable's coefficient is statistically significant, is a good linear predictor of the
dependent variable
43. In regression output, when p-value is high, we say that this explanatory variable
is not statistically significant, is a poor linear predictor of the dependent variable
46. To create a cubic trend model, the explanatory variables in time-series regression must
include:
Trend t, t2, t3
47. Pick ALL MODELS that have a non-linear trend. (TO GET FULL CREDIT, YOU NEED TO
CLICK ON ALL CORRECT ANSWERS.)
Cubic trend model, Quadratic trend model, Exponential trend model
49. To forecast this monthly sales data < picture >, regression should include ________ .
11 dummies
51. To forecast this quarterly Amazon sales, regression should include _____________
trend and 3 dummies
52. To capture the evolution of this stock price data, we need to include ________ trend.
53. For the HOUSE CONSTRUCTION regression model from today's lecture, the interaction
variable was:
trend * before 2005/after 2005 dummy
54. Autoregressive model AR(5) means that the model includes __________ lags. (type your
numerical answer)
5