0% found this document useful (0 votes)
1 views

DADM Original cheat data

The document covers various statistical concepts including the differences between population and sample, types of variables, and methods for data visualization. It discusses skewness in data, regression analysis, and the use of Excel for statistical computations. Additionally, it touches on confidence intervals, outliers, and the interpretation of correlation and regression results.

Uploaded by

Sushma Deshmuk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

DADM Original cheat data

The document covers various statistical concepts including the differences between population and sample, types of variables, and methods for data visualization. It discusses skewness in data, regression analysis, and the use of Excel for statistical computations. Additionally, it touches on confidence intervals, outliers, and the interpretation of correlation and regression results.

Uploaded by

Sushma Deshmuk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

1. What is bigger? Population or sample?

Q2

Population is bigger than sample.

2. Categorical values are called as Qualitative and Numerical value are called as Quantitative variable

3. 50 percentile is also known as

a. Q2

b. A second quartile.

C. =Quartile.inc(data,2)

d. A data point such that 50% of all data are above this

value. 4.

For each of the charts listed below, indicate how many variables from your
sample data (i.e., how many columns from your Excel spreadsheet) you need
to construct this particular chart.

Question Correct Match
Relative frequency histogram D.
1
Scatterplot E.
2
Bubble chart A.
3
Time series plot E.
2
Pie chart D.
1

5. Left-skewed data means that, compared to the main bulk of


the data, there are very few data points that are low in
magnitude.

Right-skewed data means that, compared to the main bulk of


the data, there are very few data points that are high in
magnitude
7.

The charts listed below are useful to plot the


relationship between which types of variables
and how many?
Question Correct Match
Bubble
3
quantitativ
Scatter
plot 2
quantitativ
Stacked
histogra 1 categorical
and 1 numerical
Stacked
box plot 1 categorical
and 1 numerical

Stack bar chart 1 categorical


and 1
numerical
Stacked 2
Pire chart
categorical.

8. Categorical variables are classified into Nominal and

ordinal. 9.

To create a bar graph in Excel, we do the following:


 Highlight the sample data column that we want to chart
 On the top, we go to INSERT
 Click on the column/bar chart icon
Selected Answer:
True
Answers: True

False
Response Feedback: ◆_:¨ No, we first need to create a summary table.
10 . Relative Frequency histogram is used to Illustrate the distribution of a quantitative variable. To
create a relative frequency histogram, You need 1 data from variables on the vertical axis of a relative
frequency histogram we have %/proportion/relative frequency/fraction of observation

11. Time series plot can be created only for a quantitative variable.to create a time series plot, you need
data from 2 variables.

12 For left skewed data, nearly always mean < median < mode

13. Quantitative variable can be turned into qualitative variables using the technique called
binning. Qualitative variables can be be turned into quantitative variables, but only for ordinal
qualitative variables.

14. Categorical variables can be turned into numerical variables but only when such categorical
variables are ordinal varaiable.

15. Pivot chart can be a time series plot.

True

16. Pivot chart can be a pie chart.

True

17. Independence implies no linear association

TRUE

18. Zero correlation implies independence.

False

19 Independence implies zero correlation.

TRUE

20. No linear association implies independence.

False

21. In regression analysis, interaction is…


a way to capture how the effect of an explanatory variable on the dependent
variable varies depending on the values of another explanatory variable.

22.To model this type of relationship between X and Y, an appropriate regression equation is:

Predicted Y = a + b1 * X + b2 * X2

23. the two most popular graphical method to illustrate the distribution of a sample from a categorical
variables are Bar and PIE chart

24. For absolute z score if value is between 0 to +2 then it is not an outlier


If z score is >2 but <=3 then it is possible outlier

If it is >3 then it is possible outlier

1. Strongly agree/Agree/Neutral/Disagree/Strongly Disagree is an example of


Ordinal Categorical
2. Housing prices is an example of what kind of data?
Numerical Continuous
3. Dummy Variable is:
Dichotomous
variable 0/1
variable
Categorical variable that takes two values
4. Mean>Median:
Right Skewed
5. Median>Mean:
Left Skewed
6. Sample mean is to outliers
sensitive
7. Sample Median is to outliers
Not sensitive
8. Which command in excel compute this

Average()
Average.S
()
9. Which command computes 75% of data
Percentile.INC(data,0.75)
Quartile(data,3)
10. COUNTIF does what?
Count the number of data points that match 1 condition.
11. COUNTIFS does what?
COUNT the number of data points that match 1,2,3 conditions
12. To create a pie chart what we need to do?
Make the summary table then insert-> pie chart
13. Scatter plot illustrates the relationship between
2 quantitative variable
14. A common method to detect Outlier formally is:
Compute the Z-score and see if |Z| >3
See if any data point are outside UF and LF fence
15. Outlier is the data point that is -unusual
16. Sample std dev is to outliers
Sensitive
17. Z-score =-4.56
The data point is outlier.
18. Pivot tables can be used for categorical data and quantitative variables
True
It depends
19. Pivot chart can be only be bar graph
False
20. Pick up the method that will allow you to filter the data

21. What is excel filter tool used for?


Filter data, detect data entry, sort
22. Which excel command is used to merge two
data set? Vlookup, XLookup
23. Data Validation tool can be used in
excel to Detect data entry
Make sure that data is in correct format.
24. Correct way to use Date command in
excel Date(YYYY,MM,DD)
25. Independence means zero
correlation True
26. Correlation=0 means
independence False
27. When correlation= 0 that means X and Y are not related to
each other. False
28. When correlation =0 that means there is no linear relationship
between X and Y True

 Right or left Skewed

# Positive and negative skewed


# Types of Qualitative and Quantitative

# Uni and bimodal


# Time series and cross sectional data

29 .Regression value of following graph


Correlation between X and Y =0

30. The correlation between X and Y

—>0.4

31.

The correlation between X and Y

is 0.6 32.
The correlation between X and Y is 0.3

33.

The correlation between the X and Y is 0


34.

The correlation between X and Y is -0.4

35. r= -0.85 what is the correct interpretation


The linear relationship between X and Y is negative but strong.

36. Which of the following is not true about


correlation. Correlation implies causation

37. Cheese consumption is positively correlated with # death of being tangled in bedsheet.
It is spurious relationship

38. Spurious relationship occurs when


2 variables are wrongly assumed to be related.

39. Dependent variable = Sales($000), predicted = Advertising($00). Yhat=


1.02+2.73x A regression equation
A predictive model

40. Dependent variable = sales($000). Predictor =


advertising($000) Yhat = 1.02+2.73 x
Interpret the intercept
When the advertising expense = $0then the sales is predicted to be 1020

41. Interpret the slope for 100$


Sales are predicted to go up by $2730

42. Yhat= 1.02+2.73 Y(sales)($00), X=advertsing($00). We invest $800 on


advertising. Projected Sales= $22860

43. Yhat= 1.02+2.73X. For this Rsquare =0.96

This is good linear predicted model.


We can use advertising expenditure to predict sales well.

44. Y= beer sales($000), outdoor temperature(F). What is


yhat= 15+10x When temperature is 1F, the sales is predicted
to be $15000

45. Y= beer sales($000), outdoor temperature(F). What is yhat=


15+10ln(x) When temperature increase by 1%, sales go up by
$100

46. A multiple linear regression there are several linear regression


equation False

47. In multiple linear regression, we evaluate the predictive power of linear model
by looking at R square adjusted and r square

48. In multiple regression R-square always


> R square adjusted

49. In multiple linear regression , when we add new variable the R-sqaure
always goes up? True

50. In multiple linear regression , when we add new variable the R-sqaure
adjusted always goes up?

False

51. Yhat= a+b*Dummy, where dummy is :0=Female and


1 = Male Intercept = ybar(female)

52. Yhat= a+b*Dummy, where dummy is :0=Female and 1 = Male what is the
value of dummy coefficient b?
ybar(male)

53. Dummy shows the difference in slope, holding all else


constant False

54. Dummy shows the difference in slope, holding all else


constant True

55. Interaction variable captures the difference in , holding all else


constant. Slopes

56. The relationship between X and Y shown in scatterplot is


Quadratic, Second degree polynomial

57. X variable, India, China, USA, Russia, Korea. How many


dummies 4,3,2,1

58. Y=quarterly sales


X-variables =Price,Country of origin. You want to see sales per
quarter. Create 4 quarterly dummies and include any 3 in
regression

59. Predicted sales($000) = 12+3*Mon-2*Tue+4*Wed+6*Thurs+10 Fri-5*


Sat+10.75*Temp On friday, we predict the sales to go up by 4k
On tue, we predict the sales to be 2k lower on Sunday

60. Which of the following cannot be modeled using Logistic


regression Y= starting salary
Y= ln(starting salary)

61. Mark anyon example where logistic regression can be appropriate

62. Y(1=Like,0=Dislike) is linearly dependent on explanatory variable


False

63. Logit is linearly dependent on explanatory


variables. True

64. We can tell X increases The probability of


Y=1 when Coefficient of X is positive
Types of Relationship

DADM cheat sheet


Tuesday, March 7, 2023
1:08 PM

Slope and intercept

1. Dependent is Y and other is X


For Slope =Slope(Known y column, Known X column
For intercept =Intercept(Known y column, Known X column

2. Y = MX+ C
For interpreting intercept consider X = 0 , So the C is the intercept value for example
When a borrower has zero years of education, FICO score is predicted to be 631.17.

For intercepting slope consider x increasing so the M is the value of slope for example

For every additional year of education, FICO score is predicted to decrease by 0.5153.

MATCH AND INDEX


1. Use Index to find the data in column or
row 1.
Use an appropriate Excel command to extract the value located in cell C15.
= INDEX ( C:C , 15 , 1 )

2. Use Index to find the location of data it located in row and column
1. Use an appropriate Excel command to figure out which row number contains Item No. 9966.
= MATCH ( 9966 , C:C , 0 )

+ 4.64 * CUSTOMER_Occasional
+ 2.81 * CUSTOMER_Frequent
− 0.70 * Num. Lipsticks

 She is currently in a relationship,


 She is a frequent customer of Sephora,
 She has purchased 6 lipsticks from Sephora in the past three years.

Solution Y= -0.59+2.05*1+4.64*0+2.81*1-0.70*6=0.07

Logit = EXP(Y)/(1+EXP(y))

Time series and other types


# 1 entry for each year time series
# multiple entry for each year is timeseries cross sectional data

Right and left Skewed


Types of Qualitative and Quantitative
1.X can take values 1, 2, and 3. Respective probabilities are: 0.5, 0.3, and 0.1.
Impossible because probabilities do not add upto 10

2. X can take values 1, 2, and 3. Respective probabilities are 0.455, 0.311, and 0.234.
This X variable is discrete

3. You are a risk-averse investor. (You avoid risk.) You prefer to pick two stocks that are...
negatively correlated

4. You are a risk-averse investor. You invest in 2 stocks. Your portfolio risk will ...
go up if the two stocks are positively correlated, go down if the two stocks are negatively
correlated

5. You are a risk-averse investor. You invest in 2 stocks. Your portfolio expected return will ...
not change even if the stocks are correlated

6. X = quarterly sales of pizza. Y = annual sales of pizza. Express Y in terms of X.


Y=X+X+X+X
7. Normal distribution is continuous.
True

8. X follows Normal distribution with mu=10 and sigma=2.


P(X ≥ 15) is equal to P(X > 15)

9. X follows Normal distribution with mu=10 and sigma=2. MEDIAN = __10_____

10. X follows Normal distribution with mu=10 and sigma=2. (Hint: Use Empirical Rule)
P(X > 14) ≈ 0.025

11. Parameter = a characteristic of the entire population; parameter is a constant. Variable


= a characteristic that changes from one sample to the next.

12. Population mean is a .....................


Parameter

13. Sample mean is a .....................


Variable

14. In Central Limit Theorem, sample size is considered large when n ≥ 30.
True

15. The distribution of XBAR is ________ for larger samples (large n).
Narrower

16. Prob(MU-1 < XBAR < MU+1) is ________ for larger samples.
Higher

17. The probability that XBAR is within 10 points of MU depends on the value of MU.
FALSE

18. Confidence interval that we learnt today is an interval for ____________ (xbar / mu).
population mean

19. We estimate population mean ________ (more / less) accurately if we use data from a
larger sample.
More

20. Confidence interval for MU is wider if it's based on a larger sample size. (true / false)
FALSE

Look at the formula for the margin of error: Zα/2×δ/√n . Sample size n is in the denominator,
so, if n is smaller then the margin of error is higher. So, the confidence interval is wider if it's
based on a smaller sample.

21. Confidence interval for MU is wider if confidence level is higher. (true / false)
TRUE

formula. Higher conf. level ⇨ higher Z. (Recall from lecture: you are 99.999% confident that
C.I. for MU is wider if confidence level is higher." Conf. level determines the Zα/2 value in the

the first person you see will be 4 to 9 feet tall.)

22. Confidence interval for MU is: [$3,000 to $5,000]. Margin of error = ______________ .
$1,000
23. Confidence interval for MU is: [$3,000 to $5,000]. Sample mean (XBAR) =
______________ .
$4,000
Recall: XBAR is the point estimate of MU and lies exactly in the center of the interval. So, it's
4,000.

24. We can estimate the population mean (μ) more accurately if... (pick any one that is
correct)
we have collected a large sample of data, the confidence interval is narrow

25. 99% C.I. for mu is: [ 3 , 10 ]. Interpret this interval.


I'm 99% confident that population mean (mu) is between 3 and 10., The probability that
population mean (mu) is between 3 and 10 is 0.99

26. The confidence interval in the previous question was an interval for ...
population mean
Confidence interval is ALWAYS for population something (e.g., population mean, difference
in population means).

27. The interval [-13.5 , 28.7] is the confidence interval for... (2 correct answers)
Population mean, Difference between population means

28. 95% confidence interval for mu1-mu2 is: [0.53, 4.79]. What's the conclusion?
Population mean is higher for group 1 than for group 2

29. 95% confidence interval for mu1-mu2 is: [-1.79, -0.05]. What's the conclusion?
Population mean is higher for group 2 than for group 1

30. 95% confidence interval for mu1-mu2 is: [-0.05, 10.84]


. What's the conclusion?
Inconclusive

31. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
collect larger samples

32. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
decrease confidence level

33. We have 2 INDEPENDENT samples. n1=10, n2=15. What degrees of freedom should we
use to construct C.I. for μ1-μ2? (number)
23

34. We have 2 MATCHED samples. n1=10, n2=10. What degrees of freedom should we use
to construct C.I. for μ1-μ2? (number)
9
We have 2 matched samples. Each pair of data is an observation. We have a total of 10
observations (pairs that can't be broken). d.f.= n-1 = 10-1 = 9. If you're not convinced,
review how we solved the Sales Presentations problem and took differences.

35. Which of these hypotheses is the RESEARCH HYPOTHESIS (i.e., the hypothesis that we
want to test)?
Alternative hypothesis

36. "Alpha" is a probability that is typically...


very low

37. When p-value < alpha, ...


We reject the null hypothesis

38. When p-value < alpha, there is


sufficient evidence to support alternative hypothesis

39.In Excel, to compute p-value, which command do we use?


norm.dist, t.dist

40. p-value for a regression coefficient is associated with a _________ test.


Two-tailed

41. p-values in a regression show


how well each individual X variable predicts Y linearly

42. In regression output, when p-value is close to 0, we say that this explanatory variable
this variable's coefficient is statistically significant, is a good linear predictor of the
dependent variable

43. In regression output, when p-value is high, we say that this explanatory variable
is not statistically significant, is a poor linear predictor of the dependent variable

44. STEPWISE REGRESSION. We add X2 to our regression. p-value of the coefficient is


0.0269. α=5%.
Keep this variable X2

45. BACKWARD ELIMINATION. A variable X5 is in our regression. p-value of the coefficient is


0.3274. α=5%.
Drop this variable X5

46. To create a cubic trend model, the explanatory variables in time-series regression must
include:
Trend t, t2, t3

47. Pick ALL MODELS that have a non-linear trend. (TO GET FULL CREDIT, YOU NEED TO
CLICK ON ALL CORRECT ANSWERS.)
Cubic trend model, Quadratic trend model, Exponential trend model

48. How do we capture seasonal effects in time-series regression models?


include dummies

49. To forecast this monthly sales data < picture >, regression should include ________ .
11 dummies

50. To forecast quarterly Amazon sales, regression should include ____________ .


3 dummies

51. To forecast this quarterly Amazon sales, regression should include _____________
trend and 3 dummies

52. To capture the evolution of this stock price data, we need to include ________ trend.

53. For the HOUSE CONSTRUCTION regression model from today's lecture, the interaction
variable was:
trend * before 2005/after 2005 dummy

54. Autoregressive model AR(5) means that the model includes __________ lags. (type your
numerical answer)
5

You might also like