0% found this document useful (0 votes)
32 views

DADM NOTES and Cheat Sheet

Uploaded by

shethpriyanka1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

DADM NOTES and Cheat Sheet

Uploaded by

shethpriyanka1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DADM NOTES, KAHOOT AND IMP FORMULAS

Notes:
The bar graph is the graphical representation of categorical data.
A histogram is the graphical representation of quantitative data.
Categorical Data (qualitative):

1. bar chart (cannot be skewed)


2. pie graph

Numerical Data (Quantative):

1. Histogram - 1 quantitative variable


2. Boxplot - numerical and categorical
3. Time Series - 2 quantitative variables
4. Scatterplot - 2 quantitative variables

Cross-Sectional: The data is collected at one point in time.


Time Series (longitudal): Data collected over a period of time
Cross-sectional time series: Data is collected over time; for every point of time there are multiple
observations

Symmetric Skewed: Mean ~ Median


Right Skewed: Mean > Median
Left Skewed: Median > Mean

Mean is sensitive to outliers


Median is not sensitive to outliers

Pivot tables are also knows as cross-tabulations or contingency tables.

Covariance:
If x and y move in the same direction: positive covariance
If x and y move in different directions: negative covariance
Covariance does not tell us about the strength of the linear association

Correlation:
r can be positive or negative
-1<=r<=1
the closer it is to extremes (-1 or 1) the stronger their LINEAR relationship is
0 or close to 0, there is no linear relationship
Drawbacks:
Co relation is very sensitive to outliers
one outlier can change the data
Causation:
Causal relationship - a change in variable causes a change in the other variable
Correlations and Covaraince do not imply causation
In spurious relations, two variables are wrongly assumed to be related to each other
In spurious relations, there is typically a 3rd lurking variable that drives both variables.

1. Simple linear regression = linear regression with only one predictor

2. Multiple linear regression = linear regression with more than one (quantitative or
categorical) predictors
3. Logistic regression = nonlinear regression with a categorical dependent variable

Simple Linear Regression:


Univariate: Regression (2 variables) (simple regression: 1 predictor)
Multivariate: Regression (multiple regression: 2 or more predictors)

R Sqaure:
The coefficient of determination
0 ≤ r2 ≤ +1
r2 shows how much of variability in Y is explained by variability in X (i.e., how much is
explained by the linear model) r2 shows how good the linear model is
The closer to +1, the better the model

Multiple Regression:
In multiple regression, R-square is always > Adjusted R square.
R square will always go up even if you add a bad predictor. However, R- squared Adjusted may
go down if you add a bad predictor
Therefore, R-squared will always exceed R-squared adjusted

Regression with categorical :


Coefficient of interaction variables captures the difference between slopes

Linear regression with categorical predictors:


nonlinear effects
a) Regressions with quadratic term For categorical ordinal variables only Also works for
numerical discrete or continuous
b) Regressions with multiple dummy variables For categorical ordinal & categorical nominal
variables

Logistic regression:
Logistic regression is used to model situations when the dependent variable (Y) is categorical
and may take only two values.
Kahoot:

1. Pivot tables can be used for categorical and quatitative variables: (2 answers)

Ans: TRUE
IT DEPENDS

2. Pivot chart can only be bar graph:

Ans: False

3. Which excel command do we use to merge 2 dataset?

Ans: =VLOOKUP(), XLOOKUP()

4. Data Validation tool in Excel can be used to..

Ans: make sure that the data is in correct format


detect data entry errors

5. Correct way to show Excel's Date command? Date(__/__/__)

Ans: year, month, day

6. Index command finds the value in a specified location

Ans: True

7. Match command finds the value in a specified location

Ans: False

8. Independence mean correlation = 0

Ans: True
9. Correlation = 0 means independence

Ans: False

10. When correlation = 0, it always means that X and Y are not related to each other

Ans: False

11. When correlation = 0, it means there is no linear relationship between X and Y

Ans: True

12. Which of the following statements is NOT TRUE about correlation?

Ans: Correlation implies causation

13. Cheese consumption is positively correlated with # deaths by being tangled in bedsheets
because?

Ans: it's a spurious relationship

14. Spurious correlation occurs when

Ans: 2 variables are wrongly assumed to be related

15. Multiple linear regression means there are several linear regression equations

Ans: False

16. In multiple regression, always R-sqaured

Ans: > R-Squared adjusted

17. In a multiple regression, when we add a new X variable, R-squared ALWAYS goes up.

Ans: True
18. In a multiple regression, when we add new X variable, R-squared adjusted ALWAYS
goes up.

Ans: False

19. Which of the following cannot be modeled using logistic regression?

Ans: LN( Starting Salry)


Starting Salary
[basically, any numerical value]

20. In logistic regression, Y (1=Like,0=Dislike) is linearly dependent on explanatory


variables

Ans: False

21. Logit is linearly dependent on the explanatory variables

Ans: True
IMP Formulas:

You might also like