UDJ Cheat Sheet - Merged

1. There are several probability distributions that can be used to calculate probabilities depending on whether the variable is discrete or continuous. The key distributions are binomial, Poisson, and normal. 2. Hypothesis testing involves stating the null and alternative hypotheses, calculating a test statistic such as z-score, choosing a significance level, and comparing the test statistic to critical values to determine whether to reject or fail to reject the null hypothesis. 3. Confidence intervals provide a range of values that is likely to contain the true population parameter based on a sample. For means, the margin of error depends on the standard deviation and sample size. For proportions, the margin of error decreases as the sample size increases.

Uploaded by

dew

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

UDJ Cheat Sheet - Merged

Uploaded by

dew

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

What is the probability of…?

1. Identify whether you have a discrete or continuous variable.

2. If discrete, which distribution should you use? Make sure to state this. Building Confidence Intervals
a. Binomial: We are looking at a sample of a population.
• 2 possible outcomes – success/failure Characteristic Population Sample
• p: probability of success – make sure p£0.5 Size N n
• q: probability of failure: q=1-p Mean µ 𝑥̅
• n: the number of trials or the size of the population SD s S
• x: outcome – can be any round number up to n Proportion P 𝑃$ = x/n
• Summary: x ~ Bin(n;p) I. Averages
• Questions will be, given x ~ Bin(6;0.3) Two possibilities: n³30 or n<30
o What is P(x=2)? 1. n³30
Table: n=6, p=0.3, so P(x=2)=0.3241 = 32.41% a. “Per the Central Limit Theorem, if n is large (n³30) then the
o What is P(x>3)? P(x>4) = P(x=5) + P(x=6) variable 𝑥̅ follows a Normal distribution such that: 𝑥̅ ~ Norm(µ;
Table: n=6, p=0.3, so P(x=5) + P(x=6) = 0.0102 + 0 = 1.02%µW; sW sx ̅ = s/Ön)”
b. Poisson: b. s?
• Events are coming at constant rate l during an interval I i. We know s of the population
• l (=µ): average number of events that occur during the initial interval • Questions will be, given a sample n=32 such that 𝑥̅ ~ Norm(10; 2)
• s = Öl and s=2.5, build a 95% confidence interval for µ.
• x: number of actual events – x has no limit o We are looking for: {𝑥̅ – E; 𝑥̅ + E}
• Questions will be, given l1=5 events in 3 hours o 95% confidence interval means a = 0.05 and a/2 = 0.025
o What is P(x=2) in 1 hour? o Table: Za/2 = 1.96
Calculate l2 = 5/3 » 1.7 o E = Za/2 * s/Ön = 1.96 * 2.5/Ö32 = 0.866
Table: l2 = 1.7, x = 0.2640 = 26.40% o {𝑥̅ – E; 𝑥̅ + E} = {10 – 0.866; 10 + 0.866} = {9.134; 10.866}
o What is P(x>2) in 1 hour? = 1 - [P(x=2) + P(x=1)] ii. We do not know s of the population – we are given s
3. If continuous, Normal distribution • Questions will be, given a sample n=32 such that 𝑥̅ ~ Norm(10; 2),
• x ~ Norm(µ;s) build a 95% confidence interval for µ.
• Questions will be, given x ~ Norm(3;0.5) o We are looking for: {𝑥̅ – E; 𝑥̅ + E}
o What is P(x>2)? Note: P(x>2) = P(x³2) and P(x=2) = Æ o 95% confidence interval means a = 0.05 and a/2 = 0.025
Calculate z-score: Z = [x-µ]/s = [2-3]/0.5 = -1/0.5 = -2 o Table: Za/2 = 1.96
Table: Z=-2, F(Z=-2) = 0.9772 = 97.72% o “Because we do not know s, we use s to approximate s”
o What is P(x<2)? 1-P(x>2) = 1-0.9772 = 2.28% o E = Za/2 * s/Ön = 1.96 * 2/Ö32 = 0.693
o {𝑥̅ – E; 𝑥̅ + E} = {10 – 0.693; 1 + 0.693} = {9.307; 10.693}
Combination of two random variables 2. n<30
• You have two random variables X (µX;sX) and Y (µY;sY) a. “n is small, so we must assume that the population is normally
• You are given the correlation variant -1£ rXY £1 distributed in order to apply the Central Limit Theorem.”
Note that X and Y independent means rXY = 0 b. s?
• You are given a combination of the two W such that W = a + bX + cY i. We know s of the population
• W = a + b*E[X] + c*E[Y] = a + b*µX + c*µY sW = b*sX + c*sY + rXY*b*c*sX*sY • Questions will be, given a sample n=10 such that 𝑥̅ ~ Norm(10; 2)
• Questions will be, given X (5;2) and Y (6;1), rXY = 0.5 and W = 2 + X - Y and s=2.5, build a 95% confidence interval for µ.
o What is P(W>2)? • Same as b.i (E = Za/2 * s/Ön).
1. “If X and Y follow a Normal Distribution, you can assume that W also ii. We do not know s of the population – we are given s
follows a Normal Distribution such that W ~ Norm(µW; sW)” • Questions will be, given a sample n=10 such that 𝑥̅ ~ Norm(10; 2)
2. Calculate µW; sW: and s=2.5, build a 95% confidence interval for µ.
µW = 2 + 1*5 + (-1)*6 = 1 o “Because we have a small sample size, and because we do not
sW = (1*2)2 + (-1*1)2 + 0.5*1*(-1)*2*1 = 4 know s, we use the t-distribution and we use s to approximate s.”
3. Calculate Z-score: Z=[x-µW]/sW = [2-1]/4 = 0.25 o We are looking for: {𝑥̅ – E; 𝑥̅ + E} where E = tn-1, a/2 * s/Ön
4. F(Z=0.25) = 40.13% o n-1 = 9, and 95% confidence interval means a = 0.05 and a/2 =
0.025
o Table: t9, 0.025 = 2.262
Hypothesis Testing o E = tn-1, a/2 * s/Ön = 2.262 * 2/Ö10 = 0.693
We are trying to decide whether what we are given (either µ0 or P0) is within a o {𝑥̅ – E; 𝑥̅ + E} = {10 – 0.693; 1 + 0.693} = {9.307; 10.693}
certain confidence interval of the true µ or P. II. Proportions
1. We state our hypothesis: H0: µ = µ0 or P = P0 and HA: µ ¹ µ0 or P ¹ P0 We are looking at a sample proportion.
2. We calculate the test statistic: n will always be larger than 30.
a. If we are looking at averages, s known: Zobs. = [𝑥̅ - µ] / [s/Ön] • Questions will be, sample n=32 and we have noticed that
b. If we are looking at averages, n>30, s unknown: Zobs. = [𝑥̅ - µ] / [s/Ön] 𝑃$ =0.3, build a 95% confidence interval for P.
c. [If we are looking at averages, n<30, s unknown: tobs. = [𝑥̅ - µ] / [s/Ön]] ! o We are looking for {𝑃$ – E; 𝑃$ + E} where
will not be used. E = Za/2 * Ö{[(x/n) * (1 - x/n)] / n}
d. If we are looking at proportions, Zobs. = [(x/n) – P0] / Ö[P0*(1-P0) / n] o 95% confidence interval means a = 0.05 and a/2 = 0.025
3. We choose a level of significance a - usually 5% or 1% o Table: Za/2 = 1.96
a. For a=0.05, Za/2 = 1.96 o E = 1.96 * Ö(0.3*0.7/32) = 0.1588
b. For a=0.01, Za/2 = 2.58 o {𝑃$ – E; 𝑃$ + E} = {0.3 – 0.1588; 0.3 + 0.1588} = {14.12%; 45.88%}
4. We compare our findings and decide. • Questions will be, keeping the 95% CI, what should the sample
a. If Zobs. > Za/2 or if Zobs. < -Za/2, we reject H0. size n be to reduce the margin of error to 10%? E=0.1. Only for
b. If -Za/2 < Zobs. < Za/2: “I do not have sufficient evidence to reject H0 Proportions!
therefore I accept it.” o “Because we need a new n, we cannot use the original 𝑃$ given.
5. Alternatively, you can use P-values. Note: the smaller the p-value, the less Let’s be conservative and use x/n = 0.5 (which maximizes E) and
likely H0. we apply the formula n = (Za/2 / 2E)2”
a. P-value = 2*Prob(Z>Zobs.) = 2*F(Zobs.)
o n = (Za/2 / 2E)2 = (1.96 /0.20)2 = 96.04
b. P-value > a, “I do not have sufficient evidence to reject H0 therefore I
accept it.”
c. P-value < a, reject H0.
6. Certain questions will ask you to find the “level of significance” at which H0
can be accepted. Here you will need to find a < P-value.
Building a regression model
Regression: a linear relationship between one variable you would like to predict (the dependent variable) and one or more
independent variables (X1, X2, X3…)
Y = A + B1X1 + B2X2 + B3X3 …
1. Test that all coefficients (B1, B2, B3…) are different than 0. This is to ensure there is a relationship between Xi and Y.
a. H0: Bi = 0 HA: Bi ¹ 0
b. Once you prove HA, you can say that yes, this independent variable Xi has an impact on Y.
c. t-statistic = bi – Bi / SE(bi)
However, because you are testing for Bi = 0, t-statistic = bi / SE(bi)
Dependent variable Y
Ind. Variables Coeff. SE t-stat p-value 0.05 significance?
Xi Bi SE(Bi) Bi / SE (Bi) p-value = 2*prob(Z > |t- Can you reject H0?
stat|) p-value > a - I cannot reject H0 and
p-value = 2*F (|t-stat|) accept that Bi = 0: Not significant.
p-value < a - I reject H0 and accept
that Bi ¹ 0: Yes, Xi has an impact on Y
“If X1 increases by 1 [unit], all other independent variables remaining constant, Y will increase by on average B1 [units].”
2. Deal with the risk of multicollinearity
a. Multicollinearity means that the two independent variables have the same explanation with regards to the dependant – one
of the two is therefore redundant. Keeping both in this case, rather than improving it, distorts the model.
b. There is a risk of multicollinearity if the correlation coefficient between two independent variables has an absolute value
greater than 0.7: |r|> 0.7
c. For all independent variables that have a high correlation coefficient (above 0.7 in absolute terms), look at whether they are
both significant, one only is significant, or neither is significant.
d. Both are significant: consider whether the model makes sense (eg, it makes sense that for size of a business increasing profit
increases too, and for number of employees increases profit decreases – keeping both makes sense). If it doesn’t make
sense, remove the one with the highest p-value
e. Only one is significant: remove the one with the highest p-value from the model.
f. Neither is significant: remove the one with the highest p-value from the model.
• Questions will be, what can you infer from the correlation matrix of the variables?
3. Choose the best model between those models that only have variables that are significant & compare the adjusted R2. The
best is the one with the highest adjusted R2. Note that the R-Square measures how well the model explains the variations of
the dependent variable. The closer the R-square is to the regression line, the higher it is, and the better the model.
• Questions will be, what is the best model?
4. Compute the forecast.

Looking at graphs
State:
1. What has been plotted
2. What assumption you are testing with this kind of plot.
3. Your conclusion based on this kind of plot.
4. What might be the reasons
Assumption on the errors (residuals)
Plot Type: either:
(i) residuals ei against time or observations number; or
(ii) residuals versus lagged residuals
Testing: whether errors are random (eg, not autocorrelated).
Conclusion: Plot type (i), they fluctuate randomly along the horizontal axis or plot type (ii), you have a cloud of data, that means
they are not autocorrelated.
Reasons: either (a) problem of non-linearity between the dependent variable and each independent variable – we should try
transformation; or (b) we are missing an independent variable in the model.

Plot type: residuals ei against predicted value

Testing: whether errors are homoscedastic (eg, there is a constant variance)
Conclusion: If errors are all in one corridor, homoscedastic. If errors are in an increasing corridor, heteroscedastic.
Reasons: problem of non-linearity between the dependent variable and each independent variable – we should try
transformation and split the data into groups and run separate regression for each group.

Plot type: histogram of the residuals with the superimposed normal curve
Testing: whether errors are normally distributed with a mean of 0
Conclusion: “I assume that the errors are roughly normally distributed.”
Reasons: not enough observations. Try to get more.

S-4000 Plan de Mantenimiento
No ratings yet
S-4000 Plan de Mantenimiento
7 pages
Math 235#6
No ratings yet
Math 235#6
29 pages
TEST OF SIGNIFICANCE for small sample
No ratings yet
TEST OF SIGNIFICANCE for small sample
29 pages
Binomial Distribution
No ratings yet
Binomial Distribution
22 pages
Lekcija 5 - Vjerovatnoca
No ratings yet
Lekcija 5 - Vjerovatnoca
60 pages
Stats 2 Notes
No ratings yet
Stats 2 Notes
17 pages
Chap2ParameterEstimation
No ratings yet
Chap2ParameterEstimation
14 pages
Part 8
No ratings yet
Part 8
17 pages
Lecture 5
No ratings yet
Lecture 5
27 pages
Goodness of Fit Test DF
No ratings yet
Goodness of Fit Test DF
2 pages
Basic Inference-Confidence Intervals
No ratings yet
Basic Inference-Confidence Intervals
26 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
Mathematics Soln
No ratings yet
Mathematics Soln
10 pages
Lecture 07
No ratings yet
Lecture 07
22 pages
Lectuer 21-ConfidenceInterval
No ratings yet
Lectuer 21-ConfidenceInterval
41 pages
Sample
No ratings yet
Sample
23 pages
10 Statistics PDF
No ratings yet
10 Statistics PDF
11 pages
WILP ASM Mid-Sem (Regular) Solutions
No ratings yet
WILP ASM Mid-Sem (Regular) Solutions
4 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
Normal Distribution (1)
No ratings yet
Normal Distribution (1)
26 pages
Final - Review - With Solutions - 2021
No ratings yet
Final - Review - With Solutions - 2021
2 pages
5_6Mat271
No ratings yet
5_6Mat271
6 pages
CS215 Autumn 2024-1
No ratings yet
CS215 Autumn 2024-1
6 pages
Chapter 8
No ratings yet
Chapter 8
7 pages
Confidence Intervals Continued: Statistics 512 Notes 4
No ratings yet
Confidence Intervals Continued: Statistics 512 Notes 4
8 pages
LR 1 Error Analysis
No ratings yet
LR 1 Error Analysis
17 pages
Math204 NonParOneTwo
No ratings yet
Math204 NonParOneTwo
4 pages
I. Test of a Mean: σ unknown: X Z n Z N X t s n ttn
No ratings yet
I. Test of a Mean: σ unknown: X Z n Z N X t s n ttn
12 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Hypothesis Testing: Two Populations: Learning Objectives
No ratings yet
Hypothesis Testing: Two Populations: Learning Objectives
23 pages
Gsbiju MA202 3 5
No ratings yet
Gsbiju MA202 3 5
5 pages
5 Tests of Significance Seema
No ratings yet
5 Tests of Significance Seema
8 pages
Analisis Statistika: Materi 5 Inferensia Dari Contoh Besar (Inference From Large Samples)
No ratings yet
Analisis Statistika: Materi 5 Inferensia Dari Contoh Besar (Inference From Large Samples)
19 pages
Statistics and Probabiltity
No ratings yet
Statistics and Probabiltity
25 pages
CC7 Economics
No ratings yet
CC7 Economics
13 pages
Notes in Ge 8 (MMW) - Normal Curve
No ratings yet
Notes in Ge 8 (MMW) - Normal Curve
6 pages
Assignment - Statistics Method
No ratings yet
Assignment - Statistics Method
3 pages
Statistical Treatments for Pr2 With Sample Problem and Solution
No ratings yet
Statistical Treatments for Pr2 With Sample Problem and Solution
23 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
HW2_sol (1)
No ratings yet
HW2_sol (1)
6 pages
Exam 3 Solution
No ratings yet
Exam 3 Solution
8 pages
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
No ratings yet
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
40 pages
CH 4 - Estimation & Hypothesis One Sample
No ratings yet
CH 4 - Estimation & Hypothesis One Sample
139 pages
Lecture 13 [Discrete Probability Distribution]
No ratings yet
Lecture 13 [Discrete Probability Distribution]
14 pages
04 Normal Approximation For Data and Binomial Distribution
No ratings yet
04 Normal Approximation For Data and Binomial Distribution
24 pages
2. Given Information:: 3. Calculate ε
No ratings yet
2. Given Information:: 3. Calculate ε
9 pages
Assignment #2
No ratings yet
Assignment #2
2 pages
Review For Final Exam 2
No ratings yet
Review For Final Exam 2
5 pages
sssCHAPTER 5. Introduction To Estimation 23
No ratings yet
sssCHAPTER 5. Introduction To Estimation 23
5 pages
4 Feature Selection
No ratings yet
4 Feature Selection
46 pages
Test of Significance for Quantitative Data
No ratings yet
Test of Significance for Quantitative Data
27 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Probability Distribution
No ratings yet
Probability Distribution
14 pages
211MAT1302 Unit-4
No ratings yet
211MAT1302 Unit-4
11 pages
Osobine Var
No ratings yet
Osobine Var
19 pages
2.3 Probability Distributions
No ratings yet
2.3 Probability Distributions
41 pages
IE-2207-mod-1-ch-10 (1)
No ratings yet
IE-2207-mod-1-ch-10 (1)
47 pages
PT2__D__Answer
No ratings yet
PT2__D__Answer
3 pages
Homework 1 Sol
No ratings yet
Homework 1 Sol
4 pages
Basics
No ratings yet
Basics
61 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Free Online GMAT Practice Test - London Business School
No ratings yet
Free Online GMAT Practice Test - London Business School
2 pages
Schedule Scope: CAIR7892: Weld Repair To C6604 E4 Nozzle (TA)
No ratings yet
Schedule Scope: CAIR7892: Weld Repair To C6604 E4 Nozzle (TA)
1 page
Skytrans Charter Flight Schedule: Brisbane To Chinchilla & Taroom Return Flights
No ratings yet
Skytrans Charter Flight Schedule: Brisbane To Chinchilla & Taroom Return Flights
1 page
Volunteer Application Western Downs Libraries
No ratings yet
Volunteer Application Western Downs Libraries
2 pages
Viva Energy Prospectus (AU) PDF
No ratings yet
Viva Energy Prospectus (AU) PDF
310 pages
Petroskills Learning Catalogue
No ratings yet
Petroskills Learning Catalogue
89 pages
SCE Management
100% (1)
SCE Management
16 pages
Asub-Idle Compressor Characteristic Generation Method With Enhanced Physical Background
No ratings yet
Asub-Idle Compressor Characteristic Generation Method With Enhanced Physical Background
9 pages
Guidelines For The Prophetic Ministry IPP2
No ratings yet
Guidelines For The Prophetic Ministry IPP2
6 pages
The Last Pure Human
100% (3)
The Last Pure Human
586 pages
Operation Manual YALE
No ratings yet
Operation Manual YALE
56 pages
Get The Mark of Zorro Macmillan Readers Level 3 Johnston Mcculley free all chapters
No ratings yet
Get The Mark of Zorro Macmillan Readers Level 3 Johnston Mcculley free all chapters
55 pages
Design and Compensation of Control Systems: Objective Type Questions
No ratings yet
Design and Compensation of Control Systems: Objective Type Questions
5 pages
Wheels Crown Hyster
No ratings yet
Wheels Crown Hyster
36 pages
Geography Assignment 1 - Clouds 1
No ratings yet
Geography Assignment 1 - Clouds 1
5 pages
Opencpn Raspberry Pi4 Plotter V1e
No ratings yet
Opencpn Raspberry Pi4 Plotter V1e
9 pages
AP3041 Oscilador TV Haier 50yca PDF
No ratings yet
AP3041 Oscilador TV Haier 50yca PDF
14 pages
57TH Nryli Circular Collegesuniversities
No ratings yet
57TH Nryli Circular Collegesuniversities
2 pages
RA Khonsa PWD
No ratings yet
RA Khonsa PWD
43 pages
(14.2) Employee Benefits
No ratings yet
(14.2) Employee Benefits
5 pages
2024-01-25 St. Mary's County Times
No ratings yet
2024-01-25 St. Mary's County Times
40 pages
Media Planning Notes 22
No ratings yet
Media Planning Notes 22
114 pages
Dermat Life Quality
No ratings yet
Dermat Life Quality
1 page
Proposal
100% (1)
Proposal
16 pages
Chapter 7 Supply Systems (Compatibility Mode)
No ratings yet
Chapter 7 Supply Systems (Compatibility Mode)
50 pages
CG 1
No ratings yet
CG 1
8 pages
Software Compatibility With New Vector License Model
No ratings yet
Software Compatibility With New Vector License Model
2 pages
Edoc - Pub Database
0% (1)
Edoc - Pub Database
15 pages
Textile Mill: - Spinning Mill - Weaving - Loom Motors
No ratings yet
Textile Mill: - Spinning Mill - Weaving - Loom Motors
11 pages
IT Project Management Terms
No ratings yet
IT Project Management Terms
9 pages
Water and Sanitation
No ratings yet
Water and Sanitation
18 pages
University Microfilms International A Bell & Howell Information Company 300 North Zeeb Road, Ann Arbor, ML 48106-1346 USA 313/761-4700 800/521-0600
No ratings yet
University Microfilms International A Bell & Howell Information Company 300 North Zeeb Road, Ann Arbor, ML 48106-1346 USA 313/761-4700 800/521-0600
145 pages
Array - in C Programming
No ratings yet
Array - in C Programming
7 pages
TAURI User Manual
No ratings yet
TAURI User Manual
23 pages
Mhealth: Use of Appropriate Digital Technologies For Public Health
No ratings yet
Mhealth: Use of Appropriate Digital Technologies For Public Health
5 pages
Conversation 1: Teacher and Students: Conversations
100% (2)
Conversation 1: Teacher and Students: Conversations
54 pages

UDJ Cheat Sheet - Merged

Uploaded by

UDJ Cheat Sheet - Merged

Uploaded by

What is the probability of…?

1. Identify whether you have a discrete or continuous variable.

Plot type: residuals ei against predicted value

You might also like