Z-Test - T-Test - Pearson Product - Linear Regression PDF
Z-Test - T-Test - Pearson Product - Linear Regression PDF
1
Quantitative Methods in Public
Policy and Administration
Merlyne M. Paunlagui
College of Public Affairs and Development
Students should not reproduce or distribute the copy of the copyrighted materials
included in the course pack since it is limited for personal use.
Learning Objectives
• At the end of the session the students are expected
to:
Z-TEST for a SINGLE SAMPLE TEST
OF THE MEAN
ഥ−µ
𝒙
𝚭=
/ 𝒏
ഥ−µ
𝒙
𝒕=
𝒔/ 𝒏 − 𝟏
Z=
n2 n1
Z-Test for a Difference of Proportion Test
PART A: Difference of Proportion Z-test
Example question:
Testing two flu drugs A and B. Drug A works on 41 people out of a sample of
195. Drug B works on 351 people in a sample of 605. Are the two drugs
comparable? Use a 5% alpha level.
https://www.statisticshowto.com/ztest/#:~:text=This%20tests%20for%20a%20difference,proportions%20are%20no
t%20the%20same.
Problem 1: Two-Tailed Test
• Suppose the Acme Drug Company develops a new drug, designed to prevent
colds. The company states that the drug is equally effective for men and
women. To test this claim, they choose a simple random sample of 100 women
and 200 men from a population of 100,000 volunteers.
• At the end of the study, 38% of the women caught a cold; and 51% of the men
caught a cold. Based on these findings, can we reject the company's claim that
the drug is equally effective for men and women? Use a 0.05 level of
significance.
• Solution: The solution to this problem takes four steps: (1) state the
hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4)
interpret results.
Problem 1: Two-Tailed Test
1. State the hypotheses. The first step is to state the null
hypothesis and an alternative hypothesis.
Null hypothesis: P1 = P2; Alternative hypothesis: P1 ≠ P2
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be
rejected if the proportion from population 1 is too big or if it is too small.
y y
x x
y y
x x
Scatter Plot Examples (continued)
y y
x x
y y
x x
Scatter Plot Examples (continued)
No relationship
x
Examples of Approximate
r Values
y y y
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Problem 1: Two-Tailed Test
3. Analyze sample data. Using sample data, we calculate the pooled sample
proportion (p) and the standard error (SE). Using those measures, we compute the
z-score test statistic (z).
p = (p1 * n1 + p2 * n2) / (n1 + n2)
p = [(0.38 * 100) + (0.51 * 200)] / (100 + 200)
p = 140/300 = 0.467
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ]
SE = sqrt [0.003733] = 0.061
z = (p1 - p2) / SE = (0.38 - 0.51)/0.061 = -2.13
PEARSON’S PRODUCT MOMENT CORELATION
Regression Analysis
Correlation Coefficient
(continued)
• The population correlation coefficient ρ
(rho) measures the strength of the
association between the variables
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• The closer to 0, the weaker the linear relationship
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r=
( x − x)( y − y)
[ ( x − x ) ][ ( y − y ) ]
2 2
Tree n xy − x y
Height, r=
y
70
[n( x 2 ) − ( x) 2 ][n( y 2 ) − ( y) 2 ]
60
8(3142) − (73)(321)
50 =
40
[8(713) − (73)2 ][8(14111) − (321)2 ]
= 0.886
30
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…
Correlation between
Tree Height and Trunk Diameter
Significance Test for Correlation
• Hypotheses
H0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
• Test statistic
r
• t= (with n – 2 degrees of freedom)
1− r 2
n−2
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?
r .886 Decision:
t= = = 4.68
1− r 2 1 − .886 2 Reject H0
• Correlation
– Measure of association between two variables with linear
relationship
– Indicates direction and strength of the relationship
y y = β0 + β1x + ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value intercept
Independent
ŷ i = b 0 + b1x variable
b1 =
( x − x )( y − y )
(x − x) 2
and
b0 = y − b1 x
Simple Linear Regression Example
• A real estate agent wishes to examine the relationship
between the selling price of a home and its size (measured in
square feet)
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Graphical • House price model: scatter plot and regression
Presentation line
450
400
b=
(16 )( 3960 ) − ( 96 )( 559 )
( 559 ) − ( 5.76 )( 96 )
(16 )( 676 ) − ( 96 )
2
a=
16
=
9216 = 0.4
1600
= 5.76
Summary Output
Regression Statistics
Multiple R 0.972499
R Square 0.945755
Adjusted R Square 0.941582
Standard Error 3.796237
Observations 15
ANOVA
df SS MS F Significance F
Regression 1 3266.3853266.385226.6526 1.32E-09
Residual 13 187.348414.41141
Total 14 3453.733
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%
Intercept 0.373989 2.4675740.151562 0.88186 -4.95688 5.704857 -4.95688 5.704857
7 5.745957 0.38166515.05499 1.32E-09 4.921421 6.570493 4.921421 6.570493
• The equation of the line is
Y= 0.4 + 5.76 X
• Interpretation:
An increase of one in the number of TV ad runs will generate a 5.76 increase in the
number of people visiting the family planning clinic. So the family planning officer can
now proceed with evaluating the cost effectiveness of the program ads.
Parameter Estimation Thinking Challenge
•You’re an economist for the county cooperative.
You gather the following data:
•Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
•What is the relationship
between fertilizer & crop yield? © 1984-1994 T/Maker Co.
Regression Statistics
Multiple R 0.955779
R Square 0.913514
Adjusted R Square 0.87027
Standard Error 0.894427
Observations 4
ANOVA
Signifi-cance
df SS MS F F
Regression 1 16.9 16.9 21.125 0.044221
Residual 2 1.6 0.8
Total 3 18.5
Standard
Co-efficients Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.8 1.2166 0.6576 0.5784 -4.4344 6.0344 -4.4344 6.0344
X Variable 1 bo 0.65 0.1414 4.5962 0.0442 0.0415 1.2585 0.0415 1.2585
b1
THANK YOU