0% found this document useful (0 votes)
65 views

Z-Test - T-Test - Pearson Product - Linear Regression PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Z-Test - T-Test - Pearson Product - Linear Regression PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

PM299.

1
Quantitative Methods in Public
Policy and Administration
Merlyne M. Paunlagui
College of Public Affairs and Development

Students should not reproduce or distribute the copy of the copyrighted materials
included in the course pack since it is limited for personal use.
Learning Objectives
• At the end of the session the students are expected
to:
Z-TEST for a SINGLE SAMPLE TEST
OF THE MEAN

T-TEST for a SINGLE SAMPLE TEST


OF THE MEAN
Single Sample Test of the Mean
1. Z-Test for a Single Sample Test of
the Mean
A z-test is a statistical test to
determine whether two population
means are different when the
variances are known and the
sample size is large. (n30)

ഥ−µ
𝒙
𝚭=
/ 𝒏

See Example on pages 140-143


Single Sample Test of the Mean
2. T-Test for a Single Sample Test of the
Mean
A t-test is a statistical test to determine
whether two population means are different
when the population parameters are
unknown and the sample size is small.

ഥ−µ
𝒙
𝒕=
𝒔/ 𝒏 − 𝟏

See Example on pages 144-145


Single Sample Test of Proportion
COMPARING TWO POPULATIONS
1. Difference of Means Test
2. Difference of Proportions Test

RELATIONSHIP BETWEEN TWO


INTEVAL SCALE VARIABLES
1. Pearson Product Moment Correlation
2. Linear Regression
The Z-test for a Difference of Means Test

Z=

n2 n1
Z-Test for a Difference of Proportion Test
PART A: Difference of Proportion Z-test
Example question:
Testing two flu drugs A and B. Drug A works on 41 people out of a sample of
195. Drug B works on 351 people in a sample of 605. Are the two drugs
comparable? Use a 5% alpha level.

Step 1: Find the two proportions:


P1 = 41/195 = 0.21 (that’s 21%); P2 = 351/605 = 0.58 (that’s 58%)

Step 2: Find the overall sample proportion.


The numerator will be the total number of “positive” results for the two
samples and the denominator is the total number of people in the two
samples.
p = (41 + 351) / (195 + 605) = 0.49.
PART A: Difference of Proportion Z-test
Step 3: Insert the numbers from Step 1 and Step 2 into the test statistic formula:

Solving the formula, we get: Z = 8.99


We need to find out if the z-score falls into the “rejection region.”
Step 4: Find the z-score associated with α/2. I’ll use the following table of known
values:

The z-score associated with a 5% alpha level / 2 is 1.96.


PART A: Difference of Proportion Z-test
Step 5: Compare the calculated z-score from Step 3 with the table z-score from
Step 4. If the calculated z-score is larger, you can reject the null hypothesis.
8.99 > 1.96, so we can
Decision: Reject the null hypothesis

https://www.statisticshowto.com/ztest/#:~:text=This%20tests%20for%20a%20difference,proportions%20are%20no
t%20the%20same.
Problem 1: Two-Tailed Test
• Suppose the Acme Drug Company develops a new drug, designed to prevent
colds. The company states that the drug is equally effective for men and
women. To test this claim, they choose a simple random sample of 100 women
and 200 men from a population of 100,000 volunteers.

• At the end of the study, 38% of the women caught a cold; and 51% of the men
caught a cold. Based on these findings, can we reject the company's claim that
the drug is equally effective for men and women? Use a 0.05 level of
significance.

• Solution: The solution to this problem takes four steps: (1) state the
hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4)
interpret results.
Problem 1: Two-Tailed Test
1. State the hypotheses. The first step is to state the null
hypothesis and an alternative hypothesis.
Null hypothesis: P1 = P2; Alternative hypothesis: P1 ≠ P2
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be
rejected if the proportion from population 1 is too big or if it is too small.

2.Formulate an analysis plan. For this analysis, the significance


level is 0.05. The test method is a two-proportion z-test.
After this, you should be able to:
• Calculate and interpret the results of Z-test and t-test
• Calculate and interpret the simple linear regression equation for
a set of data
• Understand the assumptions behind regression analysis
• Determine whether a regression model is significant
RELATIONSHIP BETWEEN TWO
INTEVAL SCALE VARIABLES
1. Pearson Product Moment Correlation
2. Linear Regression
Scatter Plots and Correlation
• A scatter plot (or scatter diagram) is used to show the relationship
between two variables
• Correlation analysis is used to measure strength of the association (linear
relationship) between two variables
• Only concerned with strength of the relationship
• No causal effect is implied
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
Scatter Plot Examples (continued)

Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples (continued)

No relationship

x
Examples of Approximate
r Values
y y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1
Problem 1: Two-Tailed Test
3. Analyze sample data. Using sample data, we calculate the pooled sample
proportion (p) and the standard error (SE). Using those measures, we compute the
z-score test statistic (z).
p = (p1 * n1 + p2 * n2) / (n1 + n2)
p = [(0.38 * 100) + (0.51 * 200)] / (100 + 200)
p = 140/300 = 0.467
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ]
SE = sqrt [0.003733] = 0.061
z = (p1 - p2) / SE = (0.38 - 0.51)/0.061 = -2.13
PEARSON’S PRODUCT MOMENT CORELATION
Regression Analysis
Correlation Coefficient
(continued)
• The population correlation coefficient ρ
(rho) measures the strength of the
association between the variables

• The sample correlation coefficient r is an


estimate of ρ and is used to measure the
strength of the linear relationship in the
sample observations
Features of ρ and r

• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• The closer to 0, the weaker the linear relationship
Calculating the
Correlation Coefficient
Sample correlation coefficient:

r=
 ( x − x)( y − y)
[ ( x − x ) ][  ( y − y ) ]
2 2

or the algebraic equivalent:


n xy −  x y
where:
r = Sample correlation coefficient
r =
n = Sample size
[n( x 2 ) − ( x )2 ][n( y 2 ) − ( y )2 ]
x = Value of the independent variable
y = Value of the dependent variable
Calculation Example Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example (continued)

Tree n xy −  x  y
Height, r=
y
70
[n(  x 2 ) − ( x) 2 ][n(  y 2 ) − ( y) 2 ]
60

8(3142) − (73)(321)
50 =
40
[8(713) − (73)2 ][8(14111) − (321)2 ]

= 0.886
30

20

10

0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…

Tree Height Trunk Diameter


Tree Height 1
Trunk Diameter 0.886231 1

Correlation between
Tree Height and Trunk Diameter
Significance Test for Correlation
• Hypotheses
H0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
• Test statistic
r
• t= (with n – 2 degrees of freedom)
1− r 2

n−2
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?

H0: ρ = 0 (No correlation)


H1: ρ ≠ 0 (correlation
exists)
 =.05 , df = 8 - 2 = 6
r .886
t= = = 4.68
1− r 2 1 − .886 2
n−2 8−2
Example: Test Solution

r .886 Decision:
t= = = 4.68
1− r 2 1 − .886 2 Reject H0

n−2 8−2 Conclusion:


There is
d.f. = 8-2 = 6
evidence of a
linear relationship
/2=.025 /2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 tα/2
0
-2.4469 2.4469
4.68
Regression analysis

•is a causality relationship, where you can


predict the value of one variable given the
values of the other variable/s.
Regression Analysis
• Regression analysis in policy analysis is usually used to
forecast certain events. For example, our trend line is
an example of a regression analysis.
RECALL . . .

• Correlation
– Measure of association between two variables with linear
relationship
– Indicates direction and strength of the relationship

• Simple linear regression


– One predictor variable (x)
– Linear relationship with independent variable (y)
– Finding a line that has the least sum of square error
– Predict (y) using the information about (x)
Population Linear Regression
(continued)

y y = β0 + β1x + ε
Observed Value
of y for xi

εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value

Intercept = β0

xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value intercept

Independent

ŷ i = b 0 + b1x variable

b0 is the estimated average value of y


when the value of x is zero
b1 is the estimated change in the average
value of y as a result of a one-unit change
in x

The individual random error terms ei have a mean of zero


The Least Squares Equation
• The formulas for b1 and b0 are:

b1 =
 ( x − x )( y − y )
 (x − x) 2

and

b0 = y − b1 x
Simple Linear Regression Example
• A real estate agent wishes to examine the relationship
between the selling price of a home and its size (measured in
square feet)

• A random sample of 10 houses is selected


• Dependent variable (y) = house price in
PhP1000s
• Independent variable (x) = square feet
Sample Data for House Price Model
House Price in
Square Feet
PhP1000s
(x)
(y)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Regression Using Excel
• Tools / Data Analysis / Regression
Regression Statistics
Multiple R 0.76211

Excel Output R Square


Adjusted R
0.58082
The regression equation is:
Square 0.52842
Standard Error 41.33032
Observations 10
house price = 98.24833 + 0.10977 (square feet)
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Graphical • House price model: scatter plot and regression
Presentation line

450
400

House Price ($1000s)


350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price = 98.24833 + 0.10977 (square feet)


Interpretation of the Intercept, b0

house price = 98.24833 + 0.10977 (square feet)


• b0 is the estimated average value of Y when the
value of X is zero (if x = 0 is in the range of observed
x values)
• Here, no houses had 0 square feet, so b0 = 98.24833 just
indicates that, for houses within the range of sizes
observed, PhP98,248.33 is the portion of the house price
not explained by square feet
Interpretation of the
Slope Coefficient, b1

house price = 98.24833 + 0.10977 (square feet)

• b1 measures the estimated change in the


average value of Y as a result of a one-unit
change in X
• Here, b1 = .10977 tells us that the average value of a
house increases by .10977(PhP1000) = PhP109.77, on
average, for each additional one square foot of size
Illustration:
• Knowing the effect of TV spot advertising on the number of people
visiting the Family Planning clinic would allow the population
commission official to decide rationally whether or not to increase
the amount to be spent on TV spot advertising. The officer would be
able to predict how many people the commission would be able to
attract to the Family Planning clinic if it increased the number of TV
ads run.
• The relationship between two variables (in our example, the
number of TV ad runs and the number of people visiting
Family Planning clinic can be summarized by a line. This is
called the regression line. This is the line that we will use to
predict the value of one variable, given the other.
Example:
Relationship between TV ads and number of people visiting the family
planning clinic: Number of people visiting
Municipalities Number of TV ads
the clinic
1 7 42
2 5 32
3 1 10
4 8 40
5 10 61
6 2 8
7 6 35
8 7 39
9 8 48
10 9 51
11 5 30
12 7 45
13 8 41
14 2 7
15 6 37
16 5 33
Number of
Munici- Number of
people visiting XY X2
palities TV ads (X)
the clinic (Y)
1 7 42 294 49
2 5 32 160 25
3 1 10 10 1
4 8 40 320 64
5 10 61 610 100
6 2 8 16 4
7 6 35 210 36
8 7 39 273 49
9 8 48 384 64
10 9 51 459 81
11 5 30 150 25
12 7 45 315 49
13 8 41 328 64
14 2 7 14 4
15 6 37 222 36
Here comes your footer  Page 50
16 5 33 165 25
Total 96 559 3930 53664
a=
 Y − b X
N

b=
(16 )( 3960 ) − ( 96 )( 559 )
( 559 ) − ( 5.76 )( 96 )
(16 )( 676 ) − ( 96 )
2
a=
16
=
9216 = 0.4
1600
= 5.76
Summary Output
Regression Statistics
Multiple R 0.972499
R Square 0.945755
Adjusted R Square 0.941582
Standard Error 3.796237
Observations 15

ANOVA
df SS MS F Significance F
Regression 1 3266.3853266.385226.6526 1.32E-09
Residual 13 187.348414.41141
Total 14 3453.733

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%
Intercept 0.373989 2.4675740.151562 0.88186 -4.95688 5.704857 -4.95688 5.704857
7 5.745957 0.38166515.05499 1.32E-09 4.921421 6.570493 4.921421 6.570493
• The equation of the line is
Y= 0.4 + 5.76 X

• If X= 5, our predicted value for Y will be


Y= .4+ 5.76 (5) = 29.2

• If X=7, our predicted value for Y will be


Y= .4+ 5.76 (7)= 40.7

• Interpretation:
An increase of one in the number of TV ad runs will generate a 5.76 increase in the
number of people visiting the family planning clinic. So the family planning officer can
now proceed with evaluating the cost effectiveness of the program ads.
Parameter Estimation Thinking Challenge
•You’re an economist for the county cooperative.
You gather the following data:
•Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
•What is the relationship
between fertilizer & crop yield? © 1984-1994 T/Maker Co.
Regression Statistics
Multiple R 0.955779
R Square 0.913514
Adjusted R Square 0.87027
Standard Error 0.894427
Observations 4

ANOVA
Signifi-cance
df SS MS F F
Regression 1 16.9 16.9 21.125 0.044221
Residual 2 1.6 0.8
Total 3 18.5

Standard
Co-efficients Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.8 1.2166 0.6576 0.5784 -4.4344 6.0344 -4.4344 6.0344
X Variable 1 bo 0.65 0.1414 4.5962 0.0442 0.0415 1.2585 0.0415 1.2585

b1
THANK YOU

You might also like