lineare regrassion and correlation for mph
lineare regrassion and correlation for mph
NB: Grading will be as per the grading scale of the university registrar
Hypotheses:
HO: µ1= µ2 or Ho: µ1 - µ2= 0
HA: µ1 ≠ µ2 or HA: µ1 - µ2 ≠ 0
How to run;
ttesti 50 76 8 65 68 9, level(99)
Hypotheses:
Ho: π1978 = π1979 or π1978-π1979 =0
HA: π1978 ≠ π1979 or π1978-π1979 ≠0
How to run:
prtesti 15000 .038 10000 .02
The residual or
error term, e, for
this subject.
Figure 1: A scatter plot of body mass index against hip circumference, for a sample of 412
women in a diet and health cohort study. The scatter of values appears to be distributed
around a straight line. That is, the relationship between these two variables appears to
be broadly linear Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
22
[email protected]
linear regression and correlation cont’d...
Figure 2: scatter plot indicating the relation ship between the height of
oldest sons and fathers‘
Teresaheight
Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
23
[email protected]
linear regression and correlation cont’d...
BMI = α + bo*HIP
BMI = α + bo*HIP
Sum of
Model Squares df Mean S quare F Sig.
1 Regres sion 653.333 1 653.333 13.923 .003a
Residual 610.000 13 46.923
Total 1263.333 14
2 Regres sion 1143.333 2 571.667 57.167 .000b
Residual 120.000 12 10.000
Total 1263.333 14
a. Predic tors: (Constant), dum my 1
Teresa
b. Predic tors: (Constant), Kisi
dum my(MPH in Epidemiology
1, dumm y 2 and Biostatistics, Assist. Prof.)
36
[email protected]
c. Dependent Variable: DV _sc ore
Table 2:Dummy coding cont’d…
Table 4: provide R, R², and adj.R². and the regression
model was able to account for 91% of the variance.
Coefficients a
Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 19.000 2.166 8.771 .000 14.320 23.680
dummy 1 -14.000 3.752 -.719 -3.731 .003 -22.106 -5.894
2 (Constant) 26.000 1.414 18.385 .000 22.919 29.081
dummy 1 -21.000 2.000 -1.079 -10.500 .000 -25.358 -16.642
dummy 2 -14.000 2.000 -.719 -7.000 .000 -18.358 -9.642
a. Dependent Variable: DV_score
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
38
[email protected]
Table 2:Dummy coding cont’d…
–Effects coding,
–Orthogonal coding, and
–Criterion coding (also known as criterion
scaling).
The residual or
error term, e, for
this subject.
Figure 1: A scatter plot of body mass index against hip circumference, for a sample of 412
women in a diet and health cohort study. The scatter of values appears to be distributed
around a straight line. That is, the relationship between these two variables appears to
be broadly linear
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
45
[email protected]
linear regression and correlation cont’d...
α = y − bx
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
47
[email protected]
linear regression and correlation cont’d...
n∑ XY − ∑ X ∑ Y ∑ ( X − X )(Y − Y )
b = n∑ X − (∑ X ) 2 2 = ∑(X − X ) 2
Or simply:
sy
b = r*
sx
r → linear correlation coefficient
s y → standard deviation of out come variable
s x → standard deviation of x variable
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
48
[email protected]
linear regression and correlation cont’d...
Example 1: Heights of 10 fathers(X) together with their
oldest sons (Y) are given below (in inches). Find the
regression of Y on X.
n∑ XY − ∑ X ∑ Y ∑ ( X − X )(Y − Y )
b = =
n∑ X − (∑ X )
2 2
∑(X − X ) 2
α = y − bx
10(45967) − (676 x679) 459670 − 459004
b= 10(45784) − (676) 2 =
457840 − 456976
666
b = Teresa Kisi (MPH
=in0.77
Epidemiology and Biostatistics, Assist. Prof.)
864
[email protected]
50
linear regression and correlation cont’d...
679 676
α= − 0.77* = 67.9 – 52.05 = 15.85
10 10
Variable X
This line shows a perfect linear relationship between two variables. It is a perfect
positive correlation (r = 1)
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
59
[email protected]
linear regression and correlation cont’d..
Variable X
A perfect linear relationship; however a negative correlation (r = -1)
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
60
[email protected]
linear regression and correlation cont’d..
Strength of relationship
– Correlation from 0 to 0.25 (or –0.25) indicate
little or no relationship
– Those from 0.25 to 0.5 (or –0.25 to –0.50)
indicate a fair degree of relationship;
– Those from 0.50 to 0.75(or –0.50 to –0.75)
moderate to good relationship; and
– Those greater than 0.75 (or –0.75 to –1.00)
indicate very good to excellent relationship.
r (n − 2)
tcal =
(1 − r ) 2
Properties
– -1 ≤ r ≤1
– r is a pure number without unit
– If r is close to 1 ⇒ a strong positive relationship
– If r is close to -1 ⇒ a strong negative relationship
– If r = 0 → no linear correlation
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
66
[email protected]
linear regression and correlation cont’d..
Assumptions in correlation
– The assumptions needed to make inferences
about the correlation coefficient are that the
sample was randomly selected and the two
variables, X and Y, vary together in a joint.
Distribution that is normally distributed, (called
the bivariate normal distribution).
CI = [b ± t crit SEb ]
Where as:
tcrit = tα ,df = n-k
RSS / (n − 2) k → nomber of variables
SEb = n −
∑ ( x − x)
i =1
i
2
tcal =
b
SE
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
69
[email protected]
Multiple linear regression
– Multivariate analysis refers to the analysis of data
that takes into account a number of explanatory
variables and one outcome variable
simultaneously.
– It allows for the efficient estimation of measures
of association while controlling for a number of
confounding factors.
– All types of multivariate analyses involve the
construction of a mathematical model to
describe the association between independent
and dependent variables.
• Where:
– The regression coefficients (or b1 . . . bn ) represent
the independent contributions of each explanatory
variable to the prediction of the dependent
variable.
The plot shows that the residuals are scattered randomly around the
horizontal dashed line at zero without any detectable pattern.
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
78
[email protected]
Multiple linear regression cont’d…
shows a residual plot where the variability of the residuals around the horizontal
line changes from one region to an-other. Residuals become more dispersed around
the horizontal line as we move from small to large predicted/fitted values.
Variance Proportions
height of monthly fam ily period of
Condition mother income gestation Age of mother
Model Dimension Eigenvalue Index (Constant) (cm s)(X2) (Birr)(X5) (days)(X6) (years)(X3)
1 1 1.999 1.000 .00 .00
2 .001 58.071 1.00 1.00
2 1 2.845 1.000 .00 .00 .01
2 .154 4.294 .00 .00 .43
3 .000 104.138 1.00 1.00 .56
3 1 3.829 1.000 .00 .00 .01 .00
2 .170 4.741 .00 .00 .42 .00
3 .000 116.493 .84 .19 .57 .03
4 7.58E-005 224.782 .16 .81 .00 .97
4 1 4.806 1.000 .00 .00 .00 .00 .00
2 .171 5.308 .00 .00 .41 .00 .00
3 .023 14.410 .00 .00 .13 .00 .90
4 .000 132.931 .87 .17 .45 .03 .03
5 7.10E-005 260.214 .13 .83 .01 .96 .06
a. Dependent Variable: birth weight of the child (kgs)(X1)
Correlations
Vis its to
Health care Reported health care
funding dis eas es providers
(am ount (rate per (rate per
Control Variables per 100) 10,000) 10,000)
-none- a Health care funding Correlation 1.000 .737 .964
(am ount per 100) Significance (2-tailed) . .000 .000
df 0 48 48
Reported diseases Correlation .737 1.000 .762
(rate per 10,000) Significance (2-tailed) .000 . .000
df 48 0 48
Vis its to health care Correlation .964 .762 1.000
providers (rate per Significance (2-tailed) .000 .000 .
10,000) df
48 48 0
Correlations
Vis its to
Health care Reported health care
funding dis eas es providers
(am ount (rate per (rate per
Control Variables per 100) 10,000) 10,000)
-none- a Health care funding Correlation 1.000 .737 .964
(am ount per 100) Significance (2-tailed) . .000 .000
df 0 48 48
Reported diseases Correlation .737 1.000 .762
(rate per 10,000) Significance (2-tailed) .000 . .000
df 48 0 48
Vis its to health care Correlation .964 .762 1.000
providers (rate per Significance (2-tailed) .000 .000 .
10,000) df
48 48 0
The zero-order
Vis its to health
correlation between
Health care funding Correlation
health care funding
1.000
and.013
disease
rates
(rate peris,
care providers
indeed,
10,000) both fairly high
(am ount per 100)
(0.737) and statistically
Significance (2-tailed)
df 0
. significant(p
.928
47
<
0.001). Reported diseases
(rate per 10,000)
Correlation .013 1.000
Significance (2-tailed) .928 .
df 47 0
a. Cells contain zero-order (Pearson) correlations.
Vis its to
Health care Reported health care
funding dis eas es providers
(am ount (rate per (rate per
Control Variables per 100) 10,000) 10,000)
-none- a Health care funding Correlation 1.000 .737 .964
(am ount per 100) Significance (2-tailed) . .000 .000
df 0 48 48
Reported diseases Correlation .737 1.000 .762
(rate per 10,000) Significance (2-tailed) .000 . .000
df 48 0 48
Vis its to health care Correlation .964 .762 1.000
providers (rate per Significance (2-tailed) .000 .000 .
10,000) df
48 48 0
Going back to the zero-order correlations, you can see that both health
care funding rates and reported disease rates are highly positively
correlated with the control variable, rate of visits to health care
providers. Correlations
Vis its to
Health care Reported health care
funding dis eas es providers
(am ount (rate per (rate per
Control Variables per 100) 10,000) 10,000)
-none- a Health care funding Correlation 1.000 .737 .964
(am ount per 100) Significance (2-tailed) . .000 .000
df 0 48 48
Reported diseases Correlation .737 1.000 .762
(rate per 10,000) Significance (2-tailed) .000 . .000
df 48 0 48
Vis its to health care Correlation .964 .762 1.000
providers (rate per Significance (2-tailed) .000 .000 .
10,000) df
48 48 0
Notations:
BW = Birth weight (kgs) of the child =X1
HEIGHT = Height of mother (cms) = X2
AGEMOTH = Age of mother (years) = X3
AGEFATH = Age of father (years) = X4
FAMINC = Monthly family income (Birr) = X5
GESTAT = Period of gestation (days) = X6
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
4/7/2017 115
[email protected]
Multiple linear regression cont’d…
Answer the following questions based on the above
data
1. Check the association of each predictor with the
dependent variable.
2. Fit the full regression model
3. Fit the condensed regression model
4. What do you understand from your answers in parts 1,
2 and 3 ?
5. Check the assumptions required and explain.