Stat Q4M3
Stat Q4M3
PROBABILITY
REGRESSION ANALYSIS
Regression analysis is the statistical method used to determine
the structure of a relationship between two variables (single linear
regression) or three or more variables (multiple regression).
The regression line is also called as the line of best fit. Its
significance is in enabling us to interpret data trends and help
us in making predictions based on that data, the latter which is
to be discussed further in the next lesson.
Take note that in doing regression, you first need to consider
the following assumptions:
a. There exists a relationship between the variables; and
b. The relationship is tested to be significant.
The stated conditions are necessary to be first met, otherwise doing
a regression analysis would be totally pointless.
A scatterplot is one way of illustrating a line of best fit. The figure
below shows a scatterplot of a data of two variables. Notice that
several lines can be drawn on the graph near the points. With this,
you should be able to draw the line of best fit. Best fit means that
the sum of the squares of the vertical distances from each point to
the line is at a minimum.
The Equation of a Regression Line
Going back in our algebra concepts, an equation of a line is
given by y = mx + b where m stands for the slope and b for the y-intercept.
Similarly, an equation of a regression line is given by y’=a+bx
where b is the slope and a is the y-intercept.
Furthermore, the corresponding formulas for the y-intercept a
and the slope b are as follows:
Before we proceed with our initial computation, we must remember that in making regression
analysis, the data must be correlated and that the correlation must be significant. For the sake
of this discussion let us just have the assumption that such requirements have been met.
Before we proceed with our initial computation, we must remember that in
making regression analysis, the data must be correlated and that the correlation
must be significant. For the sake of this discussion let us just have the
assumption that such requirements have been met.
Solution: we need to solve for the values in the slope a & y-
intercept b
Hence, the equation of the regression line y’= a + bx is
y'=75.667+1.583x where the slope is 1.583 and the y-intercept is
75.667.
Interpretation
In the regression line equation, our slope b is 1.583 which means
that for every change in the value of x, which is the number of
study hours, the value of y which is the score also changes at 1.583
unit on the average. Similarly, the value of the y–intercept a is
75.667. This means that the score of a student would be 75.667 if
he/she has zero hours of study.
Now, since our main objective is to predict the value of y when the
value of x is 14, we will now use our newfound equation. We will
replace x with 14.
y'=75.667+1.583x
y'=75.667+1.583(14)
y'=75.667+22.162
y'=97.829