Module 5 Ge 4educ
Module 5 Ge 4educ
EDUCATION DEPARTMENT
COURSE MODULE
MODULE 5:
OVERVIEW:
The role of data management tools is important to further analyze and interpret data. Utilizing these
tools will greatly enhance the theories that might be otherwise misunderstood. The theories come to life and
are deemed more meaningful once these data management tools are used.
Most of the datasets out there have a central value. They are either narrowly or widely spread out.
Drawing a bell-shaped curve on a histogram determines if the data follows a normal distribution or the
Gaussian distribution, named after its founder, Karl Friedrich Gauss.
A normal distribution is a continuous probability distribution. This means that it generally uses either
interval or ratio data. The histogram is a great approximation of a normal distribution. Drawing a bell-
shaped curve on the histogram determines if the data follows a normal distribution. A bell-shaped curve
1 “Molding Holistic Individuals for a Brighter Tomorrow”
symbolizes that there is one central peak. The rest of the data are on either side of the center tapering off on
the extremes.
Figure 1 and 2 show non-normal distributions. Figure 1 has two peaks. There is also a gap in the
data. The peak of Figure 2 is not centered which violates the concept of a bell. However, Figure 3 shows a
normal distribution.
A normal distribution has the following properties:
1. It is a bell-shaped curve.
2. The total area under a normal curve is 1.
3. The tails of the normal curve are asymptotic to the horizontal axis.
4. The curve is symmetrical to the mean.
5. It is determined by the population mean and the population standard deviation . The mean
controls the center and the standard deviation controls the spread of the distribution.
6. The mean, median and mode have the same value.
The standard normal distribution has the same properties as that of the normal distribution except that the mean is 0
and the standard deviation is 1. The following figure shows the standard normal distribution.
To find the areas under the normal curve, three things must be done:
I. Draw the normal curve.
II. Shade the appropriate region.
III.
IV. Calculate the area by using the Table of Areas under the Normal Curve.
III.
Therefore, the
answer is 0.4664.
III.
Since the mean is included in
the shaded region, and the area
to the left of the mean is
shaded, the area of 1.44 is to be
added to 0.5. The answer is
0.9251
If the areas are given, what are the values of z? Here are some examples:
1. Find such that P(z > ) = 0.0125
Since the area given is less than 0.5, the shaded area is on the extreme left or extreme right.
However, looking at the direction, it can be seen that the shaded area is at the extreme right.
There are various applications of the normal distribution to real-life problems. As such, these
problems are to be transformed to the standard normal distribution which makes use of the formula:
z=
where: z = standard normal score
x = random variable
= population mean
= population standard deviation
𝟒𝟎 𝟑𝟒.𝟎𝟖
z= = 0.78
𝟕.𝟔𝟐
2. The average age of a Filipino man to undergo sacrament of matrimony is 29 with a standard deviation of
2.5 years. Richard, aged 26, is contemplating if he should marry already. What is the probability that he will
marry before he reaches 30?
𝟐𝟔 𝟐𝟗 𝟑𝟎 𝟐𝟗
z1 = 𝟐.𝟓 = −1.2 and z2 = 𝟐.𝟓 = 0.4
Correlation analysis has touched quantitative research in many ways. Relationships among variables
are very important because they can explain certain phenomena that would eventually contribute to the
whole well-being of humanity.
Correlation analysis is the study of relationship between the independent and dependent
variables. It measures the strength and direction of continuous bivariate data. Examples of bivariate data are
time and academic performance, mass and width, etc.
The correlation coefficient, r, is used to determine if there is a linear relationship between two
variables. It has a value from –1 to +1. If the value of r is –1, then there is perfect negative linear
relationship between the two variables; if the value of r is +1, then there is a perfect positive linear
relationship between the two variables; and if the value or r is 0, then there is no linear relationship
between two variables. The closer the value of r to either –1 or +1 means that there is either a strong
negative or strong positive linear relationship between the two variables.
A scatter plot is a visual representation of the linear relationship between the two variables. It is a
graph involving the x and y – axes. The following scatterplots show the difference of linear relationship
between two variables:
r=
√[ ( ) ][ ( ) ]
where:
X - independent variable
Y - dependent variable
Example:
To illustrate, assume that a proprietor of a fabrication shop wants to know if there is a relationship
between the number of hours on the lathe machine and the income (Php in hundred thousands) for each
month of a year. The results are as follows:
Constructing a scatterplot helps to see if there is a relationship between the two variables. The scatter
plot is drawn below.
It can be presumed that there is a positive relationship between the number of hours on the lathe
machine and the income per month. To verify this relationship, the Pearson's r is calculated.
Month X Y XY X2 Y2
January 6.0 6.00 36.00 36.00 36.00
February 4.5 5.50 24.75 20.25 30.25
March 5.75 4.00 23.00 33.0625 16.00
7 “Molding Holistic Individuals for a Brighter Tomorrow”
April 6.25 5.00 31.25 39.0625 25.00
May 4.0 3.75 15.00 16.00 14.0625
June 4.75 4.50 21.375 22.5625 20.25
July 6.25 8.00 50.00 39.0625 64.00
August 5.50 6.60 36.30 30.25 43.56
September 5.0 4.95 24.75 25.00 24.5025
October 4.50 3.90 17.55 20.25 15.21
November 4.50 4.60 20.70 20.25 21.16
December 5.25 6.00 31.50 27.5625 36.00
Total X = 62.25 Y = 62.8 XY = 332.175 = 329.3125 = 345.995
r =
√ ( ) ( )
( . ) ( . )( . )
r =
√ ( . ) ( . ) ( . ) ( . )
( . ) ( . )
r =
√( . ) ( . ) ( . ) ( . )
.
r =
√ . .
.
r =
√ .
.
r = .
r = 0.61
As with the scatter plot, the direction of the obtained value is positive. Therefore, there is a positive
relationship between the number of hours on the lathe machine and the income per month.
Microsoft Excel can also be used to generate the Pearson correlation coefficient.
Simple linear regression analysis is slightly different from linear correlation analysis. The aim of
linear regression analysis is to develop an equation to describe the relationship between variables.
Simple linear regression analyses coming from linear correlation analysis make use of the coefficient
of determination, r2. It is the percent variation in the dependent variable which is explained by all the
independent variables put together. It tells how much of the variance in the values of one variable can be
Simple linear regression analysis seeks to develop an equation that will predict future values of the
dependent variable from values of the independent variable. For this lesson, the discussion is only on one
dependent variable and one independent variable, hence the term "simple".
The regression line or prediction line is drawn on the scatter plot. It is given by:
y = a + bx
Where
y = predicted value of the dependent variable y
a = intercept of the regression line
x = value of the independent variable
b = slope of the regression line
a = ( )
b = ( )
Note that before simple linear regression is done, a linear relationship between two variables must be
guaranteed.
Example:
To illustrate, refer to the number of hours on the lathe machine and the income per month of a year.
It was already established that there is a positive relationship between the two variables. Therefore, a
regression line can be developed for the bivariate data.
Month X Y XY X2 Y2
January 6.0 6.00 36.00 36.00 36.00
February 4.5 5.50 24.75 20.25 30.25
March 5.75 4.00 23.00 33.0625 16.00
April 6.25 5.00 31.25 39.0625 25.00
May 4.0 3.75 15.00 16.00 14.0625
June 4.75 4.50 21.375 22.5625 20.25
July 6.25 8.00 50.00 39.0625 64.00
August 5.50 6.60 36.30 30.25 43.56
September 5.0 4.95 24.75 25.00 24.5025
October 4.50 3.90 17.55 20.25 15.21
November 4.50 4.60 20.70 20.25 21.16
December 5.25 6.00 31.50 27.5625 36.00
Total X = 62.25 Y = 62.8 XY = 332.175 = 329.3125 = 345.995
a = ( )
( . )( . ) ( . )( . )
a = ( . ) ( . )
a = 0. 038
Solve for b:
b = ( )
( )( . ) ( . )( . )
b = ( ) ( )
. .
( . ) ( . )
b =( ) ( )
. .
.
b = .
b = 1.001
Therefore, the regression line is y = 0.038 + 1.001x.
Using the Microsoft Excel, the values of a and b can be also be generated.
Note:
Simple linear regression analysis determines an equation that
predicts values of one variable against values of another variable. The
coefficient of determination also helps in determining the percent of
variation explained by the independent variable.
Evaluation Activities
Solve the following. Show your solution.
1. A coffee machine dispenses coffee into 12-ounce cups. Tests show that the actual amount of coffee
dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz.
a. What percent of cups will receive less than 11.25 oz of coffee?
b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
c. If a cup is filled at random, what is the probability that the machine will overflow the cup?
2. Mr. Greco runs to keep himself physically fit. He wants to know if there is a relationship between the
time lapsed and the kilometers he ran.
Kilometres Ran Time (in minutes)
10 90
10 60
16 150
16 90
21 160
21 180
a. Calculate the Pearson correlation coefficient.
b. Determine the regression line equation.
11 “Molding Holistic Individuals for a Brighter Tomorrow”
REFERENCES:
CHARLITA P. COLANGAN
Course Facilitator