0% found this document useful (0 votes)
16 views

Module 5 Ge 4educ

This document is a course module for a Mathematics course at Colegio de Las Navas, focusing on data management tools and statistical concepts such as normal distribution and correlation analysis. It outlines learning outcomes, including the ability to analyze data using statistical tools and understand mathematical concepts in various real-life applications. The module includes lessons on normal distribution, linear regression, and correlation, providing examples and calculations to illustrate these concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Module 5 Ge 4educ

This document is a course module for a Mathematics course at Colegio de Las Navas, focusing on data management tools and statistical concepts such as normal distribution and correlation analysis. It outlines learning outcomes, including the ability to analyze data using statistical tools and understand mathematical concepts in various real-life applications. The module includes lessons on normal distribution, linear regression, and correlation, providing examples and calculations to illustrate these concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Republic of the Philippines

Province of Northern Samar


Municipality of Las Navas
COLEGIO DE LAS NAVAS
(The First Community College in the Island of Samar)

EDUCATION DEPARTMENT
COURSE MODULE

Course Code Course Title Name of Student Contact Number


MATHEMATICS IN THE
GE 4 0946-595-3697
MODERN WORLD

MODULE 5:

CHAPTER 5: DATA MANAGEMENT TOOLS

OVERVIEW:

The role of data management tools is important to further analyze and interpret data. Utilizing these
tools will greatly enhance the theories that might be otherwise misunderstood. The theories come to life and
are deemed more meaningful once these data management tools are used.

COURSE LEARNING OUTCOMES:

At the end of the semester, the students should be able to:


Knowledge
a. Discuss and argue about the nature of mathematics, what it is, how it is expressed, represented and
used.
b. Use different types of reasoning to justify statements and arguments made about mathematics and
mathematical concepts.
c. Discuss the language and symbols of mathematics.
Skills
d. Use a variety of statistical tools to process and manage numerical data.
e. Analyze codes and coding schemes used for identification, privacy, and security purposes.
f. Use mathematics in other areas such as finance, voting, health and medicine, business,
environment, arts and design, and recreation.
Values
g. Appreciate the nature and uses of mathematics in everyday life.
h. Affirm honesty and integrity in the application of mathematics to various human endeavors.

Lesson 1: Normal Distribution

Most of the datasets out there have a central value. They are either narrowly or widely spread out.
Drawing a bell-shaped curve on a histogram determines if the data follows a normal distribution or the
Gaussian distribution, named after its founder, Karl Friedrich Gauss.
A normal distribution is a continuous probability distribution. This means that it generally uses either
interval or ratio data. The histogram is a great approximation of a normal distribution. Drawing a bell-
shaped curve on the histogram determines if the data follows a normal distribution. A bell-shaped curve
1 “Molding Holistic Individuals for a Brighter Tomorrow”
symbolizes that there is one central peak. The rest of the data are on either side of the center tapering off on
the extremes.

Figure 1 and 2 show non-normal distributions. Figure 1 has two peaks. There is also a gap in the
data. The peak of Figure 2 is not centered which violates the concept of a bell. However, Figure 3 shows a
normal distribution.
A normal distribution has the following properties:
1. It is a bell-shaped curve.
2. The total area under a normal curve is 1.
3. The tails of the normal curve are asymptotic to the horizontal axis.
4. The curve is symmetrical to the mean.
5. It is determined by the population mean and the population standard deviation . The mean
controls the center and the standard deviation controls the spread of the distribution.
6. The mean, median and mode have the same value.

The standard normal distribution has the same properties as that of the normal distribution except that the mean is 0
and the standard deviation is 1. The following figure shows the standard normal distribution.

2 “Molding Holistic Individuals for a Brighter Tomorrow”


It was stated that the normal distribution is symmetric about the mean. This signifies that the areas of
a z-value are the same, whether it is positive or negative. Hence, area of –z is equal to the area of +z.
The concept of probability is used for normal distribution. Probabilities are from 0 to 1. This means
that the values of areas cannot be negative. Moreover, they also cannot have values greater than 1.
The notation P(a < z < b), P(z < a) and P(z > a) will be used and their meanings are as follows:
 P(a < z < b) is read as "the probability or area of z between a and b."
 P(z < a) is read as "the probability or area of z less than a or to the left of a."
 P(z > a) is read as "the probability or area of z greater than a or to the right of z."
Note that the symbols have the same meanings as < and >. To find the areas, the Table of
Areas under the Normal Curve will be used. The table is also known as the z-table.
Using the z-table, the area of z = – 0.46 is 0.1772 and the area of z = 0.52 is 0.1985.
Note that the area under the normal curve is 1. Hence, if only the area above or below the mean is
shaded, then the area is 0.5. The figure below illustrates this concept.

To find the areas under the normal curve, three things must be done:
I. Draw the normal curve.
II. Shade the appropriate region.
III.
IV. Calculate the area by using the Table of Areas under the Normal Curve.

Here are some examples:


1. P(–0.72 < z < 0 )
III.
Therefore, the
answer is 0.2642.

3 “Molding Holistic Individuals for a Brighter Tomorrow”


2. P(0 < z < 1.83)

III.
Therefore, the
answer is 0.4664.

3. P(–2.58 < z < 2.58) III.

Since the mean is included in


the shaded region, the areas
must be added. Therefore,
0.4951 + 0.4951 = 0.9902.

4. P(z < 1.44)

III.
Since the mean is included in
the shaded region, and the area
to the left of the mean is
shaded, the area of 1.44 is to be
added to 0.5. The answer is
0.9251

5. P(z > 1.95)


III.
Since the shaded area is on the
extreme right, the area of 1.95,
must be subtracted from 0.5.
Therefore, the answer is
0.0256.

If the areas are given, what are the values of z? Here are some examples:
1. Find such that P(z > ) = 0.0125
Since the area given is less than 0.5, the shaded area is on the extreme left or extreme right.
However, looking at the direction, it can be seen that the shaded area is at the extreme right.

Since the shaded area is at the extreme right, the area


is to be subtracted from 0.5.
Therefore,
0.5 – 0.0125 = 0.4875
Obtaining the exact or closest value from the z-table,
the z-score is 2.24.

4 “Molding Holistic Individuals for a Brighter Tomorrow”


2. Find the values of such that the area is 0.8452.
Since the area given is more than 0.5 and there are two values of to be obtained, 0.8452 has to be
divided into 2.

Therefore, obtaining the


exact value or closest to 0.4226,
the z score is 1.42.

3. Find the value of if the highest 77% of the data is to be considered.


The shaded area of the highest 77% of the data is shown. Since the z-value is below the mean, it is
negative.
The shaded area from the mean to the
extreme right is 0.5. to get the z-score,

0.77 – 0.5 = 0.27 = 0.2700

Obtaining the exact value or closest to


0.2700, the z-score is – 0.74

There are various applications of the normal distribution to real-life problems. As such, these
problems are to be transformed to the standard normal distribution which makes use of the formula:
z=
where: z = standard normal score
x = random variable
= population mean
= population standard deviation

Note that the calculated value of z is to be rounded to the hundredths place.


Examples:
1. Thirteen students who took the final exam last term have a mean grade of 34.08 and standard deviation of
7.62.
a. What is the probability that Edna will get more than 40 in the final exam?

𝟒𝟎 𝟑𝟒.𝟎𝟖
z= = 0.78
𝟕.𝟔𝟐

Therefore, the area of 0.78 is to be


subtracted from 0.5. The answer is 0.2177.
This means that Edna has a 21.77% chance
of getting more than 40 in the final exam.

5 “Molding Holistic Individuals for a Brighter Tomorrow”


b. What is the probability that Edna will get a score between 30 and 40?
𝟑𝟎 𝟑𝟒.𝟎𝟖 𝟒𝟎 𝟑𝟒.𝟎𝟖
z1 = = −0.54 and z2 = = 0.78
𝟕.𝟔𝟐 𝟕.𝟔𝟐

Therefore, the areas of −0.54 and 0.78


are added. The answer is 0.4231. This means
that Edna has a 42.31% chance of getting a
score between 30 and 40.

2. The average age of a Filipino man to undergo sacrament of matrimony is 29 with a standard deviation of
2.5 years. Richard, aged 26, is contemplating if he should marry already. What is the probability that he will
marry before he reaches 30?
𝟐𝟔 𝟐𝟗 𝟑𝟎 𝟐𝟗
z1 = 𝟐.𝟓 = −1.2 and z2 = 𝟐.𝟓 = 0.4

Therefore, the areas of −1.2 and 0.4 are


added. The answer is 0.5403. This means that
Richard has a 54.03% chance of marrying
between 26 and 30 years old.

Lesson 2: Linear Regression and Correlation

Correlation analysis has touched quantitative research in many ways. Relationships among variables
are very important because they can explain certain phenomena that would eventually contribute to the
whole well-being of humanity.
Correlation analysis is the study of relationship between the independent and dependent
variables. It measures the strength and direction of continuous bivariate data. Examples of bivariate data are
time and academic performance, mass and width, etc.
The correlation coefficient, r, is used to determine if there is a linear relationship between two
variables. It has a value from –1 to +1. If the value of r is –1, then there is perfect negative linear
relationship between the two variables; if the value of r is +1, then there is a perfect positive linear
relationship between the two variables; and if the value or r is 0, then there is no linear relationship
between two variables. The closer the value of r to either –1 or +1 means that there is either a strong
negative or strong positive linear relationship between the two variables.
A scatter plot is a visual representation of the linear relationship between the two variables. It is a
graph involving the x and y – axes. The following scatterplots show the difference of linear relationship
between two variables:

6 “Molding Holistic Individuals for a Brighter Tomorrow”


There are many methods to get the value of a correlation coefficient. However, the Pearson's
moment correlation coefficient (or simply Pearson correlation coefficient) will be used throughout this
lesson. The formula for Pearson correlation coefficient is given by:

r=
√[ ( ) ][ ( ) ]

where:
X - independent variable
Y - dependent variable

Example:
To illustrate, assume that a proprietor of a fabrication shop wants to know if there is a relationship
between the number of hours on the lathe machine and the income (Php in hundred thousands) for each
month of a year. The results are as follows:

Month Lathe (x) Income (y)


January 6.0 6.00
February 4.5 5.50
March 5.75 4.00
April 6.25 5.00
May 4.0 3.75
June 4.75 4.50
July 6.25 8.00
August 5.50 6.60
September 5.0 4.95
October 4.50 3.90
November 4.50 4.60
December 5.25 6.00

Constructing a scatterplot helps to see if there is a relationship between the two variables. The scatter
plot is drawn below.

It can be presumed that there is a positive relationship between the number of hours on the lathe
machine and the income per month. To verify this relationship, the Pearson's r is calculated.
Month X Y XY X2 Y2
January 6.0 6.00 36.00 36.00 36.00
February 4.5 5.50 24.75 20.25 30.25
March 5.75 4.00 23.00 33.0625 16.00
7 “Molding Holistic Individuals for a Brighter Tomorrow”
April 6.25 5.00 31.25 39.0625 25.00
May 4.0 3.75 15.00 16.00 14.0625
June 4.75 4.50 21.375 22.5625 20.25
July 6.25 8.00 50.00 39.0625 64.00
August 5.50 6.60 36.30 30.25 43.56
September 5.0 4.95 24.75 25.00 24.5025
October 4.50 3.90 17.55 20.25 15.21
November 4.50 4.60 20.70 20.25 21.16
December 5.25 6.00 31.50 27.5625 36.00
Total X = 62.25 Y = 62.8 XY = 332.175 = 329.3125 = 345.995

r =
√ ( ) ( )

( . ) ( . )( . )
r =
√ ( . ) ( . ) ( . ) ( . )
( . ) ( . )
r =
√( . ) ( . ) ( . ) ( . )
.
r =
√ . .
.
r =
√ .
.
r = .

r = 0.61
As with the scatter plot, the direction of the obtained value is positive. Therefore, there is a positive
relationship between the number of hours on the lathe machine and the income per month.
Microsoft Excel can also be used to generate the Pearson correlation coefficient.

Lesson 3: Simple Linear Regression Analysis

Simple linear regression analysis is slightly different from linear correlation analysis. The aim of
linear regression analysis is to develop an equation to describe the relationship between variables.
Simple linear regression analyses coming from linear correlation analysis make use of the coefficient
of determination, r2. It is the percent variation in the dependent variable which is explained by all the
independent variables put together. It tells how much of the variance in the values of one variable can be

8 “Molding Holistic Individuals for a Brighter Tomorrow”


explained by the values on another variable. The values r2 ranges from 0 to 1, which means it is always
positive. It is obtained by simply taking the square of the correlation coefficient.

Simple linear regression analysis seeks to develop an equation that will predict future values of the
dependent variable from values of the independent variable. For this lesson, the discussion is only on one
dependent variable and one independent variable, hence the term "simple".
The regression line or prediction line is drawn on the scatter plot. It is given by:
y = a + bx
Where
y = predicted value of the dependent variable y
a = intercept of the regression line
x = value of the independent variable
b = slope of the regression line

The values for a and b are given below:

a = ( )

b = ( )

Note that before simple linear regression is done, a linear relationship between two variables must be
guaranteed.

Example:
To illustrate, refer to the number of hours on the lathe machine and the income per month of a year.
It was already established that there is a positive relationship between the two variables. Therefore, a
regression line can be developed for the bivariate data.

Month X Y XY X2 Y2
January 6.0 6.00 36.00 36.00 36.00
February 4.5 5.50 24.75 20.25 30.25
March 5.75 4.00 23.00 33.0625 16.00
April 6.25 5.00 31.25 39.0625 25.00
May 4.0 3.75 15.00 16.00 14.0625
June 4.75 4.50 21.375 22.5625 20.25
July 6.25 8.00 50.00 39.0625 64.00
August 5.50 6.60 36.30 30.25 43.56
September 5.0 4.95 24.75 25.00 24.5025
October 4.50 3.90 17.55 20.25 15.21
November 4.50 4.60 20.70 20.25 21.16
December 5.25 6.00 31.50 27.5625 36.00
Total X = 62.25 Y = 62.8 XY = 332.175 = 329.3125 = 345.995

Using the formulas for a and b, the following are obtained:


Sove for a:

a = ( )
( . )( . ) ( . )( . )
a = ( . ) ( . )

9 “Molding Holistic Individuals for a Brighter Tomorrow”


( . ) ( . )
a = ( ) ( )
. .
.
a = .

a = 0. 038
Solve for b:
b = ( )
( )( . ) ( . )( . )
b = ( ) ( )
. .
( . ) ( . )
b =( ) ( )
. .
.
b = .

b = 1.001
Therefore, the regression line is y = 0.038 + 1.001x.

Using the Microsoft Excel, the values of a and b can be also be generated.

This regression line is drawn on the scatterplot. This is shown below.

10 “Molding Holistic Individuals for a Brighter Tomorrow”


The regression line serves as the estimator. If the lathe machine is used for 4.25 hours, then the
estimated income is Php 116, 250.00.
Since the correlation coefficient is 0.61, the coefficient of determination is 0.37. This means that
37% of the income per month is explained by the number of hours on the lathe machine. Furthermore, there
are other factors (63%) which are needed to explain the income per month of the fabrication shop.

Note:
Simple linear regression analysis determines an equation that
predicts values of one variable against values of another variable. The
coefficient of determination also helps in determining the percent of
variation explained by the independent variable.

Self – Check Test

Find the areas of each of the following z-scores.


z-scores Area
1. 0.99 ____________________
2. – 0.52 ____________________
3. 0.66 ____________________
4. 1.87 ____________________
5. – 2.58 ____________________
6. 3.16 ____________________
7. – 0.12 ____________________
8. – 1.25 ____________________
9. 2.09 ____________________
10. 0.50 ____________________

Evaluation Activities
Solve the following. Show your solution.
1. A coffee machine dispenses coffee into 12-ounce cups. Tests show that the actual amount of coffee
dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz.
a. What percent of cups will receive less than 11.25 oz of coffee?
b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
c. If a cup is filled at random, what is the probability that the machine will overflow the cup?
2. Mr. Greco runs to keep himself physically fit. He wants to know if there is a relationship between the
time lapsed and the kilometers he ran.
Kilometres Ran Time (in minutes)
10 90
10 60
16 150
16 90
21 160
21 180
a. Calculate the Pearson correlation coefficient.
b. Determine the regression line equation.
11 “Molding Holistic Individuals for a Brighter Tomorrow”
REFERENCES:

Aufmann, R. et al, (2018), Mathematics in The Modern World


Broto, A., Statistics Made Simple. Second Edition
Manlulu, E. et al, (2019), A Course Module for Mathematics in The Modern World.
Mathematics in The Modern World. Book Store, Inc. (RBSI) 2018

CHARLITA P. COLANGAN
Course Facilitator

12 “Molding Holistic Individuals for a Brighter Tomorrow”

You might also like