Lesson 4 MMW
Lesson 4 MMW
World
UNIT 4: DATA MANAGEMENT
4.1. Introduction
In our day-to-day life, data information such as numbers, words, patterns, or
images are everywhere. These data are gathered, recorded, evaluated,
interpreted, and then eventually applied to make useful decisions. In other
words, these data are managed for efficiency and make life easier.
In this unit, we will study the basics of data management and use statistical
tools derived from mathematics that are useful in data management and
interpretation.
4.2.1. Data: Gathering and Organizing, Representing using Graphs and Charts,
Interpreting Organized Data
The study of data known as statistics is divided into two branches as seen in
the figure below.
Branch of Statistics
Statistics
It is the branch of statistics that involves organizing, displaying, and describing data.
3 | Mathematics in the Modern 2
World
In statistics, the term population is any specific collection of objects of interest.
A sample is any subset or subcollection of the population.
If in case that the sample consists of the whole population, it is termed a census.
Data
Consider the following examples below. These are taken in the Eastern
Visayas COVID-19 Cases Bulletin by the Department of Health – Eastern
Visayas on Tuesday, November 10, 2020.
3 | Mathematics in the Modern 3
World
When the data is effectively organized and presented, we can now interpret
this data to guide us in making important decisions.
3 | Mathematics in the Modern 4
World
For example, based on the presented data, the highest cases of COVID-19 are
in the Province of Samar, to lessen the increase of the number of cases we can
suggest to the provincial authorities to strengthen the implementation of
health protocols in the said area.
Self-Assessment
4. Make at least ten (10) interpretations from the data presented
regarding the COVID-19 Cases in Eastern Visayas in this section.
5. Identify each of the following data sets as either a population or a
sample:
a. The grade point averages (GPAs) of all students at a college.
b. The GPAs of a randomly selected group of students on a
college campus.
c. The ages of the College of Engineering Faculty
d. The gender of every second customer who enters a movie
theater.
e. The lengths of short mackerel “hasa-hasa” caught on a
fishing trip to the beach.
3. Identify the following measures as either quantitative or
qualitative:
a. The 30 high-temperature readings of the last 30 days.
b. The scores of 40 students on an English test.
c. The blood types of 120 teachers in a senior high school.
d. The last four digits of social security numbers of all
students in a class.
e. The numbers on the jerseys of 5 basketball players on a
team.
World
where:
x = data
w = weight
Solution:
Mean
∑𝑥 18+15+21+16+15+14+15+21
a. 𝑥̅ = 𝑛 = 8 = 16.875
3 | Mathematics in the Modern 6
World
∑𝑥 2+5+8+9+11+4+7
b. 𝑥̅ = 𝑛= 7 = 6.571
Median
Rank the numbers from smallest to largest, if the number of data is odd,
the middle number is the median, if the number of data is even, the
median is the mean of the two middle numbers.
a. 18, 15, 21, 16, 15, 14, 15, 21
Arrange: 14, 15, 15, 15, 16, 18, 21, 21
Number of data: 8 – even
15, 16,
Get the mean of the two middle numbers: 14, 15, 15, 18, 21, 21
∑ 𝑥 15 + 16
𝑥̅ = = = 15.5
𝑛 2
Median is
15.5
b. 2, 5, 8, 9, 11, 4, 7
Arrange: 2, 4, 5, 7, 8, 9, 11
Number of data: 7 – odd
Middle number: 2, 4, 5, 7, 8, 9, 11
Median is 7.
Mode
The mode is the data that is always occurring or frequent.
b. 2, 5, 8, 9, 11, 4, 7
Arrange: 2, 4, 5, 7, 8, 9, 11
Each number on the list occurs only once. Because no number occurs
more often than the others, there is no mode.
3 | Mathematics in the Modern 7
World
Example: Weighted Mean
2. Calculate the GPA of Trimm’s grades. Use the weighted mean formula
to find Trimmm’s GPA:
Course Units Grade
Math 1 4 2.4
Chem 1 4 2.1
GE 3 3 1.9
GE 10 3 1.7
Solution:
∑ 𝑤𝑥
𝑥̅ =
∑𝑤
(4 ∙ 2.4) + (4 ∙ 2.1) + (3 ∙ 1.9) + (3 ∙ 1.7)
𝑥̅ = = 2.06
4+4+3+3
Data that has not been organized or manipulated in any manner is called raw
data. A large collection of raw data may not provide much pertinent
information that can be readily observed. A frequency distribution, which is a
table that lists observed events and the frequency of occurrence of each
observed event, is often used to organize raw data.
For instance, consider the following table, which lists the number of laptop
computers owned by families in each of 40 homes in a subdivision.
Table 4.1
The frequency distribution in Table 4.2 below was constructed using the data
from Table 4.1. The first column of the frequency distribution consists of the
numbers 0, 1, 2, 3, 4, 5, 6, and 7. The corresponding frequency of occurrence, f,
of each of the numbers in the first column is listed in the second column.
3 | Mathematics in the Modern 8
World
Table 4.2
The formula for a weighted mean can be used to find the mean of the data in a
frequency distribution. The only change is that the weights w1, w2, w3, ..., wn are
replaced with the frequencies f1, f2, f3, ..., fn.
To find the weighted mean of Table 4.2, we use the formula for weighted mean.
The mean number of laptop computers per household for the homes in the sub
division is 1.975.
Self-Assessment
18. A housing division consists of 45 homes. The following frequency distribution shows the n
Trimm’s GPA is 2.06
19. Find the mean, median and mode for the data in the following lists. a. 3, 3, 3, 3, 3, 4, 4, 5, 5,
b. 12, 34, 12, 71, 48, 93, 71
3 | Mathematics in the Modern 9
World
4.2.3. Measures of Dispersions: Range, Standard Deviation and Variance
In the previous section, we are introduced to the three types of average values
for a data set – the mean, the median, and the mode. The said measures of
central tendency only describes the central position or the average of values of
a given set of data, it does not reflect the spread or dispersion of data.
Measures of Dispersion
World
Example: Range
3. Find the range of the given data in the table.
Solution:
Machine 1
Max (x) = 10.07
Min (x) = 5.85
Range (x) = Max (x) – Min (x) = 10.07 – 5.85 = 4.22
Machine 2
Max (x) = 8.03
Min (x) = 7.95
Range (x) = Max (x) – Min (x) = 8.03 – 7.95 = 0.08
World
Step 2: For each number, calculate the deviation between the number and the
mean.
Step 3: Calculate the square of each of the deviations in Step 2, and find the
sum of these squared deviations.
Calculation of variance
Since the standard deviation is known, the variance is the square of the
standard deviation:
World
Self-Assessment
A student has the following quiz scores: 5, 8, 16, 17, 18, 20. Find the standard deviation, ra
A consumer group has tested a sample of 8 size-D batteries from each of 3 companies. Th
table.
.
(a.) Compute for the standard deviation for each company.
(b.) According to these tests, which company produces batteries for
which the values representing hours of constant use have the smallest standard devia
4.2.4. Measures of Relative Position: Z-Scores, Percentiles, Quartiles, and Box- and-
Whiskers Plots
Measures of relative position are conversions of values that show where a
given specific value stands in relation to other values of the same grouping.
4.2.4.1 Z-Scores
Consider an Internet site that offers movie downloads. Based on data kept by
the site, an estimate of the mean time to download a certain movie is 12 min
with a standard deviation of 4 min.
When you download this movie, the download takes 20 min, and you think
that is an unusually long time for the download. On the other hand, when
your friend downloads the movie, the download takes only 6 min, and your
friend is pleasantly surprised at how quickly she receives the movie. The
point here is that, in each case, a data value far from the mean is unexpected.
The graph below shows the download times for this movie using two
different measures: the number of minutes a download time is from the mean
and the number of standard deviations the download time is from the mean.
3 | Mathematics in the Modern 13
World
Measuring the distance of a data value from the mean in standard deviation
units instead of in the units of the data (minutes in this example) is quite
useful. The number of standard deviations a data value is from the mean is
known as its z-score or standard score.
Example: Z-Score
5. Raul has taken two tests in his chemistry class. He scored 72 on the first
test, for which the mean of all scores was 65 and the standard deviation
was 8. He received a 60 on a second test, for which the mean of all
scores was 45 and the standard deviation was 12. In comparison to the
other students, did Raul do better on the first test or the second test?
Given:
First test: 𝑥 = 72 𝑥̅ = 65 𝑠 = 8
Second test: 𝑥 = 60 𝑥̅ = 45 𝑠 = 12
Solution:
𝑧1𝑠𝑡 𝑡𝑒𝑠𝑡 = 𝑥 − 𝑥̅ 72 − 65
= = 0.875
𝑥 𝑠− 𝑥̅ 608− 45
𝑧2𝑛𝑑 𝑡𝑒𝑠𝑡 =
= = 1.25
𝑠 12
Raul scored 0.875 standard deviation above the mean on the first test
and 1.25 standard deviations above the mean on the second test. These
z-scores indicate that, in comparison to his classmates, Raul scored
better on the second test than he did on the first test.
3 | Mathematics in the Modern 14
World
4.2.4.2 Percentiles
Most standardized examinations provide scores in terms of percentiles, which
are defined as follows:
The following formula can be used to find the percentile that corresponds to a
particular data value in a set of data.
Example: Percentile
6. On a reading examination given to 900 students, Elaine’s score of 602
was higher than the scores of 576 of the students who took the
examination. What is the percentile for Elaine’s score?
Solution:
4.2.4.3 Quartiles
The three numbers Q1, Q2, and Q3 that partition a ranked data set into four
(approximately) equal groups are called the quartiles of the data. For instance,
for the data set below, the values Q1 = 11, Q2 = 29, and Q3 = 104 are the
quartiles of the data.
3 | Mathematics in the Modern 15
World
The quartile Q1 is called the first quartile. The quartile Q2 is called the second
quartile. It is the median of the data. The quartile Q3 is called the third quartile.
Solution:
Step 3: There are 12 data values less than the median and 12 data values
greater than the median.
The first quartile is the median of the data values less than the median.
Thus, Q1 is the mean of the data values with ranks of 6 and 7.
39 + 39
𝑄1 = = 39
2
3 | Mathematics in the Modern 16
World
The third quartile is the median of the data values greater than the
median. Thus, Q3 is the mean of the data values with ranks of 19 and 20.
50 + 53
𝑄3 = = 51.5
2
4.2.4.4 Box-and-Whiskers Plots
A box-and-whisker plot (sometimes called a box plot) is often used to
provide a visual summary of a set of data. A box-and-whisker plot shows the
median, the first and third quartiles, and the minimum and maximum values
of a data set. See the figure below.
Solution:
For the data set in Example 7, we determined that Q 1 = 39, Q2 = 43, and
Q3 = 51.5. The minimum data value for the data set is 26, and the
maximum data value is 73. Thus, the box-and-whisker plot is as shown
on the next page.
3 | Mathematics in the Modern 17
World
Self-Assessment
22. Roland received a score of 70 on a test for which the mean score was
65.5. Roland has learned that the z-score for his test is 0.6. What is the standard deviation
On an examination given to 8600 students, Hal’s score of 405 was higher than the scores
A The following table lists the weights, in ounces, of 15 avocados in
a random sample. Find the quartiles for the data.
World
The following rule, called the Empirical Rule, describes the percents of data
that lie within 1, 2, and 3 standard deviations of the mean in a normal
distribution.
3 | Mathematics in the Modern 19
World
To understand the empirical rule much better, watch the video entitled Empirical Rule (68-
Given:
𝜇 = 0.61 𝑙𝑏 𝜎 = 0.15 𝑙𝑏
Solution:
1. 0.76 lb. is 1 standard deviation above the mean of 0.61 lb.
𝜇 + 𝜎 = 0.61 𝑙𝑏 + 0.15 𝑙𝑏 = 0.76 𝑙𝑏
In a normal distribution, 34% of all data lie between the mean 𝜇 and
1 standard deviation above the mean( 𝜇 + 𝜎) and 50% of all data lie
below the mean. Thus, 34% - 50% = 84% of the tomatoes weigh less
than 0.76 lb.
84%
3 | Mathematics in the Modern 20
World
2. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
weigh more than 0.31 lb.?
This gives a total of 97.5% of the tomatoes that weigh more than
0.31 lb. Therefore, 97.5% ∙ (6000 lb.) = 5850 lb.
This gives a total of 95% of the tomatoes that weigh between 0.31 lb.
to 0.91 lb. Therefore, 95% ∙ (4500 lb.) = 4275 lb.
3 | Mathematics in the Modern 21
World
4.2.5.1. The Standard Normal Distribution
It is often helpful to convert data values x to z-scores, as we did in the
previous section by using the z-score formulas:
Tables and calculators are often used to determine the area under a portion of
the standard normal curve. We will refer to this type of area as an area of the
standard normal distribution.
Table 4.3 gives the approximate areas of the standard normal distribution
between the mean 0 and z standard deviations from the mean. (See figure
beside Table 4.3). Table 4.3 indicates that the area A of the standard normal
distribution from the mean 0 up to z = 1.34 is 0.410 square unit.
3 | Mathematics in the Modern 22
World
Table 4.3
3 | Mathematics in the Modern 23
World
Because the standard normal distribution is symmetrical about the mean of 0, we
can also use Table 4.3 to find the area of a region that is located to the left of the
mean. This process is explained in the Example below.
Solution:
Because the standard normal distribution
is symmetrical about the center line z = 0,
the area of the standard normal
distribution between z = 1.44 and z = 0 is
equal to the area between z = 0 and z =
1.44.
Solution:
Table 4.3 indicates that the area from z = 0 to z = 0.82
is 0.294 square unit. The area to the right of z = 0 is
0.500 square unit. Thus, the area to the right of z =
0.82 is 0.500 - 0.294 = 0.206 square unit.
World
Example: Solve an Application
12. A soda machine dispenses soda into 12-ounce cups. Tests show that the actual
amount of soda dispensed is normally distributed, with a mean of 11.5 oz and
a standard deviation of 0.2 oz.
a. What percent of cups will receive less than 11.25 oz of soda?
b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
c. If a cup is chosen at random, what is the probability that the machine
will overflow the cup?
Solution:
a. Recall that the formula for the z-score for a data value x is
𝑥 − 𝑥̅
𝑧𝑥 =
𝑠
The z-score for 11.25 oz is
11.25 − 11.5
𝑧11.25 =
= −1.25
0.2
Table 4.3 indicates that 0.394 (39.4%) of the data in a normal
distribution are between z = 0 and z = 1.25.
Thus, 10.6% of the cups filled by the soda machine will receive less than
11.25 oz of soda.
World
Table 4.3 indicates that 0.433 (43.3%) of
the data in a normal distribution are
between z = 0 and z = 1.5. Because the
data are normally distributed, 43.3% of
the data is also between z = 0 and z = -
1.5.
Thus, the percent of the cups that the vending machine will fill with
between 11.2 oz and 11.55 oz of soda is 43.3% + 9.9% = 53.2%.
Thus, 0.6% of the time the machine produces an overflow, and the
probability that a cup chosen at random will overflow is 0.006.
Self-Assessment
Find the area of the standard normal distribution between z = -0.67 and z = 0.
Find the area of the standard normal distribution to the left of z = -1.47.
A study of the careers of professional football players shows that the lengths of their careers are
years and a standard deviation of 1.8 years.
What percent of professional football players have a career of more than 9 years?
If a professional football player is chosen at random, what is the probability that the player will
3 | Mathematics in the Modern 26
World
4.2.6. Linear Regression and Correlation, Least-Squares Line, and Linear
Correlation Coefficient
In many applications, scientists try to determine whether two variables are
related. If they are related, the scientists then try to find an equation that can
be used to model the relationship.
The results of these experiments tended to support the idea that the speed y of
an animal is related to the animal’s stride length x. To better understand this
relationship, examine the data in Table 4.4, which are similar to, but less
extensive than, the data collected by Alexander and Jayes.
World
A graph of the ordered pairs in Table 4.4
is shown in figure at the right. In this
graph, which is called a scatter diagram
or scatter plot, the x-axis represents the
stride lengths in meters and the y-axis
represents the average speeds in meters
per second.
After a relations hip between paired data, which are referred to as bivariate
data, has been discovered, a scientist tries to model the relationship with an
equation.
The least-squares regression line is also called the least-squares line. The
approximate equation of the least-squares line for the bivariate data for the
dogs is 𝑦̂ = 3.2𝑥 − 1.1
3 | Mathematics in the Modern 28
World
In figure at the right, the vertical deviations
from the ordered pairs to the graph of 𝑦̂
= 3.2𝑥 − 1.1 are 0, -0.06, 0.5, -0.52, -0.16, -0.6,
0.34 and 0.2.
The next formula can be used to determine the equation of the least-squares
line for a given set of ordered pairs.
In the formula for the least-squares regression line, ∑ 𝑥 represents the sum of
all the x values, ∑ 𝑦 represents the sum of all the y values, and ∑ 𝑥𝑦 represents
the sum of the n products x1y1, x2y2, ..., xnyn.
The notation 𝑥̅ represents the mean of the x values, and 𝑦̅ represents the mean
of the y values.
World
Solution:
The ordered pairs are (2.5, 3.4), (3.0, 4.9), (3.3, 5.5), (3.5, 6.6), (3.8, 7.0), (4.0, 7.7),
(4.2, 8.3), (4.5, 8.7). The number of ordered pairs is n = 8.
Organize the data in four columns, as shown in table below. Then, find the
sum of each column.
World
Example: Use Least-Squares Line to Make Predictions
14. Use the equation of the least-squares line from Example 13 to predict the
average speed of an adult man for each of the following stride lengths. Round
your results to the nearest tenth of a meter per second.
a. 2.8 m
b. 4.8 m
Solution:
a. In Example 13, we found the equation of the least-squares line to be
𝑦̂ = 2.7𝑥 − 3.3. Substituting 2.8 for x gives:
𝑦̂ = 2.7(2.8) − 3.3 = 4.26
Rounding 4.26 to the nearest tenth produces 4.3. Thus, 4.3 m/s is the
predicted average speed for an adult man with a stride length of 2.8 m.
World
If the linear correlation coefficient r is positive, the relationship between the
variables has a positive correlation. In this case, if one variable increases, the
other variable also tends to increase. If r is negative, the linear relationship
between the variables has a negative correlation. In this case, if one variable
increases, the other variable tends to decrease.
Figures below shows some scatter diagrams along with the type of linear
correlation that exists between the x and y variables. The closer |r| is to 1, the
stronger the linear relationship between the variables.
World
Example: Find the Linear Correlation Coefficient
15. Find the linear correlation coefficient for stride length versus speed of an
adult man. Use the data in Table 4.4a. Round your result to the nearest
hundredth.
Solution:
The ordered pairs are
(2.5, 3.4), (3.0, 4.9), (3.3, 5.5), (3.5, 6.6), (3.8, 7.0), (4.0, 7.7), (4.2, 8.3), (4.5, 8.7).
Self-Assessment
18. Find the equation of the least-squares line for the stride length and
speed of camels given in Table 4.4c.
Use the equation of the least-squares line from Self-Assessment No.18 to predict the av
2.7 m
4.5 m
Find the linear correlation coefficient for stride length versus speed of a camel as giv
hundredth.
3 | Mathematics in the Modern 33
World
Assessment
Exercise No.4:
Answer all the self-check questions in this unit and write/encode it into a
sheet of bond paper.
Quiz No. 4:
1. How do we organize and present data?
2. Why do we have to organize and present data?
3. A consumer testing agency has tested the strengths of 3 brands of 1/8-
inch rope. The results of the tests are shown in the following table.
According to the sample test results, which company produces 1/8-
inch rope for which the breaking point has the smallest standard
deviation?
4. Calculate the variance for each company in problem no.3.
4.3. References
Aufmann, R., Lockwood, J., Nation, R., et.al. (2018). Mathematics in the Modern
World. Philippine Edition. Rex Bookstore.
Aufmann, R., Lockwood, J., Nation, R., and Clegg, K. (2013). Mathematical
Excursions, 3rd Edition. Cengage Learning.
4.4. Acknowledgement
The images, tables, figures and information contained in this module were
taken from the references cited above.