0% found this document useful (0 votes)
95 views33 pages

Lesson 4 MMW

The document discusses data management and statistical tools used for data interpretation. It covers topics like gathering and organizing data using graphs and charts, measures of central tendency including mean, median and mode, and calculating weighted mean. Examples are provided to demonstrate calculating mean, median, mode and weighted mean of different data sets.

Uploaded by

For Documents
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views33 pages

Lesson 4 MMW

The document discusses data management and statistical tools used for data interpretation. It covers topics like gathering and organizing data using graphs and charts, measures of central tendency including mean, median and mode, and calculating weighted mean. Examples are provided to demonstrate calculating mean, median, mode and weighted mean of different data sets.

Uploaded by

For Documents
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

3 | Mathematics in the Modern 1

World
UNIT 4: DATA MANAGEMENT

4.0. Intended Learning Outcomes


By the end of this unit you should be able to:
a. Discuss various processes on how to organize, manage, and
interpret numerical data.
b. Solve problems using different statistical tools.
c. Solve mathematical problems using linear regressions and correlations.

4.1. Introduction
In our day-to-day life, data information such as numbers, words, patterns, or
images are everywhere. These data are gathered, recorded, evaluated,
interpreted, and then eventually applied to make useful decisions. In other
words, these data are managed for efficiency and make life easier.

This process of data management falls in the branch of science known as


statistics. By definition, statistics is the study of data; it involves the
collection, presentation, organization, evaluation, and interpretation of data.

In this unit, we will study the basics of data management and use statistical
tools derived from mathematics that are useful in data management and
interpretation.

4.2.1. Data: Gathering and Organizing, Representing using Graphs and Charts,
Interpreting Organized Data
The study of data known as statistics is divided into two branches as seen in
the figure below.
Branch of Statistics

Statistics

Descriptive Statistics Inferential Statistics


It is the branch of statistics that involves drawing con

It is the branch of statistics that involves organizing, displaying, and describing data.
3 | Mathematics in the Modern 2

World
In statistics, the term population is any specific collection of objects of interest.
A sample is any subset or subcollection of the population.

If in case that the sample consists of the whole population, it is termed a census.

General Distinction of Data

Data

Qualitative Data Quantitative Data


These are data for which there is no natural numerical scale, but which consist of attributes, l
These are data in numerical measurements that aris

When data is already gathered, it is organized and presented in many ways in


which this data will be easily understood. It can be in presented in charts,
graphs, tables, etc.

Consider the following examples below. These are taken in the Eastern
Visayas COVID-19 Cases Bulletin by the Department of Health – Eastern
Visayas on Tuesday, November 10, 2020.
3 | Mathematics in the Modern 3

World

When the data is effectively organized and presented, we can now interpret
this data to guide us in making important decisions.
3 | Mathematics in the Modern 4

World
For example, based on the presented data, the highest cases of COVID-19 are
in the Province of Samar, to lessen the increase of the number of cases we can
suggest to the provincial authorities to strengthen the implementation of
health protocols in the said area.

Self-Assessment
4. Make at least ten (10) interpretations from the data presented
regarding the COVID-19 Cases in Eastern Visayas in this section.
5. Identify each of the following data sets as either a population or a
sample:
a. The grade point averages (GPAs) of all students at a college.
b. The GPAs of a randomly selected group of students on a
college campus.
c. The ages of the College of Engineering Faculty
d. The gender of every second customer who enters a movie
theater.
e. The lengths of short mackerel “hasa-hasa” caught on a
fishing trip to the beach.
3. Identify the following measures as either quantitative or
qualitative:
a. The 30 high-temperature readings of the last 30 days.
b. The scores of 40 students on an English test.
c. The blood types of 120 teachers in a senior high school.
d. The last four digits of social security numbers of all
students in a class.
e. The numbers on the jerseys of 5 basketball players on a
team.

4.2.2. Measures of Central Tendency: Mean, Median, Mode, Weighted Mean


One of the most basic statistical concepts involves finding measures of central
tendency of a set of numerical data. Measures of central tendency describes a
set of data by identifying the central position in a data as a single value.

Measures of Central Tendency

Mean Median Mode


3 | Mathematics in the Modern 5

World

Mean Median Mode

 The mean (𝑥̅ ) is the  The middle value  The most


sum of all values when all values frequent data
(e.g. x1, x2...) divided are placed in point.
by the total number order.  It exists as a data
of values.  May not exist as a point.
 It is affected by data point in the  It us unaffected
extreme numbers set by extreme
 May not exist as a  Influenced by the values.
data point in the set position of items
 Most stable measure but not by their
 Symbol: values.
For Population – 𝜇  The median of a
For Sample - ̅ ranked list of n
numbers is:
Formula for mean: o the middle
𝑥1 + 𝑥2 … 𝑥𝑛 ∑𝑥 number if n is
𝑥̅ = = odd.
𝑛 𝑛
o the mean of
where: the two
x = data middle
n = number of data numbers if n is
even.
Formula for weighted
mean:
𝑤1 𝑥1 + 𝑤2 𝑥2 … +𝑤𝑛 𝑥𝑛
𝑥̅ =
𝑤1 + 𝑤2 … + 𝑤𝑛
∑ 𝑤𝑥
=
∑𝑤

where:
x = data
w = weight

Example: Mean, Median & Mode


1. Find the mean, median, and mode for the data:
a. 18, 15, 21, 16, 15, 14, 15, 21
b. 2, 5, 8, 9, 11, 4, 7

Solution:

Mean
∑𝑥 18+15+21+16+15+14+15+21
a. 𝑥̅ = 𝑛 = 8 = 16.875
3 | Mathematics in the Modern 6

World
∑𝑥 2+5+8+9+11+4+7
b. 𝑥̅ = 𝑛= 7 = 6.571

Median

Rank the numbers from smallest to largest, if the number of data is odd,
the middle number is the median, if the number of data is even, the
median is the mean of the two middle numbers.
a. 18, 15, 21, 16, 15, 14, 15, 21
Arrange: 14, 15, 15, 15, 16, 18, 21, 21
Number of data: 8 – even
15, 16,
Get the mean of the two middle numbers: 14, 15, 15, 18, 21, 21
∑ 𝑥 15 + 16
𝑥̅ = = = 15.5
𝑛 2
Median is
15.5

b. 2, 5, 8, 9, 11, 4, 7
Arrange: 2, 4, 5, 7, 8, 9, 11
Number of data: 7 – odd
Middle number: 2, 4, 5, 7, 8, 9, 11
Median is 7.

Mode
The mode is the data that is always occurring or frequent.

a. 18, 15, 21, 16, 15, 14, 15, 21


Arrange: 14, 15, 15, 15, 16, 18, 21, 21
The number 15 occurs more often than the other numbers.
Mode is 15

b. 2, 5, 8, 9, 11, 4, 7
Arrange: 2, 4, 5, 7, 8, 9, 11
Each number on the list occurs only once. Because no number occurs
more often than the others, there is no mode.
3 | Mathematics in the Modern 7

World
Example: Weighted Mean
2. Calculate the GPA of Trimm’s grades. Use the weighted mean formula
to find Trimmm’s GPA:
Course Units Grade
Math 1 4 2.4
Chem 1 4 2.1
GE 3 3 1.9
GE 10 3 1.7

Solution:

∑ 𝑤𝑥
𝑥̅ =
∑𝑤
(4 ∙ 2.4) + (4 ∙ 2.1) + (3 ∙ 1.9) + (3 ∙ 1.7)
𝑥̅ = = 2.06
4+4+3+3

Data that has not been organized or manipulated in any manner is called raw
data. A large collection of raw data may not provide much pertinent
information that can be readily observed. A frequency distribution, which is a
table that lists observed events and the frequency of occurrence of each
observed event, is often used to organize raw data.

For instance, consider the following table, which lists the number of laptop
computers owned by families in each of 40 homes in a subdivision.

Table 4.1

The frequency distribution in Table 4.2 below was constructed using the data
from Table 4.1. The first column of the frequency distribution consists of the
numbers 0, 1, 2, 3, 4, 5, 6, and 7. The corresponding frequency of occurrence, f,
of each of the numbers in the first column is listed in the second column.
3 | Mathematics in the Modern 8

World
Table 4.2

The formula for a weighted mean can be used to find the mean of the data in a
frequency distribution. The only change is that the weights w1, w2, w3, ..., wn are
replaced with the frequencies f1, f2, f3, ..., fn.
To find the weighted mean of Table 4.2, we use the formula for weighted mean.

The mean number of laptop computers per household for the homes in the sub
division is 1.975.

Self-Assessment
18. A housing division consists of 45 homes. The following frequency distribution shows the n
Trimm’s GPA is 2.06

19. Find the mean, median and mode for the data in the following lists. a. 3, 3, 3, 3, 3, 4, 4, 5, 5,
b. 12, 34, 12, 71, 48, 93, 71
3 | Mathematics in the Modern 9

World
4.2.3. Measures of Dispersions: Range, Standard Deviation and Variance
In the previous section, we are introduced to the three types of average values
for a data set – the mean, the median, and the mode. The said measures of
central tendency only describes the central position or the average of values of
a given set of data, it does not reflect the spread or dispersion of data.

Now, we will introduce statistical values known as range, standard


deviation, and variance. These are identified as the measures of dispersion
since they are used to measure the spread or dispersion of data.

Measures of Dispersion

Range Standard Deviation Variance

Range Standard Deviation Variance


 The range of a set of  It is the most stable  The variance for a
data values is the measure. given set of data is
difference between  It is affected by all the square of the
the greatest data values. standard deviation
value and the least  If x1, x2, x3, ..., xn is a of the data.
data value. population of n numbers  For a population
 Unstable measure with a mean of 𝜇, then with a standard
the standard deviation deviation of 𝜎,
𝑹𝒂𝒏𝒈𝒆 (𝒙) of the population is the variance is 𝜎 2.
= 𝑴𝒂𝒙(𝒙) − 𝑴𝒊𝒏(𝒙)
∑(𝑥 − 𝜇) 2
 For a sample with
𝜎=√ a standard
Where: 𝑛
Max (x) = greatest data deviation of 𝑠, the
value
 If x1, x2, x3, ..., xn is a variance is 𝑠2 .
Min (x) = least data value
sample of n numbers
with a mean of 𝑥̅, then the
standard deviation of
the sample is
∑(𝑥 − 𝑥̅ )2
𝑠= √
𝑛−1

Where in both cases:


n = number of data
x = data
𝑥̅ = mean of a sample
𝜇 = mean of a population
3 | Mathematics in the Modern 10

World

Example: Range
3. Find the range of the given data in the table.

Solution:
Machine 1
Max (x) = 10.07
Min (x) = 5.85
Range (x) = Max (x) – Min (x) = 10.07 – 5.85 = 4.22

Machine 2
Max (x) = 8.03
Min (x) = 7.95
Range (x) = Max (x) – Min (x) = 8.03 – 7.95 = 0.08

Example: Standard Deviation & Variance


4. The following numbers were obtained by sampling a
population. 2, 4, 7, 12, 15
Find the standard deviation and the variance of the sample.
Solution:
Step 1: the mean of the five (5) sample data is:
2 + 4 + 7 + 12 + 15
𝑥̅ = =8
5
3 | Mathematics in the Modern 11

World
Step 2: For each number, calculate the deviation between the number and the
mean.

Step 3: Calculate the square of each of the deviations in Step 2, and find the
sum of these squared deviations.

Step 4: Because we have a sample of n = 5 values, divide the sum 118 by n - 1,


which is 4.
118
= 29.5
4
Step 5: The standard deviation of the sample is
𝑠 = √29.5 = 5.43139

To the nearest hundredth, the standard deviation is s 5.43.

Calculation of variance
Since the standard deviation is known, the variance is the square of the
standard deviation:

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 2 = (5.43)2 = 29.48


3 | Mathematics in the Modern 12

World

Self-Assessment
A student has the following quiz scores: 5, 8, 16, 17, 18, 20. Find the standard deviation, ra
A consumer group has tested a sample of 8 size-D batteries from each of 3 companies. Th
table.

.
(a.) Compute for the standard deviation for each company.
(b.) According to these tests, which company produces batteries for
which the values representing hours of constant use have the smallest standard devia

4.2.4. Measures of Relative Position: Z-Scores, Percentiles, Quartiles, and Box- and-
Whiskers Plots
Measures of relative position are conversions of values that show where a
given specific value stands in relation to other values of the same grouping.

4.2.4.1 Z-Scores
Consider an Internet site that offers movie downloads. Based on data kept by
the site, an estimate of the mean time to download a certain movie is 12 min
with a standard deviation of 4 min.

When you download this movie, the download takes 20 min, and you think
that is an unusually long time for the download. On the other hand, when
your friend downloads the movie, the download takes only 6 min, and your
friend is pleasantly surprised at how quickly she receives the movie. The
point here is that, in each case, a data value far from the mean is unexpected.

The graph below shows the download times for this movie using two
different measures: the number of minutes a download time is from the mean
and the number of standard deviations the download time is from the mean.
3 | Mathematics in the Modern 13

World

Measuring the distance of a data value from the mean in standard deviation
units instead of in the units of the data (minutes in this example) is quite
useful. The number of standard deviations a data value is from the mean is
known as its z-score or standard score.

Example: Z-Score
5. Raul has taken two tests in his chemistry class. He scored 72 on the first
test, for which the mean of all scores was 65 and the standard deviation
was 8. He received a 60 on a second test, for which the mean of all
scores was 45 and the standard deviation was 12. In comparison to the
other students, did Raul do better on the first test or the second test?

Given:
First test: 𝑥 = 72 𝑥̅ = 65 𝑠 = 8
Second test: 𝑥 = 60 𝑥̅ = 45 𝑠 = 12

Required: z 1st test and z 2nd test

Solution:
𝑧1𝑠𝑡 𝑡𝑒𝑠𝑡 = 𝑥 − 𝑥̅ 72 − 65
= = 0.875
𝑥 𝑠− 𝑥̅ 608− 45
𝑧2𝑛𝑑 𝑡𝑒𝑠𝑡 =
= = 1.25
𝑠 12
Raul scored 0.875 standard deviation above the mean on the first test
and 1.25 standard deviations above the mean on the second test. These
z-scores indicate that, in comparison to his classmates, Raul scored
better on the second test than he did on the first test.
3 | Mathematics in the Modern 14

World
4.2.4.2 Percentiles
Most standardized examinations provide scores in terms of percentiles, which
are defined as follows:

The following formula can be used to find the percentile that corresponds to a
particular data value in a set of data.

Example: Percentile
6. On a reading examination given to 900 students, Elaine’s score of 602
was higher than the scores of 576 of the students who took the
examination. What is the percentile for Elaine’s score?

Solution:

Elaine’s score of 602 places her at the 64th percentile.

4.2.4.3 Quartiles
The three numbers Q1, Q2, and Q3 that partition a ranked data set into four
(approximately) equal groups are called the quartiles of the data. For instance,
for the data set below, the values Q1 = 11, Q2 = 29, and Q3 = 104 are the
quartiles of the data.
3 | Mathematics in the Modern 15

World
The quartile Q1 is called the first quartile. The quartile Q2 is called the second
quartile. It is the median of the data. The quartile Q3 is called the third quartile.

The following method of finding quartiles makes use of medians.

Example: Quartiles Using Medians


7. The following table lists the calories per 100 ml of 25 popular sodas.
Find the quartiles for the data.

Solution:

Step 1: Rank the data as shown in the following table

Step 2: The median of these 25 data values has a rank of 13.


Thus, the median is 43. The second quartile Q2 is the median of the data,
so Q2 = 43.

Step 3: There are 12 data values less than the median and 12 data values
greater than the median.

The first quartile is the median of the data values less than the median.
Thus, Q1 is the mean of the data values with ranks of 6 and 7.

39 + 39
𝑄1 = = 39
2
3 | Mathematics in the Modern 16

World
The third quartile is the median of the data values greater than the
median. Thus, Q3 is the mean of the data values with ranks of 19 and 20.

50 + 53
𝑄3 = = 51.5
2
4.2.4.4 Box-and-Whiskers Plots
A box-and-whisker plot (sometimes called a box plot) is often used to
provide a visual summary of a set of data. A box-and-whisker plot shows the
median, the first and third quartiles, and the minimum and maximum values
of a data set. See the figure below.

Example: Box-and-Whiskers Plot


8. Construct a box-and-whisker plot for the data set in Example 7.

Solution:

For the data set in Example 7, we determined that Q 1 = 39, Q2 = 43, and
Q3 = 51.5. The minimum data value for the data set is 26, and the
maximum data value is 73. Thus, the box-and-whisker plot is as shown
on the next page.
3 | Mathematics in the Modern 17

World

Self-Assessment
22. Roland received a score of 70 on a test for which the mean score was
65.5. Roland has learned that the z-score for his test is 0.6. What is the standard deviation
On an examination given to 8600 students, Hal’s score of 405 was higher than the scores
A The following table lists the weights, in ounces, of 15 avocados in
a random sample. Find the quartiles for the data.

25. Construct a box-and-whisker plot for the following data.

4.2.5. Normal Distributions


The normal distribution also known as Gaussian distribution or bell curve is the
most important probability distribution in statistics since it fits many natural
phenomena. It is a probability function that describes how the values of a
variable is distributed. Data such as heights, blood pressure, measurement
error, and IQ scores are examples that follow distribution.

A normal distribution forms a bell-shaped curve


that is symmetric about a vertical line through
the mean of the data.

A graph of a normal distribution with a


population mean of 5 is shown at the right.
3 | Mathematics in the Modern 18

World

Consider the normal distribution at the


right, the area of the shaded region is
0.159 units. This region represents the
fact that 15.9% of the data is greater than
or equal to 10. Because the area under
the curve is 1, the unshaded region
under the curve has area 1 - 0.159, or
0.841, representing the fact that 84.1% of
the data are less than 10.

The following rule, called the Empirical Rule, describes the percents of data
that lie within 1, 2, and 3 standard deviations of the mean in a normal
distribution.
3 | Mathematics in the Modern 19

World
To understand the empirical rule much better, watch the video entitled Empirical Rule (68-

Example: Use Empirical Rule to Solve Application


9. A vegetable distributor knows that during the month of August, the weights of
its tomatoes are normally distributed with a mean of 0.61 lb. and a standard
deviation of 0.15 lb.
a. What percent of the tomatoes weigh less than 0.76 lb.?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
weigh more than 0.31 lb.?
c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to
weigh from 0.31 lb. to 0.91 lb.?

Given:
𝜇 = 0.61 𝑙𝑏 𝜎 = 0.15 𝑙𝑏

If we calculate the values shown in our


normal distribution curve at the right,
we will get:

𝜇 + 3𝜎 = 0.61 𝑙𝑏 + 3(0.15 𝑙𝑏) = 1.06 𝑙𝑏


𝜇 + 2𝜎 = 0.61 𝑙𝑏 + 2(0.15 𝑙𝑏) = 0.91 𝑙𝑏
𝜇 + 𝜎 = 0.61 𝑙𝑏 + 0.15 𝑙𝑏 = 0.76 𝑙𝑏
𝜇 = 0.61 𝑙𝑏
𝜇 − 𝜎 = 0.61 𝑙𝑏 − 0.15 𝑙𝑏 = 0.46 𝑙𝑏
𝜇 − 2𝜎 = 0.61 𝑙𝑏 − 2(0.15 𝑙𝑏) = 0.31 𝑙𝑏
𝜇 − 3𝜎 = 0.61 𝑙𝑏 − 3(0.15 𝑙𝑏) = 0.16 𝑙𝑏

Solution:
1. 0.76 lb. is 1 standard deviation above the mean of 0.61 lb.
𝜇 + 𝜎 = 0.61 𝑙𝑏 + 0.15 𝑙𝑏 = 0.76 𝑙𝑏

In a normal distribution, 34% of all data lie between the mean 𝜇 and
1 standard deviation above the mean( 𝜇 + 𝜎) and 50% of all data lie
below the mean. Thus, 34% - 50% = 84% of the tomatoes weigh less
than 0.76 lb.

Below the mean is 50% 2.35% +13.5%+34%

84%
3 | Mathematics in the Modern 20

World
2. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
weigh more than 0.31 lb.?

0.31 lb. is 2 standard deviations below the mean of 0.61 lb.


𝜇 − 2𝜎 = 0.61 𝑙𝑏 − 2(0.15 𝑙𝑏) = 0.31 𝑙𝑏

In a normal distribution, add the percentages above 0.31 lb., that


would be:

13.5% + 34% + 34% + 13.5% + 2.35% = 97.5%

This gives a total of 97.5% of the tomatoes that weigh more than
0.31 lb. Therefore, 97.5% ∙ (6000 lb.) = 5850 lb.

3. In a shipment of 4500 tomatoes, how many tomatoes can be expected to


weigh from 0.31 lb. to 0.91 lb.?

In a normal distribution, add the percentages between 0.31 lb. to 0.91


lb., that would be:

13.5% + 34% + 34% + 13.5% = 95%

This gives a total of 95% of the tomatoes that weigh between 0.31 lb.
to 0.91 lb. Therefore, 95% ∙ (4500 lb.) = 4275 lb.
3 | Mathematics in the Modern 21

World
4.2.5.1. The Standard Normal Distribution
It is often helpful to convert data values x to z-scores, as we did in the
previous section by using the z-score formulas:

If the original distribution of x values is a normal distribution, then the


corresponding distribution of z-scores will also be a normal distribution. This
normal distribution of z-scores is called the standard normal distribution. See
figure below. It has a mean of 0 and a standard deviation of 1.

Conversion of a normal distribution to the standard normal distribution

Tables and calculators are often used to determine the area under a portion of
the standard normal curve. We will refer to this type of area as an area of the
standard normal distribution.

Table 4.3 gives the approximate areas of the standard normal distribution
between the mean 0 and z standard deviations from the mean. (See figure
beside Table 4.3). Table 4.3 indicates that the area A of the standard normal
distribution from the mean 0 up to z = 1.34 is 0.410 square unit.
3 | Mathematics in the Modern 22

World
Table 4.3
3 | Mathematics in the Modern 23

World
Because the standard normal distribution is symmetrical about the mean of 0, we
can also use Table 4.3 to find the area of a region that is located to the left of the
mean. This process is explained in the Example below.

Example: Use Symmetry to Determine an Area


10. Find the area of the standard normal distribution between z = 1.44 and z = 0.

Solution:
Because the standard normal distribution
is symmetrical about the center line z = 0,
the area of the standard normal
distribution between z = 1.44 and z = 0 is
equal to the area between z = 0 and z =
1.44.

The entry in Table 4.3 associated with z =


1.44 is 0.425. Thus, the area of the standard
normal distribution between z = 1.44 and z
= 0 is 0.425 square unit.

Example: Find the Area of a Tail Region


11. Find the area of the standard normal distribution to the right of z = 0.82.

Solution:
Table 4.3 indicates that the area from z = 0 to z = 0.82
is 0.294 square unit. The area to the right of z = 0 is
0.500 square unit. Thus, the area to the right of z =
0.82 is 0.500 - 0.294 = 0.206 square unit.

Because the area of a portion of the standard normal distribution can be


interpreted as a percentage of the data or as a probability that the variable lies in
an interval, we can use the standard normal distribution to solve many
application problems.
3 | Mathematics in the Modern 24

World
Example: Solve an Application
12. A soda machine dispenses soda into 12-ounce cups. Tests show that the actual
amount of soda dispensed is normally distributed, with a mean of 11.5 oz and
a standard deviation of 0.2 oz.
a. What percent of cups will receive less than 11.25 oz of soda?
b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
c. If a cup is chosen at random, what is the probability that the machine
will overflow the cup?

Solution:

a. Recall that the formula for the z-score for a data value x is
𝑥 − 𝑥̅
𝑧𝑥 =
𝑠
The z-score for 11.25 oz is
11.25 − 11.5
𝑧11.25 =
= −1.25
0.2
Table 4.3 indicates that 0.394 (39.4%) of the data in a normal
distribution are between z = 0 and z = 1.25.

Because the data are normally distributed,


39.4% of the data is also between z = 0 and
z = -1.25. The percent of data to the left of
z
= -1.25 is 50% - 39.4% = 10.6%.

Thus, 10.6% of the cups filled by the soda machine will receive less than
11.25 oz of soda.

b. The z-score for 11.55 ounces is


11.55 − 11.5
𝑧11.55 = = 0.25
0.2

Table 4.3 indicates that 0.099 (9.9%) of the


data in a normal distribution is between z
= 0 and z = 0.25.

The z-score for 11.2 oz is


11.2 − 11.5
𝑧11.2 = = −1.5
0.2
3 | Mathematics in the Modern 25

World
Table 4.3 indicates that 0.433 (43.3%) of
the data in a normal distribution are
between z = 0 and z = 1.5. Because the
data are normally distributed, 43.3% of
the data is also between z = 0 and z = -
1.5.

Thus, the percent of the cups that the vending machine will fill with
between 11.2 oz and 11.55 oz of soda is 43.3% + 9.9% = 53.2%.

c. A cup will overflow if it receives more than 12 oz of soda. The z-score


for 12 oz is
12 − 11.5
𝑧12 = = 2.5
0.2
Table 4.3 indicates that 0.494 (49.4%) of
the data in the standard normal
distribution are between z = 0 and z =
2.5. The percent of data to the right of z
= 2.5 is determined by subtracting 49.4%
from 50%.

Thus, 0.6% of the time the machine produces an overflow, and the
probability that a cup chosen at random will overflow is 0.006.

Self-Assessment
Find the area of the standard normal distribution between z = -0.67 and z = 0.
Find the area of the standard normal distribution to the left of z = -1.47.
A study of the careers of professional football players shows that the lengths of their careers are
years and a standard deviation of 1.8 years.
What percent of professional football players have a career of more than 9 years?
If a professional football player is chosen at random, what is the probability that the player will
3 | Mathematics in the Modern 26

World
4.2.6. Linear Regression and Correlation, Least-Squares Line, and Linear
Correlation Coefficient
In many applications, scientists try to determine whether two variables are
related. If they are related, the scientists then try to find an equation that can
be used to model the relationship.

For instance, the zoology professor R. McNeill Alexander wanted to


determine whether the stride length of a dinosaur, as shown by its fossilized
footprints, could be used to estimate the speed of the dinosaur.

Stride length for an animal is defined as the


distance x from a particular point on a
footprint to that same point on the next
footprint of the same foot. (See the figure at the
right.)

Because no dinosaurs were available,


Alexander and fellow scientist A. S. Jayes
carried out experiments with many types of
animals, including adult men, dogs, camels,
ostriches, and elephants.

The results of these experiments tended to support the idea that the speed y of
an animal is related to the animal’s stride length x. To better understand this
relationship, examine the data in Table 4.4, which are similar to, but less
extensive than, the data collected by Alexander and Jayes.

Table 4.4 Speed for Selected Stride Lengths


3 | Mathematics in the Modern 27

World
A graph of the ordered pairs in Table 4.4
is shown in figure at the right. In this
graph, which is called a scatter diagram
or scatter plot, the x-axis represents the
stride lengths in meters and the y-axis
represents the average speeds in meters
per second.

The scatter diagram seems to indicate


that for each of the three species, a larger
stride length generally produces a faster
speed.
Also note that for each species, a straight line can be drawn such that all of the
points for that species lie on or very close to the line. Thus, the relationship
between speed and stride length appears to be a linear relationship.

After a relations hip between paired data, which are referred to as bivariate
data, has been discovered, a scientist tries to model the relationship with an
equation.

One method of determining a linear


relationship for bivariate data is called
linear regression. To see how linear
regression is carried out, let us concentrate
on the bivariate data for the dogs, which is
shown by the green points in figures at the
right. There are many lines that can be
drawn such that the data points lie close to
the line; however, scientists are generally
interested in the line called the line of best
fit or the least-squares regression line.

The least-squares regression line is also called the least-squares line. The
approximate equation of the least-squares line for the bivariate data for the
dogs is 𝑦̂ = 3.2𝑥 − 1.1
3 | Mathematics in the Modern 28

World
In figure at the right, the vertical deviations
from the ordered pairs to the graph of 𝑦̂
= 3.2𝑥 − 1.1 are 0, -0.06, 0.5, -0.52, -0.16, -0.6,
0.34 and 0.2.

It is traditional to use the symbol


𝑦̂ (pronounced y-hat) in place of y in the
equation of a least-squares line. This also
helps us differentiate the line’s y-values from
the y- values of the given ordered pairs.

The next formula can be used to determine the equation of the least-squares
line for a given set of ordered pairs.

In the formula for the least-squares regression line, ∑ 𝑥 represents the sum of
all the x values, ∑ 𝑦 represents the sum of all the y values, and ∑ 𝑥𝑦 represents
the sum of the n products x1y1, x2y2, ..., xnyn.

The notation 𝑥̅ represents the mean of the x values, and 𝑦̅ represents the mean
of the y values.

Example: Find the equation of the Least-Squares Line


13. Find the equation of the least-squares line for the ordered pairs in the Table
4.4a below:
3 | Mathematics in the Modern 29

World
Solution:

The ordered pairs are (2.5, 3.4), (3.0, 4.9), (3.3, 5.5), (3.5, 6.6), (3.8, 7.0), (4.0, 7.7),
(4.2, 8.3), (4.5, 8.7). The number of ordered pairs is n = 8.

Organize the data in four columns, as shown in table below. Then, find the
sum of each column.

If a and b are each rounded to the nearest


tenth, to reflect the accuracy of the original
data, then we have as our equation of the
least-squares line:
𝑦̂ = 𝑎𝑥 + 𝑏
𝑦̂ ≈ 2.7𝑥 − 3.3
3 | Mathematics in the Modern 30

World
Example: Use Least-Squares Line to Make Predictions
14. Use the equation of the least-squares line from Example 13 to predict the
average speed of an adult man for each of the following stride lengths. Round
your results to the nearest tenth of a meter per second.
a. 2.8 m
b. 4.8 m
Solution:
a. In Example 13, we found the equation of the least-squares line to be
𝑦̂ = 2.7𝑥 − 3.3. Substituting 2.8 for x gives:
𝑦̂ = 2.7(2.8) − 3.3 = 4.26
Rounding 4.26 to the nearest tenth produces 4.3. Thus, 4.3 m/s is the
predicted average speed for an adult man with a stride length of 2.8 m.

b. In Example 13, we found the equation of the least-squares line to be


𝑦̂ = 2.7𝑥 − 3.3. Substituting 4.8 for x gives:
𝑦̂ = 2.7(4.8) − 3.3 = 9.66
Rounding 9.66 to the nearest tenth produces 9.7. Thus, 9.7 m/s is the
predicted average speed for an adult man with a stride length of 4.8 m.

The procedure in Example 14a made use of an


equation to determine a point between given
data points. This procedure is referred to as
interpolation. In Example 14b, an equation was
used to determine a point to the right of the
given data points. The process of using an
equation to determine a point to the right or left
of given data points is referred to as
extrapolation.

To determine the strength of a linear relationship between two variables,


statisticians use a statistic called the linear correlation coefficient, which is
denoted by the variable r and is defined as follows.
3 | Mathematics in the Modern 31

World
If the linear correlation coefficient r is positive, the relationship between the
variables has a positive correlation. In this case, if one variable increases, the
other variable also tends to increase. If r is negative, the linear relationship
between the variables has a negative correlation. In this case, if one variable
increases, the other variable tends to decrease.

Figures below shows some scatter diagrams along with the type of linear
correlation that exists between the x and y variables. The closer |r| is to 1, the
stronger the linear relationship between the variables.

The linear correlation coefficient indicates the strength of a linear relationship


between two variables; however, it does not indicate the presence of a cause-and-
effect relationship.
3 | Mathematics in the Modern 32

World
Example: Find the Linear Correlation Coefficient
15. Find the linear correlation coefficient for stride length versus speed of an
adult man. Use the data in Table 4.4a. Round your result to the nearest

hundredth.

Solution:
The ordered pairs are
(2.5, 3.4), (3.0, 4.9), (3.3, 5.5), (3.5, 6.6), (3.8, 7.0), (4.0, 7.7), (4.2, 8.3), (4.5, 8.7).

The number of ordered pairs is n 8. On the table in Example No.13 found:

To the nearest hundredth, the linear correlation coefficient is 0.99.

Self-Assessment
18. Find the equation of the least-squares line for the stride length and
speed of camels given in Table 4.4c.

Use the equation of the least-squares line from Self-Assessment No.18 to predict the av
2.7 m
4.5 m
Find the linear correlation coefficient for stride length versus speed of a camel as giv
hundredth.
3 | Mathematics in the Modern 33

World
Assessment
Exercise No.4:
Answer all the self-check questions in this unit and write/encode it into a
sheet of bond paper.

Quiz No. 4:
1. How do we organize and present data?
2. Why do we have to organize and present data?
3. A consumer testing agency has tested the strengths of 3 brands of 1/8-
inch rope. The results of the tests are shown in the following table.
According to the sample test results, which company produces 1/8-
inch rope for which the breaking point has the smallest standard

deviation?
4. Calculate the variance for each company in problem no.3.

4.3. References

Aufmann, R., Lockwood, J., Nation, R., et.al. (2018). Mathematics in the Modern
World. Philippine Edition. Rex Bookstore.

Aufmann, R., Lockwood, J., Nation, R., and Clegg, K. (2013). Mathematical
Excursions, 3rd Edition. Cengage Learning.

Toledo98, Measures of Central Tendency and Variability. From


https://www.slideshare.net/toledo98/measures-of-central-tendency-and-
variability

Frost, J. (14 Oct. 2020). Normal Distribution in Statistics. From


https://statisticsbyjim.com/basics/normal-distribution/
Emmanuel, J. (15 May 2016). Empirical Rule (68-95-99.7) for Normal
Distributions. From https://www.youtube.com/watch?v=Txylq6RLiK8

4.4. Acknowledgement

The images, tables, figures and information contained in this module were
taken from the references cited above.

You might also like