0% found this document useful (0 votes)
971 views

Math 1100 Module 3a.docx-Merged

This document provides an overview of inductive and deductive reasoning. It begins by defining inductive reasoning as generalizing from specific observations to form a conjecture. Deductive reasoning is applying a general statement to a specific case. The document then provides examples of applying each type of reasoning to mathematical problems. These include finding patterns, making conjectures, proving conjectures using counterexamples, applying theorems like the Pythagorean theorem, and assessing arguments. The goal is for students to understand and identify the different reasoning approaches.

Uploaded by

Joy Andrada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
971 views

Math 1100 Module 3a.docx-Merged

This document provides an overview of inductive and deductive reasoning. It begins by defining inductive reasoning as generalizing from specific observations to form a conjecture. Deductive reasoning is applying a general statement to a specific case. The document then provides examples of applying each type of reasoning to mathematical problems. These include finding patterns, making conjectures, proving conjectures using counterexamples, applying theorems like the Pythagorean theorem, and assessing arguments. The goal is for students to understand and identify the different reasoning approaches.

Uploaded by

Joy Andrada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Department of Mathematics and Physics MATH 1100

MODULE 3a1

Reasoning
Overview
Inductive reasoning is about generalizing a pattern that has been
recognized and established and is used basically in solving puzzles. On the
other hand, deductive reasoning is about the application of a general
statement to a specific case like application of mathematical formulas to
certain problems. Both approaches are used in solving various mathematical
problems that will help you develop your mathematical reasoning.

In this module, you will be introduced to two types of reasoning:


inductive and deductive reasoning, and apply them to a specific
mathematical problem.

Time Allotment: 1 week

Objectives:
Upon completion of this module, you are expected to:
1. use different types of reasoning to justify statements and arguments
made about mathematics and mathematical concepts;
2. identify the type of reasoning used in solving different problems.

PRE-ASSESSMENT

1. What establishes a valid argument? Can a valid argument yield a false conclusion?
2. What occupations do you think require a good problem solving skills?

1
This module is based from the book “Mathematics in the Modern World” by the Department of Mathematics and
Physics, CS, CLSU.
Department of Mathematics and Physics MATH 1100

REASONING
Today, developing higher-order thinking skills or reasoning and positive attitude toward
mathematics is also given importance. This development is believed to be achieved if
students do not simply wait for the teacher to give directions and information. Students
have to be active problem solvers with a persevering attitude until a reasonable solution
is attained. Students should be encouraged to explore, reason out, and take the initiative
to investigate mathematical principles and create new ideas.

Reasoning requires a logical frame of mind. It is related to cognitive skills such as


discovering patterns, establishing and verifying tentative conclusions, and making
generalizations. Such skills can be best developed through problem solving activities and
investigation in a setting that is characterized by hands-on, minds-on, as well as
cooperative learning.

Reasoning starts with building an argument which is a series of statements typically


used to persuade someone into accepting a conclusion. We discuss two types of
reasoning used to construct effective mathematical reasoning. These are inductive
reasoning and deductive reasoning.

INDUCTIVE REASONING
Inductive reasoning is characterized by coming up with a conjecture. A conjecture is
generally an educated guess concluded from repeated observations of specific situations.

We say that a conjecture is valid if the conjecture always holds. We say that it is invalid
if we can find a specific situation that disproves the conjecture. To debunk its validity, it
only takes one counterexample.

A counterexample to a conjecture is a situation or a specific case which shows that the


conjecture is false.

Example 1. Consider the numbers 3, 5, 7, 11, 13, 17, … .


What can you say about them? These numbers are both odd and prime numbers.

Recall: A prime number is a counting number whose only factors are the number itself
and 1. Odd numbers are those integers which when divided by 2 the remainder is 1.

A student makes the following conjecture.


(i) Conjecture 1: “An odd number is a prime number.”
Is Conjecture 1 valid? No.
Counterexample : 9 is odd but not prime.
Department of Mathematics and Physics MATH 1100

(ii) Conjecture 2: “Every prime number is an odd number.”


Is Conjecture 2 valid? No.
Counterexample : 2 is prime but not odd.

Words of caution:
Inductive reasoning nevertheless, usually leads to a valid conjecture if done carefully and
systematically. However, it may need to be proven by other means such as deductive
reasoning.

Example 2. Given the number pattern, what is the missing number 𝒚?


Order 1 2 3 4 ... 7 ... 19
Number 0 4 8 12 ... 24 ... 𝒚
Solution
Order Number Pattern
1 0 4 × (1 – 1)
2 4 4 × (2 – 1)
3 8 4 × (3 – 1)
4 12 4 × (4 – 1)
… … …
7 24 4 × (7 – 1)
… … …
19 𝒚 4 × (19 – 1)

Conjecture: The missing number 𝒚 is 𝟕𝟐.

SAQ1: Follow-up questions: Give the 41st and the 401st numbers.2

2
The 41st number is 160; the 401st number is 1600.
Department of Mathematics and Physics MATH 1100

Example 3. Find the sum of the first 50 positive odd numbers.


Solution. First, find a few sums.
Number of terms Sum Pattern
1 1 1 12
1+3 2 4 22
1+3+5 3 9 32
1+3+5+7 4 16 42

Conjecture: The sum of the first 𝑛 positive odd numbers is 𝑛2 .


Reasoning inductively, you would expect that the sum of the first 50 odd numbers is 502
or 2500.

Example 4.Consider the 8 × 8 chessboard in Figure 1. How many squares are there in
the 8 × 8 chessboard?

Figure 1. 8 × 8 chessboard
If your answer is 64, you might want to think it over again.
First let us, find the pattern from boards with smaller number of divisions.
Board Pattern Number of squares
1 (1 × 1 square)

1 × 1 board
1 (2 × 2 square)
4 (1 × 1 square)
1 + 22 = 5

2 × 2 board 1+4
1 (3 × 3 square)
4 (2 × 2 square)
9 (1 × 1 square) 1 + 22 + 32 = 14

3 × 3 board 1+4+9
Department of Mathematics and Physics MATH 1100

Board Pattern Sum


1 (4 × 4 square)
4 (3 × 3 square)
9 (2 × 2 square) 1 + 22 + 32 + 42 = 14
16 (1 × 1 square)
4 × 4 board 1+4+9+16

⋮ ⋮ ⋮

1+4+9+16+25+36+49+64
1 + 22 + 32 + 42 +
5 + 62 + 72 + 82 = 𝟐𝟎𝟒
2

The number of squares in an 8 × 8 chessboard is 204.

Example 5. This year, Jeanelle’s birthday is on a Wednesday. She observes that next
year it will be on a Thursday, and in two years it will be on a Friday. So she claims,
“My birthday will be on a Wednesday again in seven years.”

Did she use inductive reasoning? Explain.

Solution. Yes, she used inductive reasoning. She made a conjecture by generalizing some
specific observations about the days of her birthday. But her conjecture is incorrect or
invalid; a leap year occurs every four years, thus the pattern she considered does not
hold.

Inductive reasoning is a powerful method of drawing a conclusion, but it is


important to realize that there is no assurance that the observed conjecture is true;
inductive reasoning is rather probabilistic. For this reason, mathematicians are reluctant
to accept a conjecture as an absolute truth until it is formally proven using other methods
such as deductive reasoning.
Department of Mathematics and Physics MATH 1100

DEDUCTIVE REASONING

Deductive reasoning is the process of reasoning logically from an established


generalization into making a conclusion. It is characterized by applying general principles
to specific situations; and for as long as the general principle being used is true for all
cases and the arguments are valid, then it is guaranteed that the conclusion is also true.

Example 1. For example, consider a very popular generalization in mathematics known


as the Pythagorean Theorem. It states that:
“In any right triangle, the sum of the squares of the legs (shorter
sides) is equal to the square of the hypotenuse (longest side) .”

If we know that the lengths of the shorter sides are 8 cm and 15 cm, then we can deduce
the length of the longest side c to be
c2 = 82 + 152
c2 = 64 + 225
c2 = 289
c = 17.
Observe that we used the general rule (Pythagorean Theorem) and applied it to the
specific situation. Thus, the result must be true.

Example 2. Consider the following arguments.


“All CLSU students are bright. Edwin is a CLSU student.
Therefore, Edwin is bright.”

The claim that “Edwin is bright” using deductive reasoning based on the premises or
assumptions that “All CLSU students are bright” and “Edwin is a CLSU student”.

Note that in the assumption, we have the word “All” which pertains to any student of
CLSU. Since Edwin is one of the students of CLSU, as stated in the second statement; the
our claim is valid.

Example 3. The angles 𝛼 and 𝛽 are complementary angles with 𝛽 = 35°. Use deductive
reasoning to find 𝛼.
Solution. It is a fact that two angles are complementary if and only their sum is 900.
That is, 𝛼 + 𝛽 = 90°
𝛼 + 35° = 90°
𝛼 = 90° − 35°
𝛼 = 55°
Department of Mathematics and Physics MATH 1100

Example 4. Use deductive reasoning to find the sum of the 1 st 50 positive odd numbers.

Solution. The positive odd numbers 1, 3, 5, 7, 9, . . . form an arithmetic sequence, where


it has been established that
𝑎𝑛 = 𝑎1 + (𝑛 − 1)𝑑 (1) and
𝑛
𝑆𝑛 = (𝑎1 + 𝑎𝑛 ) (2)
2
where 𝑎𝑛 ≔ 𝑛th term
𝑎1 ≔ 1st term
𝑛 ≔ number of terms
𝑑 ≔ common difference
𝑆𝑛 ≔ sum of the 1st 𝑛 terms

So from (2), 50
𝑆20 = (1 + 𝑎50 ) (3)
2

From (1), 𝑎50 = 1 + (50 − 1)(2) = 99

Substituting 99 to 𝑎50 in (3), 50


𝑆20 = (1 + 99)
2
𝑆20 = 400(1 + 99)

SUMMARY

 Inductive reasoning is characterized by coming up with a conjecture.


 A conjecture is generally an educated guess concluded from repeated
observations of specific situations.
 A conjecture is valid if it is always true, otherwise it is invalid. To debunk the
validity of a conjecture, just give one counterexample.
 A counterexample to a conjecture is a situation or a specific case which shows
that the conjecture is false.
 Deductive reasoning is the process of reasoning logically from an established
generalization into making a conclusion.
Department of Mathematics and Physics MATH 1100

POST-ASSESSMENT

Answer the following problems to train your mind.

I. Determine whether each of the following arguments is an example of inductive


reasoning or deductive reasoning.

1. During the past 15 years, a tree has produced guavas every other year. Last year
the tree did not produce guavas, so this year the tree wil produce plums.
2. All house renovation cost more than the estimate. The contractor estimated that
my house renovation will cost 500,000 pesos. Thus my house renovation will cost
more than 500,000 pesos.
3. All Bob Ong’s books are worth reading. The book ABNKKBSNPLAKo is a Bob Ong
book. Thus ABNKKBSNPLAKo is worth reading.

II. Answer the following problems using inductive or deductive reasoning.

4. Use inductive reasoning to predict the next number or letter in the list
a. 3, 5, 9, 15, 23, 33, _____
b. 5, 11, 17, 23, 29, 35, _____
c. J, F, M, A, M, J, J, _____

5. Use deductive reasoning to show that the following procedure always produces
the number 5.
Procedure: Pick a number. Add 4 to the number and multiply the sum by 3.
Subtract 7 and then decrease this difference by the triple of the original number.

6. Each of four neighbors, Jorem, Jomer, Delia, and Imman, has a different
occupation (teacher, architect, engineer, or doctor). From the following clues,
determine the occupation of each neighbor.
 Jomer gets home from work after the architect but before the doctor.
 Delia, who is the last to get home from work, is not the teacher.
 The doctor and Delia leave for work at the same time.
 The architect lives next door to Imman.

III. What is your favorite number? Using your favorite number create a problem similar
to item number 5. The answer should result to your favorite number.
Department of Mathematics and Physics MATH 1100

REFERENCE

Aufman, R. N., Lockwood, J., & Richard, D. (2013). Logic. In Mathematical Excursions
(3rd ed.). Brooks/Cole, Cengage Learning.
Department of Mathematics and Physics MATH 1100

MODULE 3b1

Problem Solving
Overview
Problems in mathematics can be classified into two basic types:
routine and non-routine problems. The techniques or strategies in solving
problems are different for each type.

Routine problems are problems that can be solved using arithmetic


operations and that are useful for daily living, whereas, non-routine
problems is mostly concerned with developing student’s critical and
mathematical reasoning.’

In this module, you will be introduced to the two types of problems


and apply the different strategies in solving routine and non-routine
problems.

Time Allotment: 1 week

Objectives:
Upon completion of this module, you are expected to:
1. Solve problems involving patterns and recreational problems
following Polya’s four steps;
2. Organize one’s methods and approaches for proving and solving
problems.

PRE-ASSESSMENT

1. What is the difference between an exercise and problem?


2. When do you say that a questions is a problem?
3. Give some strategies in problem solving.

1This module is based from the book “Mathematics in the Modern World” by the Department of Mathematics and
Physics, CS, CLSU.
Department of Mathematics and Physics MATH 1100

Problem Solving

Among the popular proponents of problem solving, George Polya (1945) indicated that
“A question is considered a problem if the procedure or method of solution is not
immediately known but requires one to apply creativity and previous knowledge in new
and unfamiliar situation.”

According to the National Council of Teachers in Mathematics (NCTM, 2000, p.52),


“problem solving means engaging in a task for which the solution is not known in
advance. In order to find a solution, students must draw on their knowledge or previous
experiences and through this process; they will often develop new mathematical
understandings.”

A problem can be classified as either a routine or a non-routine problem.

1. Routine Problems
A routine problem is one that may be solved by some algorithm or procedure that
involves the use of mathematical operations and applied to a particular situation. These
are the kinds of problems that are usually encountered in a typical mathematics
classroom. Often, solving a routine problem requires applying an established
generalization.

The following are examples of routine problems:

Example 1. Given 𝑓(𝑥) = 𝑥 2 – 5𝑥 + 4, is 𝑓(𝑥 + 3) = 𝑓(𝑥) + 𝑓(3) ?

Solution. The solution requires knowing the concept of functions (see Module 2).
On one hand,
𝑓(𝑥 + 3) = (𝑥 + 3)2 − 5(𝑥 + 3) + 4
𝑓(𝑥 + 3) = (𝑥 2 + 6𝑥 + 9) − 5𝑥 − 15 + 4
𝑓(𝑥 + 3) = 𝑥 2 + 𝑥 − 2

On the otherhand,
𝑓(𝑥) + 𝑓(3) = (𝑥 2 + 6𝑥 + 9) + ((3)3 − 5(3) + 4)
𝑓(𝑥) + 𝑓(3) = (𝑥 2 + 6𝑥 + 9) + 16
𝑓(𝑥) + 𝑓(3) = 𝑥 2 + 6𝑥 + 25

Thus 𝑓(𝑥 + 3) ≠ 𝑓(𝑥) + 𝑓(3).


Department of Mathematics and Physics MATH 1100

Example 2. Pedro wants to fill a big rectangular box with small cubes having side lengths
3 cm. The box is 12 cm in length, 6 cm in width and 9 cm in height. How many cubes
will fit in the box?

Solution. The solution requires recognizing that the problem is about the volume 𝑉 of a
box, which is always
𝑉 = (𝑙𝑒𝑛𝑔𝑡ℎ)(𝑤𝑖𝑑𝑡ℎ)(ℎ𝑒𝑖𝑔ℎ𝑡).
We need to find the volume of the big box, and then divide it by the volume of one of
the small cubes. We have
𝑉𝑏𝑜𝑥 = (12 𝑐𝑚)(6 𝑐𝑚)(9 𝑐𝑚) = 648 𝑐𝑚 3
𝑉𝑐𝑢𝑏𝑒 = (3 𝑐𝑚)(3 𝑐𝑚)(3 𝑐𝑚) = 27 𝑐𝑚3

And so, the number of cubes that would fit into the box is
648 𝑐𝑚 3
= 24
27 𝑐𝑚 3

A total of 24 of the small cubes having side lengths 3 cm would fit into the box.

Example 3. Juan invested ₱25,000 at 4.5% compounded semi-annually for 4 years. How
much interest will he earn?

Solution. We need here the concept of compound interests:


𝑟 𝑛
𝐹 = 𝑃 (1 + )
𝑚
where 𝐹 := final amount
𝑃 := initial amount or principal
𝑟 := annual rate of interest
𝑚 = no. of compounding periods in a year
𝑛 := total no. of compounding periods
𝐼 := amount earned or interest; also the difference of the final amount 𝐹
and the principal or initial amount 𝑃.

And so, the interest earned is computed as


𝐼 =𝐹−𝑃
𝐼 = 𝐹 − 25 000 (1)
Department of Mathematics and Physics MATH 1100

Meanwhile, F is computed as
𝑟 𝑛
𝐹 = 𝑃 (1 + )
𝑚
Since the investment is compounded semi-annually, 𝑚 = 2.
Moreover, we have 2 compounding periods in a year means that 𝑛 = (4)(2) = 8. Here,
4 is the number of years the investment gains interest.
4.5% 8
𝐹 = 25 000 (1 + )
2
0.045 8
𝐹 = 25 000 (1 + )
2
𝐹 = 29 870.78

Using this value of 𝐹 in Equation (1),


𝐼 = ₱29 870.78 − ₱25 000 = ₱4 870.78
The interest earned is ₱4 870.78.

Example 4. Rhey has 96 meters of fencing material. Find the area of the largest
rectangular lot that he can fence off with it.

Solution. We need here the established result that a quadratic function


𝑓(𝑥) = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
with constants 𝑎, 𝑏, 𝑐 is a parabola that opens upward if 𝑎 > 0 while it is a downward
𝑏
parabola if 𝑎 < 0; and the vertex is always at (𝑥, 𝑓(𝑥)) where 𝑥 = − 2𝑎.

Now let 𝑥 and 𝑦 be as in the figure below that represents the rectangular lot.

Since Rhey wants to fence the rectangular lot, we are interested with the perimeter of
the lot which is 2𝑥 + 2𝑦. Using the 96-meter fencing material we have
96 = 2𝑥 + 2𝑦
96 − 2𝑥 = 2𝑦
2𝑦 = 96 − 2𝑥
𝑦 = 48 − 𝑥

Now from the area 𝐴(𝑥) of the lot as a function in terms of 𝑥, we have
𝐴(𝑥) = 𝑥𝑦
𝐴(𝑥) = 𝑥(48 − 𝑥)
𝐴(𝑥) = −𝑥 2 + 48𝑥
Department of Mathematics and Physics MATH 1100

Recognize that 𝐴(𝑥) is a downward parabola so that the highest point occurs at the
vertex. At the vertex, we have
𝑥 𝑏
=−
2𝑎
𝑥 48
=−
2(−1)
𝑥 = 24
And
𝐴(24) = −(24)2 + 48(24)
𝐴(𝑥) = 576 𝑚 2

This means that the highest value of the area 𝐴 is 576 𝑚 2 . Using 96-meter fence, Rhey
can fence off a rectangular lot with a maximum area of 576 𝑚 2.

As may have been observed in the preceding examples, routine problems are those that
we usually see in a classroom mathematics discussion. Their solutions typically involve
applications of concepts from specific mathematics subjects. Of course, some routine
problems may also be solved by a different strategy such as what we discuss next in the
following section. We now turn to and focus on non-routine problems.

2. Non-Routine Problems
Non-routine problems are those where we do not readily have an idea on how to solve
it, or those that seem to be easy but are actually tricky; they are almost like puzzles. Such
problems may be solved in different ways or strategies and some may have more than
one answer or solution. Solving a non-routine problem usually involves common sense,
observations, and own strategy of the solver; it requires little or no use of algorithms.

Steps in Problem Solving


Polya (1945) suggested the following four general steps to solving a problem. It must be
emphasized that these steps must be taken only as a guide; there may be situations
where the listed steps overlap, or the steps are taken not in the same order as listed
below.
1. Understand the problem
2. Devise a plan
3. Carry out the plan
4. Look back and check

Step 1: Understand the Problem


Read, read again, and read the problem a third time if needed. If you don’t understand
a problem, do you think you can solve it? Figure out what kind of problem it is, take note
of what is asked for, what are the given values and/or conditions; keep only the relevant
ones.
Department of Mathematics and Physics MATH 1100

Step 2: Devise a Strategy


Some possible strategies are the following.
1. Work Backwards
2. Sketch a Picture
3. Guess, Check, Revise
4. Find a Pattern
5. Eliminate Impossible Cases

In some occasions, when you try using one strategy and then realize that it does not
work, don’t hesitate to choose another strategy. In other cases, you may need to use a
combination of strategies or even devise your own strategy. As a rule, we should be
encouraged to consider alternative solutions to a problem. Indeed, it has been said that
it is far better to solve one problem in four ways than to solve four problems in only one
way.

Step 3: Carry Out the Strategy


Once a problem is fully understood and the chosen strategy is appropriate, the rest is
usually a simple exercise. In case that the original strategy does not work, it may need
to be modified, or a new strategy may be needed. We must realize that not every problem
will be solved within the first attempt. A failed attempt can be viewed as a learning
experience. Be patient and try to avoid getting frustrated or discouraged. Computers,
calculators, or other device may be useful tools when routine tasks are involved.

Step 4: Look Back


Once an answer or solution is found, it is important to test that solution. Below are some
questions that you may find useful in the looking back process.
1. Is the answer reasonable? Does it satisfy the conditions in the problem?
2. Could there be more than one answer?
3. What is the appropriate unit of measurement?
4. Can another strategy be used?

The following are examples of some strategies. It must be pointed out that there may be
many ways to solve a problem. But we are focusing here only in illustrating some specific
strategies. The “Look Back” step is left as an exercise.
Department of Mathematics and Physics MATH 1100

STRATEGIES IN PROBLEM SOLVING

1. Work Backwards
This strategy is most appropriate if the problem involves multiple steps, and we are given
the final result instead of the initial values. The trick is to reverse the operation while
working backwards.

Example 1. A barefoot penniless boy named JR found a wallet with some money in it.
Out of it, he bought a ₱65-pair of slippers and then paid ₱20 for his jeep fare home.
Then, he gave half of what remained to his mom. But his mom didn’t need it all so she
gave back ₱35 to him. The boy ended up with ₱170. How much was in the wallet?

Solution. Start with the end-value that is ₱170. As we work backwards, we must reverse
the operation as follows:
Action Operation Reverse Operation
1. Mom gave back…(JR earned) Addition Subtraction
2. Gave half to Mom…(JR spent) Division Multiplication
3. Spent on fare…(JR spent) Subtraction Addition
4. Bought slippers…(JR spent) Subtraction Addition

Thus, starting with P170 and working backwards, we do the reverse operation as follows:
Action
1. Mom gave him P35: 170 − 35 = 135
2. He gave half to Mom: (135)(2) = 270
3. Spent ₱20 fare: 270 + 20 = 290
4. Bought ₱65 slippers: 290 + 65 = 355

Hence, when JR found the wallet, it contained ₱355.

Is the answer correct? Check it out by retracing the boy’s steps.


Department of Mathematics and Physics MATH 1100

2. Sketch a Picture
This is a great strategy if it is possible to sketch pictures especially in hard-to-visualize
ones. If possible, make your sketch big enough and include in your sketch only the
pertinent data.

Example 2. There are 5 posts in every side of a square. How many posts are there all
in all?

Solution.
From the sketch, the horizontal sides have 10 posts. The vertical sides need only
additional 3 posts each in order to have 5 posts in them. And so, there are a total of
10+3+3 or 16 posts.

Example 3. It costs ₱15 to have a long pipe cut into 3 pieces. How much would it cost
to have it cut into 6 pieces?

Solution. Note that what is paid for is the cost of cutting. Now, we need only 2 cuts in a
long pipe to have 3 pieces of it; this means that 1 cut costs P7.50.

To have the pipe cut into 6 pieces, 5 cuts must be done.

And so, (5 cuts) x (₱7.50 per cut) must cost ₱37.50.


Department of Mathematics and Physics MATH 1100

3. Guess, Check, Revise


This strategy is most appropriate when multiple related conditions need to be met. Start
by guessing intelligently an answer that meets one condition, then check if the other
conditions are also met. If yes, you got the answer. If not, revise your guess and repeat
the process.

Example 4. In a 20-item exam, the point system is 5 points for every correct answer,
and minus 2 points penalty for every wrong. You scored a total of 79 points in answering
each of the 20 items. How many correct answers did you make?

Solution.
It’s a good idea to tabulate (keep track) results of your guesses as in the following. Guess
#1 at 20 correct answers yields a total score of 100, which is way above the actual score
of 79. We need to revise Guess #1. Realize that there must be some wrong answers.

Need to
Number of Total Score
Correct Wrong 5(correct) –2 (wrong) Revise?
Guess #1 20 0 5(20) − 2(0) = 100 Yes. Too High

We need additional intelligent guesses.


Number of Total Score Need to

Correct Wrong 5(correct) –2 (wrong) Revise?


Guess #1 20 0 5(20) − 2(0) = 100 Yes. Too high
Guess #2 14 6 5(14) − 2(6) = 58 Yes. Too Low
Guess #3 18 2 5(18) − 2(2) = 86 Yes. Too high
Guess #4 17 3 5(16) − 2(4) = 𝟕𝟗 No. BINGO!

You made 17 correct answers.

Alternatively (more elegantly), start from Guess #1 which yielded 100 points. This result
is 21 points (that’s 100 – 79) more than the actual score of 72. We have to “uncorrect”
some answers and make them wrong.

Now, observe that “uncorrecting” 1 answer and making it wrong lowers down the total
points by 7 (that’s due to 5 for “uncorrecting” and another 2 for making it wrong).

Thus, to lower down by 21 the result of Guess #1, we need to “uncorrect” 21/7 or 3
answers.

That is, there must be 20 – 3 or 17 correct answers.


Department of Mathematics and Physics MATH 1100

Example 5. In a farm are dogs and ducks. All in all, there are 90 feet while there are
only 35 heads. How many dogs and how many ducks are there?

Solution.
A dog has 4 feet and a duck has 2 feet. Assume there are 35 dogs. This would yield
35(4) = 140 feet – that is 50 more than the given 90 feet. We need to “un-dog” some;
meaning choose them to be ducks.

To “un-dog” one and make it a duck would lower the number of feet by 2 (that’s due to
“minus 4” for “un-dog-ing” and “plus 2” for making it duck).

Thus, we need to “un-dog” 50/2 or 25 and make them ducks.

“Un-dog-ing” 25 (of the 35) yields 10 dogs and 25 ducks in the farm.

4. Find a Pattern

Example 6. For a school project, Leonora uses toothpicks to design what looks like the
following figure. If a box of toothpicks contains 100 pieces, how many boxes does she
need to build 50 house-alikes?

Solution. The first house-alike needs 6 toothpicks.


But for the 2nd, 3rd, 4th, ... , 50th house-alikes (49 of them), observe the pattern of
needing only 5 toothpicks for each of the succeeding house-alikes.

So, in order to build all of the 50 house-alikes, Leonora needs 6 + 5(49) = 251
toothpicks.

Since there are only 100 pieces in a box of toothpicks, she needs to buy 3 boxes of
toothpicks. That would be more than enough for her project.
Department of Mathematics and Physics MATH 1100

5. Eliminate Impossible Cases

Example7. With 5 darts all hitting the dart board each earning a corresponding score of
either 1, 3, 5, 7, 9 depending on where a dart lands in the board, which of the following
are possible total scores: 𝟑, 𝟗, 𝟐𝟗, 𝟑𝟓, 𝟒𝟐, 𝟓𝟎 ?

Formatted: Font: (Default) Arial


1 pt
Formatted: Font: 10 pt
3 pts
1
5 5 pts
Formatted: Font: Not Bold, Font color: Text 1, Text Outline,
Shadow
7 pts
1 9 pts
Formatted: Font: 10 pt
Formatted: Font: Not Bold, Font color: Text 1, Text Outline,
Shadow
Formatted: Font: 10 pt
Formatted: Font: Not Bold, Font color: Text 1, Text Outline,
Shadow
Formatted: Font: 10 pt
Solution. Since a dart may only earn an odd score (1, 3, 5, 7, 9), the total score for 5
Formatted: Font: 10 pt
darts must be odd. So, it is impossible to earn a total of 42 points.

Considering the extreme cases, 5 darts all landing in the 1-pt region earn a total of 5
points while 5 darts all landing in the 9-point region earn a total of 45 points. So, 3 and
50 must now join 42 in the eliminated cases. This leaves only 9, 29, and 35.

Now,
9 can possibly be a result of 1-1-1-3-3.
29 can possibly be a result of 3-5-7-7-7.
35 can possibly be a result of 7-7-7-7-7.

Hence, 9, 29, and 35 are possible total scores.

Example 8. Find the last digit in 201𝟑2020.

Solution. Since we are asked only for the last digit, the problem may be simplified by
considering the powers of 3 that is 32020 . Considering some powers of 3𝑛 ,

𝒏 1 2 3 4 5 6 7 8
Last digit of 𝟑𝒏 3 9 7 1 3 9 7 1

Observe that as the exponent increases, only 3, 9, 7, or 1 are popping up as the last digit.
This eliminates 0, 2, 4, 5, 6, and 8 as a possible last digit.

Moreover, observe that the pattern repeats in every cycle of 4.


Department of Mathematics and Physics MATH 1100

Dividing the actual exponent 2020 by 4 gives exactly a 505. This means that if 𝑛 = 2020,
the pattern 3 9 7 1 completes 505 full cycles exactly.

That is, when 𝑛 = 2020 it is at the end of the cycle. So, the last digit must be a 1.

Example 9. Rene is working on a cryptarithm, which aims to replace the letters with
distinct 1-digit numbers so that the addition is correct.
B A T H
+ B A T H
H A R O T
What number should replace which distinct letter?

Solution.
B A T H
+ B A T H
H A R O T

To replace the involved letters with the correct corresponding numbers, we choose from
among 0 1 2 3 4 5 6 7 8 9 .

Note that the sum has 5 digits. This implies that B must be 5 or more. Whatever it is, B
+ B can’t be 20 or more, even if there is a carry from A + A.

So H has to be a 1, which forces T = 2 and O = 4. So, we now have


B A 2 1
+ B A 2 1
1 A R 4 2

For the remaining letters, we now only choose from


0 1 2 3 4 5 6 7 8 9
Now A can’t be a 0 (why? if A=0, R=0 but A≠R since the letters are distinct 1-digit
numbers). Moreover, A can’t be a 3 (why? If A=3, R=6. Now what should be B so that
B+B=13? No possible value for B then).

Trying A = 5, we are then forced to have R = 0 and a carry 1 is brought into B + B.


1
B 5 2 1
+ B 5 2 1
1 5 0 4 2
Department of Mathematics and Physics MATH 1100

We now have to have B = 7. Thus,


B A T H
+ B A T H
H A R O T
is
1
7 5 2 1
+ 7 5 2 1
1 5 0 4 2

Note: The value of A can’t be a 6, 7, 8, or 9. Why? (Try them )

POST-ASSESSMENT

Answer the following problems to train your mind.

1. In the backyard, there are pigs and ducks. They have 29 heads and 92 legs. How
many animals are pigs and how many are ducks?
2. Place the numbers 1 to 9, one in each circle so that the sum of the four numbers
along any of the three sides of the triangle is 20. There are 9 circles and 9 numbers
to place in the circles. Each circle must contain a different number in it.
3. Find the digit represented by each letter in the coded letters. Each letter must
stand for a unique digit.
PITO
+ I S A____
WALO

4. A mathematics test consists of ten items. Five points are given for each correct
answer and two points are deducted for each wrong answer. If Madelyn did all
questions and scored 22, how many incorrect answers did she have?
5. Rose sells guavas and guyabanos in her fruit stand. Each guava costs one amount
and each guyabano costs another amount. 5 guavas + 1 guyabano cost P60. Two
guavas and 3 guyabano cost P61. At these ptices, how many pesos do 12 guavas
and 5 guyabano cost?

REFERENCE

Aufman, R. N., Lockwood, J., & Richard, D. (2013). Logic. In Mathematical Excursions
(3rd ed.). Brooks/Cole, Cengage Learning.
Central Luzon State University
Science City of Muñoz 3120
Nueva Ecija, Philippines

Instructional Module for


Mathematics in the Modern World

Chapter 4
Data Management
Overview

During the Crimean War in Victorian England, Florence Nightingale (1820–


1910) took a mission to improve the squalid field hospital conditions of the British
army. She compiled massive amounts of data from the army files which she used to
convince members of the British Parliament about the need to supply nursing and
medical care for soldiers in the field. Through a remarkable series of graphs, she
used statistics to demonstrate that most of the deaths in the war were due to illness
contracted outside the battle from wounds that went untreated. Her compassion and
self-sacrificing nature, coupled with her ability to collect, arrange, and present large
amounts of data, led to her being regarded as the Passionate Statistician.
(https://www.coursehero.com/file/p6unj1f/Descriptive-statistics-utilizes-numerical-and-graphical-methods-to-look-for/)

The above story clearly illustrates the importance of being able to efficiently
collect, organize and manage data. In this chapter, we briefly discuss data
management, which is mainly a topic under the field of Statistics.

Objectives
On successful completion of the module, students will be able to:
1. Advocate the use of statistical data in making important decisions.
2. Discuss and interpret data.
3. Understand and interpret the different measures of central tendency,
measures of dispersion, and measures of relative position.
4. Use a variety of statistical tools to process and manage numerical data.

Statistics
Statistics is the science of collecting, organizing and summarizing recorded
information or data (descriptive statistics) in such a way that a valid conclusion and
meaningful predictions can be drawn from them (inferential statistics).
Mathematics in the Modern World | 4. Data Management

Types of Statistics
1. Descriptive statistics is consists of methods concerned with the collection,
description and analysis of data without drawing conclusions or inferences about a
larger set. Its main concern is simply to describe the set of data such that
otherwise obscure information is brought out clearly.
2. Inferential statistics utilizes sample data to make estimates, decisions, predictions,
or other generalizations about a larger set of data.

Variables
In statistics, a variable refers to a specific characteristic (or attribute) of a
subject. Such an attribute may assume two or more different values. For example, the
“sex” of a person is variable; its value is either „male‟ or „female. Other examples of
variables are your course, citizenship, age, height and weight.

Types of Variables
1. Qualitative variables are those whose values are measured not in terms of
numbers, but categorically by means of depression. Examples are “course”,
“citizenship”, “favorite color” and “place of birth”.
2. Quantitative variables are those that are always associated with numbers or a
scale measure. Examples are “age”, “height”, “weight” and “population”.

The measurement of a variable may either be discrete (integer) or continuous,


and are classified into one of the following scales of measurements:
1. Nominal – characterized by data that consists of names, labels, codes or
categories only. These data cannot be arranged in an ordering scheme and cannot
be used for calculations. Examples are gender, citizenship, religion, house number,
plate number, ID number, and zip code.
2. Ordinal – it involves data that may be arranged in some order. Examples are sizes
(small, medium, large), socio-economic class (working, middle, upper),
educational attainment, and the Likert scale (strongly disagree, disagree, neutral,
agree, strongly agree).
3. Interval – measurements where the difference between values is meaningful.
Examples are temperature in Celsius, pH level, and IQ.
4. Ratio – measurements are ordered according to the amount of attribute they
possess. Equal differences in the attribute are represented by equal differences in
the numbers assigned. In ratio, zero means absence of something. Temperature
in Celsius or Fahrenheit is not a ratio scale because 0⁰C or 0⁰F does not mean the
absence of temperature; while temperature in Kelvin is an example of a ratio scale
since 0⁰K means an absence of heat. Examples are height, weight, age, and
cellphone load.

Nominal and ordinal are qualitative variable, while interval and ratio are
quantitative variables.

Page 2 of 27
Mathematics in the Modern World | 4. Data Management

Population versus Sample


In statistics, a population refers to the entire set of all objects under study; while
a sample refers to any subset of the population.

Illustration 1:
Consider an upcoming election for Provincial Governor. A candidate spends time,
money and effort to conduct a survey on who is likely to be the next governor.
Statistically, the whole list of voters in the province is what is referred to as the
population for the survey. But inasmuch as it would be very costly and virtually
impossible to interview every voter in the province, only a few will be actually
interviewed. Such a few voters are what are referred to as the sample. Results from the
sample will then be used to project the trend of the whole population.
That is, data is collected from a sample and then summarized in order to draw a
conclusion that is taken to be true for the whole population. Thus, a good sample is one
that truly represents the population, so that conclusions made from the sample is valid
for the entire population. If a sample is bad, then conclusions from it may not be valid
for the population. The fact is, information could change from one sample to another
sample of the same population.

Illustration 2:
A student researcher wants to do a survey among CLSU students. Instead of
doing a survey of all the students in CLSU, he just chose and surveyed a group of 45
students (five students per college). In this scenario, the population is all the students
of CLSU, while the sample is the group of 45 students.

Organizing Data
Considered as Phase I of organizing data is data collection, where each element
of the data is called a data point. Generally in this phase, the raw data may not show
any apparent pattern or trend.

Illustration 3:
Phase I. The following data are the respective number of kids of 50 families.
0 2 1 0 3 2 0 1 1 0
0 1 1 2 4 1 0 1 1 0
2 1 0 0 3 0 0 1 2 1
0 0 2 4 1 1 0 1 2 0
1 1 0 3 5 1 2 1 3 2

The raw data as it is presented, suggests nothing but just numbers. But if we
organize the data (Phase II), they become more meaningful.

Frequency Distribution Table


The most common way of organizing data is using a frequency distribution table
or FDT. It utilizes a table that lists all data points, along with how many times the data

Page 3 of 27
Mathematics in the Modern World | 4. Data Management

point occurs (frequency, ), and its percentage of the total number of data (relative
frequency, ).

Illustration 4: (Ungrouped Data)


Phase II. Frequency distribution of the data in Illustration 3
# of Kids Tally Frequency Relative Frequency
0 IIII – IIII – IIII - I 16 32 %
1 IIII - IIII – IIII - III 18 36 %
2 IIII – IIII 9 18 %
3 IIII 4 8%
4 II 2 4%
5 I 1 2%
Total = 50 100 %
Observe that the data has become more meaningful; for example, we can now
see that majority (a total of 86%) of the families are small-sized with only 2 or less
kids.
Note that in Illustration 3, there are only a few distinct data points (0, 1, 2, 3, 4,
or 5). If there are many distinct data points, it is better to group together the data that
belong to the same interval, as illustrated below.

Illustration 5. (Grouped Data)


Phase I. The following are examination scores of 42 mathematics students.
26 16 21 34 45 18 41
48 27 22 30 39 62 25
29 31 28 20 56 60 24
32 33 18 23 27 46 30
49 59 19 20 23 24 38
25 61 34 22 38 28 62

Phase II. We organize the raw data into a frequency distribution. First, we must decide
on how many groups to use. Customarily, the number of groups is any number from
4 to 8. Say, we use 6 groups here. Second, we determine the interval for each
group. This is done by,
6 6
7.66̅
6

In order to be consistent with the data which are integers, we round it off to 8.
Round off the class interval in such a way that it has the same number of decimal
places in given data.

Determine and enumerate the class intervals. Each class interval is defined by its
lower and upper class limits. There must be enough classes to include the lowest and
the highest values. As a rule, the lowest value in the data becomes the lower limit (LL)
of the first class interval. Adding to the lower class limit of the preceding class interval
obtains the succeeding lower limits. Upper class limits are obtained using the formula:

Page 4 of 27
Mathematics in the Modern World | 4. Data Management

Hence, the frequency distribution is

Score (x) Tally Frequency Relative Frequency


16 - 23 IIII – IIII – I 11 26 %
24 - 31 IIII - IIII – IIII - III 13 31 %
32 - 39 IIII – II 7 17 %
40 - 47 III 3 7%
48 - 55 II 2 5%
56 - 63 IIII - I 6 14 %
Total n = 42 100 %

Histogram

Data that are grouped in intervals can be depicted by a histogram, which is


actually a bar graph that shows how the data are distributed. The histogram for the
data in Illustration 5 is:

Figure 1. Frequency Distribution of Examination Scores of 42 Students

15
12
Frequency (f)

9
6
3

16 24 32 40 48 56 64
Scores

Note that a histogram should show an accurate comparison of the data. That is,
the length of the rectangles must correspond to the frequencies of the intervals, and
the width of the rectangles must be of the same size, since each interval has the same
class interval.

Pie Charts

The data used in the preceding examples were all quantitative (numerical). For
qualitative (categorical) data especially, an easy way to summarize data is through the
use of a pie chart. Pie charts are used to clearly show what part of the whole is
accounted by a specific characteristic.

You have the option to choose for the arrangement of sectors (either clockwise
or counterclockwise). After deciding the arrangement of sectors, place the sector with
the highest relative frequency starting from 12 o‟clock. The relative frequency of the
sectors should be arranged in decreasing order.

Page 5 of 27
Mathematics in the Modern World | 4. Data Management

Illustration:
In Brgy. Bacal Cuatro, Talabira, the marital status of its adult population in 2020
is tabulated below:

Marital Status Frequency Relative Frequency


Single 50 25%
Married 113 56.5%
Widowed 28 14%
Separated 9 4.5%
Total 200 100%

A pie chart to summarize the tabulated data is:

Figure 2. Percent Distribution of Marital Status, Barangay Bacal Cuatro, Talabira: 2020

4.5%
Separated
14%
Widowed

56.5%
25% Married
Single

The whole reason for constructing a pie chart is to convey information visually; it
should enable the reader to compare easily the relative proportions of the categorical
data. Thus, every slice of the pie should correspond to the relative frequency, which is
also written in the label. Using different colors for every slice in the pie may also help.
And, if the names of the categories are too long, a legend may be used.

Page 6 of 27
Mathematics in the Modern World | 4. Data Management

Measures of Central Tendency

Measure of central tendency is a value that indicates where the center of


distribution tends to be located, or simply the average of the data. It is said to form the
basis of statistics. The most common measures of central tendency are the: mean,
median, and mode. On a perfect normal distribution, all three measures of central
tendency are located at the same score, which is at the center of the normal
distribution.

Mean

The mean is the most commonly used measure of central tendency. The mean of
a data set is the sum of the data points divided by the number of data points, or simply
the average of the data points. Thus, it is strongly influenced by outliers (data points
that are extremely low or extremely high compared to other data points). The
po0pulation mean, denoted by , is estimated by the sample mean denoted by ̅.

where are the data poins and is the number data points.

Some characteristics of the mean are the following:


1. The sum of deviations of the data points from the mean is zero. (Deviation is the
difference between a data point from a certain data point)
2. The sum of the squared deviations of the data points is minimum when the
deviations are taken from the mean.
3. If a constant is added (or subtracted) to every data point, the new mean is the
original mean increase (or decrease) by .
4. If every data point is multiplied (or divided) by a constant , the new mean is the
original mean multiplied (or divided) by .
5. Since the mean is a calculated number, it may not be an actual value in the data
points.

Example 1: The data below are the current diesel prices (in pesos/liter) in nearby gas
stations, find the mean price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.80

Solution:
̅
̅ 43.26 pesos/liter

Page 7 of 27
Mathematics in the Modern World | 4. Data Management

Example 2: Gabriel has a total of 4 quizzes. One quiz is missing while the scores of his
remaining quizzes are 43, 35 and 39. Calculate the score of the missing quiz if his
mean score is 41.

Solution:
Let denote Gabriel‟s score in his missing quiz.
̅

( )
47

Example 3: In a class of 18 men and 22 women, the mean score of men in a quiz is 38
while the mean score of women is 35. Find the mean score of the whole class.

Solution:
( ) ( )
̅
̅ 36.35

Mean of Grouped Data

In a grouped data, we do not know the individual data points. In such situations ,
we use the midpoints of the intervals to represent individual scores. Consequently, the
mean of the grouped data is only an approximation.
̅
where is the midpoint of each interval and is the frequency of each interval.

Example 4: Find the mean score of 42 students from the following frequency
distribution:
Score Frequency
16 - 23 11
24 - 31 13
32 - 39 7
40 - 47 3
48 - 55 2
56 - 63 6

Solution:
Step 1: Add two columns for Midpoint ( ) and , and compute for its value. The
midpoint is half of the sum of lower limit and upper limit less by one measure of

Page 8 of 27
Mathematics in the Modern World | 4. Data Management

unit in each interval (See the example below) while is the product of frequency
and midpoint in each interval.
Step 2: Compute for and .
Step 3: Use the formula ̅ to get the mean of the grouped frequency distribution.

Frequency
Score Midpoint ( )
( )
16 - 23 19.5 11 11(19.5) = 214.5
24 - 31 27.5 13 13(27.5) = 357.5
32 - 39 35.5 7 248.5
40 - 47 43.5 3 130.5
48 - 55 51.5 2 103.0
56 - 63 59.5 6 357.0
Total = 42 1411

Finally, ̅ 33.60
Note: Actually, the data in this example are those used in Illustration 5 of this chapter.
The reader is urged to compute the actual mean which is 33.64. It only shows that the
mean of a grouped data is just an approximation of the actual mean.

Median

The median is a value that separates an array of data points into two equal parts.
To find it, the data need first to be arranged in numerical order. If there is an odd
number of data points, then the median is the middle value. If there is an even number
of values in the data set, then the median is the average of the two middle values. The
median can be denoted by or ̃.

Unlike the mean, median is not affected by extreme values in data points because
it only considers the middle values in the data set.

Example 5: Calculate the median age of the seven employees.


25 31 25 62 49 50 38

Solution: First, we need to arrange the data from lowest to highest.


25 25 31 38 49 50 62
Since there are 7 (odd) data points, the median is the middle value which is 38.

Example 6: The current crude oil prices (in pesos/liter) in nearby gas stations are listed
below. Find the median price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.90

Page 9 of 27
Mathematics in the Modern World | 4. Data Management

Solution: 39.00 42.95 43.80 43.80 43.90 44.10 44.30 44.30


Median
There are 8 (even) data points, the median price is the average of the two middle
values, 43.80 and 43.90, which is 43.85 pesos/liter.

Mode

The mode of a data set is the data point that occurs most often. If no data point is
repeated or every data point is repeated the same number of times, there is no mode.
If the mode of a data set exists, it may not be unique. A unimodal data set has one
mode, bimodal has two modes, trimodal has three modes and multimodal has many
modes. The mode can be used for qualitative as well as quantitative data.
Mode is not affected by the extreme values in the data set, since it only considers
the most frequent data. Mode can be denoted by or ̂.

Example 7: Find the mode of the following data set;


a. 1, 2, 3, 4, 5, 6, 7, 8 b. 1, 2, 3, 4, 1, 2, 3, 4 c. 5, 8, 4, 8, 6, 7, 5, 3

Solution:
a. There is no mode because no data point is repeated.
b. There is no mode because all data points are repeated twice.
c. The mode is 5 and 8, since 5 and 8 are repeated twice.

Example 8: Thirty students are asked about their favorite color. The data is summarized
by the frequency distribution table below. Find the mode.

Color Frequency
Yellow 2
Blue 5
Red 5
White 8
Black 10
The mode is black, since it has the highest frequency.

Page 10 of 27
Mathematics in the Modern World | 4. Data Management

In some situations, the measures of central tendency cannot provide enough


information that would lead to a valid conclusion, especially when two or more sets of
data need to be compared. In the following example, a weakness of the mean, median
and mode is illustrated.

Suppose that we are choosing between Jerico and Jerwin on who should represent
CLSU to an upcoming Inter-University Math Quiz Bee. To choose, their coach conducted
6 sessions of quiz-alikes between them, and came up with the following scores:

Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6


Jerico 83 65 100 92 85 85
Jerwin 81 85 74 85 90 95

So, after the 6 quizzes, Jerico and Jerwin were tied at 3 wins and 3 losses. Who
should be chosen? Looking at their averages (verify);

Mean Median Mode


Jerico 85 85 85
Jerwin 85 85 85

Surprisingly, they are again tied in these measures. The mean, median, and the
mode cannot help in deciding on who should be sent to the Quiz Bee!
Another measure that could help is to look at their consistency. This is about the
measure of variability that is to look at how spread apart or dispersed their scores are.

Measures of Variability

A measure of variability (or dispersion) is a quantity that measures the spread of


scores in a given population. It indicates the extent to which observations in a data set
are scattered about the mean. Scores that are relatively close together have a lower
variation as compared to scores that are spread farther apart. To measure the spread
or dispersion of data, we use statistical values known as the range, variance and
standard deviation, these three statistical values are the most common measures of
variability.

Range

The range, denoted by , is the difference between the lowest and the highest
values in a data set. A weakness of the range is that an extreme value (outlier) can
greatly alter its value.
= Highest Value – Lowest Value

For example, Jerico‟s range is 00 – 65 or 35; Jerwin‟s range is 95 – 74 or 21.


This indicates that the scores of Jerico are more spread apart.

Page 11 of 27
Mathematics in the Modern World | 4. Data Management

Variance and Standard Deviation

First, we define deviation to be ̅ where is a data point and ̅ is the


mean. It is the difference of a data point from the mean.

Now, in order to test their consistency, it may be tempting to average their


deviations. But, as we can see in the following table, the sum of the deviations is
always 0. This results into Jerico‟s and Jerwin‟s average deviations to be both zeroes
also.
Jerico ( ̅ ) Jerwin ( ̅ )
Score Deviation Score Deviation
̅ ̅
83 –2 81 –4
65 –20 85 0
100 15 74 –11
92 7 85 0
85 0 90 5
85 0 95 10
Total 0 Total 0

Generally, in any set of data, it can be shown algebraically that the sum of the
deviations is always 0. The negatives always cancel out the positives. So, in order to
use deviations effectively to study how the data is dispersed, the remedy is to square
each deviation. This leads to what is called as variance.

Variance is the mean of the squared deviation of the data points. The sample
variance (denoted by ) is an estimator of the population variance (denoted by ). In
symbols, sample variance of data points where is the number of data
points is defined as

( ̅)

Note: 1. If the data points represent the entire population, the divisor used is .
But for sample data points, the divisor is – . It has been a general
observation and agreed upon by statisticians that using – rather than
produces a best estimate of the true population variance.

2. Remember that the variance of a sample is an estimate of the variance of


the population. Since there are far more data points in a population, the
population tends to vary more as compared to a sample. Thus, using n as
divisor in a sample tends to underestimate the true variance of the population.
Statisticians determined that using n – 1 would compensate for such an
underestimation.

Page 12 of 27
Mathematics in the Modern World | 4. Data Management

3. Alternatively, the variance may be computed relatively quicker and easier by


the equivalent formula below. We don‟t need the mean in using this formula.
( )
[ ]

Variance is a tool to enable us to measure the typical deviation found in a set of


data, by using the individual deviations of the data points. Recall that the deviations
were squared in order to overcome the negative deviations cancelling out the positives.
Now finally, we sort of undo the squaring process – take the square root. The result is
what is called the standard deviation.

Standard deviation is defined as the square root of the variance and is


denoted by (for sample) or (for population). Thus,

Example: Compute the respective (a) variance and (b) standard deviation of the scores
of Jerico and Jerwin.

Solution:

( ̅)
a. Using the formula ,

Jerico ( ̅ ) Jerwin ( ̅ )
Score Deviation Score Deviation
( ̅) ( ̅)
̅ ̅
83 –2 4 81 –4 16
65 –20 400 85 0 0
100 15 225 74 –11 121
92 7 49 85 0 0
85 0 0 90 5 25
85 0 0 95 10 100
( ̅) 678 ( ̅) 262
( ̅) ( ̅)

And so, Jerico‟s variance is 135.6 and Jerwin‟s variance is 52.4

Take note that the value in the Deviation column is computed by subtracting the
given mean from each data, for example 83-85=-2, 65-85=-20, 100-85=15, and so on;
while the value in the ( ̅ ) column is computed by squaring each value in the ̅
column, for example (-2)2=4, (-20)2=400, (15)2=225, and so on.

Page 13 of 27
Mathematics in the Modern World | 4. Data Management

( )
Other solution: Using the alternative variance formula, [ ]. We
need to find the sum of the data and the sum of the squares of each data point. We
don‟t need the mean of the data.

Jerico Jerwin
Score ( ) Score ( )
83 6 889 81 6 561
65 4 225 85 7 225
100 10 000 74 5 476
92 8 464 85 7 225
85 7 225 90 8 100
85 7 225 95 9 025
Σx 5 0 Σx2 = 44 028 Σx 5 0 Σx2 = 43 612
( ) ( )
[ ] [ ]

[ ] [ ]

= 135.6 = 52.4

Note that the two formulas for variance yield the same result. This is always the
case. In fact, it may be proven algebraically that the formulas are equivalent.

b. Finally, their respective standard deviation are


Jerico: √ Jerwin: √
= 11.64 = 7.24
Standard deviation (and variance) is a relative measure of the dispersion of a set
of data; the larger the deviation means the more spread out a set of data is. In a single
set of data, it may not be very informative. It is most useful in comparing the
(in)consistencies of two sets of data of the same type. The set with a lower standard
deviation contains data that are more consistent; the set with a higher standard
deviation contains data that are more spread out or dispersed (less consistent).

So, between Jerico and Jerwin in the example, Jerwin wins in as far as
consistency is concerned because he has a lower standard deviation 7.24 as compared
to Jerico‟s .64 .

Page 14 of 27
Mathematics in the Modern World | 4. Data Management

Measures of Relative Position

As earlier discussed, the measures of central tendency especially the mean and
the median describe the „center‟ of a distribution. Indeed, such a center is what is
usually used and needed to summarize a distribution. Occasionally however, a different
part of the distribution is of more interest. The percentile, decile, and quartile are
used in such occasions, as they indicate the location of a data point relative to the other
data points.

Percentiles
Percentiles split the whole distribution into 100 subgroups. It is similar to cutting
a long pipe into 100 short pipes of equal lengths. In order to do this, it is necessary to
make 99 cuts. The points where the cuts are done correspond to percentile ranks or
scores. Thus, percentile ranks are from 1 to 99, which we hereby denote by P1, P2, P3,
…, P99, . There is no sense to have a P0, nor a P100.

A percentile is a value that describes the percentage of data that falls below it.
For example, suppose you got a 99 percentile score in an exam. It means that 99% of
the examinees scored lower than you; it doesn‟t mean that you had a score of 99%. In
fact, your actual score is not at all indicated.

Illustration:
Suppose that Sonny is among the 15,000 high school graduates who took
the CLSU Admission Test, and he got a 48 percentile score.

His 48 percentile score means that 48% of the 15,000 examinees (7,200)
scored lower than Sonny. It doesn‟t mean that his actual score in the exam is 48.
On the other hand, his actual score is lower than 52% of the 15,000 examinees
(7,800).

Suppose another student Nick got a percentile score of 68. This means
that 68% of the 15,000 examinees (10,200) scored lower than Nick while 32%
or 4,800 examinees scored higher than him.

The actual scores of Sonny and Nick both remain unknown, until we do
some calculations that also involve the whole distribution of data points, their
percentile scores, and the number of data points.

Calculating Percentiles
To find a data point that corresponds to a percentile score , the following steps
are suggested.
1. Arrange the data points numerically from lowest to highest.

Page 15 of 27
Mathematics in the Modern World | 4. Data Management

2. Find the location of the data point by the formula


Lp 
p
(n  1)
where = number of data points
100
3. Use to find the data point.
a. If the computed is an integer k, then the data point is in the kth
position of the arranged data.
b. If the value of includes a decimal such as k.d, then the data point Pp is
(kth data) + 0.d[(k+1)th data – kth data]

Example 1: Find P25 and P80 from the following data:


2 6 4 5 3 6 5 4 3 3 2 4 5 4 6

Solution: Note that P25 and P80 respectively refer to the 25th and 80th percentiles.
Step 1. Arrange the data in ascending order:
Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data Point 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6
25 80
Step 2. L25  (15  1)  4 L80  (15  1)  12.8
100 100

Step 3. Since L25 = 4 (integer), then Since L80 = 12.8 (with decimal),
P25 = 4th data P80 = 12thdata + 0.8(13th–12th)
=3 = 5 + 0.8(6 – 5)
= 5.8
The data points that correspond to 25th and 80th percentiles are respectively 3 and 5.8.

Example 2: The following are heights (in inches) of some students:


Sonny 59 Melch 62 Ronel 66 Jhun 67
Nick 61 Jade 59 JR 64 Edu 63
Ingrid 58 Ammi 64 Dinah 63 Rene 67
Rose 64 Delia 58 Ped 66 Rain 67
Chad 70 Angie 64 Edwin 64
Chito 66 Jorem 63 Al 67
a. Find P30 and P60.
b. JR‟s height corresponds to what percentile?

Solution:
Step 1. Arrange the data according to height (shortest to tallest).
1. Ingrid 58 7.Jorem 63 13 JR 64 19 Jhun 67
2. Delia 58 8. Dinah 63 14 Rose 64 20 Rene 67
3. Sonny 59 9. Edu 63 15 Chito 66 21 Rain 67
4. Jade 59 10 Ammi 64 16 Ronel 66 22 Chad 70
5. Nick 61 11 Angie 64 17 Ped 66
6. Melch 62 12 Edwin 64 18 Al 67

Page 16 of 27
Mathematics in the Modern World | 4. Data Management

30 60
Step 2. L30  (22  1)  6.9 L60  (22  1)  13.8
100 100

Step 3.
P30 = 6th data + 0.9(7th–6th) P60 = 13thdata +0.8(14th–13th)
P30 = 62 + 0.9(63 – 62) P60 = 64 + 0.8(64 – 64)
P30 = 62.9 P60 = 64
a. P30 = 62.9 and P60 = 64.

b. JR‟s height is in the 13th location, meaning . So,


p
Lp  (n  1)
100
p
13  (22  1)
100
13  (100)
p
23
= 56.52

Due to the definition of percentiles, it is safest to always round-down for any


decimals. Thus, we can say that 56% of the students are shorter than JR.

Deciles

Deciles split the whole distribution into 10 subgroups. It is similar to cutting a


long pipe into 10 shorter pipes of equal lengths. In order to do this, it is necessary to
make 9 cuts. The points where the cuts are done correspond to decile ranks or scores.
Thus, decile ranks are from 1 to 9 denoted by D1, D2, D3, up to D9. It makes no sense
to talk about D0 nor D10.

Correspondingly, . That is,


D1 = P10 D4 = P40 D7 = P70
D2 = P20 D5 = P50 D8 = P80
D3 = P30 D6 = P60 D9 = P90

Consequently, computations for deciles may be done by using the corresponding


percentiles.

Example 3: In the preceding example (Example 2) of student heights, the 3rd decile D3
could be computed by considering P30, which was computed to be 62.9.
Furthermore, D6 = P60 = 64. Similarly, to find the 9th decile, D9 = P90
Computing for P90,
Step 1. (see arranged data in the preceding page)
90
Step 2. L90  (22  1)  20.7
100

Page 17 of 27
Mathematics in the Modern World | 4. Data Management

Step 3. P90 = 20thdata + 0.7(21th–20th)


= 67 + 0.7(67 – 67)
= 67
So, D9 = P90 = 67.

Quartiles

Quartiles split the whole distribution into 4 subgroups. It is similar to cutting a


long pipe into 4 shorter pipes of equal lengths. In order to do this, it is necessary to
make 3 cuts. The points where the cuts are done correspond to quartile ranks or
scores. Thus, percentile ranks are from 1 to 3 denoted by Q1, Q2, and Q3. It makes no
sense to talk about Q0 nor Q4.

Correspondingly, . That is,


Q1 = P25 Q2 = P50 = median Q3 = P75

Consequently, computations for quartiles may be done by using the


corresponding percentiles. In Example 1 on percentiles (p. 105), the 1st quartile
Q1 = P25 = 3.

Example 3. The following are heights (in inches) of some students, find Q1, Q2, and Q3.
Sonny 59 Melch 62 Ronel 66 Jhun 67
Nick 61 Jade 59 JR 64 Edu 63
Ingrid 58 Ammi 64 Dinah 63 Rene 67
Rose 64 Delia 58 Ped 66 Rain 67
Chad 70 Angie 64 Edwin 64
Chito 66 Jorem 63 Al 67

Solution: Since Q1 = P25, Q2 = P50 and Q3 = P75, we compute for the corresponding
percentiles.

Step 1. Arrange the data according to height (shortest to tallest).


1. Ingrid 58 7.Jorem 63 13 JR 64 19 Jhun 67
2. Delia 58 8. Dinah 63 14 Rose 64 20 Rene 67
3. Sonny 59 9. Edu 63 15 Chito 66 21 Rain 67
4. Jade 59 10 Ammi 64 16 Ronel 66 22 Chad 70
5. Nick 61 11 Angie 64 17 Ped 66
6. Melch 62 12 Edwin 64 18 Al 67

Step 2. a. For b. For c. For


25 50 75
L25  (22  1) L50  (22  1) L75  (22  1)
100 100 100
= 5.75 = 11.5 = 17.25

Page 18 of 27
Mathematics in the Modern World | 4. Data Management

Step 3.
a. P25 = 5th + 0.75(6th – 5th) b.P50 = 11th + 0.5(12th –11th)
= 61 + 0.75(62 – 61) = 64 + 0.5(64 – 64)
= 61.75 = 64

c. P75 = 17th + 0.25(18th – 17th)


= 66 + 0.25(67 – 66)
= 66.25

Thus, Q1 = P25 = 61.75 Q2 = P50 = 64 Q3 = P75 = 66.25

Page 19 of 27
Mathematics in the Modern World | 4. Data Management

Normal Distribution

Many sets of data exhibit a pattern such as what is exhibited in the following
histogram of some discrete data. Most of the data are concentrated towards the center
and taper off at either end; the data is almost symmetrical with respect to the “center”.

15
Frequency (𝑓)

12

This type of data distribution occurs very frequently in many situations. The
normal distribution or the Gaussian distribution (in honor of Gauss, 1777-1835) is the
most important distribution in statistics. Statisticians created an ideal bell-shaped curve
(also called normal curve) to describe such a normally distributed data. The normal
curve is symmetric about a vertical axis through the mean, with a total are under the
curve equal to 1 and the curve is asymptomatic to the x-axis.

The Normal Curve

All data points are contained and spread under the bell shape, which is asymptotic
to the horizontal line. Characteristically,

1. Data points are clustered toward the center; only a few are found toward the
two ends or tails.
2. The number of data points at both sides is the same. Consequently, the three
measures of central tendency (mean, median and mode) all coincide at the
center.

Page 20 of 27
Mathematics in the Modern World | 4. Data Management

A wide variety of data have been observed to manifest the normal distribution,
and statisticians have established the occurrence and location of data points under the
normal curve. With the population mean and population standard deviation ,
occurrence of data under the normal curve has been established as illustrated below:

99.74%
95.44%
68.26%

µ- µ- µ- µ µ µ µ

Note: 1. 68.26% of the data are located from to .


2. 95.44% of the data are located from to .
3. 99.74% of the data are located from to .

Illustration. Assume that the scores of all 32,000 civil service examinees this year are
normally distributed. Their mean score is 66.5 points and the standard deviation is
2.4 points.

Solution: Based from the given, µ = 66.5 and σ = 2.4,


a. µ – σ = 66.5 – 2.4 = 64.1 and µ + σ = 66.5 + 2.4 = 68.9
This means that 68.26% of the 32,000 examinees or (21,843 examinees)
scored between 64.1 and 68.9 points.

b. µ – 2σ = 66.5 – 2(2.4) = 61.7 and µ + 2σ = 66.5 + 2(2.4) = 71.3


This means that 95.44 % of the 32,000 examinees (30,540 examinees)
scored between 61.7 and 71.3 points.

c. µ – 3σ = 66.5 – 3(2.4) = 59.3 and µ + 3σ = 66.5 + 3(2.4) = 73.7


This means that 99.74% of the 32,000 examinees (31,916 examinees)
scored between 59.3 and 73.7 points.

Example 1: In a recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ

a) above 100 b) between 85 and 115 c) above 145?

Page 21 of 27
Mathematics in the Modern World | 4. Data Management

Solution: With the given µ = 100 and σ = 15, the distribution of the scores is

99.74%

95.44%
68.26%

55 70 85 100 115 130 145

a. Above 100.
Note that 100 is the mean, and in normal distribution mean is in the center.
Since a normal curve is symmetrical to the center (µ = 100), there must be
half or 50% of the scores above it. So, there are half of 9800 scores, that is
4900 students of the 9800 have an IQ above 100.

b. Between 85 and 115.


The interval is exactly from µ–σ to µ+σ which always accounts for 68.26% of
the population. So, 68.26% of 9800 or 6,689 of the 9800 students have IQs
between 85 and 115.

c. Above 145.
Those whose scored falls from 55 (or µ–3σ) to 145 (or µ+3σ) accounts for
99.74% of data. Hence, the remaining, that is those who scored above 145
(right tail) and below 55 (left tail), accounts only for 100%–99.74% = 0.26%.
Knowing that the normal curve is symmetric, only 0.13% are at each of the
two tails. Thus, 0.13% of 9800 which is approximately 12 students have an
IQ above 145.

Notice that we round down the answer.

The Standard Normal Distribution

Observe in the preceding example that the numbers involved in the questions
(100, 145, 85, and 115) are precisely where µ, µ+3σ, µ–σ, and µ+σ are respectively
situated in the normal curve. Now, suppose there is a question such as “How many
students had an IQ above 120?”.

Page 22 of 27
Mathematics in the Modern World | 4. Data Management

We see that 120 lies somewhere in the interval (µ+σ, µ+2σ), that is (115, 130).
In cases such as this, the z-distribution comes in.

The z-distribution is basically a standardized version of the normal distribution,


hence called the Standard Normal Distribution. With the aid of Calculus and
Probability, mathematicians and statisticians determined the percentages of the areas
of various intervals under the normal curve with respect to the area of the entire bell
figure. To achieve this, it was necessary to convert every data point to its equivalent
z-score by the formula

This resulted into a normal distribution whose mean is 0 and standard deviation 1, as
illustrated in the following z-curve.

–3 –2 –1 0 1 2 3
z-score

Illustration 1: In the preceding example about IQ Test of 9800 students whose µ = 100
and σ = 15, a score of 120 corresponds to a z-score of
̅

For various z-scores, the following z-tables summarize the areas under the curve
as compared to the entire area which is taken to be 1. A z-table , also called the
standard normal table, is a statistical table that allows us to know the percentage or
proportion of values below (or to the left) of a z-score in a standard normal
distribution. There are two z-table, negative z-table for negative z-score and positive z-
table for positive z-score.

Page 23 of 27
Mathematics in the Modern World | 4. Data Management

Table 1. Negative z-table. STANDARD NORMAL DISTRIBUTION (Source: Consumer Dummies)

Page 24 of 27
Mathematics in the Modern World | 4. Data Management

Table 2. Positive z-table. STANDARD NORMAL DISTRIBUTION (Source: Consumer Dummies)

Page 25 of 27
Mathematics in the Modern World | 4. Data Management

How to use z- table?


i. Compute for the z-score and round it off to two decimal places.
ii. Based on the computed z-score, use its corresponding z-table. Negative z- table
for negative z-score, while positive z- table for positive z-score. The z- table is
composed of rows and columns, the rows represent the whole number and the
first decimal of the z-score, and the columns represent the second decimal of
the z-score.
iii. Look for the intersection of the row and column that corresponds to the
computed z-score. The value in the intersection represents the portion or
percentage that falls below (or from the left) of the given .

Illustration: In the Illustration 1, we calculated that a score of 120 corresponds to a


z-score of 1.33. The z-table gives us 0.9082, it implies that 0.9082 or 90.82%
has a score below 120.

Example 2: In the recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ
A: a) above 100 b) above 145 c) between 85 and 115
B: a) above 120 b) less than 90 c) between 80 and 130

Solution:
The solutions for the A problems have been earlier found in Example 1 where it
wasn‟t necessary to use z-scores. We do them here again using z-scores.

a) = 100 b) = 145 c) = 85 x = 115

= -1.00 = 1.00
= 0.00 = 3.00
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.5000 0.9987 0.1587 0.8413
Below z = 0 is 0.5000, it Below z = 3 implies that Below z=-1 is 0.1587 and below z=1
means that above z = 0 is above z = 3 must be is 0.8413, to get the area or
also 0.5000, since 1– 0.9987 or 0.0013. percentage between –1 < z < 1 we
1-0.5000=0.5000. need to get the difference,
0.8413-0.1587=0.6826.
So, there are So, there are
(0.0500)(9800) or 4,900 (0.0013)(9800) or 12 So, there are (0.6826)(9800) or
students. students. 6,689 students.

Compare these results with the earlier solution.

Page 26 of 27
Mathematics in the Modern World | 4. Data Management

Similarly now for the B problems,


a) = 120 b) = 90 c) = 80 x = 130

= -1.33 = 2.00
= 1.33 = -0.67
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.9082 0.2514 0.0918 0.9772
Above z = is Below z = is –1.33 < z < 2 has the area
1 - 0.9082 = 0.0918. 0.2514. 0.9772 – 0.0918 or 0.8854.

So, there are So, there are So, there are (0.8854)(9800) or
(0.0918)(9800) or 899 (0.2514)(9800) or 2,463 8,676 students.
students. students.

Example 3: The times taken to answer a mathematics exam have a normal distribution
with a mean of 65 minutes and standard deviation of 5 minutes. There are 200
students who took the exam.
a. How many examinees finished their exam in less than 1 hour?
b. How many examinees finished their exam in 63 to 72 minutes?
c. If the exam is good only for 75 minutes, how many examinees failed to finish the
exam on the given time limit?

Solution: Given: 65 and σ 5. Let x be the time taken to answer the exam.
a. Consider below x 60, we convert 1 hour to minutes because and σ is in
terms of minutes.
60-65
z - .00
5
 Using the z-table, below z=-1.00 is 0.1587.
 Hence, (0.1587)(200) or 32 examinees finished the exam in less than an hour.

b. Consider between x 6 and x = 72.


6 -65 7 -65
z -0.40 and z .40
5 5
 Using the z-table, below z = -0.40 is 0.3446 and below z = 1.40 is 0.9192.
 It implies that the portion between z = -0.40 and z = 1.40 is
0.9192 – 0.3446 = 0.5746
 So, (0.5746)(200) or 114 examinees finished the exam in 63 to 72 minutes.

c. Examinees who failed to finish the exam are those whose time is above x = 75.
75-65
z .00
5
 Using the z-table, below z = 2.00 is 0.9772.
 It implies that above z = 2.00 is 1 – 0.9772 = 0.0228
 (0.0228)(200) or 2 examinees failed to finish the exam within the time limit.

Page 27 of 27

You might also like