Algebra Basic To Higher
Algebra Basic To Higher
Math 135:
Algebraic Reasoning
for Teaching Mathematics
Steffen Lempp
Department of Mathematics, University of Wisconsin,
Madison, WI 53706-1388, USA
E-mail address: [email protected]
URL: http://www.math.wisc.edu/~lempp
Key words and phrases. algebra, equation, function, inequality, linear
function, quadratic function, exponential function, student
misconception, middle school mathematics
These are the lecture notes for the course Math 135:
“Algebraic Reasoning for Teaching Mathematics”
taught at UW-Madison in spring 2008.
The preparation of these lecture notes was partially supported
by a faculty development grant of the College of Letters and Science
and by summer support by the School of Education,
both of the University of Wisconsin-Madison.
Feedback, comments, and corrections are welcome!
This text is currently in a constant state of change,
so please excuse what I am sure will be numerous errors and typos!
Thanks to Dan McGinn and my Math 135 students in spring 2008
(Rachel Burgan, Erika Calhoon, Lindsay Feest, Amanda Greisch,
Celia Hagar, Brittany Hughes, and Shannon Osgar)
for helpful comments and corrections.
© 2008 Steffen Lempp, University of Wisconsin-Madison
Preface vi
Introduction to the Instructor viii
Introduction to the Student x
These lecture notes first review the laws of arithmetic and discuss
the role of letters in algebra, and then focus on linear, quadratic and ex-
ponential equations, inequalities and functions. But rather than simply
reviewing the algebra your students will have already learned in high
school, these notes go beyond and study in depth the concepts un-
derlying algebra, emphasizing the fact that there are very few basic
underlying idea in algebra, which “explain” everything else there is to
know about algebra: These ideas center around
• the rules of arithmetic (more precisely, the ordered field ax-
ioms), which carry over from numbers to general algebraic
expressions, and
• the rules for manipulating equations and inequalities.
These basic underlying facts are contained in the few “Propositions”
sprinkled throughout the lecture notes.
These notes, however, attempt to not just “cover” the material of
algebra, but to put it into the right context for teaching algebra, by
focusing on how real-life problems lead to algebraic problems, multi-
ple abstract representations of the same mathematical problem, and
typical student misconceptions and errors and their likely underlying
causes.
These notes just provide a bare-bones guide through an algebra
course for future middle school teachers. It is important to combine
them with a lot of in-class discussion of the topics, and especially with
actual algebra problems from school books in order to generate these
discussions. For these, we recommend using the following Singapore
math schoolbooks:
• Primary Math Textbook 5A (U.S. Edition),
• Primary Math Textbook 6A (U.S. Edition),
• New Elementary Math Textbook 1 (Syllabus D),
• New Elementary Math Textbook 2 (Syllabus D),
• New Elementary Math Textbook 3A (Syllabus D), and
• New Syllabus Additional Mathematics Textbook.
viii
INTRODUCTION TO THE INSTRUCTOR ix
Only a few pages of the first and last of these books need to be used;
the other four books should be used more extensively. The course web
page http://www.math.wisc.edu/~lempp/teach/135.html gives an
idea of how to integrate these various components.
We finally refer to Milgram [2, Chapter 8] and Wu [6] for similar
treatments of algebra, and to Papick [3] and Szydlik/Koker [5] for
very different approaches. A very careful and thorough introduction to
algebra can also be found in Gross [1].
Introduction to the Student
numbers. So the above examples turn into the following general rules:
(1.7) a+b=b+a
(1.8) a+0=a
(1.9) a×b=b×a
(1.10) a×0=0
(1.11) a×1=a
(1.12) a × (b + c) = a × b + a × c
Two related issues arise now:
(1) For which numbers a, b, and c do these rules hold?
(2) How do we know these rules are true for all numbers a, b,
and c?
In elementary school, these rules can be “proved” visually for whole
numbers by coming up with models for addition and multiplication. A
common model for the addition of whole numbers a and b is to draw
a many “objects” (e.g., apples) in a row, followed by b many objects in
the same row, and then counting how many objects you have in total.
Similarly, a common model for the multiplication of whole numbers a
and b is to draw a rectangular “array” of a many rows of b many objects
(e.g., apples), and then counting how many objects you have in total.
All the above rules are now easily visualized. (E.g., see Figure 1.1 for
a visual proof of (1.1) and Figure 1.2 for a visual proof of (1.3).)
2 7
7 2
Figure 1.1. A visual proof that 2 + 7 = 7 + 2
3
5
for each to convince yourself that they are true. For (1.14), you will
have to go to a three-dimensional array.)
Exercise 1.1. Give pictures for each of (1.8) and (1.10)–(1.14) in
the whole numbers.
1.1.2. Subtraction and Division. At first, students are given
problems of the type 2 + 5 = or 3 × 4 = , asking them to fill in
the answer. A natural extension of this is to then assign problems of
the form 2 + = 7 or 3 × = 12, asking them to fill in the missing
summand or factor, respectively.
Let’s look at the problem of the “missing summand” first: Solving
2 + = 7 corresponds, e.g., to the following word problem:
“Ann has two apples. Her mother gives her some more
apples, and now she has seven apples. How many
apples did her mother give her?”
Students will quickly learn that this requires the operation of sub-
traction: Ann’s mother gave her 7−2, that is 5, apples. More generally,
the solution to a + = c is c − a. But then a problem arises: What
if a > c?1 Obviously, you now have to go from the “objects” model
to a more “abstract” model, e.g., the money model with debt: If you
have $2 and spend $5, then you are $3 “in debt”, i.e., you have −$3
dollars. (It is a good idea to emphasize the distinction in the use of the
symbol − in these two cases by reading 2 − 5 as “two minus five”, but
−3 as “negative three”.) So allowing subtraction for arbitrary numbers
forces us to allow a “new” kind of numbers, the negative numbers.
So we proceed from the whole numbers 0, 1, 2, . . . to the integers
0, 1, −1, 2, −2, . . . . You will need to allow your students to adjust to
this change, since now your answer to a question like “Can I subtract a
larger number from a smaller number?” will change, possibly confusing
your students. Of course, one should never answer “no” to this question
without some qualification; some students will know about the negative
numbers at an early age (especially in Wisconsin in the winter!). But
now the correct answer should be “it depends”, namely, it depends
on which number system we work in. At this point, you will want to
extend the number line to the left also (see Figure 1.3, it can also be
used to include fractions and negative fractions).
But this transition from the whole numbers to the integers imme-
diately raises three new questions about the arithmetical operations:
1Ican’t resist at this point to tell you my favorite German math teacher joke:
“What does a math teacher think if two students are in the class room and five
students leave? – If three students come back, the room is empty!”
1.1. ARITHMETIC IN THE WHOLE NUMBERS AND BEYOND: A REVIEW 5
-3 -2 -1 0 ½ 1 2 3
This then also gives us an answer for item (3) above: We don’t have
to worry about new rules for subtraction, since we can use the old rules
for addition: It is fairly clear that
a × (b − c) = a × b − a × c
since this is simply
a × (b + (−c)) = a × b + (−(a × c))
(and since −(a × c) = a × (−c)).
On the other hand, in general a − b = b − a is false since it would
mean a + (−b) = (−a) + b. Similarly, a − (b − c) = (a − b) − c is false.
(How would that be rewritten?)
Let’s move on to division: Again, division first comes up when a
student transitions from solving 3 × 4 = to solving 3 × = 12, and
solving for the missing factor corresponds, e.g., to the following word
problem:
“Ann’s mother has twelve apples. If she wants to give
each of her three children the same number of apples,
how many will each child get?”
Again, solving this problem amounts to dividing: The solution to
3× = 12 is 12÷3, or 4. More generally, the solution to a× = c is c÷a.
But a similar problem as for subtraction arises: Here, a may not “evenly
divide” into c, e.g., in the above problem, when Ann’s mother has 13
apples. Here is where it becomes really confusing to your students:
There are two different ways in which school mathematics solves this
problem: division with remainder, and extending to fractions. In the
above example, this would mean that Ann’s mother could either give
four apples each to her three children and keep a “remainder” of one
apple, or that she could cut one apple into thirds and give each child
4 13 apples.2 Since both of these solutions are acceptable, you have
to make it clear to your students which solution you are aiming for.
(Generally, in the higher grades, you will want the second solution,
but division with remainder never disappears completely from your
students’ mathematics: In algebra and calculus, there is division of
polynomials with remainder, as we will see late in this course!)
In any case, we will for now focus on extending our numbers so
that we can always divide c ÷ a “evenly”. (Unless a = 0: Then it
2There is one other model of division: “Ann’s mother has 12 apples and wants
to give three apples to each of her children. How many children does she have?” If
you now increase the number of apples to 13, then only one kind of division makes
sense since cutting up children is generally frowned upon... So be careful before
you start a problem and want to vary it later!
1.1. ARITHMETIC IN THE WHOLE NUMBERS AND BEYOND: A REVIEW 7
1
(1.16) a× =1
a
This is now a new rule for multiplication; and it works not only when a
is a positive whole number but indeed for any nonzero rational number.
And again, this removes the need for new rules for division since we
can reduce division to multiplication by defining a ÷ b as a × 1b . (See
Exercise 1.4.)
We summarize all the rules about the arithmetical operations of
number which we have discovered in the following
Proposition 1.2. The following rules hold for all real numbers a,
b and c:
Ryan
$410
Juan
? $100
For (1), you could tell me that Ann had 19 − (5 × b) many apples at
the beginning, where b is the number of apples Mary gives her each day.
So in this case, we treat a as an unknown and b as a parameter, and we
express the solution (i.e., in our case, a) in terms of the parameter b.
Why on earth would I want to know this? Well, I can now solve the
problem as soon as you tell me b, and I can solve it very quickly if you
want to know a for different values of b. So, at the beginning, Ann has
four apples if Mary gives her three each day, but Ann has nine apples
if Mary gives her only two each day.
On the other hand, for (2), you could tell me that Mary gives Ann
19−a
5
many apples each day, where a is the number of apples Ann has at
the beginning. So in this case, the roles of a and b are reversed: Now a
is the parameter, and b is the unknown. And the solution b is given in
terms of a.
It is this switch in the roles of letters (as unknowns or parameters)
which is very confusing to students, so when you reach this stage, you
need to make this very clear and explicit.
Note that the distinction between a letter as a variable and a letter
as a parameter is somewhat fluid, but roughly the following: A variable
is a number we don’t know and want to leave unspecified. A parameter
is a specific number which we know but which we have not yet specified.
So, in item (1) above, we can think of b as a specific number of apples
Mary gives Ann every day. (Then b is a parameter.3) Or we can
think of b as an unspecified number for which we want to study how
a = 19 − (5 × b) (as a dependent variable) depends on the independent
variable b. Thus the distinction of the role of a letter as a parameter
or as a variable very much depends on the context.
1.2.1.4. Letters as Constants: This last use of letters is rather ob-
vious in general: There are certain letters which are used for fixed
numbers. The most common one is π, defined as the ratio of the cir-
cumference of a circle divided by its diameter. This is an irrational
number and thus a non-repeating decimal, and there is no way to write
it down “easily” using integers, fractions, or even roots. (The values
3.14, 227
, or 3.14159 are only approximations to its true value.) Another
well-known constant in calculus is the number e, the infinite sum
1 1 1
1+ + + + ...
1 1·2 1·2·3
which is again an irrational number and equals approximately 2.718.
It is very useful in calculus but not really before. A third well-known
3E.g., b could be some really complicated number which, if plugged in now,
would really make our computation very complicated.
12 1. NUMBERS AND EQUATIONS: REVIEW AND BASICS
number of the complex number i, the square root of −1. (This number
is used to allow us to take the square root of negative numbers and
is thus not a “real number”, i.e., a number which we can write as a
decimal or which we can identify with a point on the number line.) In
physics, etc., there are other constants, such as g (for the gravitational
constant). The only thing to remember about these constants is that
one should not use the same letters for variables, unknowns or param-
eters when these constants are present in the context of a problem.
(And the letter π should probably never ever be used in any meaning
other than the usual one.)
Of course, the last step uses our identities (1.8) and (1.17). But the
main step in combining like terms consists in the use of the distributive
law (1.12) in the first equality sign of (1.21)!
1.4.3. Removing Parentheses. The other common step in sim-
plifying algebraic expressions is “removing” parentheses when adding
or subtracting several terms. Here are two typical examples:
(1.22) (a − b) + (c − d) = a − b + c − d
(1.23) (a − b) − (c − d) = a − b − c + d
The first identity (1.22) is pretty simple to justify using the defini-
tion of subtraction and our identities (1.7) and (1.13):
(a−b)+(c−d) = (a+(−b))+(c+(−d)) = a+(−b)+c+(−d) = a−b+c−d
The second identity (1.23) is one of the most common sources of errors
in algebra; it requires lots of practice, and a good justification. Here it
is:
(a − b) − (c − d)
= (a − b) + [−(c + (−d))] (by definition of subtraction)
= (a − b) + [(−1)(c + (−1)d)] (by (1.17))
= (a − b) + [(−1)c + (−1)2 d] (by the distributive law)
= (a − b) + ((−c) + 1d) (by (1.17) and arithmetic)
=a−b−c+d (by (1.7), (1.13) and (1.8))
As you can see, this justification, even for a “small” expression like
(a − b) − (c − d) , can become quite tricky, and so it is no wonder that
removing parentheses after a minus sign is so hard for students!
only state “properties” of sets.) In any case, let’s fix some notation
about sets. (We will restrict ourselves to sets of numbers in the follow-
ing. Not all of the notation introduced here will be used right away; so
consider this subsection more of a reference for later on!)
The simplest example of a set is a finite set, and we can simply list
all its elements. So the set containing exactly the numbers 0, 2, and 5
would be written as {0, 2, 5}. A special case of a finite set is the empty
set containing no elements at all, usually written as {} or ∅. We write
n ∈ S to denote that a number n is a member, or an element, of a
set S.
Infinite or very large sets cannot be written this way. (For very large
sets, this is at least not practical.) In that case, we will resort to using
“dots” and write, e.g., {0, 1, . . . , 100} for the set of whole numbers less
than or equal to 100. For the infinite set of all whole numbers, we can
use notation like {0, 1, 2, . . . }, etc.
Some sets of numbers are so common that one fixes standard nota-
tion: We denote
• by N the set of all whole numbers 0, 1, 2, . . . ;
• by Z the set of all integers 0, 1, −1, 2, −2, . . . ;
• by Q the set of all rational numbers (i.e., all numbers of the
form ab where a, b ∈ Z and b 6= 0), this set is also the set of all
real numbers which can be written as repeating decimals;
• by R the set of all real numbers (i.e., all numbers which can
be written as decimals, possibly non-repeating); and
• by C the set of all complex numbers. (We will introduce this
set in detail later on.)
(Here, N stands for “natural” number, which is what mathematicians
usually call a whole number; Z stands for the German word “Zahl”,
meaning “number”; and Q stands for the word “quotient”.)
Now that we have all this, we can use the “set-builder” notation
{. . . | . . .} to define new sets. E.g., the set of all positive integers can
be written as {n ∈ Z | n > 0}; and the set of all real numbers less
than 2 can be written as {r ∈ R | r < 2}.
Furthermore, later on, it will be very convenient to have notation
for intervals of real numbers, i.e., sets of real numbers less than, greater
than, or between fixed real numbers. Given real numbers a and b, we
use
• [a, b] for the set {r ∈ R | a ≤ r ≤ b};
• (a, b) for the set {r ∈ R | a < r < b};
• [a, ∞) for the set {r ∈ R | a ≤ r}; and
• (−∞, b] for the set {r ∈ R | r ≤ b}.
1.5. CONVENTIONS IN ALGEBRA 17
The sets [a, b), (a, b], (a, ∞), and (−∞, b) are defined similarly. (Note
that the symbols “∞” and “−∞” do not denote real numbers but are
merely used for convenience. Therefore, there can be no sets of the
form (a, ∞], [a, ∞], [−∞, b), or [−∞, b].)
Finally, we will sometimes use convenient notation to combine sets.
Given two sets A and B of numbers, we let
• A ∪ B = {n | n ∈ A or n ∈ B} be the union of the sets A
and B;
• A ∩ B = {n | n ∈ A and n ∈ B} be the intersection of the
sets A and B; and
• A−B = {n | n ∈ A and n ∈ / B} be the difference of the sets A
and B.
For example, if A = {0, 1, 2} and B = {0, 2, 4}, then
A ∪ B = {0, 1, 2, 4}
A ∩ B = {0, 2}
A − B = {1}
1.5.2. Order of Operations and Use of Parentheses. While
the last subsection may have been rough going for you, very quickly
defining a lot of notation which may be unfamiliar to you, this sub-
section will be fairly smooth sailing: We need to define, for numerical
or algebraic expressions, in what order the operations are to be per-
formed; and most of this will already be familiar to you from school,
even though you may never have thought about it this carefully and
explicitly. Note that everything we define in this subsection is only
“convention”, i.e., we arbitrarily define the order of operations a cer-
tain way to make things more convenient; we could have defined it
completely differently, but it’s certainly more convenient to have con-
ventions everyone agrees with!
So suppose you want to write an expression denoting the sum of
the difference of a and the product of 4 and b, and the difference of 2
and the quotient of 5 and c. (Note that it’s hard for anyone to even
comprehend what I just wrote here, so we desperately need a better
way to write expressions!) In any case, we could simply write this as
(1.24) (a − (4 × b)) + (2 − (5 ÷ c))
since the first rule on the order of operations is certainly that anything
inside parentheses must be evaluated first; and if parentheses are nested
one inside the other, then we start from the inside out.
But this expression is ridiculously redundant, of course, as you have
surely noticed; we have literally cluttered it with many unnecessary
18 1. NUMBERS AND EQUATIONS: REVIEW AND BASICS
we first add 2 and 3 and then take the square root. For exponentiation
(a fancy word for “taking the power of”), we agree that the power
is taken before any other operation is performed (unless parentheses
are involved). So, e.g., a + b2 means that we first square b and then
add a, whereas (a + b)2 means that we first add a and b and then
square. (A word on how to read these aloud: The first is usually read
as “a plus b squared”, whereas the latter is usually read as “a plus b
quantity squared”.)
We summarize the rules on the order of operations as follows:
(1) Any operation inside parentheses is performed before any op-
eration outside them. If nested parentheses are involved, we
A
perform the operations from the inside out. Any fraction B
(for
√ expressions A and B) is taken to be (A) ÷ (B). Any root
n 1
A (for an expression A) is taken to be (A) n .
(2) If parentheses do not determine the order between operations
then we first take powers, then perform any multiplications
and divisions in order from left to right, and then any additions
and subtractions in order from left to right.
Clearly, these conventions require quite a bit of practice before a
student becomes proficient at them!
let’s give the answer and ask for one of the things in the question; let’s
solve 4 + (5 · ) = 19. The corresponding word problem now becomes
“Ann has four apples. Her friend Mary gives her the
same number of apples each day for a (five-day) week.
At the end of the week, she has 19 apples. How many
apples did Mary give Ann each day?”
Why do I call this “undoing an expression”? Let b be the number
of apples Mary gives Ann each day, and c the number of apples Ann
has at the end. Then we have the equation 4 + (5 · b) = c. The first
problem gives b and asks for c, which amounts to simply evaluating
the expression 4 + (5 · b) when b = 3. The second problem gives c and
asks for b. This is very different: It amounts to “solving” the equation
4 + (5 · b) = 19, i.e., “undoing” (rather than evaluating) the expression
4 + (5 · b) to find out for which b it evaluates to 19. In this case, the
“undoing” goes in two steps:
4 + (5 · b) = 19 turns into 5 · b = 19 − 4
5 · b = 19 − 4 turns into b = (19 − 4) ÷ 5
Now we have “undone” the expression 4+(5·b) and reduced the solution
to an arithmetic problem, namely, evaluating the numerical expression
(19 − 4) ÷ 5.
Many algebra problems we will encounter later on will be of this
nature. And, of course, we will encounter situations later on where it is
much harder to “undo” an expression, where there may be no solution
at all, or where there are several or possibly infinitely many solutions.
So let’s get started.
CHAPTER 2
Linear Equations
Note that the ratio notation can be taken further by relating not
just two quantities but several. E.g., let’s modify the ratio problem to
read:
“John has 18 apples, which are either green, red, or
yellow. For every green apple, he has three red apples
and two yellow apples. How many red apples does he
have?”
Now the ratio of the number of green apples to the number of red
apples to the number of yellow apples is given as 1 : 3 : 2 (and we can
still solve for the number of red apples as being 9).
The concept of ratio directly leads to the concept of proportional
reasoning. Let’s look at the following table for the original version of
our problem:
Table 2.1. The number of green and red apples vs. all apples
and eleven red apples. Then the new ratio of the number of green
apples to the number of red apples is 5 : 11, which is different from the
old ratio 1 : 3.
Similar examples can be found in many other contexts, e.g., in
recipes, which call for the ingredients to be in a certain ratio (such as
two cups of flour to one cup of sugar, etc.).
Percentage (from the Latin “per centum”, i.e., per one hundred) is
a special way to express a ratio, where one thinks of one quantity in
a ratio as “the whole” and sets that number equal to 100; the other
quantities are then expressed in proportion to that number. E.g., in
our first word problem, the ratio of the number of green apples to the
number of red apples to the number of all apples is 25 : 75 : 100, and
so we can say that there are 25% green apples and 75% red apples.
Usually, the “whole” is the largest of the numbers (and the others are
considered “parts” of it), but this need not always be the case.
Exercise 2.1. Think of several everyday examples of ratio and
percentage where the largest number is not considered as 100%.
2.2. Velocity and Rate of Change
Another common “linear” phenomenon is velocity:
“A train moves at a constant velocity of 120 km/h.
How far will the train travel in half an hour? In two
hours? In 3 12 hours?”
Again we can make up a table:
1
time traveled (in hours) 0 2
1 2 3 3 12
distance traveled (in km) 0 60 120 240 360 420
First a brief comment: Why didn’t I just call 120 km/h the speed
of the train rather than its velocity? There is a small but important
difference between these, and we’ll get to it in more detail later on. For
now, let’s just remember that velocity can be negative (when the train
moves backward), whereas speed is always positive (or zero, when the
train is not moving). So velocity takes into account the direction (in
one dimension, say, along a rail line), whereas speed does not.
In some sense, the above velocity problem is again a ratio problem,
since time and distance traveled are in ratio 1 : 120. But that is not how
24 2. LINEAR EQUATIONS
distance
(2.1) velocity =
time
Now Table 2.2 tells us that this quotient is fixed for any time.
(However, since time and distance are measured not only in whole
numbers, we don’t have the property of Table 2.1 any more that as we
move along a row, the numbers change by a fixed amount, since we
have deliberately added some half-hour intervals.)
A quotient as in (2.1) is just one example of a rate of change. And,
of course, a rate of change need not be constant at all: The train might
speed up or slow down. For now, however, we will concentrate on the
case of a constant rate of change. Examples of a constant rate of change
abound in everyday situations:
(1) the cost of (loose) fruit and vegetables at the grocery store is
computed by the weight (say, by the kilogram), so there is a
fixed cost per kilogram;
(2) the cost of electricity you use is computed by the kWh (kilo-
watt hour), so there is a fixed cost per hour;
(3) the water volume of a swimming pool usually increases at a
constant rate when it is filled (letting the time start when the
pool starts filling up); etc.
In each case, we can identify a constant quotient. Let’s make this
relationship more formal: Usually, there is one independent quantity
(often called x) and one dependent quantity (often called y) depending
on the independent quantity. In the train example, x is the time trav-
eled and y is the distance traveled. In the above examples (1)–(3), x is
the weight, the electricity used (more precisely, the electrical energy in
kWh), and the time since the pool starts filling up, respectively. And y
is the cost (in both (1) and (2)) and the water volume, respectively.
Since we are assuming that the ratio (or quotient) between these
two quantities x and y is constant, we’ll denote this ratio by k for now;
2.2. VELOCITY AND RATE OF CHANGE 25
we thus have
y
(2.2) k=
x
for any value of x and any “corresponding” y. In the train example, k is
the velocity; in the other three examples, it is the cost per kilogram,
the cost per kWh, and the increase in water volume (per time unit,
say, per minute), respectively.
We can now rewrite (2.2) as
(2.3) y =k·x
to make the dependence of y on x more explicit. For our train example,
switching to the more common notation of t for time, v for velocity,
and s for distance, we arrive at
(2.4) s=v·t
The equation (2.4) is just one way to represent this situation. An-
other representation is via a table as in Table 2.2. A third representa-
tion is visual via a graph as in Figure 2.2.
number of minutes 1 2 3 4 5 6 7 8 9 10
cost in cents 50 60 70 80 90 100 110 120 130 140
(1) We still have the property as in Table 2.1 that each row in-
creases by a fixed amount (by 1 and 10, respectively).
(2) We no longer have the properties from Tables 2.1 and 2.2 that
each column has numbers in the same ratio: E.g., the ratios 1 :
50 and 2 : 60 from the second and third column are different.
(3) As in Table 2.1, but unlike in Table 2.2, it doesn’t make “real-
world” sense to look at the cost of a phone call lasting 0 min-
utes.
(4) As in Table 2.1, but unlike in Table 2.2, it doesn’t make sense
to look at the cost of a phone call lasting a number of minutes
which is not a whole number (such as half a minute). We’re
assuming here that the phone company charges the same for
a “fractional” minute (i.e., a time interval lasting less than a
minute) as for a whole minute. So, e.g., a phone call lasting 2
minutes and 1 second, and another call lasting 2 minutes and
59 seconds, will both be billed as a 3-minute call by the phone
company.
As you can see, there are a whole lot of new issues coming up, and
we need to separate them out one by one. We’ll defer the issue of
whole-number minutes in item (4) until Section 4.3.1; we’ll assume for
now that we just “round up” and so consider only whole numbers.
The issue in item (3) is fairly easy to resolve if we just look for
the “pattern” in Table 2.3: We could simply assume that the phone
company charges 40¢ for a 0-minute phone call. This is, of course, not
how this works out in the real world, but it makes the table follow the
pattern noted in item (1).
So we are down to considering the issues from items (1) and (2).
Let’s look at the graph for the cost of the phone call (see Figure 2.3).
Note that in light of item (3) above, I didn’t draw this graph as a line
but only used dots for positive whole-number values of the number of
minutes.
What we would like, of course, is an equation describing this graph.
Let’s denote by t the number of minutes of the phone call, and by c the
cost of the phone call in cents. Using the information from the original
problem, we notice that t − 1 is the number of additional minutes, and
28 2. LINEAR EQUATIONS
5m + 3
gives the mass of the ball in grams, since the volume of the nickel must
be 100 − V in cm3 , and so, by proportional reasoning, the mass of the
iron and the nickel in the alloy is 7.9V and 8.9(100 − V ) in grams,
respectively.1
1Actually, the fact that iron and nickel form an alloy here changes the specific
density slightly, but not enough to affect the computation enough to matter.
30 2. LINEAR EQUATIONS
We now compute the ratio (i.e., the quotient) of the y-change over
the x-change as
y2 − y1 (mx2 + b) − (mx1 + b)
=
x2 − x1 x2 − x1
mx2 − mx1
=
x2 − x1
m(x2 − x1 )
=
x2 − x1
=m
Therefore, this ratio does not depend on the choice of the points A
and B on the graph of the equation (2.8) and in fact always equals the
parameter m.
Now let’s pick three distinct points A, B, and B 0 on the graph
(and assume they are arranged in this order from left to right, and
that m > 0 so that B 0 and B both lie “above” A in the coordinate
plane). In order to show that the graph of our equation really does
form a line, we need to show that these three points are collinear. (For
an example, see Figure 2.7 for the graph of the equation y = 12 x + 12
with the points A, B, and B 0 , on the graph. Don’t be fooled by the
picture, however! We haven’t shown yet that the three points A, B
and B 0 lie on a line! So far, we don’t know that yet!) If we let the
point C (C 0 , respectively) be the point where the vertical line through B
2.4. LINE EQUATIONS 33
crosses the y-axis. Therefore, b is called the y-intercept of the line, and
the equation
y = mx + b
is called the slope-intercept form of the line.
equations (2.10) and (2.11) to obtain the two-point form of a line equa-
tion:
y − y0 y1 − y0
(2.12) =
x − x0 x1 − x0
Again, with a little bit of algebra, this can be rewritten as
y1 − y0
y − y0 = (x − x0 )
x1 − x0
or, in slope-intercept form, as
y1 − y0 y1 − y0
y= x + (− x 0 + y0 )
x1 − x 0 x1 − x0
And again, of course, in this last form, this is basically the same as
the slope-intercept form, except that now both parameters m and b
are very complicated. (Note that in equations (2.10)–(2.12), not only
the letter m, but also the letters x0 , x1 , y0 , and y1 are parameters, and
only the letters x and y are variables, something which many students
will find highly mystifying!)
The above three forms of the line equation all apply exactly to all
non-vertical lines. There is a fourth and more general form of the line
equation, which also applies to vertical lines and, not surprisingly, is
called the general form of the line equation:
(2.13) Ax + By + C = 0
Here A, B, and C are parameters, i.e., fixed numbers, and x and y are
variables, namely, the x- and y-coordinates of points in the coordinate
plane.
We will tacitly assume that at least one of A and B is nonzero,
since otherwise the equation (2.13) degenerates to the trivial equation
C = 0, which is either true or false, independent of the values of the
variables x and y, and thus does not describe a line. (The equation
C = 0 either describes the whole plane, if the parameter C is zero,
or the empty set otherwise.3) On the other hand, if at least one of A
and B is nonzero, then the equation (2.13) really does describe a line,
as we will now see.
We consider four cases for the general line equation (2.13) sepa-
rately. (Here, the third case will not exclude the first two cases!)
3Note that in this equation C = 0, the letter C is the parameter and not the
unknown! The unknowns, or variables, in C = 0 are x and y!
38 2. LINEAR EQUATIONS
setting x = 0 and and solving for y, and then setting y = 0 and solving
for x, respectively, it is not hard to see that the line contains the two
(distinct) points (− CA , 0) and (0, − B
C
) on the x- and y-axis, respectively.
So the x-intercept of the line is − B , and its y-intercept is − CA . Since
C
x y
Figure 2.9. The line −2 + 1
= 1 with x-intercept a =
−2 and y-intercept b = 1
Finally, let’s look at the five different forms of a line equation in
terms of a “real-life” example. In each case, a car will cross a state
line (where the mile markers start at 0) and travel at a constant speed.
We use the independent variable t for the time elapsed in hours since
40 2. LINEAR EQUATIONS
noon, and the dependent variable s for the distance traveled in miles
starting at the state line (taking s to be negative before the car crosses
the state line). In all versions of this example, we will get an equation
for the same line, see Figure 2.10.
Slope-intercept form:
“A car travels at a constant speed of 60mph. At noon,
it has traveled 90 miles past the state line.”
Slope m = 60, s-intercept b = 90, equation
s = 60t + 90
Point-slope form:
“A car travels at a constant speed of 60mph. At
1 p.m., it has traveled 150 miles past the state line.”
Slope m = 60, point P0 = (1, 150), equation
s − 150
60 = or s − 150 = 60(t − 1)
t−1
Two-point form:
“A car travels at a constant speed. At 1 p.m., it has
traveled 150 miles past the state line; at 2:30 p.m., it
2.4. LINE EQUATIONS 41
More precisely, what we are saying is that the following three equa-
tions are equivalent (namely, have the same solutions in t):
(2.21) 50t + 1000 = 2500
(2.22) 50t = 1500
(2.23) t = 30
But why do we know that they are equivalent? Notice that we really
have to show two things:
(1) Any solution to equation (2.21) is also a solution to equa-
tion (2.23), and
(2) any solution to equation (2.23) is also a solution to equa-
tion (2.21).
Let’s address item (1) first: Equation (2.22) is obtained from equa-
tion (2.21) by subtracting 1000 “on both sides”. But, if we replace t
in equation (2.21) by any number which makes both sides of the equa-
tion equal, then this equality remains true if we subtract 1000 from
both sides. (After all, the two sides of equation (2.21) represent the
same number before the subtraction, so they must still represent the
same number after the subtraction!) The same is true in going from
equation (2.22) to equation (2.23): If we replace t in equation (2.22)
by any number which makes both sides of the equation equal, then this
equality remains true after we divide by 50 on both sides.
The argument for item (2) is similar, but with a crucial difference:
We first have to show that if we replace t in equation (2.23) by any
number which makes both sides of the equation equal, then this equal-
ity is true before we divide both sides by 50. There is only one way
to show this: Note that equation (2.22) can be obtained from equa-
tion (2.23) by multiplying both sides by 50! Now we can argue as for
item (1): If we replace t in equation (2.23) by any number which makes
both sides of the equation equal, then this equality remains true if we
multiply both sides by 50, and the resulting equation is equation (2.22).
A similar argument shows that any solution to equation (2.22) is a so-
lution to equation (2.21), since equation (2.21) can be obtained from
equation (2.22) by adding 1000 to both sides of the equation. Both
steps of the argument for item (2) can be viewed as “undoing” the
corresponding steps of the argument for item (1).
This leads us to two crucial observations about “solving equations”
in general:
Proposition 2.6. (1) Any solution to an equation A = B
(where A and B are algebraic expressions) remains a solution
2.5. SOLVING A LINEAR EQUATION IN ONE VARIABLE 45
Here are two fine points which you can ignore on a first reading,
but which are important in general:
Let’s first look at the caveat. Here are two very simple examples:
(1) If we add x1 to both sides of the equation x = 0, say, then the new
equation x+ x1 = x1 obviously does not have the same solutions as x = 0;
but the proposition still holds for all values of x at which x1 is defined,
namely, whenever x is nonzero: Both equations x = 0 and x + x1 = x1
are
√ false for all nonzero x, as the proposition states. (2) If we add
x to
√ both sides √ of the equation x = −1, say, then the new equation
x + x = −1 + x obviously does not have the same solutions√as
x = −1; but the proposition still holds for all values of x at which x
is defined,
√ √ whenever x is non-negative: Both x = −1 and
namely,
x + x = −1 + x are false for all non-negative x as the proposition
states.
As for the extension of Proposition 2.6 (2), here is a very simple ex-
ample: The equation x = 1 obviously does not have the same solutions
as the equations x2 = x (obtained by multiplying both sides by x).
However, if we only look at the solutions to both equations where x
46 2. LINEAR EQUATIONS
is nonzero, then they do indeed have the same solutions, namely, only
x = 1.
Now for the proof of Proposition 2.6: Adding to, subtracting from,
multiplying by a nonzero expression, or dividing by a nonzero expres-
sion, both sides of an equation “preserves” all solutions of the original
equation; and since these steps can be reversed by subtracting, adding,
dividing, and multiplying, respectively, we also don’t get any extra
solutions.
Notice that the crucial part in Proposition 2.6 is the “vice versa”
part: Not only do we “preserve” a solution by adding, subtracting,
multiplying or dividing both sides of an equation; we also don’t get
any new solution. Of course, in part (2) of Proposition 2.6, we defi-
nitely need the restriction that we don’t multiply or divide by 0 (or an
expression which could equal 0): Dividing by 0 is not allowed anyway,
and multiplying an equation A = B by 0 leads to the equation 0 = 0,
which almost always has more solutions than the original equation,
namely, any number is a “solution” to 0 = 0. A more sophisticated ex-
ample (and an example of a very common error) is to divide both sides
of the equation x2 = x by x to “arrive” at the equation x = 1. Clearly,
x may be zero, so this would not be allowed under Proposition 2.6,
and it is also easy to see that Proposition 2.6 fails in this example:
The equation x2 = x has two solutions, namely, 0 and 1, whereas the
equation x = 1 only has the solution 1. (However, as noted above, the
extension of our proposition still allows us to state that the nonzero
solutions of x2 = x and x = 1 agree!)
More generally, why am I putting so much emphasis on the “vice
versa” part? After all, it’s obvious that you shouldn’t multiply both
sides of an equation by 0 while solving an equation. The reason is that
this “vice versa” part will become much more subtle later on and leads
to frequent mistakes. For example, if we square an equation A = B,
then any solution to the original equation A = B is still a solution to
the equation A2 = B 2 after squaring, but not vice versa, as you can see
from the simple example x = 1: It has only one solution, namely, 1; but
the equation x2 = 1 (i.e., after squaring) has two solutions, namely, 1
and −1. So as we extend Proposition 2.6 later on (e.g., in Proposi-
tion 5.3) to allow more “operations” on equations, we always have to
make sure that the “vice versa” part also holds, since it can fail in
subtle ways!
Let’s now vary the pool problem a bit to arrive at the general form
of a linear equation:
2.5. SOLVING A LINEAR EQUATION IN ONE VARIABLE 47
intersect (as, e.g., in Figure 2.12, where one line is horizontal, or Fig-
ure 2.13, where neither line is horizontal).
Algebraically speaking, we first subtract (up to two times) to “get
all the x on one side” and “all the constants on the other side”. In order
to see what might happen, it’s easier to view this problem graphically
as finding the x-coordinate of the point where the two non-vertical
lines described by the equations in (2.28) intersect. There are now
three cases for the two lines described by the equations in (2.28):
Case 1: The lines intersect in a single point P , say: Then the two
lines must have different slope, i.e., m0 6= m1 . In that case, there is
exactly one solution to the equation (2.27), namely the x-coordinate
50 2. LINEAR EQUATIONS
and C = D.)
(2) Conversely, any simultaneous solutions of the equations A +
C = B +D and C = D (or A−C = B −D and C = D, respec-
tively) are also simultaneous solutions of the original equations
A = B and C = D. (Similarly, again for later reference, if C
and D are nonzero expressions, then any simultaneous solu-
tions of the equations AC = BD and C = D (or CA = D B
and
C = D, respectively) are also simultaneous solutions of the
original equations A = B and C = D.)
The proof of the part (1) of Proposition 2.8 is simple: If we replace
the variables in A, B, C, and D by any fixed numbers, then the truth
of the equations A = B and C = D with the variables replaced by
these fixed numbers implies the truth of the equations A + C = B + D,
A−C = B −D, AC = BD and CA = D B
. For part (2) of Proposition 2.8,
we have to work just a little harder. E.g., for the case A+C = B+D and
C = D, we use the fact that we can “reverse” the operation of “adding
equations” by “subtracting”: From A + C = B + D and C = D, we
obtain (A + C) − C = (B + D) − D, which simplifies to the desired
equation A = B. Similarly, for the case AC = BD and C = D, we use
the fact that we can “reverse” the operation of “multiplying equations”
by “dividing”: From AC = BD and C = D, we obtain AC C
= BDD
,
which simplifies to the desired equation A = B, given that C and D
are nonzero expressions.
56 2. LINEAR EQUATIONS
Let’s now return to our geometric intuition for solving two simul-
taneous linear equations in two unknowns again to see what else can
happen. We had said that, geometrically, solving two simultaneous
linear equations in two unknowns amounts to finding the point(s) of
intersection of the two lines described by the two equations. This is
easy by either of the two algebraic methods (substitution method or
elimination method) described above as long as the two lines are not
parallel and therefore intersect in exactly one point. Let’s vary our
original problem a bit as follows:
“I’m thinking of two numbers. The sum of the two
numbers is 12. The sum of twice each of the two
numbers exceeds 24 by 3. What are my numbers?”
We can quickly translate this problem into the following two simul-
taneous equations
x + y = 12
(2.31)
2x + 2y − 3 = 24
we quickly see that these lines are parallel and do not intersect; so
it is also geometrically clear that they cannot have any simultaneous
solutions.
Finally, it is possible that the two lines from two simultaneous linear
equations coincide, in which case there are infinitely many simultaneous
solutions. Here is a somewhat non-sensical example:
2.6. SOLVING LINEAR EQUATIONS IN TWO OR MORE VARIABLES 57
(3.3) a = b implies a + c = b + c
(3.4) a = b implies a − c = b − c
(3.5) c 6= 0 and a = b implies ac = bc
a b
(3.6) c 6= 0 and a = b implies =
c c
In fact, not only do these properties hold, but it’s also not hard to see
that their converses hold; i.e., the right-hand side of each statement im-
plies the left-hand side (always assuming that c 6= 0 in (3.5) and (3.6)).
This is since we can “undo” the addition in (3.3) using (3.4); the sub-
traction in (3.4) using (3.3); the multiplication in (3.5) using (3.6); and
the division in (3.6) using (3.5).
62 3. ORDER AND LINEAR INEQUALITIES
There are now the following similar properties linking the order <
of the numbers and the arithmetical operations:
(3.7) a < b implies a + c < b + c
(3.8) a < b implies a − c < b − c
(3.9) c > 0 and a < b implies ac < bc
a b
(3.10) c > 0 and a < b implies <
c c
Why do properties (3.7)–(3.10) hold? This is easy and intuitive to
see for properties (3.7)–(3.8): Adding or subtracting c to both sides of
an inequality does not change its truth. Similarly, for properties (3.9)–
(3.10), multiplying or dividing both sides of an inequality by a positive
number c does not change its truth.
There are also more visual proofs: The implications (3.7) and (3.8)
hold since we can think of them as amounting to “shifting” a and b on
the number line to the right or left by an equal distance, given by c.
The implications (3.9) and (3.10) hold since we can think of them as
amounting to “scaling” a and b on the number line by the positive
factor or divisor c.
Again, we have the converses of these properties (the “right-to-left”
direction of these implications), “undoing” the operations as for (3.3)–
(3.6), still under the assumption that c > 0 in (3.9) and (3.10).
We can also state similar properties and their converses using ≤,
except that we still assume c > 0 in (3.13) and (3.14):
(3.11) a ≤ b implies a + c ≤ b + c
(3.12) a ≤ b implies a − c ≤ b − c
(3.13) c > 0 and a ≤ b implies ac ≤ bc
a b
(3.14) c > 0 and a ≤ b implies ≤
c c
And again we have the converses of these properties (the “right-to-left”
direction of these implications), “undoing” the operations as for (3.3)–
(3.6), still under the assumption that c > 0 in (3.13) and (3.14).
But this leads to an obvious question, and one which is at the root of
many errors when solving inequalities: Why do we need the restriction
that c > 0 in properties (3.9)–(3.10) and (3.13)–(3.14)? We start with
the following easy but crucial observations:
(3.15) a < b implies − a > −b
(3.16) a ≤ b implies − a ≥ −b
3.2. ORDER AND THE ARITHMETICAL OPERATIONS 63
Note that (3.15) and (3.16) hold not only for positive numbers a and b,
but for 0 and negative numbers as well!
In order to see why (3.15) holds, we distinguish three cases:
Case 0 ≤ a < b: Then b lies to the right of a on the number line,
and both are ≥ 0. Since taking the negative of a number means to
reflect the points on the number line about 0, −b will now lie to the
left of a on the number line, and both are ≤ 0. (See Figure 3.1.)
-b -a 0 a b
a b 0 -b -a
Case a < 0 < b: Then b lies to the right of 0, and a to the left of 0,
on the number line. Since taking the negative of a number means to
reflect the points on the number line about 0, −b will now lie to the
left of 0, and −a to the right of 0, on the number line. (See Figure 3.3.)
-b a 0 -a b
Let’s denote the time (in hours, from when she starts walking) by t
and the distance from her grandmother’s house (in kilometers) by s.
Then we have the following equation for time and distance:
(3.22) s = −3t + 5
The problem tells us that this quantity is at most 2 at the end of the
walk, leading to the inequality
−3t + 5 ≤ 2
Now we can first subtract 5 from both sides (using (3.12)) to obtain
−3t ≤ −3
and then divide by −3 (using (3.20)) to arrive at the solution
(3.23) t≥1
Thus she will have walked at least one hour.
Note that solving linear inequalities is thus very similar to solving
linear equations, the one difference being that whenever we multiply or
divide by a negative number, we have to invert the inequality sign. This
is reflected in the following proposition, which closely follows Proposi-
tion 2.6:
Proposition 3.3. (1) Any solution to an inequality A < B
(where A and B are algebraic expressions) remains a solu-
tion if we add or subtract the same algebraic expression on
both sides of the inequality; and vice versa, any solution to
the inequality after adding or subtracting the same algebraic
expression is also a solution of the original inequality A < B.
(2) Any solution to an inequality A < B (where A and B are alge-
braic expressions) remains a solution if we multiply or divide
both sides of the inequality by the same positive algebraic ex-
pression; and vice versa, any solution to the inequality after
multiplying or dividing by the same positive algebraic expres-
sion is also a solution of the original inequality A < B. (Here,
an algebraic expression is positive if it is positive no matter
what numbers replace the letters in the expression.)
(3) Any solution to an inequality A < B (where A and B are al-
gebraic expressions) is a solution of the inequality AC > BC
(or CA > BC
) obtained by multiplying (or dividing, respectively)
both sides of the inequality by the same negative algebraic ex-
pression and switching the inequality sign; and vice versa, any
solution to the inequality AC > BC (or CA > B C
) after multi-
plying (or dividing, respectively) by the same negative algebraic
66 3. ORDER AND LINEAR INEQUALITIES
Now we can first subtract 5 from both sides (using (3.8)) to obtain
−3t < −6t + 3
then add 6t to both sides (using (3.7)) to obtain
3t < 3
and finally divide by 3 (using (3.10)) to arrive at the solution
(3.25) t<1
Thus she will be safe at any time less than one hour.
There are two other nice ways (in addition to (3.23) and (3.25)) to
represent the solutions to inequalities like (3.22) and (3.24):
• We can represent the solutions as intervals using the notation
from subsection 1.5.1: The solutions to (3.22) are all t ≥ 1,
and so the solution set can be written as [1, ∞). Similarly, the
solutions to (3.24) are all t < 1, and so the solution set can be
written as (−∞, 1).
• We can represent the solutions on the number line, by let-
ting the points corresponding to solutions be denoted by the
thicker part of the number line, and by using solid dots for
interval endpoints which are a solution, and circles for interval
endpoints which are not a solution. The solutions to (3.22)
and (3.24) on the number line are now pictured in Figures 3.4
and 3.5, respectively.
-3 -2 -1 0 1 2 3
Here is one final example, illustrating how more than one inequality
for a quantity works:
Little Red Riding Hood falls asleep in the forest, 5km
away from her grandmother’s house. The next morn-
ing, she walks toward her grandmother’s house at a
constant speed of 3km/h for some time. How long
68 3. ORDER AND LINEAR INEQUALITIES
-3 -2 -1 0 1 2 3
-3 -2 -1 0 2/3 1 2 3
Note that in this last example, the set of t which are not a solution
can be described by the statement
2 > −3t + 5 or 3 < −3t + 5
which is frequently (but incorrectly!) abbreviated as 3 < −3t + 5 <
2. This last statement makes no sense at all since it would imply
that 3 < 2. The “real” problem is that one cannot write the set of
“non”-solutions of (3.27) as a single interval; rather, it consists of two
intervals, namely, (−∞, 32 ) and (1, ∞). Using our set notation from
subsection 1.5.1, we can also write the set of “non”-solutions of (3.27)
as (−∞, 32 ) ∪ (1, ∞).
We will return to more complicated inequalities later on and then
discuss possible solution sets of inequalities in more detail. For now,
we turn to one common application of inequalities, estimation and
approximation.
3.4. Estimation and Approximation
In a “purely mathematical” setting, the number 2 means exactly
that: It’s the number 2, no more and no less. And writing it differently,
e.g., as 2.0 or 2.00, does not change that fact.
But in the “real world”, we can’t measure quantities exactly; we can
only measure them approximately. And there often is a “convention”
that 2 means 2 rounded to the nearest integer (i.e., 2 denotes a quan-
tity x which satisfies the inequality 1.5 ≤ x < 2.5), whereas 2.0 means 2
rounded to the nearest tenth (i.e., 2.0 denotes a quantity x which sat-
isfies the inequality 1.95 ≤ x < 2.05), and 2.00 means 2 rounded to the
nearest hundredth (i.e., 2.00 denotes a quantity x which satisfies the
inequality 1.995 ≤ x < 2.005).
A better and more precise way to write these would be to write
2 ± .5; 2 ± .05; and 2 ± .005, respectively. One advantage is that it
allows us to write things like 2 ± .1 (i.e., 2 “rounded to the closest
70 3. ORDER AND LINEAR INEQUALITIES
should never give more significant digits for the output than for the in-
puts: Under this rule of thumb, we would have computed the perimeter
as 27cm and the area as 44cm2 (rounding each to two significant dig-
its), which is not quite accurate by our computations above (since the
latter suggests that 26.5 ≤ p < 27.5 and 43.5 ≤ A < 44.5), but at least
it is “in the right ballpark” for the accuracy of the answer.
CHAPTER 4
but rather that, as a computer scientist might put it, we apply some
“rule” f to the input x in order to determine the output y.
The set of possible values for the quantity x is called the domain
of the function; it is determined either mathematically by when the
equation y = f (x) “makes sense” or “is defined”, or by the “real life”
context as to when it makes sense for the “real life” quantity x. The
critical feature of a function is, however, the fact that for each argu-
ment x in the domain of the function, there is exactly one value for y
such that y = f (x) holds, as already suggested by the way we write
this. But note that we could have written the relation between the
quantities x and y differently, and it wouldn’t have been so easy to
check whether the relation defines a function: E.g., the relation
(4.3) x − y2 = 0
can be expressed as
x = y2
and so we can think of it as describing a function for the quantity x
as y varies. (The fact that I “switched” the roles of x and y here is
deliberate: The choice of what to call the variables is also completely
arbitrary, and one should not always choose the same letter for the
independent and the dependent variable, respectively!) On the other
hand, (4.3) does in general not describe y as a function of x: E.g.,
setting x = 1, both y = 1 and y = −1 satisfy (4.3). However, we can
resolve this “problem” by restricting the allowable values of y: If we
specify in addition to (4.3) that y ≥ 0, then (4.3) can be rewritten as
√
y= x
and y is a function of x. All this should convince you that when a
relationship between quantities is specified, determining whether or
not this relationship can be described as one quantity being a function
of another may depend on additional constraints being imposed, and
sometimes this is needed in order to obtain a function. And, of course,
the “real life” context may impose such or other constraints.
So let’s look at the concept of a function more abstractly.1 We’ll
also introduce some more handy notation going along with functions.
1In these lecture notes, we’ll restrict ourselves to functions from numbers to
numbers. More generally, one can consider functions from any set of objects to
any other set of objects. E.g., one could define a function with domain the set of
all Americans, which on the input “person” returns the value “eligible to vote” or
“not eligible to vote”. As long as we assume that for each American, exactly one
of “eligible to vote” or “not eligible to vote” is true, this would also be a function,
albeit admittedly much more abstract.
74 4. FUNCTIONS
First of all, let’s fix two sets of numbers X and Y ; we’ll call them the
domain and the codomain 2 of our function f . We then use
(4.4) f :X→Y
f :R→R
x 7→ 2x + 1
but we could also restrict the domain to the set of positive real numbers,
say, and write
f : (0, ∞) → R
x 7→ 2x + 1
f :Z→Z
x 7→ 2x + 1
Note in this notation that the two arrows are slightly different:
In the second line, we use the special arrow 7→ to indicate that x is
“mapped to” y; so a function is sometimes also called a map or a
mapping.
Often, in school mathematics, functions are given by simple formu-
las, such as f (x) = 2x + 1 or f (x) = x2 . But they do not have to be so
simple! Here are three “perfectly reasonable” functions for which there
line at x and checking how often the curve intersects the vertical line.
If the line and the curve intersect only once (for each argument x),
then the curve “passes the Vertical Line Test” and is the graph of a
function. This is because this unique point of intersection (x, y), say,
tells us that the value of the function f at x is y. See Figure 4.3 for
an example of a curve (described by the equation x = y 2 ) which does
not pass the Vertical Line Test: The line given by x = 1 intersects the
curve at the points (1, 1) and (1, −1). Algebraically, it’s
√ also easy to
see that the curve has two parts, one described by
√y = x (the “upper
half of the curve”), the other described by y = − x (the “lower half of
the curve”); and both halves individually pass the Vertical Line Test,
as they should,
√ since they
√ are the graphs of the two functions given
by y = x and y = − x, with domain the set of nonnegative real
numbers.
check how often the graph of the function intersects the ver-
tical line. If the line and the graph intersect at most once
(for each value y), then the graph “passes the Horizontal Line
Test” and is the graph of a 1–1 function. This is because that
point of intersection (x, y), say, tells us that the value of the
function f equals y at at most one value x. (See Figure 4.4
for the example of the function f : R → R, x 7→ x2 which fails
the Horizontal Line Test, e.g., at the line y = 1, and thus is
not 1–1. If we were to restrict the function f to the domain
[0, ∞) (i.e., the “right half” of the graph), or to the domain
(−∞, 0] (i.e., the “left half” of the graph), then the function
would pass the Horizontal Line Test and thus would be 1–1.)
What are the conditions now under which the inverse function of a
function f : X → Y exists? Note that we must ensure two things:
(1) For each y in the codomain Y of f , there is an argument x in
the domain X of f such that y = f (x); and
(2) for each y in the codomain Y of f , there is not more than one
argument x in the domain X of f such that y = f (x).
Now observe that condition (1) is exactly the condition that f is onto;
and condition (2) is the condition that f is 1–1! These two conditions
can be combined into one condition:
(3) For each y in the codomain Y of f , there is exactly one argu-
ment x in the domain X of f such that y = f (x).
It should be clear that condition (3) is exactly the condition on the ex-
istence of an inverse function which we are looking for: Condition (1)
is necessary since otherwise, for some y, we will not be able to define
f −1 (y). Condition (2) is necessary as well since otherwise, for some y,
we will not know which value x to choose for f −1 (y). On the other
hand, condition (3) is also sufficient to ensure the existence of an in-
verse function f −1 : Y → X: Given y in Y , condition (3) ensures the
existence of exactly one x in X such that f (x) = y, and so it is both
natural and inevitable to define f −1 (y) = x.
There is also a very nice visual way to find (the graph of) the inverse
function of a function f : Recall that the graph of a function f : X → Y
is the set of all pairs (x, y) such that y = f (x), whereas the graph of
the inverse function f −1 : Y → X is the set of all pairs (y, x) such
that x = f −1 (y), i.e., such that y = f (x). So the graph of f −1 is
obtained from the graph of f by simply interchanging the coordinates
of each point (x, y)! Graphically, this corresponds to reflecting the
graph of f along the “main diagonal” described by y = x; but, of
course, this works only after we rename the variables from x = f −1 (y)
to y = f −1 (x). Figures 4.5 and 4.6 show two examples.
We conclude with another example of our concepts for functions
from this section: Let X = {0, 1, 2} and Y = {1, 2, 3} be two finite
sets of numbers. Let f and g be two functions from X to Y , defined
commonly write expressions like sin2 x in place of (sin x)2 , this notation can lead to
serious ambiguities, since we won’t know how to interpret sin−1 x unless the context
clarifies this, and I always recommend to use the special notation arcsin x for the
inverse of the sine function in place of sin−1 x; but we’ll leave these intricacies aside
for now.
4.2. RANGE, ONTO AND 1–1 FUNCTIONS, AND INVERSE 81
by
f (0) = 1 g(0) = 1
f (1) = 3 g(1) = 3
f (2) = 3 g(2) = 2
We can represent f and g visually as shown in Figure 4.7.
f g
0 1 0 1
f g
1 2 1 2
2 3 2 g 3
f
It is now easy to see that f is neither onto nor 1–1: f is not onto
since 2 is in the codomain but not in the range of f ; and it is not 1–1
since f (1) = f (2). It is therefore impossible to find an inverse function
for f : Given the value 2 in the codomain Y of f , we can’t find any x
in the domain X of f with f (x) = 2; furthermore, for the value 3 in
the codomain Y of f , there are two different x in the domain X of f
with f (x) = 3. So it is not possible to find exactly one value for f −1 (y)
for either y = 2 or y = 3. On the other hand, for each y in Y , there is
exactly one x in X such that g(x) = y, so we can define g −1 : Y → X
with
g −1 (1) = 0
g −1 (2) = 2
g −1 (3) = 1
and g thus has an inverse function.
4.3. Some Functions Closely Related to Linear Functions
So far, we have mainly dealt with one kind of function, linear func-
tions, which can be expressed as f (x) = mx + b for fixed parameters m
4.3. SOME FUNCTIONS CLOSELY RELATED TO LINEAR FUNCTIONS 83
Figure 4.8. The graph of the least integer function from (4.9)
“A pay phone charges 50¢ for the first minute and 10¢
for each additional minute of a long-distance call.”
Before, we had restricted the domain of the function describing the
cost of the phone call to the set of positive integers, so we defined this
function as
f : {1, 2, 3, . . . } → R
(4.10)
n 7→ 10n + 40
where n is the length of the phone call (in minutes) and f (n) is its
cost (in cents), and its graph was given in Figure 2.3. But this is really
cheating, since phone calls can be of length any positive real (at least
in principle), and then the phone company rounds up the length of the
phone call to the next integer. So the cost function is really given by
g : (0, ∞) → R
(4.11)
x 7→ 10 · dxe + 40
The graph of the “real” cost function g is given in Figure 4.9.
What are the pros and cons of describing the cost function accu-
rately as in (4.11) rather than just as f (x) = 10x + 40 (as in (2.6))?
Certainly, the former is more accurate; it is the actual cost function as
computed by the phone company. On the other hand, f (x) = 10x + 40
is a close approximation, which is much easier to work with: Unlike g,
f is 1–1 and onto and thus has an inverse function; so we can easily
compute the length of a 90-cent phone call, say, using f , or rather f −1 .
If we tried this for an 85-cent phone call, say, we’d still get an “answer”
4.3. SOME FUNCTIONS CLOSELY RELATED TO LINEAR FUNCTIONS 85
using f −1 , namely 4.5 minutes, and even though the answer would be
wrong (there is no phone call costing exactly 85 cents!), it would tell
us that the phone call would be around 4 or 5 minutes.
So the upshot is that a linear function is sometimes just a useful
approximation to the real function, such as a step function, which is
much harder to deal with. School mathematics is often happy to just
deal with the approximation, but you as teachers should be aware of
this distinction, and the reasons why one uses approximations, since
it may confuse some of your students, and you should then be able to
give at least a “roughly correct” answer.
and as can be seen also from the graph shown in Figure 4.10. Note that
at the points t = 12 , we can compute s both as 2 · 12 and as 1 + 8( 21 − 12 );
so the function s “bends” but does not “jump” at t = 12 , and we are
allowed to use 0 ≤ t ≤ 21 and 12 ≤ t ≤ 1 21 in the first and second case
of (4.12) even though the cases “overlap” at t = 12 . Similarly, at t = 1 12 ,
we can compute s both as 1 + 8(1 12 − 12 ) and as 9 + 20(1 12 − 1 12 ). The
speeds at the various times corresponds to the slopes of the various line
segments in the graph.
meter, is expressed by
|` − 10| > .5
or equivalently by
` < 9.5 or ` > 10.5
Here, P is the pressure (in pascal, say), V the volume (in m3 , say),
T the absolute temperature (in degrees Kelvin), and C is a constant
(which is fixed as long as we only consider a fixed amount of a particular
kind of gas).
The Combined Gas Law can now be rewritten, and stated as saying
the following:
CT
(1) P = : The pressure varies directly with the temperature
V
(at constant volume), and inversely with the volume (at con-
stant temperature).
CT
(2) V = : The volume varies directly with the temperature
P
(at constant pressure), and inversely with the pressure (at con-
stant temperature).
PV
(3) T = : The temperature varies directly with the pressure
C
(at constant volume), and also directly with the volume (at
constant pressure).
4.3. SOME FUNCTIONS CLOSELY RELATED TO LINEAR FUNCTIONS 91
1
Figure 4.13. The graph of the function y = x
CHAPTER 5
Or, we could have a “pattern” which “oscillates” between the 1st and
the 4th shape, i.e., the 5th shape is the same as the 3rd, the 6th shape
is the same as the 2nd, the 7th shape is the same as the 1st, the 8th
shape is the same as the 2nd again, etc. There is no reason why one
pattern should be considered more “reasonable” than the others! This
is why a careful definition of the intended pattern is essential.
Now let’s look at the perimeter and the area of the nth shape in
Figure 5.1 as defined above (assuming that the side length of each
square is one unit length, and each square has area one unit square).
Table 5.1 below shows these for the first five shapes.
number of shape 1 2 3 4 5
perimeter of shape 4 10 16 22 28
area of shape 1 4 9 16 25
Table 5.1. Perimeter and area of the shapes from Figure 5.1
We will now see that perimeter and area show a very different pat-
tern:
Table 5.1 suggests that the perimeter increases by a fixed amount
(namely, 6) each time. We can also see this, more precisely and for
general n, from our definitions of the shapes: E.g., from the above
definition (1) for the shapes, we see that each time we add to the
pattern, we move the bottom edge down one unit; and then we add
a square on each side of the new bottom row, resulting in an increase
in perimeter by six edges of these two squares. We can also see this
from the above definition (2) for the shapes: Each time we add to the
pattern, we move all the edges on the right over by two units; and then
we add four edges at the top (including for the new top square) and
two edges at the bottom, resulting in an increase in perimeter by six
edges.
Therefore, the perimeter (let’s call it P (n)) of the nth shape is a
linear function of n, with slope 6 and the graph containing the point
(1, 4) (since the 1st shape has perimeter 4). By the point-slope form
for a line equation(see (2.11)), we arrive at the formula
(5.1) P (n) = 6(n − 1) + 4 = 6n − 2
Here is another way of looking at finding the expression for this
function: Let’s define a new function (called the first difference func-
tion) by
(5.2) P 1 (n) = P (n + 1) − P (n)
94 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
So this function P 1 tells us not the function P itself but by how much
the function P increases from n to n + 1. As we showed above, this
increase is constant, namely, we have
(5.3) P 1 (n) = 6
for all whole numbers n > 0. Note that (5.3) (together with (5.2)) does
determine the function P ; e.g., the function P 0 (n) = 6n also satisfies
both (5.3) and (5.2). But if we also specify that
(5.4) P (1) = 4
then we can compute the function P at all whole numbers n > 0
by starting with n = 1 in (5.4) and then repeatedly using that (5.3)
and (5.2) give us
P (n + 1) = P (n) + P 1 (n) = P (n) + 6
to compute P (2), P (3), etc.1
We note that any function P (defined on the positive whole num-
bers) with constant first difference function P 1 must be linear: Suppose
that P 1 (n) = c for all n > 0. Then we can compute
P (2) = P (1) + c
P (3) = P (2) + c = P (1) + 2c
P (4) = P (3) + c = · · · = P (1) + 3c
...
P (n) = P (n − 1) + c = · · · = P (1) + (n − 1)c
...
On the other hand, the increase in area from one shape to the next
is not constant: The increase from the 1st to the 2nd shape is 3, from
the 2nd to the 3rd is 5, from the 3rd to the 4th is 7, etc. So, instead of
the increase being constant, it appears from Table 5.1 that the increase
increases by a constant, namely, 2. We can also see this more precisely
from our definitions of the shapes: E.g., from the above definition (1)
for the shapes, we see that each time we add to the pattern, we add
two more squares to the shape than we did the last time (since the new
bottom row is two units wider than the previous bottom row). And we
can see this from Figure 5.2: The squares marked “new” (in regular
font) are the same number as was added at the previous step; the two
squares marked “new” (in slanted font) are added in addition to the
1Such a definition of a function is sometimes called a definition by recursion.
5.1. INTRODUCTION TO QUADRATIC FUNCTIONS 95
other new squares; so we do indeed add two more squares at each step
than we did at the previous step.
new
This results in the following equation for the first difference func-
tion A1 of the area A:
So, by the argument on the previous page, the first difference function
A1 (n) = A(n + 1) − A(n) is a linear function, since it increases by 2
each time; namely, the so-called second difference function
A2 (n) = A1 (n + 1) − A1 (n)
is constant, namely, 2.
Whenever the second difference function is constant, the first dif-
ference function must be linear, as we saw before; and whenever the
first difference function is linear, then the function is quadratic, as we
will now show.2
First of all, a quadratic function is a function of the form
2If you happen to know some calculus, note the analogy: We consider here so-
called discrete functions, i.e., functions defined on the positive integers, say, in an
area called discrete mathematics. For such functions, the first difference function
is the analogue of the first derivative of a continuous function, and the second
difference function is the analogue of a continuous function. What we just said was
that a discrete function is linear exactly when its first difference function is constant;
and that a discrete function is quadratic exactly when its second difference function
is constant. Of course, the derivatives do not exist for all continuous functions; but
the difference functions of discrete functions always exist.
96 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
the area of the nth shape. But it’s not hard to see that the areas of
these shapes satisfy
(5.9) A(n + 1) − A(n) = n + 1
and so the second difference is constant 1. Comparing (5.9) with (5.7)
above, we arrive at
n + 1 = (2a)n + (a + b)
and so
1 = 2a
1=a+b
which solves to
1
a=
2
1
b=
2
Now we still need to find c. Setting n = 1 gives
A(1) = 1 = 1
2
· 12 + 21 + c
98 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
and therefore again c = 0. Thus the area of the nth shape in Figure 5.1
is
(5.10) A(n) = 12 n2 + 21 n
Exercise 5.1. Find a formula for the perimeter of the shapes in
Figure 5.4, and prove both it and the area formula (5.10) geometrically
in at least two different ways.
5.2. Solving Quadratic Equations
We call an equation a quadratic equation (in an unknown x) if it is
of the form
(5.11) ax2 + bx + c = 0
for some parameters a, b, c in R. We will also assume from now on that
a 6= 0 since otherwise (5.11) reduces to a linear equation. Note that a
more general equation
ax2 + bx + c = a0 x2 + b0 x + c0
can easily be transformed into the format (5.11) by “subtracting the
right-hand side to the left”; so we can restrict ourselves in general to
equations of the form (5.11).
We will now consider increasingly sophisticated (and more compli-
cated) techniques to solve a quadratic equation, starting with a review
of the binomial laws, which will prove very useful.
5.2.1. Reviewing the Binomial Laws. The binomial laws (for
algebraic expressions a, b and c) state that
(5.12) (a + b)2 = a2 + 2ab + b2
(5.13) (a − b)2 = a2 − 2ab + b2
(5.14) (a + b)(a − b) = a2 − b2
and are simple but very useful consequences of the laws of arithmetic
from Chapter 1.
We will prove them in two different ways, algebraically from the
laws of arithmetic, and geometrically using models for addition and
multiplication.
Let’s start with an algebraic proof of (5.12):
(a + b)2 = (a + b)(a + b) (by definition of squaring)
= a(a + b) + b(a + b) (by distributive law)
= (a2 + ab) + (ba + b2 ) (by distributive law again)
= a2 + 2ab + b2 (by comm./assoc./distrib. laws)
5.2. SOLVING QUADRATIC EQUATIONS 99
For a geometric proof of (5.12) (for a, b > 0), refer to Figure 5.5.
The area of the big square, which has side length a+b, is (a+b)2 , and is
the sum of the areas of the two smaller squares and the two rectangles,
which have areas a2 , b2 , ba and ab, respectively. The binomial law (5.12)
now follows immediately.
b ba b2
a a2 ab
a b
is that now we can “take the square root” on both sides and we arrive
at
x+1 = ±2 (which is an abbreviation for “x + 1 = 2 or x + 1 = −2”)
(Don’t forget the ± here! More about that in a minute.) Subtracting 1
on both sides then yields
x = −1 ± 2 (which is an abbreviation for “x = 1 or x = −3”)
Clearly, “taking the square root” here did the trick, but why was
that justified? This step, and the ± part of it, is a frequent source of
student errors, so let’s look at it more closely: What we really did was
to take the following steps, each resulting in an equivalent statement:
(5.20) (x + 1)2 = 4
(5.21) |x + 1|2 = |2|2
(5.22) |x + 1| = |2|
(5.23) x + 1 = ±2 (i.e., x + 1 = 2 or x + 1 = −2)
Let’s analyze each step carefully: Going from (5.20) to (5.21) uses
that 4 = 22 , that 2 = |2|, and the identity
A2 = |A|2
which holds for all algebraic expressions A and follows immediately
from the definition (4.13) of the absolute value. Going from (5.21)
to (5.22) is a new operation on equations for us: If both sides of an
equation A = B (for algebraic expressions A and B) are non-negative,
then the equations
A=B and A2 = B 2
have the same solutions. Finally, the step from (5.22) to (5.23) follows
immediately from (4.15). Since this sequence of three steps is used
quite frequently, we state it as a proposition:
Proposition 5.3. The solutions to the equation
A2 = B 2
(for algebraic expressions A and B) are the same as the solutions to
the statement
A = B or A = −B
The proof of Proposition 5.3 is the same the proof for the special
case above: The following four statements are equivalent, by the same
102 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
p px
½p ½px
½px ¼p2
x x2 x ½px
½px
x x ½p
square since we are adding just the right number in order to make the
left-hand side into an expression which can be written as a square, as
shown geometrically in Figure 5.6.
From the √ quadratic formula, we can see that one solution of (5.37)
3 1
is − 2 + 2 5. (Of course, the “other” solution from the formula is
√
− 32 − 12 5, but let’s ignore that for now.) We will now “divide” the
√
expression x2 +3x+1 by the expression x+ 23 − 12 5 (i.e., the expression
“x − solution”, for a reason we’ll see later), similarly to the way long
division works for whole numbers, except that now the “digits” of our
expressions are not numbers between 0 and 9 (as in usual long division)
but expressions of the form ax2 , bx or c where a, b and c are numbers.
This procedure is called polynomial division and works in our example
as shown in Table 5.2.
√
x + ( 23 + 1
5)
√ 2
x + ( 23 − 1
5) x2 + 3x + 1
2
3 1
√
− (x2 + ( 2 + 2 5)x)
√
( 32 − 21 5)x + 1
√
− (( 32 − 21 5)x + 1)
0
3 1
√
Table 5.2. Dividing (x2 + 3x + 1) ÷ (x + 2
− 2
5) by
polynomial division
as
x2 + 3x + 1 3 1√
√ =x+ − 5 or
x + 32 + 12 5 2 2
3 1√ 3 1√
2
x + 3x + 1 = x + − 5 x+ + 5
2 2 2 2
So why all this work? We can now use (5.18): The solutions x to
the equation
(5.37) x2 + 3x + 1 = 0
are exactly the solutions to the statement
3 1√ 3 1√
x+ + 5 = 0 or x + − 5 =0
2 2 2 2
√
Thus equation (5.37) has exactly two solutions, namely, − 32 − 12 5 and
√
− 23 + 12 5.
So what have we accomplished? We started with a quadratic equa-
tion and one solution for it. We then used polynomial division and
computed another solution, and we also saw that these two solutions
are the only ones!
The point of all this is that this can be done in general, leading us
not only to a way of finding one solution of a quadratic equation from
another, but, more importantly, a proof of the following
Proposition 5.4. Any quadratic equation ax2 + bx + c = 0 (with
a 6= 0) has at most two real solutions. (In fact, it has at most two
complex solutions.)
The parenthetical statement above can easily be seen once we have
covered the complex numbers; so let’s concentrate on the case for the
real numbers: First of all, we may assume (by dividing both sides by a)
that a = 1, so let’s only consider a quadratic equation of the form
(5.38) x2 + px + q = 0
Furthermore, if this equation has no real solution at all, then we’re
done. So let’s fix one real solution and call it x0 . We will now attempt
polynomial division of x2 + px + q by x − x0 and see what happens, as
shown in Table 5.3.
Let’s go over this very abstract polynomial division in detail: We
want to divide x2 + px + q by x − x0 . Again, just as in long division,
we do so “digit” by “digit”, except that instead of “digits”, we now
use the coefficients: In x2 + px + q , the coefficient 1 of x2 corresponds
to the “hundreds digit”, the coefficient p of x corresponds to the “tens
108 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
x + (p + x0 )
2
x − x0 x + px + q
− (x2 − x0 x)
(p + x0 )x + q
− ((p + x0 )x − (p + x0 )x0 )
(p + x0 )x0 + q
and
(−bi)2 = (−b)2 i2 = a(−1) = −a
Thus the equation x2 = −a (for positive a) has the two solutions bi
and −bi. While one does not in general define one of bi or −bi to be
“the” square root of −a, both bi and −bi serve as a good substitute
√ for
the square root of −a. Whenever we write expressions √ like ± −a, it
doesn’t matter anyhow which of bi or −bi we use for −a.
Finally, we’ll define a complex number to be any number of the
form a + bi, where a and b are arbitrary real numbers. Addition,
subtraction and multiplication are defined on the complex numbers
just as expected:
(5.47) (a + bi) + (c + di) = (a + c) + (b + d)i
(5.48) (a + bi) − (c + di) = (a − c) + (b − d)i
(5.49) (a + bi)(c + di) = (ac − bd) + (ad + bc)i
Here, the formula (5.49) for multiplication just uses repeated applica-
tion of the distributive law for the left-hand side, plus the fact that
i2 = −1. Division is somewhat more complicated since one has make
the denominator a real number; this involves a slight trick and some
messy computations:
a + bi (a + bi)(c − di)
=
c + di (c + di)(c − di)
(5.50) (ac + bd) + (bc − ad)i
=
c2 + d 2
ac + bd bc − ad
= 2 + 2 i
c + d2 c + d2
Note that the last line of (5.50) is well-defined as long as c+di 6= 0, i.e.,
as long as at least one of c and d is nonzero, making c2 + d2 nonzero.
So, for c + di 6= 0, the formula (5.50) gives us again an expression of
the form A + Bi for real numbers A and B, i.e., a complex number in
our sense. (It is also possible to solve quadratic equations with complex
parameters a, b and c, but that would require the definition of square
roots of complex numbers, which is beyond the scope of these notes.)
We now return to the quadratic formula
√
b b2 − 4ac
(5.36) x=− ±
2a 2a
for the solution to the general quadratic equation
(5.30) ax2 + bx + c = 0
5.4. GRAPHING QUADRATIC FUNCTIONS 113
(where a, b and c are now arbitrary real numbers with a 6= 0). The
quadratic formula then gives us one or two complex solutions in all
cases; namely, one solution if b2 − 4ac = 0, and two solutions otherwise.
We conclude with an example: Consider the quadratic equation
(5.51) x2 + 2x + 2 = 0
Here, a = 1 and b = c = 2. So the solutions to (5.51) are
√
2 22 − 8
x=− ±
2 √ 2
−4
= −1 ±
2
2i
= −1 ±
2
= −1 ± i
We can check that these two solutions really work:
(−1 + i)2 + 2(−1 + i) + 2 = (1 − 2i − 1) + (−2 + 2i) + 2
= (1 − 1 − 2 + 2) + (−2 + 2)i = 0
and
(−1 − i)2 + 2(−1 − i) + 2 = (1 + 2i − 1) + (−2 − 2i) + 2
= (1 − 1 − 2 + 2) + (2 − 2)i = 0
shown in Figure 5.7. (We only show the graph for values of t in the
interval [0, 10.2] since the graph appears to be below the t-axis outside
this interval, which would correspond to the arrow being below the
ground, contradicting the “real life” context.) It is now not hard to
check that the arrow is at maximum height at just over five seconds,
when it reaches a height of approximately 130 meters. Furthermore,
the arrow appears to hit the ground after just over ten seconds.
Could we have computed this without a picture of the graph? Of
course, but that requires a bit of thinking and computation. Let’s
address the second question from the problem first: When does the
arrow hit the ground again? This corresponds to a solution of the
quadratic equation
50t − 4.9t2 = 0
5.4. GRAPHING QUADRATIC FUNCTIONS 115
since the arrow hits the ground when s equals 0 again. We can solve
this equation by factoring:
50
−4.9t2 + 50t = −4.9t(t −)=0
4.9
and so s equals 0 at time t = 0 (when the arrow is fired) and at time
50
t ≈ 4.9 ≈ 10.2. The latter time (10.2 seconds after being fired) must
be the approximate time when the arrow hits the ground again.
The first question (How high will the arrow go, and when will it
be at maximum height?) is harder to answer.7 The solution uses our
technique of completing the square from subsection 5.2.4:
s = 50t − 4.9t2
50
= −4.9 t2 − 4.9 t
50 25 2 25 2
2
(5.54) = −4.9 t − 4.9
t + 4.9
− 4.9
50 25 2 2
+ 25
= −4.9 t2 − 4.9 t + 4.9 4.9
25 2 2
+ 25
= −4.9 t − 4.9 4.9
25 2
Since −4.9 t − 4.9 ≤ 0 for all t, we can now see that the height s
252
cannot exceed 4.9 ≈ 127.5 meters, but that that height is indeed the
25
maximum height and is achieved at 4.9 ≈ 5.1 seconds.
Now let’s do the above in the abstract. Consider a quadratic func-
tion
(5.55) f (x) = ax2 + bx + c
where a 6= 0. We can rewrite this, completing the square, as
2 b
f (x) = a x + x + c
a
2 2 !
b b b
= a x2 + x + − +c
a 2a 2a
(5.56) 2 !
b b b2
= a x2 + x + +c−
a 2a 4a
2
b2
b
=a x+ + c− and so
2a 4a
7Calculus was invented to solve problems like this, but in our easy case, we
don’t need it!
116 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
and so
2
b2
b
(5.57) f (x) − c − =a x− −
4a 2a
Setting
b2
b
x0 = − and y0 = c −
2a 4a
equation (5.57) simplifies to
(5.58) f (x) − y0 = a(x − x0 )2 or f (x) = a(x − x0 )2 + y0
The graph of the quadratic function is called a parabola, and the
point V with coordinates (x0 , y0 ) is called the vertex of the parabola.
(In the arrow example (5.53) above, the vertex is approximately the
point (5.1, 127.5).)
We are still assuming that a 6= 0 and now need to distinguish two
cases before we can discuss the graph of a quadratic function in more
detail:
Case 1: a > 0: Then a(x − x0 )2 + y0 ≥ y0 for all x, and so the minimal
value of f is y0 , which is achieved only at x = x0 . The equation
describes a parabola which “opens upward”. In order to solve
the quadratic equation
(5.59) a(x − x0 )2 + y0 = 0
we need to distinguish three subcases:
Case 1a: y0 < 0: Then the equation (5.59) has two real solutions
r
y0
x = x0 ± −
a
2
since the graph of the function f (x) = a(x p− yx00 ) + y0
intersects the x-axis at the two points (x0 + − a , 0) and
(x0 − − ya0 , 0). The function achieves its minimum value
p
at the vertex (x0 , y0 ). (See Figure 5.8 for an example with
x0 = 2, y0 = −1, and so the vertex is V = (2, −1); the
graph of f intersects the x-axis at (1, 0) and (3, 0).)
Case 1b: y0 = 0: Then the equation (5.59) has only one real solu-
tion
x = x0
since the graph of the function f (x) = a(x − x0 )2 + y0
intersects the x-axis only at the vertex (x0 , 0), at which
the function also achieves its minimum value. (See Fig-
ure 5.9 for an example with x0 = 2, y0 = 0, so the vertex
5.4. GRAPHING QUADRATIC FUNCTIONS 117
(5.59) a(x − x0 )2 + y0 = 0
Case 2a: y0 > 0: Then the equation (5.59) has two real solutions
r
y0
x = x0 ± −
a
2
since the graph of the function f (x) = a(x p− yx00 ) + y0
intersects the x-axis at the two points (x0 + − a , 0) and
(x0 − − ya0 , 0). The function achieves its maximum value
p
at the vertex (x0 , y0 ). (See Figure 5.11 for an example
with x0 = −2, y0 = 1, and so the vertex is V = (−2, 1);
the graph of f intersects the x-axis at (1, 0) and (3, 0).)
Case 2b: y0 = 0: Then the equation (5.59) has only one real solu-
tion
x = x0
since the graph of the function f (x) = a(x − x0 )2 + y0
intersects the x-axis only at the vertex (x0 , 0), at which
it also achieves its maximum value.
Case 2c: y0 < 0: Then the equation (5.59) has no real solutions
since
f (x) = a(x − x0 )2 + y0 ≤ y0 < 0
120 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
1
(5.52) s = v0 t − gt2
2
25 2 252
(5.54) s = −4.9 t − 4.9
+ 4.9
122 5. QUADRATIC FUNCTIONS, EQUATIONS AND INEQUALITIES
In this last chapter, we’ll cover the exponential functions and their
inverse functions, the logarithmic functions.
= am+n
And to see that (6.4) is true, we check
n
z }| {
m m m m
(am )n = a · a · a · · · a · a · a · a · · · a · a · a · a · · · a . . . a · a · a · · · a
z }| { z }| { z }| { z }| {
m·n
= a · a · a···a
z }| {
= am·n
Exercise 6.1. How you would prove (6.3) and (6.5) this way? Why
does it make sense to impose the restriction m > n in (6.3) for now?)
So (6.2)–(6.5) follow immediately from our previous rules for mul-
tiplication. Wouldn’t it be nice to have these rules hold not just for
nonzero whole numbers as exponents? Of course, you know that these
rules are still true, right?
Well, there is a small problem: We haven’t even defined what am
means unless m is a nonzero whole number! We will do this in a
sequence of small steps, each time justifying our definition for more
and more possible exponents.
From now on, we will assume that the base a is nonzero. (This
will make sense in a moment when we see how division by a becomes
involved in our definition.) Now we can define exponentiation for an
exponent which is 0 or a negative integer as follows:
(6.6) a0 = 1
1
(6.7) am = (for any negative integer m)
a−m
First, note here that a−m in (6.7) has already been defined in (6.1)
since −m is a nonzero whole number if m is a negative integer. More
importantly, note that we are forced to define exponentiation this way
if we want the analogue of (6.3) to still hold:
a1 a
a0 = = =1
a1 a
a0 1
am = −m = −m
a a
126 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
1 √
p
(6.8) ap = a
n
1
n
(6.9) a p = ap
1 p p
ap = a p = a1 = a
n 1
·n
1 n
a = a = ap
p p
n 1 n
a p = ap
1 p·m
= ap
1 p m
= ap
= am
n n0
p p0
a =a
4Strictly speaking, what we are doing here is merely what you have already
done for the other arithmetical operations, like addition and multiplication, for
arbitrary real numbers. E.g., the sum r0 + r1 of two irrational numbers r0 and r1 ,
say, is “approximated” more and more “closely” by the sum q0 + q1 of rational
numbers q0 and q1 “close” to r0 and r1 , respectively.
128 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
Proposition 6.2. The following exponential rules hold for all pos-
itive real numbers a and b, and for all real numbers r and s:
(6.10) ar · as = ar+s
ar
(6.11) = ar−s
as
(6.12) (ar )s = ar·s
(6.13) (a · b)r = ar · br
If both r and s are integers, then these rules hold for arbitrary nonzero
real numbers a and b.
The formal proof of Proposition 6.2 is quite tedious, so we’ll skip it
here and simply accept the exponential rules as given from now on.
There is one special case of (6.11) which comes up frequently,
namely, the case when r = 0, so we’ll state this as a separate formula:
s −t
−s 1 1 t 1 1
(6.14) a = s = or, equivalently: a = −t =
a a a a
number of hours 0 1 2 3 4 5
number of fish 5 8 11 14 17 20
number of hours 0 1 2 3 4 5
number of bacteria 100 200 400 800 1,600 3,200
We see right away that the number of bacteria does not increase by
a fixed amount each hour. In fact, the increase itself increases rather
dramatically: From hour 0 to hour 1, the increase is by only 100,
but from hour 4 to hour 5, the increase is already by 1,600. In fact,
this leads us to the following observation: If we denote the number
of bacteria after n hours by B(n), and consider the “first difference
function” (as defined in section 5.1), namely,
B 1 (n) = B(n + 1) − B(n)
then it turns out that in fact B 1 (n) = B(n), i.e., the first difference
function B 1 equals the original function B. We could now try, as in
section 5.1, to take the second difference function
B 2 (n) = B 1 (n + 1) − B 1 (n)
but, of course, this will give us the same function again! So not only is
the function B not linear; it is also not quadratic, by the discussion in
section 5.1.
130 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
On the other hand, you will surely have noticed definite “patterns”
in Tables 6.1 and 6.2: In the former, the additive change is constant;
namely, each hour, the number of fish increases by a summand of 3.
On the other hand, in the latter, the multiplicative change is constant;
namely, each hour, the number of bacteria increases by a factor of 2.
It is then not hard to see that the function B can be described by
Note that there is some similarity between (6.15) and (6.16): The
“starting point” (at n = 0) is 5 and 100, respectively, and shows up
first in the expression on the right-hand side. Next comes the op-
eration symbol for the way in which the increase occurs, additively
in (6.15), and multiplicatively in (6.16). Next comes the “rate of
change”, namely, 3 in (6.15) for the additive change of 3 each hour,
and 2 in (6.16) for the multiplicative change of 2 each hour, respectively.
And finally comes n for the number of times by which this increase oc-
curs, preceded by symbol for the “repeated operation” (namely, multi-
plication as repeated addition in (6.15), and exponentiation as repeated
multiplication in (6.16), respectively).
In both our fish and our bacteria example, it is now easy to compute
the number of fish or bacteria, respectively, at time 10 hours, say: It
is 5 + 3 · 10 = 35 fish and 100 · 210 = 102, 400 bacteria, respectively.5
But, as we mentioned already in section 2.5, such problems are really
problems of arithmetic: In order to solve them, we merely have to
evaluate the numerical expressions 5 + 3 · 10 and 100 · 210 , respectively.
On the other hand, suppose that we are given the number of fish, or
bacteria, respectively, and want to compute the number of hours after
which there are this many fish or bacteria. Suppose I want to know
when there will be 50 fish. This is now a “real algebra” problem and
clearly amounts to solving the equation
5 + 3 · n = 50
3n = 50 − 5
50 − 5
n= = 15
3
5In
the latter case, of course, we should really write “approximately 100,000
bacteria” since 102,400 gives a false sense of precision for our answer.
6.3. THE EXPONENTIAL FUNCTIONS 131
Proposition 6.3. For any base b > 0 (with b 6= 1), the exponential
function h(x) = bx has the following properties:
(1) The exponential function h(x) = bx is defined for all real num-
bers x, i.e., the domain of h is the set R of all real numbers.
(2) The graph crosses the y-axis at the point (1, 0).
(3) The graph lies entirely above the x-axis, i.e., bx > 0 for all x.
6.3. THE EXPONENTIAL FUNCTIONS 133
since x1 − x0 > 0, and so bx1 < bx0 . Next, (5b) and (5c) can be reduced
to (4c) and (4b), respectively: 0 < b < 1 implies 1 < 1b by (3.9). Now
−x
apply (4c) to see that, as x grows larger and larger, bx = 1b gets
arbitrarily close to 0. Similarly, apply (4b) to see that, as x grows
smaller and smaller (i.e., as x “grows larger and larger in the negative
−x
direction”), bx = 1b grows larger without bound.
Furthermore, (6) follows by (4a) and (5a) since x0 < x1 implies
bx0 < bx1 or bx0 > bx1 and thus bx0 6= bx1 .
Finally, (7) can be seen from (4b) and (4c) for b > 1, and from (5b)
and (5c) for 0 < b < 1, respectively.8
There is one issue we need to address still: How does the value of
the base b impact on the graph of the exponential function? Figure 6.3
shows some examples; note that for clarity, we used various dotted lines
for some of the graphs so you can quickly tell them apart. As you can
see from these graphs, for b > 1, the bigger b is, the more steeply the
graph of the exponential function increases. On the other hand, for
0 < b < 1, the smaller b, the more steeply the graph of the exponential
function decreases.
One final note: If you look at other algebra books, or at calculus
books, you will find a special exponential function being mentioned,
often called “the” exponential function. This is the exponential func-
tion f (x) = ex with special base e ≈ 2.718 (which is sometimes called
Euler’s number ). The fact that this base is special does not become
apparent until one studies calculus; and using e as a base before is re-
ally more of a pain than a convenience since e is actually an irrational
number. Therefore, we have chosen to ignore the base e here.
the interest is added to the amount, and next time interest is paid,
interest is also added for the previous interest.
For a simple example, suppose I have $100 in a savings account,
and the bank pays me 5% interest at the end of each year. At the
end of the first year, I then have $105 (namely, the principal of $100
plus 5% of it in interest); at the end of the second year, I have $110.25
(namely, the previous amount of $105 plus 5% of it in interest); etc.
See Table 6.3.
number of years 0 1 2 3 4
amount in account $100.00 $105.00 $110.25 $115.76 $121.55
Here, we have introduced the special symbol expb for the exponential
function with base b.
Now recall from subsection 4.2.3 that a function has an inverse func-
tion exactly when the function is 1–1 and onto! This inverse function of
the exponential function expb (x) = bx is usually called the logarithmic
function to base b; it is defined as
logb : (0, ∞) → R
(6.23)
y 7→ the unique x such that bx = y
The first, called log, is really log10 and sometimes called the common
logarithm or the decadic logarithm. Its importance is mainly historic,
as we will explain later on; but since it’s somewhat easier than the
other logarithm on your calculator, we will mainly use it. The other
logarithmic function on your calculator, called ln, is really loge for
the number e ≈ 2.718 mentioned at the end of section 6.3.1 and is
usually called the natural logarithm (which explains the symbol ln); it
is somewhat cumbersome and not all that useful until calculus (when it
becomes very important, so important that most mathematicians call
it “the” logarithmic function and ignore all others!).9
There is now a handy little formula which allows you to compute
any logarithmic function from log10 :
log10 (x)
(6.28) logb (x) =
log10 (b)
This formula is somewhat tedious to deduce, so we’ll skip this here.10
(Incidentally, the same formula also works with e in place of 10, i.e.,
we also have logb (x) = ln(x)
ln(b)
.)
Note that formula (6.28) is exactly what you’re looking for with
a calculator which can handle log10 : The right-hand side only in-
volves computing log10 (and division). E.g., here is how we can com-
pute log2 (64) the hard way, using a calculator:
log10 (64) 1.806
log2 (64) = ≈ =6
log10 (2) 0.301
We will see applications of the logarithmic functions in subsec-
tion 6.4.2 where the use of a calculator (and thus the use of for-
mula (6.28)) is inevitable.
But before we head into applications of the logarithmic functions,
let’s look at some properties of these functions. We summarize them in
the following proposition, which is the counterpart of Proposition 6.3
for the exponential functions. (Figure 6.4 shows the graphs of some
logarithmic functions and of the corresponding exponential functions;
9Some calculators have only one button for logarithmic functions, typically
called log, and now you don’t necessarily know which of the two logarithmic func-
tions above that means, so be careful! You can test them as follows: If log(10)
computes as 1 on your calculator, then you have log10 , since 101 = 10. If it com-
putes as approximately 2.3, then you have loge , since e2.3 ≈ 2.72.3 ≈ 10.
10But if you insist, here is the deduction: First of all, x = blogb (x) =
logb (x)
10log10 (b) as well as x = 10log10 (x) by definition of logb and log10 , respectively.
logb (x)
By (6.12), this gives 10log10 (b)·logb (x) = 10log10 (b) = 10log10 (x) . Since exp10
is 1–1, we obtain log10 (b) · logb (x) = log10 (x) and thus logb (x) = log 10 (x)
log (b) .
10
6.4. LOGARITHMIC FUNCTIONS 141
note that for fixed base b, the graphs of the exponential and the loga-
rithmic functions to base b are the reflections of each other along the
“main diagonal” y = x as mentioned in subsection 4.2.3. In order to
better distinguish the graphs of the various functions in Figure 6.4, we
have drawn some of them with “dotted” lines; but note that in reality,
they should all be drawn with “solid” lines.)
Proposition 6.4. For any base b > 0 (with b 6= 1), the logarithmic
function logb has the following properties:
(1) The logarithmic function logb (x) is defined for all positive real
numbers x, i.e., the domain of logb is the set (0, ∞) of all
positive real numbers.
(2) The graph crosses the x-axis at the point (0, 1).
142 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
Proposition 6.5. The following logarithmic rules hold for all pos-
itive real numbers a 6= 1, all positive real numbers r and s, and all real
numbers y:
We will not prove these here, but merely list them for reference.11
11They actually are not hard to show: In each case, we set r = ax and s = ay .
Firstly, loga (r · s) = loga (r) + loga (s) holds since ax · ay = ax+y . Next, loga ( rs ) =
x
loga (r) − loga (s) holds since aay = ax−y . And finally, loga (ry ) = y · loga (r) holds
since (ax )y = axy .
6.4. LOGARITHMIC FUNCTIONS 143
The basic idea of the slide rule is to take advantage of the log-
arithmic rules (6.29) and (6.30) in Proposition 6.5 in order to make
(approximate) manual computations easier by converting multiplica-
tion to “graphical addition”, and division to “graphical subtraction”.12
Figure 6.5 shows a basic slide rule with a magnifying sliding cursor,
allowing you to better align numbers on the scales; we will mainly use
the middle two scales of it (called scales C and D on our slide rule) for
now. Note that these two scales are sliding number lines, each from 1
to 10, but not drawn to scale like a usual number line, but such that
each number x between 1 and 10 is drawn at a distance log10 (x) to the
right of 1. Note, therefore, that the distance from 1 to 4 is twice the
distance from 1 to 2 (since log10 (4) = log10 (22 ) = 2 · log10 (2) by (6.31)),
and similarly the distance from 1 to 8 is three times the distance from 1
12
This short subsection is not intended as a full-blown guide to using a slide
rule, but rather to give you the basic idea as to how and why a slide rule works.
You can practice with a virtual slide rule on your computer screen at the web site
http://www.antiquark.com/sliderule/sim/n909es/virtual-n909-es.html.
144 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
to 2 (since log10 (8) = log10 (23 ) = 3 · log10 (2) by (6.31)). (You can check
this against scale L at the very bottom, which is the usual number line
from 0.0 to 1.0; its numbers represent the (base-10) logarithms of the
numbers on scale D, so log10 (1) = 0, log10 (2) ≈ 0.3, log10 (4) ≈ 0.6,
log10 (8) ≈ 0.9, and log10 (10) = 1.)
To illustrate the use of the slide rule, let’s start with a simple mul-
tiplication problem: Suppose you want to multiply 2 by 3. The slide
rule would accomplish this in the following steps (see Figure 6.5):
(1) Convert 2 and 3 to log10 (2) and log10 (3), represented by the
length 2 on scale D and the length 3 on scale C, respectively.
(2) Add these two lengths and read off their sum as the length
log10 (2) + log10 (3) = log10 (2 · 3) = log10 (6) on scale D (us-
ing (6.29)).
(3) Convert log10 (6) back to 10log10 (6) = 6.
Similarly, in order to divide 6 by 3, say, we would proceed as follows
(again refer to Figure 6.5):
(1) Convert 6 and 3 to log10 (6) and log10 (3), represented by the
length 6 on scale D and the length 3 on scale C, respectively.
(2) Subtract these two lengths and read off their difference as the
length log10 (6) − log10 (3) = log10 ( 36 ) = log10 (2) on scale D
(using (6.30)).
(3) Convert log10 (2) back to 10log10 (2) = 2.
There are many other computations which can be performed on
a slide rule, but since pocket calculators are nowadays a much more
efficient way to perform the same computations, we’ll leave it at that.
6.4.2.2. Radioactive Decay. The next application concerns radioac-
tive materials: As radioactive material decays, its mass shrinks by a
fixed amount per time period which is proportional to the amount left.
Mathematically, this is similar to the balance of an interest-bearing
savings account, which grows by a fixed amount per time period, as
we discussed in subsection 6.3.2, except that the amount of radioactive
material decreases, whereas the balance of an interest-bearing savings
account increases.
Now, just to confuse you, the decay of a radioactive material is
typically described by giving its half-life, namely, the time period it
takes for the amount of material to decrease to half of its original
amount. E.g., the half-life of plutonium is approximately 24,000 years;
Table 6.4 shows the amount left after multiples of 24,000 years, starting
with 1kg of plutonium. As you can see, the mass of the plutonium
shrinks in half every 24,000 years, as the name “half-life” indicates.
6.4. LOGARITHMIC FUNCTIONS 145
giving
t = 24, 000 · log 1 (0.1)
2
log(0.1)
= 24, 000 · (by (6.28))
log(0.5)
−1
≈ 24, 000 ·
−0.301
≈ 24, 000 · 3.32 ≈ 79, 700
So it will take approximately 79,700 years for 90% of the plutonium to
decay (and thus only 10% of it to be left).
6.4.2.3. Earthquakes and the Richter Scale. Let’s look another ap-
plication of logarithms, which at first looks quite different, the Richter
scale for earthquakes. This is done as follows: One first measures the
146 6. EXPONENTIAL AND LOGARITHMIC FUNCTIONS
maximum amplitude A (i.e., how far the ground “swings” from its rest-
ing position) at a distance of 100km from the epicenter;13 the amplitude
is measured in µm (read: micrometer, which is equal to 0.001mm or
0.000001m).
The problem with amplitude is that one has to deal with very large
numbers: The amplitude of a barely noticeable earthquake is about
100µm; the amplitude of a very destructive earthquake is as high as
100,000,000µm = 100m or more. As you can see, it’s not very practical
to use the amplitude; so one uses the (decadic) logarithm of the am-
plitude, called the Richter magnitude (or, more correctly, called local
magnitude)
13There are formulas which one can use to convert the amplitude measured at
a different distance from the epicenter to the amplitude one would have measured
at a distance of 100km; we will not worry about this here.
6.4. LOGARITHMIC FUNCTIONS 147
14There is also a unit called bel, which equals 10 decibels and is now rarely
used. It was defined as log10 (P ). In a sense, bels are more similar to the definition
of the Richter magnitude, but for some reason, it is now common practice to use
the unit decibels.
Bibliography
148
Index
set, 15
slide rule, 143
slope, 33, 35
solution set, 67
speed, 23
square (of a number), 124
square root, 109
“step” function, 83
substitution method, 54
union, 17
unknown, 9
whole numbers, 4, 16