0% found this document useful (0 votes)
38 views74 pages

Mechanics II Notes

Mechanics II notes. University of the Witwatersrand, Johannesburg.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views74 pages

Mechanics II Notes

Mechanics II notes. University of the Witwatersrand, Johannesburg.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

M E C H A N I C S II

APPM2023

Course Notes by
Warren Carlson

2023
Preface

This is the set of course notes for APPM2023. The notes are intended to complement the lectures
and other course material, and are by no means intended to be complete. Students should consult
the various references that have been given to find additional material, and different views on the
subject matter.
This material is under development. Please would you report any errors or problems (no
matter how big or small). Any suggestions would be gratefully appreciated.

School of Computer Science and Applied Mathematics,


University of the Witwatersrand,
Johannesburg,
South Africa

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs


License. To view a copy of this license, visit,

• http://creativecommons.org/licenses/by-nc-nd/2.5/

Send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105,
USA. In summary, this means you are free to make copies provided it is not for commercial
purposes and you do not change the material.

i
Contributors

This course material was prepared with the help of Dario Fanucchi, Matthew Woolway and
Sheldon Herbst. I am indebted to these individuals for their help in preparing this material.

ii
Foreword

We live today in an era of discovery, where new phenomena are constantly being observed and
explained, where thousands of scientists are competing in an undying quest to discover the secret
language of the universe. We live in a world of intellectual abundance, a world where precious
gems of knowledge, mined for so conscientiously by our predecessors and contemporaries are
just a mouse-click away. It is certainly exciting to witness this lightning-quick development of
human understanding, which expands into new domains with more pace and agility than the
most riveting game of sports. What, then, might lead us to the next breakthrough? What might
decode the beautiful language of that grand and silent masterpiece - the universe? Perhaps it is
prudent to heed the words of Newton that we can only progress by ‘Standing on the Shoulders of
Giants’.
With this inspiration, let us embark on a journey of the mind to the very origins of our scientific
understanding. Our journey begins in the year 2010 and we travel slowly backwards, noticing
as we go the thousands of Journal Articles and PhD’s published in the time we are winding back.
Soon we have passed to a time before social networking was a concept, and before cell phones had
colour screens. Our journey begins to speed up now, and we see the information revolution and
the internet disappearing. Let us go further still, before humans had travelled into space, before
television, before the Wright Brothers developed the first Aeroplane, before radio was discovered,
before highways and skyscrapers and cars and trucks. Let us go back to before Charles Babbage
first conceived his Analytical Engine, and continue still, to before electricity was understood and
before railways were ever built. Now we are travelling at a breathtaking speed beyond the first
telescope, crafted by Galileo Galilei, and beyond the first use of a magnetic compass. We are
reaching the furthest stretches of known history - around us is a desolate plane. Human beings
are living in caves and hiding out in the vast unknown. Let us pause here, and begin our walk
back to the present.
Humans, from the very outset, have been driven by the urge to understand the world. Even
in the distant past we observe the prehistoric man carving out a tool from rock and wielding
it to best his prey. Further forward we see the discovery of fire, the taming of wolves and the
logging of the stars above. We now jump to the time of the Ancient Greeks. We see sundials and
water-watches for telling the time of day, pulleys and levers for moving heavy objects, channels of
water irrigating farms and cities, and the great Library of Alexandria - a veritable treasure trove
of all the knowledge of the time. Archimedes, who lived at this time, is regarded now as one of
the three most influential figures in the history of mathematics (the other two being Newton

iii
and Gauss). It was Archimedes who first gave serious thought to the language of the universe,
in devising methods for obtaining volumes and areas of various solids, and considering basic
ideas in mechanics and mathematics. These ideas were sporadic, but nonetheless foundational
to future understanding of the world. One important contribution to our understanding of the
universe by the Ancient Greeks in general is the formal study of Geometry. Euclid’s Geometry of
flat space was to become the cornerstone of classical mechanics in later years.
Moving forward to the 16-th century, we skip past ages of development in various different
parts of the world. Here we meet a man by the name of Galileo Galilei, well regarded as one of the
greatest thinkers of all time, who reasoned deeply about the nature of the physical world. Galileo’s
famous observations that objects of different mass fall with the same acceleration (9.8m s−1 ) is
still quoted today. One of the most paradigm-changing notions introduced by Galileo is that of
an inertial reference frame: The physics of a system, it seems, obeys the same laws no matter
what angle we look at it from, no matter where we look at it from, and no matter how fast we are
moving, provided that we are moving at a constant velocity. These are the symmetries of classical
mechanics, and it was Galileo’s discovery of the last one in particular that led to Newton’s First
Law of Motion. In the late 16-th and early 17-th century, we pause to consider the work of one
Johannes Kepler, who charted the motions of the planets through space, and first posed the theory
that they followed elliptical orbits around the sun. We notice the theory of the geocentric universe
losing all credibility, and being replaced with a heliocentric model of the solar system. Yet Kepler
has no theoretical framework that explains the motions of the planets.
Now we take our journey forward to the late 17-th century, where we meet the Eponymous
Hero of all Newtonian Mechanics: Isaac Newton Himself. Newton was an extremely able
mathematician, who independently developed Calculus (concurrently with Leibniz) and who
invented much of the Physics that is now regarded as classical optics and classical mechanics. He
developed the first sound theory of gravitation, and formulated classical mechanics in three
fundamental laws. The first of these is a restatement of Galileo’s inertial reference frames, the
second is the equation of motion obeyed by all classical systems, and the third gives meaning to
interactions between different objects in a system and different systems. Newtons gravitational
and mechanical theories make predictions that agree astonishingly well with observation.
Indeed, for several centuries after this point, the scientific community believed his work to be the
language of the universe. Not until the advent of Relativity and Quantum Mechanics in the early
20-th century was this to be disputed. Newton introduced to the world a physical theory both
rich and accurate, and naturally this served as a starting point for an explosion of research and
discovery. The field of Classical Mechanics, founded upon Newton’s Three Laws, flourished.
Several thinkers - Bernoulli, d’Alembert, Snell, Lagrange and Hamilton among others extended
Newton’s ideas to different circumstances and reformulated his laws into mathematical
frameworks that gave them greater versatility and applicability.
Indeed, moving forward to the 20-th century we see that it is Hamilton’s formulation of Classical
Mechanics that inspires the mathematical formulation of Quantum Mechanics by Schrödinger,
Heisenberg, Dirac and others. Moving further forward still we see Richard Feynman, inspired by

iv
the Lagrangian approach to classical mechanics, and using it to develop his elegant Path Integral
approach to Quantum Physics.
And so we return to the present, back past all the welcome discoveries and developments
that we left behind earlier and back to our opening question. How then do we make the next
breakthrough? Perhaps, like Feynman, we might draw some inspiration from the elegant language
of Lagrangian Mechanics. Like so many of the most powerful theories in Mathematics and
Physics, Lagrangian Mechanics has a depth to it that can only be fully appreciated after a long
and thoughtful perusal of its workings. Thus I encourage you to explore this rich field thoroughly,
and to find the hidden connections between this and other branches of mathematics (notably
the calculus of variations) and physics (notably quantum physics). May this be the beginning of
your journey.
Dario Fanucchi

v
Notes on the topics covered
I provide here some remarks, comments and discussions on the topics covered in the first part
of the course. These notes are not comprehensive and are sometimes a little off-topic, but are
intended to complement and clarify what was covered in the lectures. The level of detail here is
variable. In places there is just a sketch of something covered in more detail elsewhere (like the
lectures), and at times there is much more detail. I’ve also tried to supply pointers herein as to
where one might obtain further enrichment on the topics covered. In the Historical Anecdote,
which begins these notes, I have given a personal motivation for this course and Lagrangian
Mechanics in general. The pragmatist can skip this section and not risk losing any important
knowledge.
It is important to keep in mind that the physics we study in this course is, in effect, Newtonian
Physics. We will rework the familiar F~ = m a~ into very elegant forms with far reaching
consequences so that it is barely recognisable in the end; but nonetheless we will not escape from
the fundamental shortcomings of Classical Physics. The physics we study breaks down at speeds
close to the speed of light, and at scales where quantum effects are important. Nevertheless,
classical physics is a very good approximation to reality for a very large class of problems in our
world. Moreover, the elegant new language we will develop here to speak about classical physics
is precisely the most natural language in which to speak about all of the more sophisticated
modern theories today. In this sense, we will reveal in this course very deep ideas that run
beneath much of modern physics and mathematics.

vi
Contents

Preface i

Contributors ii

Foreword iii

Contents vii

1 Introduction 1
1.1 Development of a Physical Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Frames of Reference and Galilean Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Newton’s Laws of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Classical Mechanics as a State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Algebra and Geometry 12


2.1 Coordinate Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Euclidean Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Coordinate Systems and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 The 2-Dimensional Cartesian Coordinate System . . . . . . . . . . . . . . . . . . . 23
2.3.2 The 3-Dimensional Cartesian Coordinate System . . . . . . . . . . . . . . . . . . . 27
2.3.3 Other Linear Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.4 Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.5 Transformation Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Coordinate Curves and Coordinate Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Parametric Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Tangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.3 Cotangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.4 Tangent and Cotangent Vector Component Relations . . . . . . . . . . . . . . . . 39
2.4.5 The Metric Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.6 2-Dimensional Plane Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.7 2-Dimensional Elliptical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.8 3-Dimensional Polar Cylindrical Coordinate . . . . . . . . . . . . . . . . . . . . . . 49

vii
2.4.9 3-Dimensional Polar Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.10 Other Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5 Coordinate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

viii
Chapter 1

Introduction

We understand our environment by constructing mental models that capture some important
aspect of the environment. A model of a system helps us to understand that system by making
predictions that we can observe and test. In this way, we develop expectations of the behaviour
of the system. A physical theory is an organisation of mental models for a given system. A good
theory should not only describe a given system well, it should also give insight into the behaviour
of another system. Below is a description of the development of a physical theory.

1.1 Development of a Physical Theory


The formal method to define a new theory begins with an informed guess of the rules that govern
the behavior of a given system. The data that informs this guess comes from the collection of prior
knowledge acquired from observations of other systems and possibly adapting existing models
from other systems. A theory is a philosophy by which we may organise our understanding and
expectations. If a theory gives the incorrect prediction for a physically observed phenomenon
then the theory is wrong and a new guess is needed. The British statistician George Box famously
stated “All models are wrong, but some models are useful,” and we should be cognisant of this
when building our models. The construction of useful models is part of the process of building
better theories. Figure 1.1 gives a graphical representation of this process. This course shall focus
on the initial guess and intermediate step of computing the consequences of this special guess
for the behavior of mechanical systems.

Guess Compute Consequences Test/Check

Figure 1.1: A diagrammatic representation of the development and evaluation of a physical theory
starting with an initial guess, followed by the determination of the logical consequences of the
guess and comparison of the expected consequences with the measurements or observations.

1
A theory is a conceptual abstraction of the elements needed to describes a system. It is possible
to produce multiple abstractions that describe a single phenomenon. These abstractions can
be conceptually and mathematically distinct. Two theories that imply identical consequences
would be physically indistinguishable, even though the characters of these theories could be very
different.
The value of a theory lies not only in how well it might describe one given system, but in how
it might inform the description of other systems. Two conceptually and mathematically distinct
theories of a single phenomenon motivate distinct patterns of thought and provide different
insights into that phenomenon. Each theory will admit different modifications and motivate
distinct ideas for new theories. Most importantly, different theories admit different ideas of
natural modifications and extensions. An extension of one theory to a broader collection of
phenomenon might require less effort than what is required to extend another theory. As such,
the value of one theory might exceed that of another by measure of its usefulness in guessing
new, better theories. The collection of ideas shared by multiple theories in themselves provide
the inspiration for a more general theory. It is therefore advantages us to know multiple theories
at any instant.
A single theory might possess aspects that we might improve upon. Given two theories, one
might be considered better than the other depending on how one evaluates each with regards to
the following questions

1. Which theory make more accurate predictions?

2. Which theory is closer to the truth?

3. Which theory allows more easily computed consequences?

4. Which theory is most easily understood?

5. Which theory is most easily extended?

The relative rankings of two theories with respect to these questions might lead us to judge one
theory as better than the other, but such a ranking should not necessarily lead us to disregard one
theory in favour of another. In general, more powerful formulations should explain more while
using fewer assumptions and admitting fewer exceptions.
Given a collection of facts about some system, how much about that system do we actually
know?. Answering this question requires us to understand of the distinction between information,
knowledge and the philosophy about the models that we build. Information on about a system is
a collection of uncurated facts about that system. Knowledge is the collection of ideas that we
generate from the data by abstracting the patterns in the collection of facts. A philosophy is a
formal arrangement of the collection of ideas about the system and how these ideas are supported
by the facts. In this sense information, knowledge and philosophy are distinct levels of abstraction
of the description of a system and its relation to other systems.

2
These notes contain a description of the relevant topics necessary to build a new formulation
of Classical Mechanics to compete with the standard Newtonian formulation. The new and
standard formulations will be physically indistinguishable. The purpose of these notes is to make
clear the numerous advantages of this new formulation over the standard formulation of Classical
Mechanics by providing a formulation wherein

1. physical consequences of mechanical configurations are more quickly and easily computed
than in the standard formulation;

2. the behaviours of mechanical systems are more easily understood than in the standard
formulation;

3. extensions are possible that are either more difficult to achieve or not possible in the
standard formulation.

This new formulation is not a replacement of the standard formulation, but one with some number
of conceptual and computational advantage over the standard formulation in certain problems.
The first task in developing this new formulation is to understand the formal structure of the
standard Newtonian Mechanics and build new abstractions from this formal structure.

1.2 Frames of Reference and Galilean Symmetries


The laws of classical physics are a mathematical abstraction of observed facts about classical
physical systems. These laws are invariant under a collection of transformations known
collectively as the Galilean Group,

Translation in space: x~ 0 = x~ + c~.

Translation in time: t 0 = t + k .

Rotation: x~ 0 = R x~ , R is an orthogonal matrix with det (R ) = 1.

Velocity transformations: x~ 0 = x~ + v~ t

We often refer to the invariance of physical laws under these transformations as the symmetries
of classical physics. By translational symmetry, there is no fixed ‘origin’ in the universe that can
be explicitly labelled with the zero vector (in space or space-time). This leads to the following
definition.

Definition 1 (Affine Space) An affine space is a space comprising a set of points where the
difference vector between any two points in an affine space is defined, but the sum of two points is
not.

Remark 1 We think of the classical universe as an Affine Space.

3
Generally, when solving problems we specify an origin by singling out a point in the affine
space. All other points are defined relative to that origin by the difference vectors, and so the
position of a point x in space is given by a sequence of N real valued numbers that label that
point. The collection of all such sequences of N numbers defines an N -dimensional vector space
RN .
The fact that the laws of physics are invariant under velocity transformations leads to the
concept of an inertial reference frame where in any two observers moving relative to one another
with a constant velocity will observe the same forces on some system that they are both watching.
This is the substance of the Galilean principle of inertia. An inertial reference frame is one that
is not accelerating. All such reference frames are equivalent by the fourth transformation of the
Galilean Group, and hence the laws of physics will be the same in any inertial reference frame.
A non-inertial reference frame is one that is accelerating. The laws of physics are not the
same in reference frames that are accelerating relative to each other. To account for this in
an accelerating reference frame, ‘fictitious forces’ are introduced into the system as a result of
the motion of the system. An observer in a rotating reference frame observes the deflected
motion of objects moving relative to the observer. This deflected motion is commonly interpreted
as a applied force that causes a deviation in the motion of objects from straight line motion.
Example 1.1 gives a demonstration of this phenomenon.

Example 1.1 (Coriolus-Effect) Suppose observers a and b stand at diametrically opposed points
on a rotating disc, see Figure 1.2. An observer a throws a ball to observer b . In the stationary
reference frame, the ball follows a straight line from the a to b . In the time it takes the ball to move
from a to b , observer a moves to a 0 and observer b moves to b 0 . Suppose we consider the flight path
of the ball from the perspectives of a and b . If a and b remain at fixed points on the rotating disc,
then a = a 0 and b = b 0 , however, the trajectory followed by the ball now deviates from the original
straight path since a and b move relative to the path of the ball. The path of the ball is deflected
away from b and toward a in the perspective of the observers in the rotating reference frame. This
deflection appears as the consequence of a ‘fictitious force’ known as the Coriolis Effect.

The Galilean Group describes the symmetries of Classical Physics. The symmetries of
relativistic physics are described by the Lorentz Group. This accounts for the change in the laws
of physics as the speed of the observer approaches the speed of light.

1.3 Newton’s Laws of Motion


Newtonian Mechanics is a means of reasoning about the motion of objects. To make this clear, it
is necessary to define the following concepts.

Definition 2 (Particle) A particle is an idealised object without physical extent, having at any
given instant, a well defined position and momentum. A particle has no dimension.

4
b b = b′
b′
ω

a′
a a = a′

Stationary Co-rotating

Figure 1.2: The relative motion of a ball thrown from an observer a to an observer b as seen in a
stationary reference frame and a reference frame co-rotating at a rate ω.

Definition 3 (State) The state of a collection of particles is the intrinsic information needed to
determine the motion of each particle in the collection.

We shall refer to a collection of particles as a system of particles or a system. In Newtonian


Mechanics, the state of a system is a function of the position and momentum of each particle.

Remark 2 The parameters that label a particle in a system may change in time. As such, a particle
has a state. Additionally, a particle may have a non-zero mass. This mass parameter may vary.
Therefore, it is sometimes useful to define the mass of a particle as part of the state of the particle.
At speeds much less than the speed of light, the momentum of a particle is approximately equal to
the product its mass and velocity. If the mass is not constant, then it is more accurate to consider
the state of the particle as its position and momentum than its position and velocity. For a many
particle system, we consider the state of the system to be the positions and velocities (or momenta)
of all the particles in the system.

The state of a particle is all the intrinsic information that we need to fully determine its motion.
In Newtonian Mechanics, the state of a particle is characterised by its position and momentum
at a given instant in time. Assuming we know all the (external) actions applied to a particle at
a given instant, as well as the (intrinsic) state of the particle, we can completely determine its
motion. It is a property of nature that position and momentum are the most general state variables
needed to characterise the state of a system. This is true even in Relativistic extensions to Classical
Mechanics, where the mass of a particle is not constant in time.

Remark 3 It is possible to imagine a world in which something else (say acceleration) was also
important for state, but it is not so in reality. In Aristotelian Mechanics, only position was thought
to be necessary to characterise the state of a particle.

5
The mathematical equations of motion are intricately connected to the definition of state.
In Newtonian Mechanics, the equations of motion are second order differential equations in
position. Hence they can be used to solve for the motion of a particle if the initial position and
velocity of the particle are known. In an alternate universe in which the equations of motion were
third-order, the initial acceleration would also be required, and thus acceleration would also be
part of the state of a particle. In Aristotle’s universe, the equations of motion are first order, and
hence the state is determined by the position alone.
The transitions between states in Newtonian Mechanics is governed by a set of rules,

NI: A particle will continue to move in a ‘straight line’ with constant velocity unless acted upon
by a force.

NII: When an resultant force is applied to a particle, the rate of change of the momentum of the
particle is equal to the magnitude of the resultant force, and acts in the direction of that
force.

NIII: For every force or action there is an equal but opposite force or reaction.

These rules are commonly know as Newton’s Laws of Motion.


NI is a statement about how a system maintains its state. It suggests that the ‘state of motion’
of the particle remains constant unless some force acts on it. We can always pick an inertial
reference frame in which such an object is stationary. The notion of state and the notion of an
inertial reference frame are both implied by NI (Galileo’s Law of Inertia).
NII is a statement about the resistance to change in state of a system when it is acted upon by
an agent of change. It suggests that an agent F~ that acts to change the state of motion of a particle
affect a change in the momentum p~ as
d p~
 ‹
F~ = .
dt
A particle tends to resist such a change more if it is heavier and less if it is lighter. When the mass
of the particle is constant, we get the more familiar result,

F~ = m a~

which implies that the acceleration is proportional to the applied force, and inversely proportional
to the mass of the particle.
NIII is a statement about the consequence of changing the state of a system. Of these rules,
NIII provides a rule that governs the interactions in a system. It gives a rule for dealing with forces
exerted by particles on each other. This is often explained by thinking of pushing a car down the
road. The force we exert on the car to make it move is the same as the resisting force exerted by
the car on us, making it difficult to push.
More information on Newton’s Laws, particularly with regards to problem solving, can be found
in a Physics text like [?] The remainder of this course shall focus mainly on NII and formulating
equations for the motion of systems that are compatible with NII.

6
1.4 Classical Mechanics as a State Machine
The state description of classical mechanics describes the mechanism by which a system which
occupies one well defined state at one instant will, under the action of a given rule, transition
to a new well defined state. There may be many such states and transition rules that govern
the state of a system. We consider in detail the consequences of this formal construction of
Classical Mechanics as a mathematical machine that manages system states. Before continuing,
let’s introduce some definitions.

Definition 4 (State Machine) A state machine is a collection of ‘states’ and ‘rules’ for moving
between these states. A finite state machine is a state machine with a finite number of states. An
infinite state machine is a state machine with an infinite number of states. The ‘state’ of a state
machine is exactly the information required to determine what will happen next.

2 6

1 3 5

4
Figure 1.3: A finite state machine with 6 states. The states are drawn as vertices (dots), and the
rules for transition between the states are drawn as directed edges (arrows) between them. If the
system begins in state 1, it will transition to state 2 at the first time step. At the second time step it
will transition to state 3, and eventually to state 4, after which it will return to state 1 and repeat
this process. If the system begins in state 5, it will transition to state 3 in the first time step, and to
state 4 in the second time step, and so on. If the system begins in state 6, it will remain in state 6
forever.

Definition 5 (Deterministic State Machine) A state machine is called deterministic if there is at


most one possible transition from every state. Given the current state of a deterministic system, it is
possible to completly determine what all future states of that system will be.

Definition 6 (Reversible State Machine) A state machine is called reversible if there is at most
one possible transition into every state. Given the current state of a reversible system, it is possible
to work backwards and completly determine what all past states of that system were.

Remark 4 The labels deterministic and reversible are used to describe state machines.

7
2 2

1 1

3 3

Non-Deterministic Non-Reversible
Figure 1.4: The non-deterministic state machine has at least one state with more than one
transition rule from that state, while a non-reversible state machine has at least state with more
than one transition to that state.

Figure 1.4 demonstrates the key difference between non-deterministic and non-reversible
state machines. Figure 1.3 depicts a deterministic but not reversible (if the system is in state 3 we
cannot tell whether it came from state 2 or state 5). It is possible for a state machine to be both
deterministic and reversible, deterministic but not reversible, reversible but not deterministic or
neither deterministic nor reversible.

Definition 7 (Phase Space) The set of all states in a state machine is called the phase space of the
state machine. This terminology is most often used for infinite state machines.

A discrete infinite state machine is an infinite collection of states and a set of transition rules
for stepping between these states at each time step. A continuous infinite state machine is an
infinite collection of states and a collection of rules for determining the state at any real value of
time t > 0 from a given initial state. Classical Mechanics is a continuous infinite state machine
that is both deterministic and reversible. The state of a particle is its position and momentum;
and the transition rule is Newton’s second law of motion (NII).

The phase space of a particle is defined on its paired position and momentum. It can be useful
to consider the path a particle traverses through its phase space as a function of time. The state of
a system of particles is the collection of position and momentum pairs of all the particles in the
system. The entire (classical) universe can be thought of as having some state at a given point
in time the collection of the positions and momenta of every particle in the universe. In our
classical picture of the universe, time is thought of as a continuous parameter, and we think of
the universe as a succession of ‘states’ for different values of this parameter. The laws of physics
tell us how to move up and down in this succession. Figure 1.5 demonstrates the phase-space for
a damped oscillator that includes the path of the oscillator through this space.

8

pc

s (0)

x
Figure 1.5: The phase-space of a damped oscillator of fixed mass contains the collection of all
state pairs s (t ) = (x (t ), ẋ (t )), where x (t ) and ẋ (t ) are the position of the oscillator and the velocity
the oscillator at a time t , respectively. Grey curves denote the collection of possible paths through
phase-space that an oscillator might follow. The red curves corresponds to the ordered sequence
of states traversed by a specific oscillator with a state s (0) = (x (0), ẋ (0)) at time t = 0. This oscillator
traverses a time ordered sequence of states on the time interval t ∈ (0, ∞). This sequence of states
terminates at the critical point pc at a time t → ∞.

Remark 5 In general, the state pair s = ( x~ , p~ ) of a system keeps track of the position and the
momentum. It is often convenient to represent the phase-space of a particle of fixed non-zero as
the collection of all pairs s̃ = ( x~ , x~˙ ) and keeping track of only the positions and velocities. This is
valid when the particle has fixed mass and moves at low (non-relativistic) speeds. This is depicted
in Figure 1.5.

1.5 Exercises
Exercise 1.1 Draw a deterministic state machine and a non-deterministic state machine, with at
least five states.

Exercise 1.2 Draw a reversible state machine and an irreversible state machine, with at least five
states.

Exercise 1.3 Classify the state machine associated with each of the following graphical
representations:

9
1. 3. ẋ
3 2

4 1 x

5 6

2. 4. ẋ
3 2

4 1 x

5 6

Exercise 1.4 Consider each of the following systems as a state machine and give a graphical
representation of each, including all necessary labels. In each case, classify the state machine as
either finite or infinite, and state whether it is deterministic and/or reversible.

1. A traffic signal on a South African road. (Hint: What is the sequence of the lights?).

2. A lamp switch. (Hint: What are the possible settings for a lamp switch.)

3. An oven thermostat. (Hint: How is an oven thermostat different from a lamp switch.)

4. A cart rolling along a rail and slowing to a stop, in 1 dimension. (Hint: What does the position
of the cart do relative to its speed?.)

5. A paperclip sliding across a smooth table top, at constant speed, in 2 dimensions. (Hint:
What does constant motion look like in phase space).

6. A mass oscillating up and down at the end of a spring that hangs from a fixed mount point,
in 1 dimensions. (Hint: What does oscillatory motion look like in phase space.)

10
7. An asteroid in a decaying orbit about the sun, in 3 dimensions (Hint: asteroids orbit in a
2-dimensional plane.)

Exercise 1.5 Show that Classical Mechanics is deterministic, then determine the conditions under
which Classical Mechanics is reversible.

Exercise 1.6 Show that Newton’s Second Law of Motion is invariant under Gallinean
transformations, and a constant force.

11
Chapter 2

Algebra and Geometry

Geometry and algebra form the basis of the quantitative study of the world. Geometry defines the
spatial relation among points and curves. Algebra is the formal set of rules by which computations
involving geometry are made. In this chapter we shall discuss the geometric and algebraic basis
for the description of mechanical systems. We shall begin with the coordinate free description of
length in a geometric context, then coordinate systems as a method for describing the relative
position of a point in space. We then discuss the algebraic formulation of length measure given a
coordinate system and promote the algebra to a general setting.

2.1 Coordinate Grids


Our ultimate goal in Mechanics is to understand the mechanism by which systems go. This
immediately implies that we should have same good description of the system in terms of its
location is space and time. The objective of this section is to get some understanding of the
conditions under which we can give a good description of a system that we want to study.
René Descartes was the first to introduce the concept of coordinate systems to geometry. He
assigned a pair of real numbers to each point in the plane - an x - and y -coordinate pair. The
plane thus parameterised is known as the Cartesian plane. Here we shall study several examples
of coordinates on the Cartesian plane and other spaces.
In the simplest case, we can study the motion of an object that moves along a straight line. As
a precursor to a more general discussion to follow, we study the motion of a small bead that moves
along a straight wire with infinite length. In this case, we might ask where the bead is located at
some point in time, and then again at another point in time. If we are to make intuitive sense
of how the bead moves, we should have a way to denote the position of the bead along the wire.
The most natural description of the position of the bead in its motion is the distance of the bead
from some predefined point. We shall call this reference point the origin or datum point. With
this choice of origin, we can define the position of the bead a using a single (signed) real valued
number. We write a ∈ R where a positive value of a denotes a position to the right-hand side of
the origin, and a negative value of a denotes a position to the left-hand side of the origin.

12
Since the real numbers R have a natural ordering, it is possible to decide if the bead has moved
toward the left or toward the right of its previous position by simply comparing the current a f
value of the bead position with a previously measured value of a i . We compute the displacement
vector
a~ = a f − a i

defined as the directed line segment whose initial point and final point are a i and a f , respectively.
When this directed line segment is placed with its initial point at a i , the final point will correspond
to a f . In this way, we can locate the final position of the bead a f using only its initial data a i and
the displacement vector a~.
~ along the wire defines a vector space V where
The set of all displacement vectors a~ and b

~ ∈V
αa~ + β b

when α, β ∈ R. This means that there exists a vector c~ ∈ V such that

~.
c~ = αa~ + β b

Note that the definition of each of these displacement vectors and, indeed, the vector space V is
defined in terms of the differences between the points in R corresponding to points along the
wire. The space R is an example of an affine space.

Definition 8 (Affine Space) An affine space is a space comprising a set of points where the difference
(vector) between any two points is well defined, but the sum is not defined.

The definition of an affine space uses the idea of differences between point rather than their
sum. This makes intuitive sense when considering displacement vectors, but the following
question now arises: Why should we need a definition of this form if we already know how to add
real numbers? The answer is quite simple - for any pair of numbers a and b , we can compute
their sum c = a + b and claim that the sum of the two points is now well defined. However,
we could equally compute the difference a = c − b or b = c − a . This demonstrates that the
definition is, at least, compatible with the summation of real numbers in R. However, an affine
space gives us something more general, which is still valid when the above argument is not, for
example, we can consider a displacement vector, connecting two points in the plane. There exists
no composition rule to combine two points in the plane to get a third point, however there does
exist a composition rule to add two displacement vectors in the plane to get a third displacement
vector. The interpretation of the third displacement vector is unchanged from that in the one
dimensional case, where the vector defines a displacement between points it the plane. Therefore,
once the collection of all displacement vectors is defined, the collection of points needed to define
them is no longer needed. This is not the case for the space of points themselves. A byproduct of
this definition is that there is also no preferred choice of origin. Can you explain why this might
be the case?

Remark 6 Physical space, the space where we live, is an affine space.

13
At this point we should note that we have not specified how the position a has been assigned.
To do this, we should introduce a ruler, with a regular increment, to demarcate the position of the
bead. A ruler with a regular increment has the benefit that increasing the geometric distance of
the bead from the origin by some fixed factor is associated to a re-scaling of the numerical value
of the a by the same factor. Additionally, we might choose to transfer the distance information on
the ruler, that we will call the metric, onto the wire where the bead moves. Transferring the metric
information from the ruler to the wire corresponds to placing a unique numerical value to each
point on the wire, while maintaining the ordering of the numbers on the ruler, and hence allows us
to read off the position information of the bead using only the numbers associated the points on
the wire, without the need for the ruler. Additionally, the distance between any two bead positions
is simply the magnitude of the displacement vector connecting the bead positions. We call this
process of assigning a unique numerical value to each point on the wire coordinatisation. At this
point we have associated the space of positions along the wire with the space of real numbers, R.
It should be clear that choosing coordinates is not a unique process. As an example, we could
swap out the original ruler that has some predefined unit, perhaps millimeters, with another ruler
with a different unit, say inches. Since we know that one inch corresponds to 25.4 millimeters, we
can use a conversion factor 25.4 to translate between the different distance measurement scales
on the millimeter ruler and the inch ruler. This conversion factor is the factor associated with
changing the metric from millimeters to inches. Then, if the bead is at a position a measured
using the inch ruler, then it will have a position

a 0 = 25.4a

as measured on the millimeter ruler. For each different ruler, there will be a new conversion
factor. We can think of the exchange of rulers as a transformation f on the metric information
that we place on the wire. Later we shall see that there are choices of coordinatisation where the
conversion factor changes depending on where the coordinates are studied. This rescaling gives
a coordinate transformation
f : R → R where a 7→ f (a )
subject to the following restriction

a < b ⇐⇒ f (a ) < f (b ) for all a , b ∈ R.

This means that if a point a is to the left of a point b on the wire before the change of coordinate
is applied, then the transformed coordinate f (a ) is also to the left of the point f (b ) under the new
coordinate system. When f is also linear, that is, for any scalar c

f (c a ) = c f (a )

then f is an example of an affine transformation.

Definition 9 (Affine Transformation) Suppose X is an affine space with a , b ∈ X and let f be a


function

f :X →X

14
b − a 7→ f (b ) − f (a )

then f is an affine transformation.

Remark 7 An affine transformation describes any function that preserves lines, parallelism and
relative scales but not distance information or angles.

The exchange of the millimeter ruler for an inch ruler in the description of the position
of the bead on the wire is an example of an affine transformation on the coordinate grid on
the wire that allows us to describe the position of the bead. We shall encounter many such
transformations and it is important to understand the connection between different coordinate
descriptions of physical systems and effects of changing coordinate systems when describing
certain physical quantities. Before continuing coordinate transformations, let’s consider some
interesting examples of coordinate systems on some well known spaces.
Example 2.1 continues the discussion of placing coordinates onto a 1-dimensional space.

Example 2.1 (Coordinate grid on a Circle) The circle S 1 is a 1-dimensional space. This means
that every point on the circle is described by a single number. Since there is no preferred point of
origin S 1 , we can choose an origin denoted 0 and proceed to assign coordinates the rest of the space,
see Figure 2.1. There are may ways to coordinatize S 1 , but we shall consider these technical details
for later in this text.
The simplest way to assign coordinates on S 1 is to associated points on the circumference with
the distance along the circumference from the chosen origin using a mapping function f . Suppose
that S 1 has a circumference of 2π. We can map a finite interval corresponding to the semi-open
subset I = [0, 2π) ⊂ R to S 1 , see Figure 2.1. By this mapping

f : R → S1
f
f (x ) = x mod 2π and x ∼ x + 2π

and the length of the interval I matches the circumference of the circle. The equivalence relation
f
x ∼ x + 2π means that the points x and x + 2π are mapped to the same point on S 1 . This means
that f is a many-to-one mapping and that f describes a valid one-to-one mapping only on the
restricted subset [0, 2π) ⊂ R.
After, the interval [0, 2π) is mapped to the circle, f maps [2π, 4π) onto S 1 and so on for each
interval [2πn, 2π(n + 1)) and for each n ∈ Z. The negative real numbers are similarly mapped to
the circle under the generalization to n ∈ Z. Clearly, f is a many-to-one mapping, and f maps the
entire space R is mapped onto S 1 , by winding the real line multiple times over the circle.

Example 2.2 extends the ideas from coordinatizing the 1-dimensional line to the 2-dimensional
plane. This is done by constructing a regular coordinate grid containing multiple copies of R.

15
2π 0

Figure 2.1: A mapping of the semi-open interval [0, 2π) to S 1 . Starting at x = 0 the positive real
line is wrapped around the circumference of the circle, such that the point x = 2π coincides with
the image of x = 0.

Example 2.2 (Coordinate grid on the Cartesian Plane) The 2-dimensional Cartesian plane can
be covered with a by a 2-dimensional grid that assigns a pair of numbers to each point on the plane.
The assignment of points follows from the placement of a single copy of the real line R on the plane.
Then, at each point x along this line, a second copy of R is placed such that each each point y
on the subsequent copies are aligned to form a grid. This grid is formed by the Cartesian product
of the two copies of R to form a new space R2 = R × R. Each point on the plane corresponds to a
unique point (x , y ) ∈ R2 corresponds to an ordered sequence where the first entry in the sequence,
x , is taken from the first copy of R and the second entry in the sequence, y , is taken from the second
copy of R in the product R × R. Figure 2.2 shows grid formed by the multiple copies of R that covers
the 2-dimensional plane.

Figure 2.2: Coordinate grid on the Cartesian plane built from copies of the real line R to form
the 2-dimensional product space R2 that covers the entirety of the 2-dimensional plane. Each
horizontal line (blue) corresponds one copy of R and each vertical line (red) corresponds another
copy of R, and the arrows at the end of each line indicates that the lines representing each copy
extend to an infinite distance in each direction.

16
An arbitrary point in the plane corresponds to a point on the grid with associated coordinate
pair (x , y ). Since each copy of the real line in the pair assigns a unique point in R2 , each pair (x , y )
uniquely defines a point on the Cartesian plane. This one-to-one mapping between point in the
plane and points in R2 ensures that this coordinatization is valid everywhere on the plane.

There is no unique process to assign coordinates to a space. An example of this fact is


demonstrated in Example 2.3, where a non-uniform coordinate grid is assigned to the two
dimensional plane.

Example 2.3 (Riemann Normal Coordinates on the Cartesian Plane) An alternative coordinate
system on the 2-dimensional Cartesian plane can be constructed by choosing an origin on the plane
and then laying out a copy of the positive real line. This single copy R defines a 1-dimensional
subspace of the plane. Then, using a protractor, we can lay out additional copies of the R, each
having an infinitesimal angular separation from the last, forming a radial grid, see Figure 2.3.
This coordinate system is called the Riemann Normal Coordinates (RNC) and corresponds to the
familiar plane polar coordinates on R2 .

Figure 2.3: Riemann normal coordinates correspond to the standard plane polar coordinates in
two dimensions. Radial lines (blue) emanating from the origin denote distance from the origin,
while the angular displacement between adjacent radial lines correspond to a given angle of
rotation about the origin. Circular (red) lines correspond to the loci of points at fixed radial
distance from the origin.

Notice that the RNC fail to be one-to-one at the origin, since this point has a unique radial
coordinate r = 0, but non-unique angular position.

17
The Riemann Normal Coordinates are a good starting point to assign coordinates to a patch of
a given space. In general, we do not expect that any single coordinate system to cover the whole
of the space we want to study, but in the case of the 2-dimensional plane and sphere S 2 , we can
use these coordinates to cover the whole space, see Example 2.4. In this case, we see that RNC
assign coordinates to the whole of S 2 , but this coordinate system is not one-to-one at two points.

Example 2.4 (Coordinate grid on the Sphere) A non-trivial example of Riemann normal
coordinates from Example 2.3 follows by considering the case of the sphere S 2 . Choose as the origin
the north pole of the sphere S 2 , and meet out an instance of the Riemann Normal coordinates. This
corresponds, under an appropriate rescaling, to the standard spherical polar coordinates on the
sphere, with fixed radial position.

Figure 2.4: Polar coordinate grid on a sphere S 2 . Radial lines (blue) emanating from the origin
denote distance from the origin, while the angular displacement between adjacent radial lines
correspond to a given angle of rotation about the origin. Circular (red) lines correspond to the
loci of points at fixed radial distance from the origin.

Notice that in the case of S 2 , the anti-podal point of the origin, corresponding to the south pole,
is at a fixed “radial” distance from the origin. Beyond this antipodal point, the radial lines wrap
back along S 2 . This means that the Riemann normal coordinates provide a one-to-one mapping
between a finitesubset of R2 onto S 2 . There are two points on S 2 where the coordinate mapping
breaks down, the north pole (with many possible angular displacement assignments) the south
pole (with many possible radial distance assignments). More generally, RNC can be extended to
more than two dimensions.

The torus is another example of a 2-dimensional space with interesting geometric and
topological properties that is easily coordinatized using a rectangular coordinate grid. This is
demonstrated in Example 2.5.

18
Example 2.5 (Coordinate grid on the Torus) The 2-dimensional surface of the torus T 2
corresponds to a product of copies of the circle, attached in a specific way so that the joined copies
form a surface with a handle. If we zoom into any single point on the T 2 , we find a local
description that suggests that the space is formed by a product of two copies of the circle, that is
T 2 ' S 1 × S 1 . However, this is not a good description of the whole torus since there is a special
arrangement of circles to form this surface. Figure 2.5 gives on choice of coordinates on the surface
of the torus.

Figure 2.5: Coordinate grid on a torus. The red circle with circumference cred and blue circle
with circumference cblue define the size of the torus. “Cutting” the torus along these two lines
transforms the 2-dimensional surface of the torus in three dimensions into a flat, 2-dimensional
rectangle corresponding to a subset of the 2-dimensional Cartesian plane.

We can add parameters to the surface of the torus by “cutting” the surface along the red line and
straightening the resulting shape to form a cylinder, and then cutting along the blue line and then
flattening the torus into a rectangle. We can then assign edge lengths cred and cblue , corresponding
to the circumference of the red and blue circles drawn on T 2 . We can identify the red edges of this
rectangle with a pair of adjacent vertical lines in Figure 2.2 and similarly, identify the blue edges of
this rectangle with a pair of adjacent horizontal lines in Figure 2.2. The surface T 2 now corresponds
to a rectangular piece of the Cartesian plane when the red edges of length are identified (this means
that the red line on the left-hand side of the rectangle is the red line on the right-hand side of the
rectangle) and similarly, and the blue edges are identified. Clearly, there is more than one choice of
rectangle in Figure 2.2 that can be matched, so there is a many to one matching between the R2 and
T 2 , and each cred × cblue rectangular block on the Cartesian plane is a good coordinate patch for T 2 .

The length along a path that coincides with one of the coordinate curves on the space is easily
determined by simply computing the difference between the final and initial position along that
path. This is identical to the process by which we measure the length of an object using a ruler,

19
since the construction of the coordinate grid transfers the information from the ruler onto the
space we wish to study. However, if the coordinate curve does not have a constant sized increment,
or if the path deviates from the coordinate curve, then the path length is not determined by this
simple procedure an a new method of computing lengths is needed. We shall discuss this method
in the next section.

2.2 Euclidean Geometry


The solution to the problem of measuring the distance between marked points on a triangle
drawn on a flat plane was known in the ancient world to the Greeks, Egyptians, Mesopotamians
and Babylonians. The problem is described as follows. Given a right-angled triangle with given
length and height, what is the extent of the diagonal edge? There exist many constructions for
this length. An intuitive construction of this length relies on only the relation between similar
triangles in R2 .
Consider the triangle in Figure 2.6. Label the vertices of a triangle; an edge joining vertices in
a triangle is a directed line element that is labelled by the vertices it intersects. In general, we may
refer to a given edge AB~ and with length AB . We can prove by construction the similarity of these
triangles as follows. Consider triangles 4AB C and 4AB D then

C ÂB = D ÂB [common angle]


A B̂ C = A D̂ B [90◦ ]
A Ĉ B = A B̂ D [sum of internal angles in triangle].

By equality of internal angles, we conclude that triangles 4AB C and 4AD B are similar and write
4AB C ∼ 4AD B . Similar arguments supply 4AB C ∼ 4D C B and 4AB D ∼ 4D C B . The size of
an angle is a measure of displacement of a point in space relative to some given reference point
from a third point. This displacement defines the ratios of the edge length between these points.
The similarity of triangles in the collection {4AB C , 4AB D , 4D C B } implies a corresponding set
of relations among ratios of edge lengths from one triangle with each of the other triangles in the
collection. So,
AB AD DC BC
= and =
AC AB BC AC
and so
AB 2 = AD · AC and B C 2 = D C · AC .

and the sum of these becomes

AB 2 + B C 2 = AD · AC + D C · AC = (AD + D C ) · AC ,

where AD + D C = AC , from which we determine

AB 2 + B C 2 = AC 2 . (2.1)

Equation (2.1) is the celebrated Pythagorean Theorem in Euclidean geometry.

20
C

D φ

θ
A
B
Figure 2.6: The triangle 4AB C with right-angle A B̂ C has internal angles equal to those in triangles
4AD B and 4D C B . Equality of the internal angles among these triangles ensures that the ratios
of the appropriate edge lengths of among triangles is maintained.

Remark 8 (Pythagorean Tripple) An integral solution to (2.1) is called a Pythagorean Triple.


Pythagorean triples are generated by the relation
2 2 2
n 2 − m 2 + 2n 2 m 2 = n 2 + m 2 (2.2)

for n, m ∈ Z. Note that this formulation follows directly from the factorisation of quartic
polynomials.

We shall use (2.1) as the basis of measurement in Euclidean space. We shall use this idea
of length to define an operation that maps vectors to scalars that is compatible with our usual
ideas of length. To construct the an expression for the lengths of vectors in the plane, we may
use the Law of Cosines which follows from the geometry of the plane. Given the labelled triangle
construction in Figure 2.7, we find the following relations among the edges,
2
~ − a~) · (b
(b ~ − a~) = b
~ − a~
2
~ ·b
a~ · a~ + b ~ − 2a~ · b
~ = ka~k2 + b
~ ~ cos (θ )
− 2 ka~k b
~ = ka~k b
a~ · b ~ cos (θ ) . (2.3)

By the laws of cosines, we find a natural relationship between the magnitudes of two vectors
and the angle between them, and component wise multiplication and sum of two vectors. We
call this the dot or inner product. Clearly, the inner product of a vector with itself returns the
Pythagorean measure of its length. We have used only algebra and the Euclidean law of cosines
to reach this outcome. We can extend this definition to more that two dimensions by adding
more components to each vector, without changing any part of the formal relation or reference to
coordinate directions.

The definition of relative distance defined by (2.1) and the Law of Cosines shall form the basis
of all measurements of length to follow in these notes. In particular the dot product shall define
the mathematical machinery that we shall use to measure lengths of objects.

21
a~

θ ~ − a~
b

~
b

Figure 2.7: The Law of Cosines implies that the lengths adjacent edges and the angle between
them encodes the length of the third edge of any triangle in the plane.

Remark 9 Notice that the construction of (2.1) did not rely on the orientation of the triangle in
Figure 2.6 in the plane. Similarly, the Law of Cosines formulation in (2.3) did not rely on the
orientation of the triange in Figure 2.7, but rather only the relative orientation of the vectors a~ and
~ . Indeed, the derivation of the relations between these quantities did not need any mention of
b
special points or point descriptions. The only assumption needed to derive these quantites is that
we work in the flat, 2-dimensional Euclidean plane. The independence of these results from any
choice of description highlights a fundamental insight into the nature of these quantites, that their
existence is independent of how we choose to describe them. Moreover, since these statements are
true independently of our choice of coordinate description, these quantities must hold true for any
choice of coordinates and changing from one coordinate description to another will not change these
quantities. We shall consider this idea again later when we consider coordinate transformations
which will naturally lead to the concept tensors.

In Euclidean geometry, we have the familiar descriptions of length, area and volume. Moreover,
when the quantities we want to measure are conveniently aligned with the coordinate axes,
then we recover the familiar descriptions of these quantites. For example, when the edge of
an object we wish to study is aligned with on of the coordinate axis, then we can use unit of
measurement provided by the coordinate grid to “measure” the corresponding length. We can
find correspondingly simple descriptions for area and volume.

2.3 Coordinate Systems and Their Properties

The state of a particle depends on its position and momentum. For us to reason about these
properties, we need a coordinate system that maps spatial positions onto a mathematical
framework. Recall that the classical universe is an Affine Space. In whichever coordinate system
we work, we must first select a reference point in our system and call it the origin.
René Descartes was the first to introduce the concept of coordinate systems to geometry. He
assigned a pair of real numbers (x , y ) to each point in the plane. The plane thus parameterised is
known as the Cartesian plane.

22
2.3.1 The 2-Dimensional Cartesian Coordinate System
We begin by choosing two orthogonal directions in the space R2 and label them with the
corresponding unit vectors x̂ and ŷ directions. The unit vectors in these two directions, being
orthogonal, are linearly independent. Thus, since R2 is a 2-dimensional space, these unit vectors
form an orthonormal basis for the plane. Every point in the plane can then be expressed as some
linear combination of these unit vectors.
Figure 2.9 shows how the coordinate position of a marked point in R2 is described by a position
vector p~ with a given extent in each of the coordinate unit vector directions. The general position
p~ can be written as
p~ = a x̂ + b ŷ ,

where a = p~ · x̂ and b = p~ · ŷ . The pair (a , b ) are called the coordinates of p~ . In a purely symbolic
sense, we may think of the above as
‚ Œ
€ Š a
p~ = x̂ ŷ
b

where standard row and column matrix multiplication is used to evaluate this product. The unit
vectors x̂ and ŷ above are usually omitted on account of an implicit understanding of which basis
is being used. Indeed, since every vector in the plane can now be uniquely identified with its
coordinates, we can simply identify p~ with the tuple of coordinates (a , b ),
‚ Œ
a
p~ :=
b

where the position in the column of numbers is sufficient to define the different coordinate
direction components.

When constructing a 2-dimensional Cartesian coordinate system to describe a given problem,


we must choose three parameters, namely, the origin, the x̂ direction and the ŷ direction. Notice
that the requirement that the x̂ and ŷ directions are orthogonal implies that any choice of one
leaves one of two choices for the other (see Figure 2.9).

Assume we have chosen a single point in the Affine Space of the Classical Universe to represent
the origin of our coordinate system. There are infinitely many 2-dimensional Cartesian Coordinate
Systems rooted at this single point. Consider first coordinate systems that are related by a rotation
operation. Consider three 2-dimensional Cartesian coordinate systems that are rotated with
respect to each other, as in Figure 2.10. Without loss of generality, assume that one of these is the
standard (x , y ) Cartesian coordinate system, which is represented with a horizontal x -axis and a
vertical y -axis. Consider a point p coordinates (x , y ) relative to a set of coordinate axes x̂ and ŷ ,

23

a p~


b

Figure 2.8: The coordinate direction axes in a 2-dimensional coordinate system that define the
coordinate position of a marked point with position vector p~ = (a , b )> .

y1
x

y2

Figure 2.9: The choice of orthogonal coordinate direction axes in a 2-dimensional coordinate
system.

(x 0 , y 0 ) relative to a set of coordinate axes x̂ 0 and ŷ 0 , and (x 00 , y 00 ) relative to coordinate axes x̂ 00 , ŷ 00 ,


where x̂ 0 and ŷ 0 are obtained by rotating x̂ and ŷ through an angle θ ; and x̂ 00 and x̂ 00 are obtained
by rotating x̂ 0 and ŷ 0 through an angle φ. Let p~ be the directed line segment starting at the origin
of the coordinate system and ending at p . Then p~ = p p̂ where p = p~ is the length of the vector
p~ . This is presented in Figure 2.10.

Next, consider the relative angular displacements of the each set of coordinate axes. The
following identities are useful in the discussion to follow,
  
cos α + β = cos (α) cos β − sin (α) sin β
  
sin α + β = cos (α) sin β + sin (α) cos β .

24
ŷ 0 ŷ
ŷ 00

p~
x̂ 00

α x̂ 0
φ
θ

Figure 2.10: Multiple rotation of a coordinate system with a designated point P marked by the
displacement vector p~ . The x̂ , ŷ -coordinate system is rotated by φ relative to the x̂ 0 , ŷ 0 -
 
00 00

coordinate system, which is itself rotated by θ relative to the x̂ , ŷ -coordinate system. The vector
p~ now has a different component representation in each coordinate system.

From Figure 2.10 we find


x = p cos (α) and y = p sin (α) .
and
x 0 = p cos (α − θ ) and y 0 = p sin (α − θ ) .
We can decompose the compound angle expressions using the angular composition identities

x 0 = p (cos (α) cos (−θ ) + sin (α) sin (−θ ))


= p cos (α) cos (θ ) − p sin (α) sin (θ )
= x cos (θ ) + y sin (θ )

and

y 0 = p (cos (α) sin (−θ ) + sin (α) cos (−θ ))


= −p cos (α) sin (θ ) + p sin (α) cos (θ )
= −x sin (θ ) + y cos (θ ) .

We can express these relations in matrix form as


‚ Œ ‚ Œ‚ Œ
x0 cos (θ ) sin (θ ) x
=
y0 − sin (θ ) cos (θ ) y

The matrix, ‚ Œ
cos (θ ) sin (θ )
R (θ ) =
− sin (θ ) cos (θ )

25
specifies the rotation. Note that R (θ ) carries coordinate pair (x , y ) through an angle θ to (x 0 , y 0 )
in the new coordinate system. The matrix R is called a rotation matrix. Note also, R (−θ ) carries
coordinate pair (x 0 , y 0 ) through an angle θ to (x , y ) in the original coordinate system, such that a
rotation through an angle θ followed by a rotation through an angle −θ leaves the point (x , y )
unchanged. Additionally,
‚ Œ ‚ Œ>
cos (θ ) − sin (θ ) cos (θ ) sin (θ )
R (−θ ) = = .
sin (θ ) cos (θ ) − sin (θ ) cos (θ )

Therefore,
‚ Œ‚ Œ ‚ Œ
cos (θ ) − sin (θ ) cos (θ ) sin (θ ) 1 0
R (−θ )R (θ ) = R > (θ )R (θ ) = ,
sin (θ ) cos (θ ) − sin (θ ) cos (θ ) 0 1

or,
R > (θ )R (θ ) = 1,

which implies that


R > (θ ) = R −1 (θ ).

Successive rotations can be represented by successive multiplications of the rotation matrix.


If P has coordinates (x 00 , y 00 ) with respect to x̂ 00 and ŷ 00 , where these axes make an angle φ with x̂ 0
and ŷ 0 , then,
   
x 00 = x 0 cos φ + y 0 sin φ and y 00 = −x 0 sin φ + y 0 cos φ ,

which is now a rotation of (x 0 , y 0 ) through an angle φ. In matrix form


‚ Œ ‚  Œ ‚ Œ ‚  Œ ‚ Œ‚ Œ
x 00 cos φ sin φ x0 cos φ sin φ cos (θ ) sin (θ ) x
=   =   ,
y 00 − sin φ cos φ y0 − sin φ cos φ − sin (θ ) cos (θ ) y

which is equal to a rotation through an angle φ + θ . Then,


‚ Œ ‚  Œ ‚ Œ
x 00 cos φ + θ sin φ + θ x
=   ,
y 00
− sin φ + θ cos φ + θ y

and φ + θ is the angle between x̂ and ŷ , and x̂ 00 and ŷ 00 .

Remark 10 Rotating a vector about the origin of a fixed coordinate system through an angle θ is
equivalent to fixing that point and rotating the coordinate axes about the origin through an angle
−θ .

Not all 2-dimensional Cartesian coordinate systems rooted at the same origin can be related
by a rotation. For instance, the coordinate system resulting from swapping the x and y axes is not
a rotation of the original coordinate system. The transformation matrix in this case is given by
‚ Œ ‚ Œ‚ Œ
xp 0 1 xp0
= ,
yp 1 0 yp0

26
It is clear that any 2-dimensional Cartesian coordinate system rooted at the same origin as the
original (x , y )-system is either a rotation of the (x , y )-system, a swap of axes, or both a swap of
axes and some rotation of the result. This latter transformation is simply the composition of the
other two ‚ Œ ‚ Œ‚  Œ ‚ 0 Œ
xp 0 1 cos φ − sin φ xp
=   .
yp 1 0 sin φ cos φ yp0
Schematically, such transformations can be written as

r~p = M r~p0 ,

where M is either the rotation matrix presented earlier or the composition matrix above. Notice
that M > M = 1, so M is an orthogonal matrix. We write M ∈ O (2) to mean M is an orthogonal
matrix of size 2. O (2) is the orthogonal group of order 2. Observe that if M is a pure rotation
then det (M ) = 1, and if M involves a switching of axes, then det (M ) = −1. The class of all pure
rotations of two dimensional Cartesian coordinate systems is called S O (2), the special orthogonal
group of order 2.

2.3.2 The 3-Dimensional Cartesian Coordinate System


A 3-dimensional Cartesian coordinate system is fully specified by four parameters. It is necessary
assign to some point in the system the special property of being the origin of the system. Thinking
of all other points in the universe by their difference vector from the origin we now have a vector
space R3 . The other three parameters are the orthogonal unit vectors x̂ , ŷ and ẑ . As in two
dimensions the coordinates of any point can be written in terms of these unit vectors
   
€ Š px  px 
p~ = px x̂ + py ŷ + pz ẑ = x̂ ŷ ẑ py  := py  .
pz pz

As before, once we have chosen two of these (say x̂ and ŷ ) we have two possible choices for the
other (say ẑ ). In this instance there is a physical significance attached to this choice - it determines
the handedness of the resulting coordinate system.

Remark 11 (Handedness of a Coordinate System) In a Right-Handed coordinate system, if you


place your right hand at the origin, and point your fingers down the x axis, and then curl your hand
towards the y -axis, your thumb will point up the z axis. In a Left-Handedy coordinate system we
apply the same rule replacing the right hand with the left hand.

The right-hand rule is used to determine the direction of the third axis given two other
coordinate axis directions in three dimensions. We use this to give an orientation among triples
of orthogonal coordinate unit vectors and define the vector cross-product of two vectors to give a

27
z z

φ φ

y x x y

Left-Handed Right-Handed

Figure 2.11: The relative orientation of the left-handed and right-handed coordinate systems.

formal mathematical operation that generates this orientation. We usually prefer to work in a
Right-Handed coordinate system. In a right-handed system x̂ × ŷ = ẑ holds, while in a
left-handed system x̂ × ŷ = −ẑ holds. The usual cross-product formulas are phrased for
right-handed coordinate systems.
If we fix the origin, two sets of 3-dimensional Cartesian Coordinate systems are related to each
other by a 3-dimensional Orthogonal Transformation

r~p = M r~p0 ,

where M ∈ O (3). This fact is easy to take on face-value as a generalization of the result for 2-
dimensional Cartesian Coordinates, and indeed, this result generalizes correctly to n -dimensional
space. This generalisation deserves some discussion; to this end, suppose we connect two sets of
Cartesian Coordinates in n dimensions by an invertible linear transformation, Q . Then distances
between points in space should be preserved by Q . No matter what n -dimensional Cartesian
~ ) between two prescribed points, with position vectors a~
space we consider, the distance d(a~, b
~ , should be the same. Thus, we require
and b

~ ) = d(Q a~,Q b
d(a~, b ~ ).

Stated differently,
~ )> (a~ − b
(a~ − b ~ ) = (Q a~ − Q b
~ )> (Q a~ − Q b
~ ).

Simplifying this yields


~ )> (a~ − b
(a~ − b ~ ) = (a~ − b
~ )>Q >Q (a~ − b
~ ).

~ ∈ Rn it follows that
And since this must be true for all a~, b

x~ > x~ = x~ >Q >Q x~ ∀ x~ ∈ Rn .

Let us name this central matrix L = Q >Q . Choose x~ = (0, 0, . . . , 0, 1, 0, 0, . . . 0)> with the 1 in the i -th
position. Then,
1 = x~ > x~ = x~ > L x~ = L i i .

28
Since this holds for all i , the diagonal elements of L must all be 1’s.
We can learn more about the structure of the transformation matrix L by considering the
following neat construction. Choose x~ = (0, 0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0)> with 1’s in the i -th and
j -th positions and 0 everywhere else. This yields

2 = x~ > x~ = x~ > L x~ = L i i + L j j + L i j + L j i .

But we know from the previous result that L i i + L j j = 1 + 1 = 2. Combining these statements
results in
L i j + L j i = 0.

But L = Q >Q and so L > = L , meaning L i j = L j i . Putting these last two results together, we get
L i j = 0, i 6= j . Thus L is a diagonal matrix with 1s on the diagonal - the identity matrix. We
have shown that Q >Q = I , and hence Q must be an orthogonal matrix to preserve distances.
This confirms that in general transformations between n-dimensional Cartesian (orthogonal)
coordinate Systems must be via orthogonal matrices. On the other hand, for every orthogonal
transformation, the angle between two vectors is preserved by the transformation. This is because
the inner product is preserved,

x > y = x > 1 y = x >Q >Q y = (Q x )> (Q y ).

So, if we started with an orthonormal basis, an orthogonal transformation will keep our basis
orthonormal, and hence every orthogonal transformation takes a Cartesian coordinate system to
another Cartesian Coordinate System.
In summary, every transformation between Cartesian coordinate systems is an orthogonal
transformation, and every orthogonal transformation maps Cartesian coordinate systems to
Cartesian coordinate systems. In other words the transformations mapping Cartesian coordinate
systems to Cartesian coordinate systems are precisely the orthogonal transformations.
We remark further that the orthogonal matrices always have a determinant of either +1 or −1,
  
det Q 2 = det (Q ) det (Q ) = det Q > det (Q ) = det Q >Q = det (1) = 1.

Thus det (Q ) = ±1. Those with determinant +1 can be thought of as rotations. We call the collection
of all of these matrices the special orthogonal group S O (n ). Those with determinant −1 can be
thought of as first interchanging axes and then rotating. In 3-dimensions, transformations of
determinant +1 preserve the handedness of the coordinate system, whilst transformations of
determinant −1 reverse the handedness of the coordinate system.

2.3.3 Other Linear Coordinate Systems


In linear algebra, any set of linearly independent vectors that spans the space of interest can be
thought of as a basis for that space. These vectors need not be of unit length, and they need not be
mutually orthogonal. Any point in space can be expressed as a linear combination of these basis
vectors. The coefficients in this expansion are the coordinates of the point in the linear basis.

29
We now consider the transformation equations between standard Cartesian coordinates and
some linear coordinate system with basis {v~1 , v~2 , v~3 }. These vectors need not be of unit length and
they need not be mutually orthogonal. The only requirement is that they are linearly independent.
Let us consider an arbitrary point in space, p~ . This point is written in both the original
Cartesian coordinates and in the linear coordinate system as follows,

p~ = p1 x̂1 + p2 x̂2 + p3 x̂3 ,


p~ = p10 x̂1 + p20 x̂2 + p30 x̂3 .

Operating with x̂1> in both equations yields

p1 = p10 x̂1> v~1 + p20 x̂1> v~2 + p30 x̂1> v~3 .

Similarly,

p2 = p10 x̂2> v~1 + p20 x̂2> v~2 + p30 x̂2> v~3 ,


p3 = p10 x̂3> v~1 + p20 x̂3> v~2 + p30 x̂3> v~3 .

Thus,           
p1 x̂1> v~1 x̂1> v~2 x̂1> v~3 p10 x̂1> € Š p1
0
p10
p2  =  x̂2 v~1 x̂2> v~2 x̂2> v~3  p20  =  x̂2>  v~1 v~2 v~3 p20  = X V p20 
   >        

p3 x̂3> v~1 x̂3> v~2 x̂3> v~3 p30 x̂3> p30 p30

In the above, X is the matrix whose column vectors are x̂1 , x̂2 and x̂3 while V is the matrix whose
column vectors are v~1 , v~2 and v~3 . We are free to write the matrices X and V in any coordinate
system, as long as we use the same one for both. If we use the original Cartesian Coordinates,
then X = 1, and the columns of V are the coordinates of the vectors v~1 , v~2 and v~3 in our original
Cartesian Coordinate System.
With X and V defined as above, the same result can be derived directly
   
p1 p10
p~ = X p2  = V p2  .
   0

p3 p30

And so,    
p10 p10
p2  = X V p20  .
 0 −1
(2.4)
 

p30 p30

Since X is orthogonal, X −1 = X > . In the general case of transforming between any two linear
coordinate systems, the last step no longer applies, but the remainder of the argument is valid,
and (2.4) gives the transformation rule between any two sets of linear coordinate systems. We
can continue in this line to show in a different way that the matrix of transformation between two
Orthonormal coordinate systems is orthogonal.

30
2.3.4 Curvilinear Coordinates
When dealing with orthonormal coordinate systems it is quite clear what the coordinate axes are.
This concept generalizes to all coordinate systems. In particular, it generalizes to non-orthogonal
coordinate systems as well. If we set all the coordinates to zero except for one, and allow that one
coordinate to vary, we trace out the coordinate axis of that coordinate. Check that this concept
yields the standard x −, y − and z -axes in the Cartesian Coordinate systems. In arbitrary linear
coordinate systems, the coordinate axes are lines emanating from the origin along the directions
of the basis vectors.
A related concept is that of coordinate curves. Instead of setting the other coordinates to zero
if we simply fix them as some prescribed constants, while allowing our chosen coordinate to vary,
we obtain a coordinate curve for that coordinate. Each coordinate is then associated with a family
of coordinate curves. In linear coordinate systems, these curves are lines parallel to the basis
vectors.
Finally, we consider the idea of coordinate surfaces. In a 3-dimensional coordinate system, we
can set one of the coordinates to be a constant value and allow the other two to vary. The figure
that they trace out is called a coordinate surface. It should be clear that the coordinate surfaces of
linear coordinate systems are planes.
So far we have covered only linear coordinate systems. That is, we’ve considered coordinate
systems that made use of the vector-space nature of 2-dimensional and 3-dimensional space
to assign coordinates to each point. In all of these systems the coordinate curves are lines and
the coordinate surfaces are planes. However, it is not necessary to confine ourselves to linear
representations of space. Indeed, any parameters that unambiguously label every point in space
can be thought of as coordinates. It is possible, and often useful, to use nonlinear coordinate
systems for solving problems.
We consider here a family of coordinate systems collectively known as Curvilinear
Coordinates. The name Curvilinear Coordinates is derived from the fact that the coordinate
curves and coordinate surfaces are not necessarily straight lines and planes in these coordinate
systems, but curves and curved surfaces. For every set of curvilinear coordinates we construct,
we will consider the following aspects: transformation equations, directional derivatives and
length measurements along curves. The translation between different coordinate systems
corresponds to finding transformation equations between coordinate systems and taking into
accound the change in geometry of the coordinate curves.

2.3.5 Transformation Equations


Consider the curvilinear coordinates s 1 , s 2 , . . . , s n . In order for a unique set of curvilinear
coordinates to correspond to every point in space, we need to stipulate a correspondence
between the Cartesian Coordinates of some point (in a well-specified Cartesian System), and the
Curvilinear Coordinates of the same point. We achieve this through transformation equations

x 1 = x 1 s 1, s 2, . . . , s n ,

31

x 2 = x 2 s 1, s 2, . . . , s n ,
..
.

x n = x n s 1, s 2, . . . , s n .

It is customary to refer to the coordinates with superscripts (s i ) rather than subscripts (si ). There
is a reason for this, but for now we simply accept this as convention.
We require that these transformation equations are well-behaved. We mean by this

1. The equations must be locally invertible. In some neighbourhood of every point there is an
expression for the curvilinear coordinates in terms of the Cartesian Coordinates. If this was
not the case the coordinates would not ‘uniquely’ label the points in some region of space.

2. The Jacobian Matrix of the transformation must be non-singular. This is necessary to


ensure that the number of curvilinear coordinates at a point correspond to the number of
coordinates that described that point and ensures that the transformation is well defined.

3. The transformation equations must be differentiable, and the derivatives must be


continuous. Often they are smooth (infinitely differentiable). This requirement ensures
that the coordinate curves and surfaces are indeed curved, as opposed to jagged.

The inverse function theorem tells us that point 3 above follows from points 1 and 2. From point
3, we can always invert the equations of transformation at some given point in space, so that we
can write

x 1 = x 1 (s 1 , s 2 , . . . , s n ) s 1 = s 1 (x 1 , x 2 , . . . , x n )
.. ..
. and .
x = x (s , s , . . . , s )
n n 1 2 n
s n = s n (x 1 , x 2 , . . . , x n ).

An informal motivation for these facts will follow after we’ve defined tangent vectors. Next we
consider two familiar examples of invertible coordinate transformations.

Example 2.6 (Rectilinear Coordinate to Rectilinear Coordinate Invervsion) Consider the


following reparametrisation of the standard x − y -coordinate system

x 0 (x , y ) = x − y
y 0 (x , y ) = x + y .

We can invert this coordinate transformation directly to find

 1 
x x 0, y 0 = x 0 + y 0
2
 1 0 
y x , y = y − x0 .
0 0
2

32
Clearly, the forward and reverse transformations are simple linear transformations of the coordinate
on either side of the equality sign. We can associate to this transformation a simple matrix transform
‚ Œ ‚ Œ‚ Œ
x0 1 −1 x
=
y0 1 1 y
for which we now have the inverse transformation
‚ Œ ‚ Œ‚ Œ
x 1 1 x0
= .
y −1 1 y 0
In the case of this transformation, it follows that the representation of any vector r~ in the x − y -
coordinate system has a corresponding representation r~0 in the x 0 − y 0 -coordinate system, where

r~0 = A −1 r~ and r~ = A r~0

where the primed and unprimed coordinate axes are related by

X~ 0 = A X~ and X~ = A −1 X~ 0 .

The proof of these statements is straight forward and is left as an exercise to the reader. It is now
simple to verify that the primed coordinates system corresponds to a rotation of the unprimed
coordinate system through an angle of −π/4.

Example 2.7 (Plane Polar Coordinate to Rectilinear Coordinates Inversion) Consider the
standard plane polar coordinates on the x − y -plane,

x (r, θ ) = r cos (θ )
y (r, θ ) = r sin (θ ) .

We can invert this coordinate transformation by noting that


y r sin (θ ) y 
= = tan (θ ) or θ = arctan .
x r cos (θ ) x
Now, we find v
x x t  y 2
r= =  = x 1+ = x2 + y 2
p
cos (θ ) cos arctan y
x
x
or alternatively, v
y y t  y 2
r= =  = x 1+ = x2 + y 2
p
sin (θ ) sin arctan y
x
x
and we have used the identities
1 θ
cos (arctan (θ )) = p and sin (arctan (θ )) = p .
1+θ2 1+θ2
In each case, we recover the familiar coordinate transformation functions
y 
r (x , y ) = x 2 + y 2 and θ (x , y ) = arctan
p
.
x

At this point it is important to note the subtlty of the invers trigonometric function arctan y /x ,
which will return the correct values for the angle θ on the open interval y /x ∈ (−π/2, π/2). The
interested reader should consider a reformulation of this inverse function which will return the
correct value of theta on the complete angular interval [0, 2π).

33
2.4 Coordinate Curves and Coordinate Surfaces
We will normally distinguish a particular curvilinear coordinate system according to the shapes
of the coordinate curves (in the 2-dimensional case) or coordinate surfaces (in the 3-dimensional
case). We will derive these objects for each coordinate system we discuss.

2.4.1 Parametric Curves


Parametric curves give a simple way to assign coordinates from one space to another. We can
assign coordinates on the real line R to the real line R by linear transformation

f :R→R
f (x ) = a x + b = x 0

where a , b ∈ R. The mapping f assigns to each x ∈ R a unique image x 0 ∈ R for every x ∈ R.


For each choice of a , b ∈ R there corresponds a different map f . Clearly, the choice of f is not
unique. When the spaces involved in the coordinate mapping are more complicated than R, we
should expect that the mapping function f will be more complicated, too. As an example of a
more complicated mapping, suppose we consider a coordinatisation map f that assigns to each
point in R a point on the unit circle S 1 . Next, we consider one example of such a mapping known
as a projective map.

Example 2.8 (The Projective Mapping R → S 1 ) An efficient way to assign numbers to each point
on the circle using a continuous mapping from the real line is by considering displacement vectors
that connect points on the line to points on the circle. Suppose we mark the origin O as the zero
point on the real line R and as the centre of the unit circle S 1 , choose the point P as the point on the
circle directly above O , and let Z be any point in r ∈ R. Clearly
‚ Œ ‚ Œ ‚ Œ
0 0 r
O= , P= and Z=
0 1 0

We can think of P as the ‘north pole’ of the circle. Now construct the following position vectors
‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ
0 0 0 r 0 r
p~ = P − O = − = and v~ = Z − P = − =
1 0 1 0 1 −1

Notice that p~ connects the origin to the North pole and v~ connects the north pole to the point in R.
The line line element O P will intersect the circle S 1 at some point on its circumference. This line
element with be directed along v~ but will have a length that depends on the position of Z . We can
construct a new vector that joins the north pole to the point of intersection between the line element
O P and the circle using a linear combination of vectors p~ and v~,
‚ Œ
λr
w~ = p~ + λv~ = and kw~ k = 1
1−λ

34
where λ is a scaling parameter which adjusts the contribution of v~ in the definition of w~ and
changes the length of w~ along the line connecting the north pole an z . An appropriate choice of λ
will define the displacement vector w~ connecting the north pole to the point a on the circle. The
parameter λ is called a Lagrange Multiplier.
If we enforce the requirement that the radius of the circle is 1, then we can determine the value
of the multiplier λ. We can break down the calculation as follows.
‚ Œ ‚ Œ
λr λr
kw~ k = · = λ2 r 2 + (1 − λ)2 = 1
1−λ 1−λ

We compute the squared magnitude of w~ as a function of λ and set the value equal to 1

2
λ=
r2+1
and then solve λ as a function of the parameters of the problem.
‚ Œ
2r
r 2 +1
w~ = r 2 −1
r 2 +1

Finally, we recover a component wise expression for the value of w~ . Now for every value of the
parameter r , we assign a unique point to the circle. This argumentation can be generalized to
higher dimensions.

Remark 12 Lagrange Multipliers are often used in optimisation problems. A single Lagrange
multiplier is used in Example 2.8 to assign a specific value to a component in a vector sum, subject
to a geometrical restriction. Lagrange Multipliers are especially useful in problems where we
compare the lengths of parallel vectors. We shall revisit this concept later on when considering the
constrained motion of objects.

Suppose we were to consider some general functions f that take input parameters r, s and t
and output values in R, R2 or R3 . The generic functions f map elements from one space with
some number of dimensions (number of inputs to f ) into another space with a different number
of dimensions (number of outputs from f ). The functions a , b and c describe the components of
the outputs of f . In each case, the number of input parameters determines the dimension of the
output of f , regardless of the number of dimensions of the space where this output is sent. For
simplicity, only the input and output sequences are listed.

t 7→ f (t ): f map the real number line, R, back to itself, where f can only stretch or compress
parts of it.

(r, s ) 7→ (a (r, s ), b (r, s )): f maps the 2-dimensional plane, back to the 2-dimensional plane with,
where f can rotate, stretch or compress parts of it.

(r, s ) 7→ (a (r, s ), b (r, s ), c (r, s )): f maps the 2-dimensional plane into a subspace of 3-dimensional
euclidean space. The outcome is some continuous surface in 3 dimensions.

35
t 7→ (a (t ), b (t ), c (t )): f maps the real line to a 1-dimensional curve in 3-dimensional Euclidean
space. The outcome of this mapping is some smooth, continuous line in 3 dimensions.

In general, we shall use the word curve to mean a continuously connected, smooth, subspace of
another space. In particular, we shall consider curves in R3 . It is important to note that there is
a class of coordinate transformation for each of the parameters r, s and t that leave the output
unchanged and we consider these next.

Remark 13 We have already encountered the idea of an affine transformation. We can think of an
affine transformation more generally as any transformation that preserves collinearity and ratios
of distances.

An affine transformation is any transformation that preserves collinearity (i.e., all points lying
on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint
of a line segment remains the midpoint after transformation). In particular, we may transform
the parameter t along 1-dimensional curves such that it lies in the range from 0 to 1 and think of
this number as marking the percentage of the total length of the curve (t = 0 at the beginning of
the curve and t = 1 at the end).

2.4.2 Tangent Vectors


One of the reasons we require the transformation equations to be differentiable is that the
derivatives give us useful information about the structure of the transformation. In particular,
consider what happens when we fix all the coordinates at some point and allow one of them to
vary infinitesimally. The motion thus produced is in the ‘characteristic direction’ of that
coordinate. It is tangent to the coordinate curve at that point. Formally, we write the tangent
vector associated with the coordinate s i as

x 1 (s 1 , s 2 , . . . , s n )
 

∂  x (s , s , . . . , s ) 
 2 1 2 n 
∂ x~
 ‹
e~i = .. = .
∂ si  ∂ si

. 
x n (s 1 , s 2 , . . . , s n )

This idea is best understood through examples. The most basic example is Cartesian coordinates.
Example 2.9 demonstrates the most simple transformation equations corresponding to the identity
transformation of the coordinates curves. In this case the transformation equations are trivial.

Example 2.9 (Trivial Cartesian Coordinate Transformation) Consider the identity coordinate
transformation of the Cartesian coordinate system,

x = x (x , y , z ) = x , y = y (x , y , z ) = y , and z = z (x , y , z ) = z .

36
So, the tangent vectors are
           
x 1 x 0 x 0
∂     ∂     ∂    
e~x =  y  = 0 = x̂ , e~y =  y  = 1 = ŷ , and e~z =  y  = 0 = ẑ ,
∂x ∂y ∂z
z 0 z 0 z 1

These are the unit vectors in the x , y and z directions respectively.

For a general linear coordinate system with basis vectors {v~1 , v~2 , v~3 }, the tangent vectors at
any point can be easily shown to be exactly the basis vectors. This makes sense - the coordinate
curves run parallel to the basis vectors.
The tangent vectors are not always unit vectors. When they are not unit vectors, we can
normalize them to obtain unit vectors in the coordinate directions. The sizes of the tangent
vectors are called the metric coefficients, hi , of the coordinate system. We write,

e~i = hi ŝi ,

where ŝi is a unit vector in the same direction as e~i that we can think of as the unit vector in the
i -th coordinate direction.
Consider the Jacobian Matrix, J , of the coordinate transformation.
€ ∂ x 1 Š € 1 Š
∂ s1 . . . ∂∂ sxn
d x~
 ‹
J= =  ... .. ..  .

. . 
d s~
∂ xn ∂ xn
 
∂ s1 ... ∂ sn

Clearly, the columns of J are the tangent vectors


€ Š
J = e~1 e~2 . . . e~n−1 e~n .

The matrix will be singular if and only if the tangent vectors are linearly dependent; but this
means that the tangent space would collapse (its dimension would decrease), and the coordinate
curves would coincide. We do not want this to occur because it implies that locally (very zoomed
in), the coordinate space itself will collapse and not be invertible. This explains why a sensible
transformation must have a non-singular Jacobian Matrix at all points, and keeps our intuition
sharp about a requirement that might otherwise seem arbitrary and unnatural. The interested
reader should read up on the Inverse Function Theorem (or the Constant Rank Theorem) to formally
understand this point.
For the well behaved coordinate systems that we will consider, it follows that the tangent
vectors are always linearly independent and will hence always form a basis for the tangent space,
which is always n -dimensional. When we zoom in very close to a point in coordinate space,
the space begins to look like the tangent space. If we consider the region of space enclosed by
varying each coordinate over an increasingly small interval, this region will begin to resemble the
fundamental parallelogram or parallelepiped of the tangent space. The (signed) area/volume
of this entity is given by the determinant of the matrix formed by placing the tangent vectors

37
as column vectors i.e. the Jacobian Matrix. This reasoning leads us to a result well known in
Multivariable Calculus

d x 1 d x 2 . . . d x n = |det (J )| d s 1 d s 2 . . . d s n .

In many of the examples we study, the tangent vectors will be orthogonal. In this case the unit
vectors will be orthonormal, and we will have
€ Š € Š € Š
det (J ) = det e~1 . . . e~n = det h1 ŝ1 . . . hn ŝn = h1 . . . hn det ŝ1 . . . ŝn .

This last matrix of unit vectors is an orthogonal matrix - we can see this by pre-multiplying it by
its transpose
 
 >
 ŝ1> ŝ1 ŝ1> ŝ2 . . . ŝ1> ŝn 
ŝ1  > 1 0 . . . 0
ŝ ŝ ŝ > ŝ . . . ŝ2> ŝn  

ŝ >  €
 2 Š  2 1 2 2  0 1 . . . 0
 .  ŝ1 ŝ2 . . . ŝn =  ŝ3> ŝ1 ŝ3> ŝ2 . . . ŝ3> ŝn  =  
=1
 ..    .. .. . . . .. 

 .. .. ..  . . .

..
 . . . . 
ŝn> > > >
0 0 ... 1
ŝn ŝ1 ŝn ŝ2 . . . ŝn ŝn

Now the determinant of an orthogonal matrix is either +1 or −1, with the sign depending on the
order in which we supply the rows. The absolute value of this is always +1, and thus we get

|det (J )| = h1 h2 . . . hn ,

with the convention that we have chosen all the hi to be positive. Thus when the tangent vectors
are orthogonal, we can compute the area/volume elements very easily. We simply take the product
of the metric coefficients.

2.4.3 Cotangent Vectors


It is worth remarking at this point that there are actually two kinds of direction vectors associated
with any change of coordinates. The first kind is the tangent vectors discussed above, which we can
think of as the partial derivatives ∂∂s i . The second kind is a set of vectors known as the cotangent


vectors. These are associated with differentials d s i . From the chain rule we can write out the
differential as
∂ si ∂s ∂s
   i  n‹
ds =
i
dx +
1
d x + ··· +
2
dxn .
∂x 1 ∂x 2 ∂x n

This last expression looks like a dot product, and can be rewritten as
€ Š 
∂ si 
∂ x1 dx1
€ ∂ s i Š
d x 2 
 ∂ x2 
dsi =  .  ·  . .
  
. .
€ . Š   . 
∂ si
∂ xn
dxn

38
This leads us to directly associate the differential d s i with the vector
€ Š
∂ si
∂ x1
€ ∂ s i Š
∂ si
 
 ∂ x2 
e~ =  .  =
i  .

€ .
. ∂ ~
x
Š
i ∂s
∂ xn

We call this vector, evaluated at some point, the i -th cotangent vector at the given point. For
Cartesian coordinates, the tangent and cotangent vectors coincide.
The inner products between the cotangent vectors are known as the components of the
contravariant metric tensor
g i j = e~i · e~ j .

The cotangent vectors are always linearly independent for the well behaved systems that we will
consider. This follows from the result of the next section.

2.4.4 Tangent and Cotangent Vector Component Relations


In general it is not necessary for the tangent vectors to be orthogonal. Likewise the cotangent
vectors are not necessarily orthogonal. However, the tangent and cotangent vectors satisfy a
condition of mutual orthogonality
€ Š  € Š
∂sj ∂ x1
∂ x1 ∂ si
€ ∂ s j Š € ∂ x 2 Š  n 
∂sj ∂ xk ∂s
   j
 ∂ x2   ∂ si  X
e~i · e~ = e~ · e~i =  .  · 
j j   
..
 = = .
€ .. Š  .

 k =1 ∂ x k ∂ s i ∂ s i

∂s j
∂ xn

∂ xn ∂ si

Where the second last step follows from the chain rule, and the last follows from the fact that the
s i are functionally independent (not constrained or related to each other by any equations). This
relationship can be written more elegantly by making use of the Kronecker δ-function,
(
j 1 if i = j ,
δi = (2.5)
0 otherwise.

We can state the mutual orthogonality relation as

j
e~i · e~ j = δi .

Because of this relation, the tangent and co-tangent vectors are called reciprocal bases. As will
become apparent shortly, this is a very useful result. We now obtain some geometric insight into
the mutual orthogonality identity in Figure 2.12.

39
e~2 e~1
e~2 e~2
e~ 2 e~1 e~1
e~1

(a)
(b) (c)

Figure 2.12: Mutual orthogonality of tangent and co-tangent vectors In figure (a) we begin with our
original basis vectors (tangent vectors). Figure (b) shows how the mutual orthogonality condition
completely determines the vector e~2 ; firstly the constraint that it must be orthogonal to e~1 forces
it to be on the blue line illustrated above. Secondly the constraint that the dot product e~2 · e~2 = 1
forces it to make an acute angle with the vector e~2 and determines its length completely. Finally,
figure (c) illustrates the same idea for choosing the other reciprocal vector, e~1 .

It is clear from the example that the mutual orthogonality condition completely specifies the
basis e~1 , . . . , e~n once the basis {e~1 , . . . , e~n } is known. The geometric method is purely algebraic


and does not rely on the direct calculation of the co-tangent vectors. Notice how neither set of
basis vectors is orthogonal in this example, but mutual orthogonality still applies.
The purely algebraic method presented above generalizes to any number of dimensions. We
consider the Jacobian matrix, J , whose columns are the tangent vectors. Because the coordinate
transformations are well behaved, J is invertible and so J −1 exists and is unique. Let the rows of
J −1 be r~1> , r~2> , . . . , r~> . Then,

n

r~1> r~1> e1 r~1> e2 . . . r~1> en


     
1 0 ... 0
r~>  €
r~2 e1 r~2 e2 r~2> en 
0 1  Š  > >
 2
1 = .  = J −1 J =  .  e~1 e~2 . . . e~n =  . .
  
 .. ..  ..   .. . ..
.  
0 1 r~n> r~n> e1 r~n> en

It clearly follows that the rows of J −1 are mutually orthogonal with the columns of J , and that this
property defines J −1 so that these are the unique vectors satisfying it. Clearly then the rows of J −1
are the cotangent vectors r~i = e~i .
One consequence of the above argument is that, for our well behaved coordinate systems
€ ∂ x 1 Š € Š−1 € ∂ s 1 Š € Š
∂ x1 ∂ s1
∂ s1 ... ∂ sn ∂ x1 ... ∂ xn
 .. .. .. 
=
 .. .. .. 
 . . .  . . . 
∂ xn ∂ xn ∂ sn ∂ sn
   
∂ s1 ... ∂ sn ∂ x1 ... ∂ xn

Stated differently, the Jacobian of the inverse of some transformation is the inverse of the Jacobian
of the transformation.
As a corollary we find that if the tangent vectors are orthonormal, then J −1 = J > , and so the
tangent vectors and the cotangent vectors coincide. What happens when the tangent vectors are
orthogonal, but not orthonormal?

40
2.4.5 The Metric Tensor
The Cartesian tangent vectors are orthogonal, but clearly the tangent vectors are not orthogonal
for all coordinate systems. In a linear coordinate system, for instance, the tangent vectors are only
orthogonal if the basis vectors are orthogonal. The inner products between the tangent vectors
are known as the components of the covariant metric tensor

g i j = e~i · e~j . (2.6)


p
Clearly hi = g i i . For an orthogonal system, g i j is zero whenever i 6= j . For a general coordinate
system, the tangent vectors at a given point form a basis for a linear space rooted at that point.
We call this space the tangent space of the coordinate space at that point. The tangent vectors are
sometimes referred to as the direction vectors at a point. The tangent space can be thought of
as a very zoomed in picture of coordinate space near a point. Formally, it is the linearisation of
coordinate space at the point. As we zoom closer and closer to the point, the coordinate space
‘flattens out’ into the tangent space.
We may construct an object g whose components are exactly the g i j of (2.6). By indexing into
g we may extract each component g i j of g in a given computation. Now, the dot product of a
vector u~ with a vector v~ in N -dimensions, in a given coordinate basis {e~k } is expressed as

u~ · v~ = u 1 e~1 + u 2 e~2 + . . . u N e~N · v 1 e~1 + v 2 e~2 + . . . v N e~N


 

N X
X N
= u i v j e~i · e~j
i =1 j =1
N X
X N
= gi j u i v j .
i =1 j =1

It is common to omit the summation symbols and write instead

u~ · v~ = g i j u i v j ,

where the summation over the indices i and j is implicit. This implicit summation is referred to
as a summation convention.
A point in space has two basic sets of coordinates

1. Coordinates in terms of the Tangent Vectors (known as the Contravariant Components of


the Vector)
v~ = v 1 e~1 + v 2 e~2 + · · · + v m e~m .

Conventionally these components are written with a superscript v i .

2. Coordinates in terms of the Co-tangent Vectors (known as the Covariant Components of the
Vector)
v~ = v1 e~1 + v2 e~2 + · · · + vm e~m

Conventionally these components are written with a subscript vi .

41
This is nothing especially new - we know from linear algebra that we can write the same vector in
terms of two different bases. What is special is the two particular bases we have chosen. Because of
mutual orthogonality, we can determine the coordinates very easily. We simply take dot products
with the tangent and cotangent vectors, respectively,

e~i · v~ = e~i · v1 e~1 + v2 e~2 + · · · + vm e~m = vi




e~i · v~ = e~i · v 1 e~1 + v 2 e~2 + · · · + v m e~m = v i




Thus finding the covariant or contravariant components of some point is as easy as taking the
dot product with the tangent or cotangent vectors, respectively. Clearly, it follows from the
relationships of the dot products that for given vectors u~ and v~,

j
u~ · v~ = g i j u i v j = g i u i v j = g i j u i v j = g i j u j v i (2.7)

We interpret covariant components u i as the elements of row matrices and contravariant v j as


the elements column matrices. Then we may replace the explicit reference to summation indices
in the summation convention for computing inner products with the multiplication rules of linear
algebra and write instead
u~ · v~ = U > gV (2.8)

where U is the row matrix with components of u~ , V is the column matrix with components of v~
and g is the metric tensor. This result will be useful when we are transforming vector equations
from one coordinate system to another. We can replace the explicit reference to summation
indices by employing matrix algebra.

Example 2.10 (Simple Matrix Dot Product) Consider the dot product between the basis unit
vectors x̂ and ŷ in the standard 2-dimensional rectilinear coordinate system. Clearly, x̂ · ŷ = 0. We
may rewrite these vectors as column matrices
‚ Œ ‚ Œ
1 0
x̂ = and ŷ =
0 1

and use the 2-dimensional identity matrix


‚ Œ ‚ Œ
1 0 x̂ · x̂ x̂ · ŷ
1= =
0 1 ŷ · x̂ ŷ · ŷ

as the metric tensor to write


‚ Œ‚ Œ ‚ Œ
€ Š 1 0 0 € Š 0
x̂ · ŷˆ = 1 0 = 1 0 =0
0 1 1 1

Similarly, we find ‚ Œ‚ Œ ‚ Œ
€ Š 1 0 1 € Š 1
x̂ · x̂ˆ = 1 0 = 1 0 =1
0 1 0 0

42
and ‚ Œ‚ Œ ‚ Œ
€ Š 1 0 1 € Š 1
ŷ · ŷˆ = 1 0 = 1 0 =1
0 1 0 0
as expected. More generally, in orthonormal coordinate basis (defined using a system of mutually
orthogonal unit vectors), e~i · e~j = δi j , then

g i j = gi j and u i = ui

and so
‚ Œ‚ Œ ‚ Œ
€ Š 1 0 v1 € Š v1
u~ · v~ = u 1 u 2 u
= u1 u2 u
= u1 v 1 + u2 v 2 = u 1 v 1 + u 2 v 2.
0 1 v v

as expected, and the matrix multiplication is explicit. By similarly argument,


‚ Œ‚ Œ ‚ Œ
€ Š 1 0 u1 € Š u1
u~ · u~ = u 1 u 2 = u1 u2 = u 1 v 1 + u 2 u 2 = u~ 2 ,
0 1 u2 u2

again, as expected.

The first interesting geometric quantity that we want to compute is the length of a curve given
by a function y = f (t ). We make use of the vector calculus of the preceding sections to write
the position vector of a point on the curve and then construct a sequence of line elements at
each point along that curve whose length we can sum. Suppose that the parametrisation is such
that the end points of the curve correspond to t = 0 and t = 1. The length of a 1-dimensional
parametric curve, parameterised by t , with tangent v~(t ) is simply the integral of the magnitude of
the tangent vector. We start by constructing a vector that defines a point on the curve at a time t ,

a~(t ) = t x̂ + f (t ) ŷ (2.9)

and then compute the velocity of the point a~ along the path
d a~
 ‹
v~(t ) =
dt
corresponding to the tangent vector along the curve at a time t . The length along the path ∆`(t )
that is traversed by a point starting at a~(t ), at a speed kv k (t ) and for a period of time ∆t is given
by
d a~(t )
 ‹
∆`(t ) = k∆a~(t )k = ∆t = kv~(t )k ∆t .
dt
We rewrite this portion of the path in the limit that ∆t → 0 and recover infinitesimal element of
the path
d `2 = v~(t ) · v~(t ) d t 2 = g i j v~i (t )v~ j (t ) d t 2 .

Integrating over all such infinitesimal path elements returns the length over the entire path
Z Z1
p
`= d` = dt v~(t ) · v~(t ) (2.10)
y = f (x )
0

43
and metric tensor is necessary to compute the dot product of the tangent vector with itself.
The formulation can be extended to any smooth parametric curve curve a~(t ) in any number of
dimensions.
We now consider several example calculations for the length, area and volume in different
coordinates systems.

2.4.6 2-Dimensional Plane Polar Coordinates


There is always more than one way to assign coordinates to a space. It will be useful to be able to
translate between these different coordinate systems. As was stated that RNC correspond to the
classic example of 2-dimensional plane polar coordinates. We call these curvilinear coordinates
in 2-dimensions since there exists functions that translate the coordinate curves in one choice
of coordinates to coordinate curves in another. Here we consider two parameters, the distance
of a point from the origin r and its angle with the x -axis θ . The parameters r and θ completely
characterise any point in the plane. We can write transformation equations for this coordinate
system
x = r cos (θ ) and y = r sin (θ ) . (2.11)

These equations can be inverted - thus every point corresponds to an unique pair (r, θ ). It is
understood that we restrict θ to be in some interval of length 2π so that no two values of θ
correspond to the same physical angle. Figure 2.3 demonstrates the standard (x , y ) and (r, θ )
coordinate description of a point in the 2-dimensional plane.
Are these transformation equations well-behaved? Certainly they are invertible - without even
using equations we can see that every point in the plane has a unique polar representation and a
unique Cartesian representation, and thus there is a one-to-one correspondence between them.
The equations are also differentiable. As for the Jacobian Matrix, we shall leave the answer to this
question to the discussion of tangent vectors. It turns out that it is always non-singular (which
comes as no surprise, since we convinced ourselves that they were invertible). Recall that these
are the family of curves formed when we let one coordinate vary while keeping the other constant.
If we vary r while keeping θ constant, we get a ray emanating from the origin at the angle θ
out to infinity. These are the radial coordinate curves depicted in Figure 2.3. These curves are
lines, and thus we see that there is no curvature associated with the r coordinate. If we vary θ
while keeping r constant we get a circle of radius r and centre the origin. These are the circular
coordinate curves depicted in Figure 2.3. Clearly these curves are not straight lines, so there is
curvature associated with the θ coordinate. Fundamental geometry then tells us that the angle
between the two families of curves is always 90◦ . We will see this more formally when we consider
the tangent vectors for polar coordinates.
Notice that the coordinate curves for each coordinate are a family of curves, not just one curve.
When we fix the other parameter, then we get a specific curve in the family. For instance the
θ -curves are the circles centre the origin, but if we specifically look at the curve when r = 5, then

44
we single out the circle of radius 5. Similarly the r -curves are the rays from the origin, but if we
specifically consider the curve where θ = 90◦ then we single out the positive y -axis.
We often make use of polar coordinates when solving problems involving circular motion or
circular symmetry. Circles are coordinate curves in polar coordinates, so motion on a circle is
described by just one of coordinate. Thus, it is convenient to reason in polar coordinates about
physical systems where there is circular motion or circular symmetry.
It is a very simple matter to obtain the tangent vectors for any coordinate system once we
have written down the transformation equations
‚ Œ ‚ Œ ‚ Œ
∂ x (r, θ ) ∂ r cos (θ ) cos (θ )
e~r = = = ,
∂ r y (r, θ ) ∂ r r sin (θ ) sin (θ )
‚ Œ ‚ Œ ‚ Œ
∂ x (r, θ ) ∂ r cos (θ ) − sin (θ )
e~θ = = =r .
∂ θ y (r, θ ) ∂ θ r sin (θ ) cos (θ )
We notice that e~r is a unit vector, and e~θ has length r . So, we can write
‚ Œ ‚ Œ
cos (θ ) − sin (θ )
r̂ = and θ̂ = ,
sin (θ ) cos (θ )
and
e~r = r̂ and e~θ = r θ̂ .
The unit vectors r̂ and θ̂ are often used when solving problems about circular motion. The former
points radially outwards, and the latter points tangentially and counter-clockwise. In Figure 2.13,
notice how the unit vectors are indeed tangent to the corresponding coordinate curves.

y
θ̂ r̂

r~

θ x

Figure 2.13: Unit vectors tangent to coordinate curves in 2-dimensions.

Earlier we remarked that the two classes of coordinate curves in polar coordinates are
orthogonal. This is easy to show by using the tangent vectors
‚ Œ
€ Š −r sin (θ )
e~r · e~θ = cos (θ ) sin (θ ) ·
r cos (θ )

45
= −r cos (θ ) sin (θ ) + r cos (θ ) sin (θ )
= 0.

Thus the unit vectors are orthogonal for all values of r and θ , and hence everywhere in space.
Clearly the two families of coordinate curves are always orthogonal.
We noticed earlier that r̂ is a unit vector - and hence the r coordinate does not curve (or
stretch) space. Indeed the coordinate curves are straight rays. We also saw that ha t θ has a
coefficient of curvature of r . Thus θ is a curved coordinate. This makes sense physically as θ is
an angle, and its corresponding coordinate curves are circles.

Example 2.11 (Circumference of a Circle) We can determine the circumference of a circle in the
(x , y )-plane by starting with a polar coordinate representation of a position of a point on the circle
using (2.11). In this coordinate system, a point p~ (t ) = r cos (θ ) x̂ + r sin (θ ) ŷ traces a circular path
of radius r , centered at the origin of the coordinate system, that is parameterised t . Tangents to this
path are given by the parametric curve p~˙ (t ). Then, it follows
‚ Œ ‚ Œ ‚ Œ
cos (θ (t )) cos (θ (t )) − sin (θ (t ))
p~ (t ) = r r r̂ and p~˙ (t ) = ṙ + r θ̇ = ṙ r̂ + r θ̇ θ̂
sin (θ (t )) sin (θ (t )) cos (θ (t ))

The radial coordinate is constant for motion along the circumference of a circle, so ṙ = 0 and we
have immediately
p~˙ (t ) = r θ̇ θ̂ .

Next we use (2.10) to integrate the lengths of each tangent vector along the path defined by the
circumference of the circle,

Z1 Z1 Z1 Z1 Z2π

q Æ  ‹
`= dt p~˙ (t ) · p~˙ (t ) = dt r 2 θ̇ 2 θ̂ · θ̂ = r d t θ̇ = r dt = r dθ
dt
0 0 0 0 0

where the change of coordinates θ = 2πt ensures that a single circuit of the circle occurs in the time
interval t ∈ [0, 1]. Then,
Z2π
`=r d θ = 2πr
0

which is exactly what we should expect from the standard definition of the circumference of a circle.

Example 2.12 (Area of a disk) We now consider the area element in 2-dimensional Polar
Coordinates. We may obtain this in several ways. Our first method is a purely geometric argument.
Consider the area element enclosed between coordinate curves of infinitesimal distance apart as
depicted in Figure 2.14.

46
y

r dθ

dθ dr

Figure 2.14: 2-dimensional-Polar-Coordinate-Area-Element

The area element is highlighted in Figure 2.14. We notice that because the curves are at 90
degrees to each other, the area element starts to resemble a rectangle as we make its sides very
small (the curvature on the ‘circle’ side flattens out in the limit). Thus we can write the area of the
infinitesimal element as the product of the lengths of the sides. The area element is clearly given by

d A = d r (r d θ ) = r d r d θ .

Notice how the orthogonality of the coordinate curves was required for this argument to work. Had
the curves not been orthogonal, simply multiplying the side lengths in the limit would not give the
correct area element. Notice also that the ‘side lengths’ of our infinitesimal area elements are our
coefficients of curvature. This is no coincidence. With these two points in mind, we can now see that
our geometric argument is in fact analogous to the argument presented in general in the preceding
section, that the Jacobian Determinant can be given as the product of the coefficients of curvature
provided that the tangent vectors are orthogonal. Thus in this case we could immediately write out

d A = (1)(r ) d r d θ = r d r d θ .

The third way of deriving this result is to evaluate the Jacobian determinant directly. This yields the
same answer. As an example of the use of the area element, we will now compute the area of the
disk of radius R ,
Z ZR Z2π ZR
Area = dA = d r d θ r = 2π d r r = πR 2 .
S
0 0 0

We will see more of this type of integral when we study rigid bodies later in the course.

2.4.7 2-Dimensional Elliptical Coordinates


An example of a coordinate system in which the tangent vectors are not in general orthogonal is
the elliptic coordinate system (u , φ), in which for some prescribed a and b , the transformation

47
equations are given by
 
x = a u cos φ and y = b u sin φ .

Here we allow φ to vary between 0 and 2π and u > 0. It should be clear that polar coordinates
are a special case of elliptic coordinates in which a = b = 1 and r = u . We get a stretched polar
coordinate system when a = b 6= 1. The name should leave little surprise that the coordinate
curves in elliptic coordinates are rays and ellipses. This is depicted in Figure 2.15.

bu

x
au

Figure 2.15: 2-dimensional elliptic coordinate curves. Notice that this coordinate system has a
distorted area element. It is not in general the same as the product of the coefficients of curvature.
This is because the coordinate curves intersect at an angle, and the infinitesimal unit of area is
now a parallelogram.

It is evident in Figure 2.15 that the curves are not in general orthogonal. This is best seen by
computing the tangent vectors
‚ Œ ‚ Œ
a cos φ −ua sin φ
e~u =  , e~θ =  .
b sin φ u b cos φ

The dot product is


‚ Œ
€ Š −a u sin φ 1
e~u · e~θ = a cos φ
 
 = u b 2 − a 2 sin 2φ .

b sin φ
b u cos φ 2

Clearly, this inner product is zero for all φ when a = b . When a 6= b then the coordinate unit
vectors are orthogonal only when φ = πn or φ = π2 + πn and n ∈ Z.

Remark 14 There is no known closed form solution for the circumference of an ellipse in terms of
its semi-major axis lengths a and b . This quantity must be determined numerically, usually using
computer-based numerical methods that implement the arc-length formula of (2.10).

48
Example 2.13 (Area of an Ellipse in 2-Dimensional Elliptic Coordinates) The coordinate grid
in Figure 2.15 is distorted in a way that distinguishes it from that of the the polar coordinate grid in
Figure 2.3. The coordinate curves will intersect such that the dot product between tangent vectors
along the coordinate curves will vary along the path of each coordinate curve. Clearly e~u · e~θ is only

zero across all values of u , φ if we set a = b . For all other elliptical systems, the coordinate curves
are not generally orthogonal. There will be places where the coordinate curves are orthogonal. Can
you see this geometrically? Can you obtain these positions algebraically?. This means that the
coefficients of curvature cannot be directly used in this case to obtain the area element. Instead we
must compute the determinant of the Jacobian Matrix directly
‚   Œ
a cos φ b sin φ
det (e~u e~θ ) = det
 
  = a b u cos2 φ + a b u sin2 φ = a b u.
−ua sin φ u b cos φ

So the area element in Elliptic Coordinates is d A = a b u d u d φ . We compute the area of an ellipse


with a x -radius of a and a y -radius of b :

Z Z1 Z2π Z1
Area = dA = d u d φ a b u = 2πa b d u u = πa b .
S
0 0 0

Notice that while this is easy to compute by other methods, with Elliptical Coordinates we could
just as easily compute the area of an Elliptical Segment between two specified angles:

Z1 Zβ Z1 Zβ
β −α
Z
Area = dA = du dφ a b u = a b du u dφ = a b.
S
2
0 α 0 α

We next consider some examples in 3-dimensions.

2.4.8 3-Dimensional Polar Cylindrical Coordinate


The 3-dimensional Polar Cylindrical Coordinate system uses three coordinates (φ, ρ, z ) as shown
in Figure 2.16. The coordinate φ defines an angle between the x -axis and some plane of interest
(which passes through the z -axis) and is commonly referred to as the azimuthal angle or azimuthal
coordinate. The coordinate ρ then gives the distance to travel in that plane without lifting. Finally,
the coordinate z gives the height of the point of interest.

Using these definitions and Figure 2.16, we can write down the transformation equations
 
x = ρ cos φ , y = ρ sin φ , z = z,

where it is understood that ρ ≥ 0, 0 ≤ φ ≤ 2π. We can think of polar cylindrical coordinates as a


projection of 2-dimensional polar coordinates into 3-dimensions by the addition of the z

49
z


p φ̂
ρ̂

ρ
y
φ
x

Figure 2.16: 3-dimensional polar cylindrical coordinate unit vectors.

coordinate. These transformation equations are clearly well-behaved. Now consider the
coordinate curves for cylindrical coordinates. If we keep both ρ and z fixed and vary φ, we get a
circle of an arbitrary radius centred at an arbitrary point on the z -axis. This is the φ-coordinate
curve. If we keep both φ and z fixed and allow ρ to vary, we get a ray emanating from an arbitrary
point on the z -axis in some arbitrary direction in a plane parallel to the x − y plane. This is the ρ
- coordinate curve. If we keep both ρ and φ constant and allow z to vary we get a vertical line
passing through an arbitrary point in the x − y plane specified by our choice of ρ and φ. This is
the z -coordinate curve.

Remark 15 For each of the ‘coordinate curves’ above, we’ve described a member of a family of
curves. The coordinate curves for a coordinate are always a family of curves.

Now we consider the coordinate surfaces for cylindrical coordinates. If we keep ρ fixed and
allow φ and z to vary, then we get a cylinder of radius ρ and infinite height centred on the z -axis.
For different values of ρ we will get cylinders of different radius, and this family of surfaces is the
family of coordinate surfaces associated with varying φ and z . We can call it the φ, z family of
coordinate surfaces. This coordinate surface is what gives cylindrical coordinates their name.
Keeping φ fixed and varying the other two parameters produces a half-plane with one side along
the z -axis. Similarly, keeping z fixed and varying the other two parameters produces a plane
parallel to the x − y axis.
When motion or symmetry exists along one of these coordinate surfaces or coordinate curves,
then cylindrical coordinates will be a good choice of coordinates for the problem. Notice that the
coordinate surfaces that were planes aren’t interesting to us as we can consider planar motion
using one of our 2-dimensional coordinate systems. Thus the cylinder is the truly interesting

50
object here. When our system is constrained to move in a spiral or some other motion on the
surface of a cylinder, then Polar Cylindrical Coordinates may be a good choice.
As in any other coordinate system, we can obtain the tangent vectors directly from the
transformation equations
€  Š>
e~ρ = cos φ sin φ 0 = ρ̂,


€  Š>
e~φ = −ρ sin φ cos φ 0 = ρ φ̂,


€ Š>
e~z = 0 0 1 = ẑ ,

It is also relatively immediate what the sizes of these vectors are, and hence their relationships to
the corresponding unit vectors. We notice in particular that the only coordinate with curvature is
φ. This makes sense as it is the only coordinate for which the coordinate curve was not a line or ray.
The unit vectors in cylindrical coordinates are illustrated in Figure 2.16. As per usual these vectors
are tangential to the coordinate curves and in the direction of increase of their corresponding
coordinates. (ẑ points upwards, ρ̂ points outwards, φ̂ points counter-clockwise).
Figure 2.16 suggests that the tangent vectors are orthogonal. Indeed, this should be quite clear
from thinking of cylindrical coordinates as polar coordinates with an additional coordinate z .
Formally, we can compute the pair-wise inner products between the tangent vectors
 
€ −ρ sin φ
 Š
e~ρ · e~φ = cos φ sin φ 0  ρ cos φ  = 0,
 

0
 
€ 0
 Š 
e~ρ · e~z = cos φ sin φ 0 0 = 0,


1
 
€ Š −ρ sin φ 
e~z · e~φ = 0 0 1  ρ cos φ  = 0.
0
Thus indeed we see that the tangent vectors are orthogonal at all points in space. The families of
coordinate curves are then orthogonal at all points in space.

Example 2.14 (Volume of a Cylinder in 3-Dimensianal Polar Cylindrical Coordinates) In


3-dimensional polar cylindrical coordinates, we consider three coordinates (φ, ρ, z ), see
Figure 2.16. This orthogonality of the tangent vectors enables us to write down the volume element
directly in terms of a product of the coefficients of curvature

d V = (ρ d φ )(1 d ρ )(1 d z ) = ρ d φ d ρ d z .

As a simple application we compute the volume of a cylinder of radius R and height H


Z ZH ZR Z2π ZH ZR Z2π
Volume = dV = dz dρ dφ ρ = dz dρ ρ d φ = πR 2 H .
cylinder
0 0 0 0 0 0

51
z

z +dz

dz

dρ ρ dφ

φ dφ y
ρ

x dρ

Figure 2.17: The volume element in cylindrical coordinates.

2.4.9 3-Dimensional Polar Spherical Coordinates


In 3 − d polar spherical coordinates we consider coordinates (ρ, θ , φ), where φ is once again the
angle between the x -axis and the plane the point of interest makes with the z -axis and is called,
again, azimuthal angle or azimuthal coordinate, θ is the angle within this plane between the
z -axis and the point of interest and is called the polar angle or polar coordinate. The coordinate
and ρ in this case is the distance from the origin to the point of interest and is commonly called the
radial position or radial coordinate. Note in particular that ρ for spherical coordinates is different
from ρ for cylindrical coordinates. See Figure 2.18.

As before, we shall relate these spherical coordinate system to other coordinate systems. From
the Figure 2.18 and some elementary trigonometry, we arrive at the following transformation
equations for spherical coordinates
 
x = ρ sin (θ ) cos φ , y = ρ sin (θ ) sin φ , and z = ρ cos (θ ) ,

where it is understood that ρ ≥ 0, 0 ≤ φ ≤ 2π and 0 ≤ θ ≤ π. (Challenge question, why do we not


allow θ to vary all the way to 2π?) Notice that as we now have two angles we expect both of them
to have a curvature.
If we keep θ and ρ constant and allow ρ to vary, we obtain a ray from the origin extending
to infinity. This is a ρ-coordinate curve for spherical coordinates. If we keep ρ and θ constant
and allow φ to vary, we get a circle centred somewhere on the z -axis and in a plane parallel to the

52
z

ρ
θ

y
φ
x

Figure 2.18: 3-Dimensional Polar Spherical Coordinate System.

z = 0 plane. This is a φ-coordinate curve. If we keep φ and ρ constant and allow θ to vary we
obtain a semi-circle centred at the origin and with diameter on the z -axis (radius ρ, in a plane at
angle φ with the x -axis). This is a θ -coordinate curve.
If we keep ρ constant and allow the other two parameters to vary we obtain a half-plane
with one edge the z -axis. If we keep ρ constant and allow the other two parameters to vary, we
obtain a sphere centred at the origin. If we keep θ constant and allow the other two parameters
to vary, we obtain a cone with its apex at the origin and its axis of symmetry along the z -axis.
These last two coordinate surfaces are of interest to us in problem solving. The sphere gives the
coordinates the name Spherical Coordinates. But unlike with Cylindrical Coordinates there is a
second non-trivial coordinate surface - the Cone. These coordinates are thus also suitable for
motion that is constrained to a cone, or for conical spirals etc, as well as problems with conical
symmetry.
As always we obtain the tangent vectors by taking partial derivatives of the transformation
equations
   
ρ sin (θ ) cos φ sin (θ ) cos φ
∂ 
e~ρ =  ρ sin (θ ) sin φ  =  sin (θ ) sin φ  = ρ̂,
  
∂ρ
ρ cos (θ ) cos (θ )
     
ρ sin (θ ) cos φ ρ cos (θ ) cos φ cos (θ ) cos φ
∂ 
e~θ =  ρ sin (θ ) sin φ  =  ρ cos (θ ) sin φ  = ρ  cos (θ ) sin φ  = ρ θ̂ ,
    
∂θ
ρ cos (θ ) −ρ sin (θ ) − sin (θ )
     
ρ sin (θ ) cos φ −ρ sin (θ ) sin φ − sin φ
∂ 
e~φ =  ρ sin (θ ) sin φ  =  ρ sin (θ ) cos φ  = ρ sin (θ )  cos φ  = ρ sin (θ ) φ̂,
    
∂φ
ρ cos (θ ) 0 0

53
Here we see the coefficients of curvature are non-trivial for both θ and φ, as expected.
We wish to compute the angles between the tangent vectors. Before we begin we observe that
the corresponding unit vectors have the same direction as the tangent vectors and are easier to
work with. So we compute the inner products between the unit vectors
 
€ Š cos (θ ) cos φ
ρ̂ > θ̂ = sin (θ ) cos φ sin (θ ) sin φ cos (θ )  cos (θ ) sin φ 
  

− sin (θ )
 
= sin (θ ) cos (θ ) cos2 φ + sin2 φ − cos (θ ) sin (θ )
= sin (θ ) cos (θ ) − cos (θ ) sin (θ )
= 0.
 
€ Š − sin φ
ρ̂ > φ̂ = sin (θ ) cos φ sin (θ ) sin φ cos (θ )  cos φ 
  

0
   
= sin (θ ) − sin φ cos φ + sin φ cos φ
= 0.
 
€ Š − sin φ 
θ̂ φ̂ = cos (θ ) cos φ cos (θ ) sin φ − sin (θ )  cos φ 
>
 

0
   
= cos (θ ) − cos φ sin φ + sin φ cos φ
= 0.

The tangent vectors are orthogonal.

Example 2.15 (Volume in of a Sphere in 3-Dimensional Polar Spherical Coordinates) This


enables us to think of the infinitesimal volume element as a rectangular box (as opposed to a
parallelepiped), and most importantly to write the volume element in terms of the product of the
coefficients of curvature.
d V = ρ 2 sin (θ ) d ρ d θ d φ .

As a simple application we compute the volume of a sphere of radius R

Z Z2πZπ ZR Zπ
1 4
Volume = dV = d ρ d θ d φ ρ 2 sin (θ ) = 2πR 3 d θ sin (θ ) = πR 3 .
Sphere
3 3
0 0 0 0

2.4.10 Other Coordinate Systems


It is impossible to provide an exhaustive list of all possible coordinate systems and their
associated transformation equations. There are literally an infinite number of well behaved

54
z


ρ dθ

θ dθ

φ dφ y

x (θ ) d φ
ρ sin

Figure 2.19: The volume element in spherical polar coordinates.

transformation equations and consequent coordinate systems. Below are some examples of
coordinate transformation equations slightly off the beaten track. For each of these coordinate
systems, it is left to the reader to perform the following steps,

1. Identify the correct ranges for caviling coordinates.

2. Identify what part of the x − y plane or x − y − z space is being mapped (if not the whole
thing).

3. Identify coordinate curves/surfaces.

4. Find tangent vectors and cotangent vectors.

5. Investigate Orthogonality of tangent vectors and other properties.

6. Find area/volume elements.

Try using the Mathematica instruction

ParametricPlot[{formula for x, formula for y}, {x, xmin, xmax}, {y, ymin, ymax}]

to view the region covered by the coordinates

1. Canonical Hyperbolic Coordinates in 2-dimensions,

x = r (cosh (t ) − 1) and y = sinh (t ) ,

where r, t ∈ R are varying.

55
2. Modified Hyperbolic Coordinates in 2-dimensions,

x = a r (cosh (t ) − 1) and y = b sinh (t ) ,

where a , b > 0 are fixed.

3. Ellipsoid Coordinates in 3-dimensions,


 
x = a ρ sin (θ ) cos φ , y = b ρ sin (θ ) sin φ and z = c ρ cos (θ )

where a , b , c > 0 are fixed.

2.5 Coordinate Transformations


Recall that a scalar field assigns a value to each point in some domain. A vector field is a function
that assigns a vector to every point in some domain. Formally,

V : S → T,

where S and T are open subsets of vector spaces. To each x~ ∈ S , we assign y~ ∈ T according to
the rule y~ = V ( x~ ). When we associate some coordinate system with S and T , then we can think
of V as associated with The function that takes the coordinates of x~ and maps them onto the
coordinates of y~
 
y 1 x 1, x 2, . . . , x n

 y 2 x 1, x 2, . . . , x n 
V~ x 1 , x 2 , . . . , x

n
= .
 
..
 . 

y m x 1, x 2, . . . , x n
One way of visualizing a vector field is by attaching a small vector to every point in space. So at
the x~ point we would ‘draw’ the vector V~ ( x~ ). Consider the following vector fields,
‚ Œ ‚ Œ
y 1 (x , y ) 2
V~ (x , y ) = = , (2.12)
y (x , y )
2
−1
Which is simply the constant vector field with assigns a single fixed vector value to each point in
R2 . A more complicated example is
‚ Œ y
!
y (x , y )
1
V~ (x , y ) = x 2 +y 2
p
= , (2.13)
y 2 (x , y ) − p x 2x+y 2

which is an example of a unit (normalised) vector field in 2-dimensions. Similarly,


−2y
   
y 1 (x , y , z ) p
x +4y +z
4 2 2

V~ (x , y , z ) =  y 2 (x , y , z ) =  p 4 2 2  ,
x2
(2.14)
   
x +4y +z
y 3 (x , y , z ) −z
.
x 4 +4y 2 +z 2
p

is an example of a unit (normalised) vector field in 3-dimensions. Graphical representation of the


vector fields in (2.12), (2.13) and (2.14) are depicted in Figure 2.20, Figure 2.21 and Figure 2.22,
respectively.

56
0
y

0
x
Figure 2.20: The constant 2-Dimensional normalised vector field defined in (2.12).

0
y

0
x
Figure 2.21: The 2-Dimensional normalised vector field defined in (2.13).

In all three examples above, we have implicitly made the assumption that both the original
space and the target space are parameterized with Cartesian coordinates. This enabled us to

57
0
z

0
0 y
x
Figure 2.22: The 3-Dimensional normalised vector field defined in (2.14). The different arrow
colours correspond to different z -values in 3-dimensional space.

write out equations for our vector fields taking coordinates as arguments and returning vectors
in coordinate representation. Other examples of vector fields include the flow of a fluid in some
container (at each point the fluid has a vector direction/velocity of flow), the electric field around
some charge, and the ‘wind velocity’ field often shown on weather reports.
Recall when transforming scalar fields we transformed what we put in as an argument to the
function. For vector fields we must also transforms what comes out of the function. Depending on
whether we are considering covariant or contravariant coordinates, this will transform differently.
Let us begin by considering the first example above,
‚ Œ ‚ Œ
x x
f~ = .
y y

We can think of this as two scalar components

f x (x , y ) = x and f y (x , y ).

We know that these are both covariant and contravariant components for f in Cartesian
coordinates. Let us now convert this to polar coordinates. First we transform the scalar
components. This part is familiar from transforming scalar functions

f x (r, θ ) = f x (x (r, θ ), y (r, θ )) = f x (r cos (θ ) , r sin (θ )) = r cos (θ )


f y (r, θ ) = f y (x (r, θ ), y (r, θ )) = f y (r cos (θ ) , r sin (θ )) = r sin (θ ) .

But the components are still Cartesian. To convert these to polar coordinates we have to decide
whether we wish to consider the covariant components or the contravariant components. We
will generally be interested in the covariant components

f r = e~r · f~

58
= (cos (θ ) , sin (θ )) · (r cos (θ ) , r sin (θ ))
= r cos2 (θ ) + r sin2 (θ )
= r,

and

fθ = e~θ · f~
= (−r sin (θ ) , r cos (θ )) · (r cos (θ ) , r sin (θ ))
= −r 2 sin (θ ) cos (θ ) + r 2 sin (θ ) cos (θ )
= 0.

As we expect from the shape of the vector field, the tangential component is zero.

2.6 Exercises
Exercise 2.1 Show that the exchange of a millimeter ruler with an inch ruler corresponds to
applying an affine transformation to the space whose original metric is given in millimeters.

Exercise 2.2 Embed T 2 into R2 and then give an explicit realization realization of a coordinate
mapping from a rectangular subset of U ⊂ R2 onto T 2 . Show that this mapping is many-to-one
outside of U .

Exercise 2.3 Show by direct construction that the RNC provide a many to one mapping between
R2 and S 2 and then define a subset of R2 where the RNC provide a one-to-one mapping from R2
onto S 2 . Embed S 2 into R2 and then give an explicit coordinatization of S 2 for an appropriate
transformation of Riemann normal coordinates.

Exercise 2.4 Show that the spherical polar coordinates in R3 coincide with the RNC on S 2 , centered
at the origin, when the radial coordinate ρ = 1 is fixed. Find the relations between the RNC
coordinates and the angular variables in the spherical polar coordinates.

Exercise 2.5 Use the standard definition of the vector dot product in N -dimensional Euclidean
space in rectilinear coordinates to prove that

N
~=
X
a~ · b a i bi
i =1

Exercise 2.6 Consider a marked point in p ∈ R3 . Let α, β and γ denote the angles subtended at
the origin by the vector p~ and each of the coordinate axes x̂ , ŷ and ẑ . Show that
 
cos2 (α) + cos2 β + cos2 γ = 1.

59
Exercise 2.7 Compute the cotangent vectors in elliptic coordinates directly and then by inverting
the Jacobian matrix. Make sure you get the same results. Verify that they are mutually orthogonal
with the tangent vectors.

Exercise 2.8 Show that the cotangent vectors in an arbitrary linear system are the rows of the
inverse transformation matrix associated with that system (hint, V −1 V = I ). Clearly these do not
coincide with the tangent vectors in general.

Exercise 2.9 Show that the matrix of transformation between two orthonormal coordinate systems
is orthogonal.

Exercise 2.10 Verify that the area of the parallelogram whose adjacent edges are given by the vectors
~ is
a~ and b
~ sin (θ )
A = ka~k b
~.
where θ is the smallest angle between a~ and b

Exercise 2.11 Suppose A is a linear transformation (matrix) and the vector r~ transforms to a vector
r~0 by
r~0 = A r~.

Show this corresponds to a corresponding transformation of the coordinate axes given by

X~ 0 = A −1 X~ .

Exercise 2.12 Consider the points a = (a 1 , a 2 ) and b = (b1 , b2 ) in R2 , in x y -coordinates, and the
displacement vector
v~ = b − a

joining a to b . Answer the following questions. (Hint: Passive (coordinate axis) transformations
act opposite to active (point coordinate) transformations.)

1. Construct the metric tensor g in this coordinate system? (Hint: It is the 2-dimensional identify
matrix.)

2. Use the metric tensor g to compute the length of v~. (Hint: Use matrix multiplication.)

3. Shift the origin in the x y -coordinate system by a constant vector s~ = α x̂ + β ŷ , for fixed α and
β , to define a new x 0 - and y 0 -coordinate axes.

a) compute the positions of a and b with respect to the x 0 y 0 -coordinate system.

b) determine v~ with respect to the x 0 y 0 -coordinate system.

c) compute the metric g0 with respect to the x 0 y 0 -coordinate system.

d) Use the metric tensor g0 to compute the length of v~ in the x 0 y 0 -coordinate system.

60
e) Do the points a and b change when changing from the x y -coordinate system to the
x 0 y 0 -coordinate system?
f ) Do the points a and b have different descriptions in the x y - and x 0 y 0 -coordinate
systems?
g) Does the length of v~ differ in the x y - and x 0 y 0 -coordinate systems?

4. Define a rotation matrix that will rotate the x - and y - coordinate axes, through some fixed
angle θ , to define the x 00 - and y 00 -coordinate axes. (Hint: Use matrix multiplication to get
the new point and vector component values.)

a) compute the positions of a and b with respect to the x 00 y 00 -coordinate system.


b) determine v~ with respect to the x 00 y 00 -coordinate system.
c) compute the metric g00 with respect to the x 00 y 00 -coordinate system.
d) Use the metric tensor g00 to compute the length of v~ in the x 00 y 00 -coordinate system.
e) Do the points a and b change when changing from the x y -coordinate system to the
x 00 y 00 -coordinate system?
f ) Do the points a and b have different descriptions in the x y - and x 00 y 00 -coordinate
systems?
g) Does the length of v~ differ in the x y - and x 00 y 00 -coordinate systems?

5. Rescale the x - and y - coordinate axes by some fixed factor K to define a new x 000 - and y 000 -
coordinate axes.

a) compute the positions of a and b with respect to the x 000 y 000 -coordinate system.
b) determine v~ with respect to the x 000 y 000 -coordinate system.
c) compute the metric g000 with respect to the x 000 y 000 -coordinate system.
d) Use the metric tensor g000 to compute the length of v~ in the x 000 y 000 -coordinate system.
e) Do the points a and b change when changing from the x y -coordinate system to the
x 000 y 000 -coordinate system?
f ) Do the points a and b have different descriptions in the x y - and x 000 y 000 -coordinate
systems?
g) Does the length of v~ differ in the x y - and x 000 y 000 -coordinate systems?

6. Use the forms of the metric tensors and the vector v~ in each of the different coordinate systems
to explain why the length of v~ is identical in each coordinate system.

Exercise 2.13 Suppose a particle moves along the surface of a ball with position, in the standard
rectilinear x y z -coordinate system, given by
 
x
p~ =  y 
 

61
in the standard rectilinear x y z -coordinate system. Answer the following questions.

1. Rewrite the position of the particle with respect to the corresponding x̂ , ŷ and ẑ unit vectors.

2. Show, by direct calculation, that the tangents to the x -, y - and z - coordinate curves are
     
1 0 0
e~x = 0 , e~y = 1 and e~z = 0 .
     

0 0 1

What do these tangent vectors tell us? Are these tangent vectors also unit vectors?

3. Show, by direct calculation, that the metric tensor in the x y z -coordinate system is
 
1 0 0
g =  0 1 0 .
 

0 0 1

4. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

x (t ) = 0, y (t ) = 0 and z (t ) = 1 − 2t .

Describe the motion of the particle along this path. Show, by direct calculation, that this path
has length ` = 2. Does this make sense? Explain your answer.

5. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

2t t2−1
x (t ) = , y (t ) = 0 and z (t ) = .
t2+1 t2+1
Describe the motion of the particle along this path. Show, by direct calculation, that this path
has length ` = π2 . Does this make sense? Explain your answer.

Exercise 2.14 Suppose a particle moves along the surface of a ball with position, in the standard
rectilinear x y z -coordinate system, given by
   
x ρ sin (θ ) cos φ
p~ =  y  =  ρ sin (θ ) sin φ 
   

z ρ cos (θ )

where ρ, θ and φ are the radial, polar- and azimuthal-angle positions of particle on the sphere.
Answer the following questions.

1. Show, by direct calculation, that the tangents to the ρ-, θ - and φ- coordinate curves are
     
sin (θ ) cos φ cos (θ ) cos φ − sin φ
e~ρ =  sin (θ ) sin φ  , e~θ = ρ  cos (θ ) sin φ  and e~φ = ρ sin (θ )  cos φ  .
     

cos (θ ) − sin (θ ) 0

62
2. Show, by direct calculation, that the radial, polar- and azimuthal-unit vectors are
     
sin (θ ) cos φ cos (θ ) cos φ − sin φ
ρ̂ =  sin (θ ) sin φ  , θ̂ =  cos (θ ) sin φ  and φ̂ =  cos φ  .
     

cos (θ ) − sin (θ ) 0

3. Rewrite the position of the particle with respect to the ρ̂, θ̂ and φ̂.

4. Re-interpret p~ (ρ, θ , φ) as a column vector with coordinate components ρ, θ and φ. Write p~ as


a column vector in the ρθ φ-coordinate system. Compare this column vector representation
of p~ with that in the x y z coordinate system and explain why
   
x ρ
p~ =  y  =  θ  but x 6= ρ y 6= θ and z 6= φ
   

z φ

is a consistent statement.

5. Show that the metric tensor in the ρθ φ-coordinate system is


 
1 0 0
g = 0 ρ 2 0 .
 

0 0 ρ 2 sin2 (θ )

6. Why is the metric tensor in the x y z -coordinate system different than that in the ρθ φ-
coordinate system?

7. Suppose that ρ = ρ(t ), θ = θ (t ) and φ = φ(t ) and show that the velocity of the particle is

p~˙ (t ) = ρ̇(t )r̂ + ρ(t )θ̇ (t )θ̂ + ρ(t ) sin (θ ) φ̇(t )φ̂

and give an interpretation for each of quantities ρ̇(t ), ρ(t )θ̇ (t ) and ρ(t ) sin (θ ) φ̇(t ).

8. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that
π
ρ(t ) = 1, θ (t ) = and φ(t ) = 8t
2
Describe the motion of the particle in 3 dimensions, as it travels along this path. Show, by
direct calculation, that this path has length ` = 8. Does this make sense? Explain your answer?

9. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

ρ(t ) = 1, θ (t ) = πt and φ(t ) = 0

Describe the motion of the particle in 3 dimensions, as it travels along this path. Show, by
direct calculation, that this path has length ` = π.

63
Exercise 2.15 Show that the area of the outer curved part of a hollow paper cylinder of radius R
and height h in 3-dimensions is
A = 2πR h .

Hint: use the following procedure

1. use the cylindrical coordinates system to determine the vectors tangent to the surface of the
cylinder. Suppose that the base of the cylinder corresponds to z = 0.

2. determine the area element by computing the cross product of vectors tangent to the curved
part of the cylinder.

3. integrate area element of the cylinder.

4. cut the cylinder along its edge, and press it flat to form a rectangle, then measure the lengths
of each side of this rectangle, compute its area and compare this answer that obtained by
performing the integral. (Should these answers coincide?)

Exercise 2.16 Show that the volume of a solid ball of radius R in 3-dimensions is

4
V = πR 3 .
3
Hint: use the following procedure

1. use the spherical polar coordinates system to determine the vectors tangent to surface of the
sphere. Suppose that the north pole corresponds to the point θ = 0.

2. compute the metric tensor g in this coordinate system.

3. determine the volume element by computing the scalar triple product of the tangent vectors.
q
4. compare the volume element with det(g) , where det(g) is the absolute value of the
determinant of the metric tensor (when written as a matrix).

5. integrate volume element of the sphere over the northern hemisphere and then double the
value of the integral to get the volume of the entire sphere.

6. integrate volume element of the sphere over the entire sphere and compare your answer with
that obtained by doubling the volume of the northern hemisphere. (Should these answers
coincide?)

64

You might also like