An Introduction To Vectors, Vector Operators and Vector Analysis by Pramod S. Joag
An Introduction To Vectors, Vector Operators and Vector Analysis by Pramod S. Joag
Conceived as s a supplementary text and reference book for undergraduate and graduate
students of science and engineering, this book intends communicating the fundamental
concepts of vectors and their applications. It is divided into three units. The first unit deals
with basic formulation: both conceptual and theoretical. It discusses applications of
algebraic operations, Levi-Civita notation and curvilinear coordinate systems like
spherical polar and parabolic systems. Structures and analytical geometry of curves and
surfaces is covered in detail.
The second unit discusses algebra of operators and their types. It explains the equivalence
between the algebra of vector operators and the algebra of matrices. Formulation of
eigenvectors and eigenvalues of a linear vector operator are discussed using vector algebra.
Topics including Mohr’s algorithm, Hamilton’s theorem and Euler’s theorem are discussed
in detail. The unit ends with a discussion on transformation groups, rotation group, group
of isometries and the Euclidean group, with applications to rigid displacements.
The third unit deals with vector analysis. It discusses important topics including vector
valued functions of a scalar variable, functions of vector argument (both scalar valued and
vector valued): thus covering both the scalar and vector fields and vector integration
Pramod S. Joag is presently working as CSIR Emeritus Scientist at the Savitribai Phule
University of Pune, India. For over 30 years he has been teaching classical mechanics,
quantum mechanics, electrodynamics, solid state physics, thermodynamics and statistical
mechanics at undergraduate and graduate levels. His research interests include quantum
information, and more specifically measures of quantum entanglement and quantum
discord, production of multipartite entangled states, entangled Fermion systems, models
of quantum nonlocality etc.
www.TechnicalBooksPDF.com
www.TechnicalBooksPDF.com
An Introduction to Vectors, Vector
Operators and Vector Analysis
Pramod S. Joag
www.TechnicalBooksPDF.com
4843/24, 2nd Floor, Ansari Road, Daryaganj, Delhi - 110002, India
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781107154438
© Pramod S. Joag 2016
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2016
Printed in India
A catalogue record for this publication is available from the British Library
Library of Congress Cataloging-in-Publication Data
Names: Joag, Pramod S., 1951- author.
Title: An introduction to vectors, vector operators and vector analysis /
Pramod S. Joag.
Description: Daryaganj, Delhi, India : Cambridge University Press, 2016. |
Includes bibliographical references and index.
Identifiers: LCCN 2016019490| ISBN 9781107154438 (hardback) | ISBN 110715443X
(hardback)
Subjects: LCSH: Vector analysis. | Mathematical physics.
Classification:LCC QC20.7.V4 J63 2016 | DDC 512/.5–dc23 LC record available at
https://lccn.loc.gov/2016019490
ISBN 978-1-107-15443-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
www.TechnicalBooksPDF.com
To Ela and Ninad
who made me write this document
www.TechnicalBooksPDF.com
www.TechnicalBooksPDF.com
Contents
Figures xiii
Tables xx
Preface xxi
Nomenclature xxv
I Basic Formulation
1 Getting Concepts and Gathering Tools 3
1.1 Vectors and Scalars 3
1.2 Space and Direction 4
1.3 Representing Vectors in Space 6
1.4 Addition and its Properties 8
1.4.1 Decomposition and resolution of vectors 13
1.4.2 Examples of vector addition 16
1.5 Coordinate Systems 18
1.5.1 Right-handed (dextral) and left-handed coordinate systems 18
1.6 Linear Independence, Basis 19
1.7 Scalar and Vector Products 22
1.7.1 Scalar product 22
1.7.2 Physical applications of the scalar product 30
1.7.3 Vector product 32
1.7.4 Generalizing the geometric interpretation of the vector product 36
1.7.5 Physical applications of the vector product 38
1.8 Products of Three or More Vectors 39
1.8.1 The scalar triple product 39
1.8.2 Physical applications of the scalar triple product 43
1.8.3 The vector triple product 45
1.9 Homomorphism and Isomorphism 45
www.TechnicalBooksPDF.com
viii Contents
II Vector Operators
www.TechnicalBooksPDF.com
Contents ix
8 Preliminaries 215
8.1 Fundamental Notions 215
8.2 Sets and Mappings 216
8.3 Convergence of a Sequence 217
8.4 Continuous Functions 220
www.TechnicalBooksPDF.com
x Contents
www.TechnicalBooksPDF.com
Contents xi
www.TechnicalBooksPDF.com
xii Contents
Appendices
Bibliography 515
Index 517
www.TechnicalBooksPDF.com
Figures
1.1 (a) A line indicates two possible directions. A line with an arrow specifies
a unique direction. (b) The angle between two directions is the amount by
which one line is to be rotated so as to coincide with the other along with
the arrows. Note the counterclockwise and clockwise rotations. (c) The angle
between two directions is measured by the arc of the unit circle swept by the
rotating direction. 5
1.2 We can choose the angle between directions ≤ π by choosing which direction
is to be rotated (counterclockwise) towards which. 5
1.3 Different representations of the same vector in space 7
1.4 Shifting origin makes (a) two different vectors correspond to the same point
and (b) two different points correspond to the same vector 8
1.5 Vector addition is commutative 9
1.6 (a) Addition of two vectors (see text). (b) Vector AE equals a + b + c + d.
Draw different figures, adding a, b, c, d in different orders to check that this
vector addition is associative. 10
1.7 αa + αb = α (a + b) 10
1.8 Subtraction of vectors 11
1.9 a, b, αa + βb are in the same plane 11
1.10 An arbitrary triangle ABC formed by addition of vectors a, b; c = a + b. The
angles at the respective vertices A, B, C are denoted by the same symbols. 12
1.11 Dividing P Q in the ratio λ : (1 − λ) 13
1.12 Addition of forces to get the resultant. 17
1.13 (a) The velocity of a shell fired from a moving tank relative to the ground.
(b) The southward angle θ at which the shell will fire from a moving tank so
that its resulting velocity is due west. 17
1.14 (a) Left handed screw motion and (b) Left handed coordinate system.
(c) Right handed screw motion and (d) Right handed (dextral) coordinate
www.TechnicalBooksPDF.com
xiv Figures
system. Try to construct other examples of the left and right handed
coordinate systems. 19
1.15 Scalar product is commutative. The projections of a on b and b on a give
respectively a· b̂ = |a| cos θ and b· â = |b| cos θ. Multiplication on both sides
of the first equation by |b| and the second by |a| results in the symmetrical
form a · b = |b| a · b̂ = |a| b · â 23
1.16 The scalar product is distributive with respect to addition 25
1.17 Lines joining a point on a sphere with two diametrically opposite points are
perpendicular 27
1.18 Getting coordinates of a vector v (see text) 27
1.19 Euclidean distance for vectors 28
1.20 Work done on an object as it is displaced by d under the action of force F 30
1.21 Potential energy of an electric dipole p in an electric field E 31
1.22 Torque on a current carrying coil in a magnetic field 31
1.23 Vector product of a and b : |a×b| = |a||b| sin θ is the area of the parallelogram
as shown 32
1.24 Generalizing the geometric interpretation of vector product 36
1.25 Geometrical interpretation of coordinates of a vector product 37
1.26 Moment of a force 38
1.27 Geometric interpretation of the scalar triple product (see text) 40
1.28 The volume of a tetrahedron as the scalar triple product 41
1.29 See text 50
1.30 Spherical polar coordinates 58
1.31 Coordinate surfaces are x2 + y 2 + z2 = r 2 (spheres r = constant) tan θ =
(x2 + y 2 )1/2 /z (circular cones, θ = constant) tan φ = y/x (half planes φ =
constant) 60
1.32 Differential displacement corresponds to |ds| = |dr| (see text) 62
1.33 Parabolic coordinates (µ, ν, φ). Coordinate surfaces are paraboloids of
revolution (µ = constant, ν = constant) and half-planes (φ = constant) 63
1.34 Cylindrical coordinates (ρ, φ, z ). Coordinate surfaces are circular cylinders
(ρ = constant), half-planes (φ = constant) intersecting on the z-axis, and
parallel planes (z = constant) 64
1.35 Prolate spheroidal coordinates (η, θ, φ). Coordinate surfaces are prolate
spheroids (η = constant), hyperboloids (θ = constant), and half-planes
(φ = constant) 65
1.36 Oblate spheroidal coordinates (η, θ, φ). Coordinate surfaces are oblate
spheroids (η = constant), hyperboloids (θ = constant), and half-planes
(φ = constant) 66
www.TechnicalBooksPDF.com
Figures xv
1.37 (a) Positively and (b) negatively oriented triplets (a, b, c), (c) Triplet (b, a, c)
has orientation opposite to that of (a, b, c) in (a) 68
2.1 Line L with directance d = x − (û · x)û 76
2.2 |m| = |x × û| = |d| for all x on the line L 76
2.3 See text 77
2.4 See text 78
2.5 With A and B defined in Eq. (2.8) (a) |a×b| = |A| + |B| and (b) |a×b| = |B|−
|A|. These equations can be written in terms of the areas of the corresponding
triangles 80
2.6 A0 = (x − c) × (b − c) and B0 = (a − c) × (x − c) (see text) 81
2.7 Case of c parallel to x 82
2.8 See text 83
2.9 A plane positively oriented with respect to the frame (î, ĵ, k̂) 83
2.10 Every line in the plane is normal to a 85
2.11 As seen from the figure, for every point on the plane k̂ · r = constant 86
2.12 Shortest distance between two skew lines 88
2.13 A spherical triangle 89
2.14 Depicting Eq. (2.22) 91
2.15 Conics with a common focus and pericenter 92
3.1 Isomorphism between the complex plane Z and E2 95
3.2 Finding evolute of a unit circle 95
√
3.3 Finding i 96
3.4 Finding nth roots of unity 97
3.5 z, z∗ , z ± z∗ 97
3.6 Depicting Eq. (3.3) 99
3.7 If D is real, z1 , z2 , z3 , z4 lie on a circle 100
3.8 The argument ∆ of ω defined by Eq. (3.9) 102
3.9 Constant angle property of the circle 104
3.10 Constant power property of the circle 104
3.11 Illustrating Eq. (3.10) 105
3.12 Both impedance and admittance of this circuit are circles 106
3.13 Boucherot’s circuit 107
3.14 Four terminal network 108
3.15 Geometrical meaning of ωz2 = ω0 ω∞ 109
3.16 Point by point implementation of transformation Eq. (3.14) 110
3.17 An ellipse and a hyperbola 110
4.1 Inverse of a mapping. A one to one and onto map f : X 7→ Y has the unique
inverse f −1 : Y 7→ X 118
www.TechnicalBooksPDF.com
xvi Figures
5.1 u · eiθ v = e−iθ u · v 141
5.2 Symmetric transformation with principal values λ1 > 1 and λ2 < 1 145
5.3 An ellipsoid with semi-axes λ1 , λ2 , λ3 146
5.4 Parameters in Mohr’s algorithm 150
5.5 Mohr’s Circle 151
5.6 Verification of Eq. (5.48) 152
6.1 Reflection of a vector in a plane 161
6.2 Reflection of a particle with momentum p by an unmovable plane 162
6.3 See text 164
6.4 Shear of a unit square 169
6.5 Rotation of a vector 170
6.6 Infinitesimal rotation δθ of x about n̂ 171
6.7 Vectors dx and arc length ds as radius |x| sin θ is rotated through angle δθ.
As δθ 7→ 0 dx becomes tangent to the circle. 172
6.8 Orthonormal triad to study the action of the rotation operator 174
6.9 Equivalent rotations: One counterclockwise and the other clockwise 178
6.10 Composition of rotations. Rotations do not commute. 180
6.11 Active and passive transformations 182
6.12 Euler angles 184
6.13 Rotations corresponding to Euler angles 186
6.14 Roll, pitch and yaw 187
7.1 (a) Symmetry elements of an equilateral triangle i) Reflections in three planes
shown by ⊥ bisectors of sides. ii) Rotations through 2π/3, 4π/3 and 2π
(= identity) about the axis ⊥ to the plane of the triangle passing through the
center. (b) Isomorphism with S3 (see text). 194
7.2 (a) Symmetry elements of a square (group D4 ) i) Reflections in planes
through the diagonal and bisectors of the opposite sides. ii) Rotations about
the axis through the center and ⊥ to the square by angles π/2, pi, 3π/2 and
2π (= identity). (b) D4 is isomorphic with a subgroup of S4 (see text). 195
7.3 Translation of a physical object by a 199
7.4 A rigid displacement is the composite of a rotation and a translation. The
translation vector a need not be in the plane of rotation. 201
7.5 Equivalence of a rotation/translation in a plane to a pure rotation 203
8.1 A converging sequence in E3 218
9.1 Geometry of the derivative 222
9.2 Parameterization by arc length 226
9.3 The Osculating circle 228
9.4 Curvature of a planar curve 229
www.TechnicalBooksPDF.com
Figures xvii
www.TechnicalBooksPDF.com
xviii Figures
10.11 The Network of coordinate lines and coordinate surfaces at any arbitary
point, defining a curvilinear coordinate system 314
10.12 (a) Evaluating x · da (b) Flux through the opposite faces of a volume element 318
10.13 Circulation around a loop 320
11.1 Defining the line integral 323
11.2 x(t ) = cos t î + sin t ĵ 325
11.3 A circular helix 326
11.4 In carrying a test charge from a to b the same work is done along either path 326
11.5 Line integral over a unit circle 328
11.6 Line integral around a simple closed curve as the sum of the line integrals
over its projections on the coordinate planes 330
11.7 Illustrating Eq. (11.13) 333
11.8 Each winding of the curve of integration around the z axis adds 2π to its
value 335
11.9 Illustration of a simply connected domain 337
11.10 The closed loop for integration 340
11.11 The geometry of Eq. (11.17) 342
11.12 A spherically symmetric mass distribution 347
11.13 Variables in the multipole expansion 352
11.14 Earth’s rotation affected its shape in its formative stage 354
11.15 Area integral 358
11.16 Area swept out by radius vector along a closed curve. Cross-hatched region
is swept out twice in opposite directions, so its area is zero. 359
11.17 Directed area of a self-intersecting closed plane curve. Vertical and
horizontal lines denote areas with opposite orientation, so cross-hatched
region has zero area. 360
11.18 Interior and exterior approximations to the area of the unit disc |x| ≤ 1 for
n = 0, 1, 2 where A−0 = 0, A−1 = 1, A−2 = 2, A2+ = 4.25, A1+ = 6, A0+ = 12 361
11.19 Evaluation of a double integral 364
11.20 Subdivision by polar coordinate net 367
11.21 General convex region of integration 374
11.22 Non-convex region of integration 375
11.23 Circular ring as a region of integration 375
11.24 Triangle as a region of integration 376
11.25 The right triangular pyramid 378
11.26 Changing variables of integration (see text) 379
11.27 Tangent plane to the surface 385
11.28 Divergence theorem for connected regions 396
www.TechnicalBooksPDF.com
Figures xix
www.TechnicalBooksPDF.com
Tables
www.TechnicalBooksPDF.com
Preface
www.TechnicalBooksPDF.com
xxii Preface
vector operators and the third on vector analysis. Following is the brief description of each
one of them.
The first part gives the basic formulation, both conceptual and theoretical. The first
chapter builds basic concepts and tools. The first three sections are the result of my
experience with students and I have found that these matters should be explicitly dealt
with for the correct understanding of the subject. I hope that the first three sections will
clear up the confusion and the misconceptions regarding many basic issues, in the minds
of students. I have also given the applications and examples of every algebraic operation,
starting from vector addition. Levi-Civita notation is introduced in detail and used to get
the vector identities. The metric space structure is introduced and used to understand
vectors in the context of the physical quantities they represent. Apart from the essential
structures like basis, dimension, coordinate systems and the consequences of linearity, the
curvilinear coordinate systems like spherical polar and parabolic systems are developed
systematically. Vector fields are defined and their basic structure is given. The orientation
of a linearly independent triplet of vectors is then discussed, also including the orientation
of a triplet relative to a coordinate system and the related concept of the orientation of a
plane, which is later used to understand the orientation of a surface. The second chapter
deals with the analytical geometry of curves and surfaces emphasizing vector methods.
The third chapter uses complex algebra for manipulating planar vectors and for the
description and transformations of the plane curves. In this chapter I follow the treatment
by Zwikker [26] which is a complete and rigorous exposition of these issues.
The second part deals with operators on vectors. Everything about vector operators is
formulated using vector algebra (scalar and vector products) and matrices. The fourth
chapter gives the algebra of operators and various types of operators, and proves and
emphasizes the equivalence between the algebra of vector operators and the algebra of
matrices representing them. The fifth chapter gives general formulation of getting
eigenvectors and eigenvalues of a linear operator on vectors using vector algebra. The
properties of the spectrum of a symmetric operator are also obtained using vector algebra.
Thus, extremely useful and general methods are accessible to the students using
elementary vector algebra. A powerful algorithm to diagonalize a positive operator acting
on a 2-D space, called Mohr’s algorithm, is then described. Mohr’s algorithm has been
routinely used by engineers via its graphical implementation, as explained in the text. The
sixth chapter develops in detail orthogonal transformations as rotations or reflections. The
generic forms for operators of reflection and rotation, as well as the matrices for the
rotation operator are obtained. The relationship between rotation and reflection is
established via Hamilton’s theorem. The active and passive transformations and their
connection with symmetry is discussed. The concept of broken symmetry is briefly
discussed. The Euler angle construction for arbitrary rotation is then derived. The
problem of finding the axis and the angle of rotation corresponding to a given orthogonal
matrix is solved as the Euler’s theorem. The second part ends with the seventh chapter on
transformation groups and deals with the rotation group, group of isometries and the
Euclidean group, with applications to rigid displacements.
www.TechnicalBooksPDF.com
Preface xxiii
The third part deals with vector analysis. This is a vast subject and a personal flavor in
the choice of topics is inevitable. For me the guiding question was, what vector analysis a
graduating student in science and engineering must have ? Again, the variety of answers to
this question is limited only by the number of people addressing it. Thus, the third part
gives my version of the answer to this question and the resulting vector analysis. I
primarily develop the subject with geometric point of view, making as much contact with
applications as possible. My aim is to enable the student to independently read,
understand and use the literature based on vector analysis for the applications of his
interest. Whether this aim is met can only be decided by the students who learn and try to
use this material. This part is divided into five (Chapters 8–12). The eighth chapter
outlines fundamental notions and preliminary start ups, and also sets the objectives. The
ninth chapter consists of the vector valued functions of a scalar variable. Theories of space
curves and of plane curves are developed from scratch with some physical applications.
This chapter ends with the integration of such functions with respect to their scalar
argument and their Taylor series expansion. The tenth chapter deals with the functions of
vector argument, both scalar valued and vector valued, thus covering both the scalar and
vector fields. Again, everything is developed from scratch, starting with the directional
derivative, partial derivatives and continuity of such functions. A part of this development
is inspired by the geometric calculus developed by D. Hestenes and others [7, 10, 11]. To
summarize, this chapter consists of different forms of derivatives of these and inverse
functions, and their geometric/physical applications. A major omission in this chapter is
that of the systematic development of differential forms, which may not be required in an
undergraduate course. The eleventh chapter concerns vector integration. This is done in
three phases: the line, the surface and the volume integral. All the standard topics are
covered, emphasizing geometric aspects and physical applications. While writing this part,
I have made use of many books, especially the book by Courant and John [5] and that by
Lang [15], for the simple reason that I have learnt my calculus from these books, and I
have no regrets about that. In particular, my treatment of multiple integrals and matrices
and determinants in Appendix A is inspired by Courant and John’s book. I find in their
book, the unique property of building rigorous mathematics, starting from an intuitive
geometric picture. Also, I follow Griffiths while presenting the divergence and the curl of
vector fields, which, I think, is possibly one of the most compact and clear treatments of
this topic. The subsections 11.1.1 and 11.8.1 and a part of section 9.2 are based on ref
[22]. The twelfth and last chapter of the book presents an assorted collection of
applications involving rotational motion of a rigid body, projectile motion, satellites and
their orbits etc, illustrating coordinate-free analysis using vector techniques. This chapter,
again, is influenced by Hestenes [10].
Appendix A develops the theory of matrices and determinants emphasizing their
connection with vectors, also proving all results involving matrices and determinants used
in the text. Appendix B gives a brief introduction to Dirac delta function.
The whole book is interspersed with exercises, which form an integral part of the text.
Most of these exercises are illustrative or they explore some real life application of the
theory. Some of them point out the subtlties involved. I recommend all students to attempt
www.TechnicalBooksPDF.com
xxiv Preface
all exercises, without looking at the solutions beforehand. When you read a solution after
an attempt to get there, you understand it better. Also, do not be miserly about drawing
figures, a figure can show you a way which thousand words may not.
I cannot end this preface without expressing my affection towards my friend and my
deceased colleague Dr Narayan Rana, who re-kindled my interest in mechanics. Long
evenings that I spent with him discussing mechanics and physics in general, sharing and
laughing at various aspects of life from a distance, are the treasures of my life. We entered
a rewarding and fruitful collaboration of writing a book on mechanics [19]. This
collaboration and Hestenes’ book [10] motivated me to formulate mechanics in a
coordinate free way using vector methods. Apart from the book by Hestenes and his other
related work, the book by V. I. Arnold on mechanics [3] has made an indelible impact on
my understanding and my global view of mechanics, although its influence is not quite
apparent in this book. I have always enjoyed discussing mechanics and physics in general
with my colleagues Rajeev Pathak, Anil Gangal, C. V. Dharmadhikari, P. Durganandini,
and Ahmad Sayeed. The present book is produced in LATEX and I thank our students,
Dinesh Mali, Mukesh Khanore and Mihir Durve for their help in drawing figures and also
as TEXperts.
www.TechnicalBooksPDF.com
Nomenclature
α, β, γ, δ Scalars
∠ (a, b) Angle between vectors a, b
a, b, x, y Vectors
θ, φ, ψ, χ Angles
R Region of 3-D space/plane
LHS Left hand side
RHS Right hand side
R3 Vector space comprising ordered triplets of real numbers
E3 3-D vector space
|a|, a Magnitude of a
||a|| Norm of a
A, B Matrices
|A|, |B| Determinants
R(z ), I (z ) Real and imaginary parts of a complex number
CM Center of mass
µ Magnetic moment
L Magnitude of angular momentum, A linear differential form
h Angular momentum
www.TechnicalBooksPDF.com
xxvi Nomenclature
www.TechnicalBooksPDF.com
Part I
Basic Formulation
www.TechnicalBooksPDF.com
www.TechnicalBooksPDF.com
1
www.TechnicalBooksPDF.com
4 An Introduction to Vectors, Vector Operators and Vector Analysis
In this book we are using boldfaced letters for vectors. A symbol which is not bold, may
represent the magnitude of the corresponding vector, or a scalar.
S = rθ.
Any arbitrary circle drawn in the specified plane can be used to get the value of angle θ via
the above equation (θ = S/r ). In other words, the radius r is arbitrary. It is convenient
to choose a unit circle, that is, a circle with radius unity, (r = 1), so that the arc-length
and the angle swept by the rotating line are numerically equal (see Fig. 1.1(c)). Such a arc-
length measure of angle is called ‘radian measure’. Since the length of the circumference
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 5
of a unit circle is 2π, the angle corresponding to one complete rotation is 2π. The angle
corresponding to half the circumference is π and so on.
This procedure still leaves an ambiguity in defining the angle between two directions.
We can rotate one of the directions (so as to coincide with the other direction) in two ways.
The sense of one rotation is reverse to that of the other. Each of these rotations correspond
to different angles, say θ and 2π − θ (see Fig. 1.1(b)). Which of these rotations do we
choose? We place a clock with its center at the point of intersection of the two lines so as to
view it from the top. We then choose the rotation in the sense opposite to that of the hands
of the clock. This is called counterclockwise rotation.
Fig. 1.1 (a) A line indicates two possible directions. A line with an arrow specifies
a unique direction. (b) The angle between two directions is the amount
by which one line is to be rotated so as to coincide with the other
along with the arrows. Note the counterclockwise and clockwise rotations.
(c) The angle between two directions is measured by the arc of the unit
circle swept by the rotating direction.
The angle swept by a counterclockwise rotation is taken to be positive, while the angle swept
by a clockwise rotation is negative. Note that we can always choose the angle between two
directions to be ≤ π by choosing which direction is to be rotated counterclockwise towards
which (see Fig. 1.2).
Fig. 1.2 We can choose the angle between directions ≤ π by choosing which
direction is to be rotated (counterclockwise) towards which.
www.TechnicalBooksPDF.com
6 An Introduction to Vectors, Vector Operators and Vector Analysis
The angle between two directions is used to specify one direction relative to the other. If
you reflect on your experience, you will realize that the only way to specify a direction is to
specify it relative to some other reference direction which you can determine by observing
something like a magnetic needle. To appreciate this, imagine that you are on a ship sailing
in the mid-pacific. Suppose that you have no device like a magnetic compass or a gyroscope
on the ship (I do not recommend this!) and that clouds block your vision of the pole star
and the other stars. Then it is impossible to tell in which direction your ship is moving.
Exercise Consider three different non-coplanar lines 1 intersecting at a point O. Take a
point P which is not on any of these three lines. Put arrows on these three lines to specify
three directions (Draw a figure). Construct a path starting at O and ending at P on which
you are moving either in or opposite to one of the three directions you have specified by
putting arrows on the three lines. Convince yourself that this is always possible. In the light
of the statements made in the first para of this section, this exercise demonstrates that our
space is three dimensional.
1 Any number of lines all of which fall on the same plane are called coplanar. A collection of lines which are not coplanar is
called non-coplanar. A pair of intersecting lines is coplanar.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 7
a = |a|â,
www.TechnicalBooksPDF.com
8 An Introduction to Vectors, Vector Operators and Vector Analysis
The geometric interpretation of the set of real numbers is a straight line, that is, the set
of real numbers is in one to one correspondence with the points on the line. Similarly, the
set of vectors is in one to one correspondence with the points in the three dimensional
space R3 . To see this one to one correspondence, consider the set of vectors comprising all
possible values of some vector quantity. We can construct the set containing the
representatives of these vectors in space. One to one correspondence between these two
sets is obvious by construction. To transfer this correspondence to the points in R3 we
take an arbitrary point in space say O, called origin and represent every vector with O as
the base point. Since the vectors have all possible magnitudes and directions, every point
in space is at the tip of some vector based at O, representing a possible value of the vector
quantity. In this way, a unique magnitude and direction is assigned to every point in space,
establishing the one to one correspondence between the set of vectors and the set of points
in space. We could have chosen any other point, say O0 as the origin and base all vectors at
O0 . This gives a new representation for each vector in the set of vectors obtained by
parallelly transporting each vector based at O to that based at O0 . These two are the
representations of the same set of vectors (values of a vector quantity). However, they
generate two different one to one correspondences with the points in R3 as can be seen
from Fig. 1.4. We see that changing the origin from O to O0 makes a vector correspond to
two different points in space (or, makes a point in space correspond to two different
vectors) as we assign a vector (based at O or O0 ) to a point in space. Thus, changing the
origin changes the one to one correspondence between the set of vectors and the points in
space. Later, we will have a closer look at the one to one correspondence between R3 and
the set of vectors (values of a vector quantity).
Fig. 1.4 Shifting origin makes (a) two different vectors correspond to the same point
and (b) two different points correspond to the same vector
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 9
The vector a + b is sometimes called the resultant of a and b. The rule of adding two or
more vectors is motivated by the net displacement of an object in space, resulting due to
many successive displacements. Thus, if we go from A to B by travelling 10km NE (vector
a) and then from B to C by travelling 6 km W (vector b) the net displacement, 8 km due
North from A to C (vector c), is obtained as depicted in Fig. 1.6(a), which is the same as
that given by c = a + b. Figure 1.6(b) shows the net displacement (f) after four successive
displacements (a, b, c, d) which is consistent with f = a + b + c + d.
We can now list the properties of vector addition and multiplication by a scalar.
(1) Closure If a, b are in R3 then a + b is also in R3 . That is, addition of two vectors
results in a vector.
(2) Commutativity a + b = b + a (see Fig. 1.5).
(3) Associativity For all vectors a, b, c in R3 , a + (b + c) = (a + b) + c. Thus, while
adding three or more vectors, it does not matter which two you add first, which two
next etc, that is, the order in which you add does not matter (see Fig. 1.6(b)).
(4) Identity There is a unique vector 0 such that for every vector a in R3 , a + 0 = a.
(5) Inverse For every vector a , 0 in R3 , there is a unique vector −a such that a +
(−a) = 0 and 0 ± 0 = 0.
To every pair α and a where α is a scalar (i.e., a real number) and a in R3 there is a vector
αa in R3 . If we denote by |a| the magnitude of a, then the magnitude of αa is |α| |a|. If
α > 0, the direction of αa is the same as that of a, while if α < 0 then the direction of αa
is opposite to that of a. If α > 0, then αa is said to be the scaling of a by α. Note that α =
1/|a| produces unit vector â in the direction of a. We have, for the scalar multiplication,
www.TechnicalBooksPDF.com
10 An Introduction to Vectors, Vector Operators and Vector Analysis
Fig. 1.6 (a) Addition of two vectors (see text). (b) Vector AE equals a + b + c + d.
Draw different figures, adding a, b, c, d in different orders to check that this
vector addition is associative.
(3) α (a + b) = αa + αb.
(4) (α + β )a = αa + βa.
Fig. 1.7 αa + αb = α (a + b)
Note that these properties are shared by all vectors independent of the context in which
they are used and independent of which vector quantity they correspond to. As explained
in section 1.3, this is true of all the algebra of vectors and operations on vectors we develop
in this book and will not be stated explicitly again.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 11
AE AD |b|
= = α = α.
AC AB |b|
Substituting AC = |a + b| in the above equation, we get AE = |C| = α|a + b| = α|c|.
However, the vectors c and C are in the same direction, so that C = αc = α (a + b).
Finally, C = αa + αb giving αa + αb = α (a + b).
To subtract vector b from vector a we add vector −b to vector a, as shown in Fig. 1.8
a − b = a + (−b).
Given any two non-zero vectors a and b, their linear combination αa + βb, (α, β scalars)
is a vector in the plane defined by a and b (see Fig. 1.9). Given any set of N vectors
{x1 , x2 , . . . , xN }, their linear combination is defined iteratively. The resulting vector
PN PN
i =1 αi xi is common to the planes formed by all the pairs of vectors ( i =1i,k
αi xi , xk ), k = 1, . . . , N . You can verify this for N = 3.
www.TechnicalBooksPDF.com
12 An Introduction to Vectors, Vector Operators and Vector Analysis
where ∠(a, b) is the angle between the directions of a and b. This also gives, for the angle
between a and b,
which is nothing but the statement of the Pythagorean theorem. Let us now find the angle
made by the vector c = a + b with a say, in terms of the attributes of vectors a and b.
Here again, we make use of the fact that the triplet {a, b, c} forms a triangle. Applying the
trigonometric law of sines to this triangle, we get,
where (a, b, c ) are the magnitudes of the corresponding vectors and the angles involved are
between the directions of the vectors. Having calculated the value of c, we can use the last
equality to get,
b
sin(∠(c, a)) = sin(∠(a, b)) .
c
This gives ∠(c, a) as required. Again, if a and b are orthogonal, we can simplify by noting
sin(∠(c, a)) = bc , or tan(∠(c, a)) = ba .
Exercise If a and b are position vectors of points P and Q, based at the origin O, then
show that the position vector x of a point X dividing P Q in the ratio λ : (1 − λ) is given by
(1 − λ)a + λb.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 13
For what values of λ does the position vector correspond to the point on the ray in the
direction of Q from P ?
Solution We have, (see Fig. 1.11),
−−→
x − a = λ P Q = λ(b − a).
x = (1 − λ)a + λb.
x = a + λ(b − a),
−−→
where b − a = P Q , to see that λ > 0.
Exercise Two spheres of masses m1 and m2 are rigidly connected by a massless rod. The
system is rotating freely about its center of mass. Find the total angular momentum of the
system about CM.
Answer Let the position vector of m1 relative to m2 be r, let the velocity of m1 relative
to CM be v and let µ = mm1+mm2 be the reduced mass of the system. Then the total angular
1 2
momentum is L = 2µr × v.
www.TechnicalBooksPDF.com
14 An Introduction to Vectors, Vector Operators and Vector Analysis
(in fact, uncountably many), pairs of vectors into which a given vector can be decomposed
or resolved. In order to resolve a given vector c into a set of N vectors we first choose
arbitrary sets {αi , 0} and {xi , 0}, i = 1, . . . , N − 1 of N − 1 scalars and vectors
respectively and find the vector x = N
P −1
i =1 αi xi . Then, we choose αN and xN to satisfy
αN xN = c − x. Thus, any vector can be resolved or decomposed in a set of N vectors in
infinitely (uncountably) many ways.
Exercise Draw figures illustrating c = αa + βb and d = αa + βb + γc for different sets
of scalars and vectors satisfying these equations.
Exercise Given a vector c find two vectors a, b of given magnitudes a, b respectively, such
that c is the resultant of a, b. When is this impossible?
Answer Squaring both sides of b = c − a we get, for the angle between a and c, ĉ · â =
−−→ −−→
cos θ = (c2 + a2 − b2 )/2ca. Thus, if we draw vectors c = AC and a = AB making angle
−−→
θ = cos−1 [(c2 + a2 −b2 )/2ca] with each other at A, then the vector BC gives the required
vector b. This will fail if the vectors a, b, c cannot make a triangle, that is when a + b < c.
Exercise Given a vector a , 0 and N non-zero vectors xi , i = 1, . . . , N , no two of which
are parallel and no three of which are coplanar, show that the linear combination of {xi }’s
that equals a is unique.
Solution We first show it for N = 2. Let a = λx1 + µx2 . Note that both the coefficients
cannot be zero, otherwise a = 0. Now suppose that some other linear combination equals
a, say a = λ1 x1 + µ1 x2 . Subtracting these two equations we get (λ − λ1 )x1 + (µ − µ1 )
x2 = 0. Either both of these coefficients are non-zero, or both are zero, otherwise one of
the vectors x1 , x2 is zero, contradicting the assumption that both are non-zero. If both
the coefficients are non-zero, then the vectors x1 , x2 are simply proportional to each other,
which means that they are parallel, in contradiction with the assumption that they are not.
Therefore, both the coefficients (λ − λ1 ) and (µ − µ1 ) must vanish, proving that the linear
combination of x1 and x2 which equals a is unique. This also means that a given linear
combination specifies a unique vector a. Now let a equal a linear combination of three
non-zero and non-coplanar2 vectors, say a = λ1 x1 + λ2 x2 + λ3 x3 . We know that the first
two terms in this linear combination add up to a unique vector say x12 = λ1 x1 + λ2 x2 .
Therefore, we can equivalently write this linear combination as a = x12 + λ3 x3 involving
only two vectors which are not collinear because three vectors x1 , x2 , x3 are not coplanar,so
that we know it to be unique. This fixes the coefficient λ3 and hence makes the linear
combination of three vectors giving a unique. Iterating the same argument we can show
that a linear combination of non-zero, non-parallel and non-coplanar N vectors which
equals a is unique.
Exercise The center of mass of the vertices of a tetrahedron P QRS (each with unit mass)
may be defined as the point dividing MS in the ratio 1 : 3, where M is the center of mass
of the vertices P QR. Show that this definition is independent of the order in which the
vertices are taken and it agrees with the general definition of the center of mass.
2 Note that if three vectors are non-coplanar, then no two of them can be parallel.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 15
m1 a1 + m2 a2 + · · · + mn an = 0.
Solution By definition, the left side gives the position vector of the center of mass G,
which is chosen to be zero.
c = αa + βb (1.1)
for some non-zero scalars α and β. This means, the vectors c, αa and βb form a triangle.
Since triangle is a planar figure, we conclude that vectors a, b, c are coplanar. More useful
form of Eq. (1.1) is
αa + βb + γc = 0. (1.2)
www.TechnicalBooksPDF.com
16 An Introduction to Vectors, Vector Operators and Vector Analysis
On the other hand if a, b, c are given to be coplanar, it is possible to resolve one of them
along the other two vectors, as shown at the beginning of this subsection, so that they
satisfy Eq. (1.1) or Eq. (1.2) with α, β, γ not all zero. Thus, three non-zero vectors are
coplanar if and only if they satisfy Eq. (1.2) with two or more non-zero coefficients.
It follows immediately that if three non-zero vectors satisfy Eq. (1.2) only when all the
coefficients are zero, then they aught to be non-coplanar.
Exercise Show that three points with position vectors a, b, c are collinear if and only if
there exist three non-zero scalars α, β, γ, α , ±β, such that
αa + βb + γc = 0
and
α + β + γ = 0.
Hint From a previous exercise we can infer that if three points are collinear, the position
αc+γa
vector of the middle point is b = α +γ giving γa + αc = (α + γ )b or β + α + γ = 0.
If the given conditions are assumed, we can show that b divides the line joining a and c in
the ratio α : γ.
Exercise Four points P , Q, R, S have position vectors a, b, c, d respectively, no three of
which are collinear. Show that P , Q, R, S are coplanar if and only if there exist four scalars
α, β, γ, δ not all zero, satisfying
αa + βb + γc + δd = 0 and α + β + γ + δ = 0. (1.3)
Solution Let the given P , Q, R, S be coplanar and let the lines P Q and RS intersect at
A with position vector r, such that P A : AQ = λ/µ and RA : AS = ρ/τ. By previous
exercise this gives,
µa + λb τd + ρc
=r= ,
λ+µ ρ+τ
or
µ λ ρ ρ
a+ b− c− d = 0.
λ+µ λ+µ ρ+τ ρ+τ
Replacing the coefficients by α, β, γ, δ we see that conditions in Eq. (1.3) are satisfied. The
proof of sufficiency is left to you.
Vector methods employed to prove simple results in Euclidean geometry may be found
in [23].
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 17
As our first example, we calculate the acceleration of a particle of mass 0.2 kg moving
on a frictionless, horizontal and rectangular table when subjected to a force of F1 = 3N
along the breadth and F2 = 4 N along the length of the table. We know that forces are
vector quantities and the force F experienced by a particle subjected to several forces
F1 , F2 , . . . , FN is simply the sum F1 + F2 + · · · + FN . Thus, the force on the particle is
1
F = F1 + F2 and the magnitude of the resultant F is |F| = (F12 + F22 ) 2 = 5N and acts in a
direction making angle φ with the breadth of the table where tan φ = 43 (see Fig. 1.12). By
Newton’s law, F = ma, so the acceleration is in the same direction, with the magnitude
F/m = 25 m/s/s.
In our second example we make use of the following principle. If an observer moving with
velocity v0 with respect to the ground sees an object moving with an apparent velocity
va , then the velocity of the object with respect to ground say vg is vg = va + v0 . Thus,
consider a tank travelling due north at v0 = 10 m/s firing a shell at va = 200 m/s in a
direction which appears due west to an observer on the tank. Then, the ground velocity of
1
the shell vg has the magnitude vg = (2002 + 102 ) 2 = 205 m/s and a direction making
an angle φ north of due west where tan φ = 10/200 = 0.05 (see Fig. 1.13(a)). A more
relevant question is to ask about the direction in which the gun should be aimed so as to
hit a target due west of the tank. Here, the gun must be fired in a direction θ south of due
west so that the total velocity is in the direction due west. Consulting Fig. 1.13(b) we see
that the required angle is given by sin θ = 0.05.
Fig. 1.13 (a) The velocity of a shell fired from a moving tank relative to the ground.
(b) The southward angle θ at which the shell will fire from a moving tank so
that its resulting velocity is due west.
www.TechnicalBooksPDF.com
18 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise A river flows with a speed of 1m/s. A boy wishes to swim across the river to the
point exactly opposite to him on the other bank. He can swim relative to water at the speed
of 2m/s. At what angle θ should he aim relative to the bank?
Exercise You travel from A to B with velocity 30î and travel back from B to A with
velocity −70î, both measured in the same units. Find your (a) average velocity (b) average
speed.
Answer (a) 0 because the net displacement is 0. (b) Average speed = distance
travelled/time of travel = 42.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 19
î, ĵ, k̂ along their axes are chosen so that a rotation from î to ĵ about z axis should advance
a right handed screw in the direction of k̂ along the z axis. In the last statement you can
cyclically permute î −→ ĵ −→ k̂ −→ î, with the corresponding change in the axis about
which rotation takes place. The coordinate system so chosen is known as the right handed
or dextral system. As against this, we can fix the î, ĵ, k̂ vectors such that a rotation from î
towards ĵ advances a left handed screw in the direction of k̂. As you may know, the same
sense of rotation advances right handed and left handed screws in opposite directions.
This choice results in the left handed coordinate system. Having fixed the î, ĵ, k̂ vectors,
their directions are called the positive directions of the corresponding axes. All this is
depicted in Fig. 1.14.
Fig. 1.14 (a) Left handed screw motion and (b) Left handed coordinate system.
(c) Right handed screw motion and (d) Right handed (dextral) coordinate
system. Try to construct other examples of the left and right handed
coordinate systems.
www.TechnicalBooksPDF.com
20 An Introduction to Vectors, Vector Operators and Vector Analysis
(see subsection 1.4.1, Fig. 1.9) and can never be made to coincide with the non-coplanar
vector k̂, irrespective of the values of α1 and α2 we choose. In other words, none of the
non-coplanar vectors î, ĵ, k̂ can be expressed as a linear combination of the remaining
ones. Such a set of vectors is called a set of linearly independent vectors. If a set of vectors
{v1 , v2 , v3 }, vi , 0; i = 1, 2, 3 is linearly independent, then the equation
α1 v1 + α2 v2 + α3 v3 = 0 (1.4)
is satisfied only when all scalars are zero. Suppose αi , 0, i ∈ {1, 2, 3} and still satisfies
Eq. (1.4), (note that at least two of them have to be non-zero for this), then we can divide
Eq. (1.4) by a non-zero coefficient (say α1 ) making the coefficient of the corresponding
vector (v1 ) equal unity. We can then take all the other terms from LHS to RHS so that this
vector (v1 ) is expressed as the linear combination of the remaining ones. Thus, these two
definitions of linear independence are equivalent.
Exercise Show that two linearly dependent vectors are parallel to each other.
What is most interesting is that the maximum number of linearly independent vectors we
can find in R3 is three. In other words, any set of ≥ 4 vectors in R3 is linearly dependent,
that is, one or more of the vectors in this set can be expressed as a linear combination of
the remaining ones. (Compare with our discussion about the dimension of the ‘space we
live in’ in the second para of section 1.2). We identify this maximal number of linearly
independent vectors, namely three, to be the dimension of R3 . (In general, the maximum
number of linearly independent vectors in an n dimensional space is n. We assume that n
is finite). Any set of three non-coplanar vectors in R3 can be used to express any vector v
in R3 as their linear combination in the following way. Consider the set {e1 , e2 , e3 , v} of
which the first three vectors are linearly independent, that is, non-coplanar. Since the
dimension of space is three, the above set comprising four vectors has to be linearly
dependent. Therefore, in the equation
α1 e1 + α2 e2 + α3 e3 + α4 v = 0,
not all of the scalar coefficients αi , i = 1, . . . , 4 can be zero. If α4 = 0, the equation reduces
to
α1 e1 + α2 e2 + α3 e3 = 0
and not all these α 0 s can be zero. This contradicts the fact that the set {ei , i = 1, 2, 3} is
linearly independent. Therefore, α4 , 0 and we can write
−1
v= [α e + α2 e2 + α3 e3 ] .
α4 1 1
Note that we can trivially generalize this argument to any n dimensional space where the
maximum number of linearly independent vectors is n.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 21
We restrict ourselves to the case where the three linearly independent vectors î, ĵ, k̂ are
also mutually orthogonal, (although the following discussion in this section does
not require it) and set up the corresponding Cartesian coordinate system. Given any
vector v we want to find three scalars vx , vy and vz such that the linear combination
vx î + vy ĵ + vz k̂ equals v. The successive terms in this linear combination are called the
x, y, and z components of v or the components along the x, y, z axes respectively. The
scalars vx , vy , vz are the coordinates of the tip of the vector v, (or the coordinates of v for
brevity) based at the origin of the coordinate system corresponding to the mutually
orthogonal unit vectors î, ĵ, k̂ we have set up. (A way to get these coordinates is given in
the next section). Given v, the scalars vx , vy , vz defined by the linear combination of v in
terms of the three linearly independent vectors are unique. Suppose vx1 , vy1 , vz1 and vx2 ,
vy2 , vz2 are two sets of scalars such that both the corresponding linear combinations equal
v. This means
Since î, ĵ, k̂ are linearly independent, the last equation is satisfied only when
(vx1 − vx2 ) = 0 etc, that is, when vx1 = vx2 , vy1 = vy2 and vz1 = vz2 . Thus, every vector
in R3 corresponds to a unique triplet of scalars (real numbers, motivating the notation
R3 ) once we fix the mutually orthogonal set of vectors î, ĵ, k̂. (e.g., the triplet 0, 0, 0
corresponds to the origin). The set of vectors {î, ĵ, k̂} has two properties: It is a maximal
set of linearly independent vectors (i.e., contains three vectors) and every vector in R3 can
be written as a unique linear combination of this set of vectors. Such a set of vectors is
called a basis. Note that we may add a vector to the set of basis vectors and express every
vector in R3 as a linear combination of this expanded set, but this linear combination can
be written as a linear combination of the basis vectors alone, because the expanded set is a
linearly dependent set of vectors. On the other hand, as we have seen above, given a
linearly independent set smaller than a basis, we can find vectors that are not equal to any
linear combination of vectors from this smaller set. Thus, a basis (that is, a maximal set of
linearly independent vectors) is the minimal set of vectors required to span the space.
Further, there are infinite possible bases as we can choose infinitely many sets of three
mutually orthogonal vectors and each of them can be a basis, defining the corresponding
coordinate system and the corresponding linear combinations for the vectors in R3 . For
different bases (coordinate systems), the linear combinations of basis vectors which equal
a given vector are different, resulting in different coordinates for the same vector in
different coordinate systems. A basis comprising three mutually orthogonal unit vectors is
called an orthonormal basis.
Exercise Let a = 3k =1 αk îk and b = 3k =1 βk îk with respect to the same orthonormal
P P
www.TechnicalBooksPDF.com
22 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise If any subset of a set of vectors is linearly dependent, then show that the whole
set is linearly dependent.
Solution Let {xi , i = 1, . . . , k} out of {xi , i = 1, . . . , n} k < n be linearly dependent, so that
Pk Pk Pn
i =1 αi xi = 0 such that not all αi = 0. Consider i =1 αi xi + j =k +1 0xj = 0 which is
a linear combination of all the n vectors equated to zero such that not all the coefficients
equal zero. Therefore, the whole set is linearly dependent.
From this result we conclude that every subset of a linearly independent set of vectors is
linearly independent. Thus, any three linearly independent vectors have to be
non-coplanar, which in turn ensures that no two of them are collinear and hence no two
of them are linearly dependent.
where a, |a| (b, |b|) is the magnitude of a (b) and θ is the angle between the directions of a
and b. (To get this angle, we have to base both the vectors at the same point). Note that
the scalar product has different signs for θ < π/2 and θ > π/2. We can always take
θ < π by choosing which direction is to be rotated counterclockwise towards which. If
one of the two vectors (say b̂) is a unit vector, then a · b̂ is the projection of a on the
direction defined by b̂. Thus, the scalar product is the product of the projection of a on
the direction defined by b̂ with the magnitude of b which is the same as the product of the
projection of b on the direction defined by â with the magnitude of a. This demonstrates
the obvious symmetry of the result
a · b = b · a.
This shows that the dot product is commutative (see Fig. 1.15).
The magnitude of a is also called the norm of a and denoted ||a||. Note that a · a = a2 =
2
||a|| so that
√
||a|| = + a · a.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 23
If a and b are parallel or antiparallel, their scalar product evaluates to ±ab respectively. In
particular, a · a = a2 .
If a and b are orthogonal,
π
a · b = ab cos = 0.
2
Thus, the scalar product of two orthogonal vectors vanishes. Conversely, a · b = 0 does not
necessarily imply either a = 0 or b = 0.
The inverse of a non-zero vector a with respect to the dot product is
a
a−1 = ,
|a|2
a
because a−1 · a = 1 = a · a−1 . We will denote by a−1 a vector like |a|2
even if it does not
occur as a factor in a dot product.
Exercise Let a and b be two non-zero non-parallel vectors. Show that
a·b
c = a− b = a − a · bb−1
|b|2
is perpendicular to b. The vector c is called the component of a perpendicular to b.
Solution Note that c , 0, otherwise a will be proportional to b, contradicting the
assumption that they are not parallel. Now check that b · c = 0.
www.TechnicalBooksPDF.com
24 An Introduction to Vectors, Vector Operators and Vector Analysis
v = vx î + vy ĵ + vz k̂. (1.6)
Dotting both sides with î, ĵ, k̂ successively and using orthonormality of the basis (Eq.
(1.5)), we get,
vs = v · n̂ ; s = x, y, z ; n̂ = î, ĵ, k̂. (1.7)
α1 e1 + α2 e2 + · · · + αn en = 0
Direction cosines
Given a vector v and an orthonormal basis î1 , î2 , î3 we define the quantities
v · îk v
ξk = = k = v̂ · îk , k = 1, 2, 3.
||v|| ||v||
If α1 , α2 , α3 are the angles made by the direction of v with î1 , î2 , î3 respectively, then
ξk = cos αk , k = 1, 2, 3.
ξk are called the direction cosines of the vector v with respect to the orthonormal basis
î1 , î2 , î3 . Direction cosines unambiguously specify the direction of a non-zero vector. In
particular, two or more vectors having the same direction cosines with respect to some
orthonormal basis have the same directions. The only vector with all the direction cosines
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 25
zero and hence having no direction, is the zero vector. Note that the coordinates of a unit
vector are its direction cosines:
where α1 , α2 , α3 are the angles made by v̂ with î1 , î2 , î3 respectively, or with the positive
directions of the x, y, z-axes respectively.
Distributive property
The dot product is distributive, that is,
(α1 a1 + α2 a2 ) · b = α1 a1 · b + α2 a2 · b.
This is seen from Fig. 1.16 where the projection of (α1 a1 + α2 a2 ) on b equals the sum of
the projections of α1 a1 and α2 a2 on b. From Fig. 1.16 we get,
α1 a1 · b + α2 a2 · b = (α1 a1 · b̂ + α2 a2 · b̂)|b|
= ((α1 a1 + α2 a2 ) · b̂)|b|
= (α1 a1 + α2 a2 ) · b
We can use the distributive property of the dot product to express it in terms of the
coordinates of the factors with respect to an orthogonal Cartesian coordinate system.
Thus, let {x1 , x2 , x3 } and {y1 , y2 , y3 } be the coordinates of vectors a and b with respect to
an orthogonal Cartesian coordinate system. Then we have,
= x1 x2 + y1 y2 + z1 z2
www.TechnicalBooksPDF.com
26 An Introduction to Vectors, Vector Operators and Vector Analysis
where we have used the distributive property and Eq. (1.5), that is, orthonormality of the
basis. This is the desired result. Note that for unit vectors â and b̂, we can replace the LHS
of this equation by cos θ, θ being the angle between â and b̂, and their coordinates by their
respective direction cosines, say, (λ1 , µ1 , ν1 ) and (λ2 , µ2 , ν2 ). Thus, we get
cos θ = λ1 λ2 + µ1 µ2 + ν1 ν2 .
This equation expresses the well known relation in Solid Geometry that the cosine of the
angle between two straight lines equals the sum of the products of the pairs of cosines
of the angles made by the straight lines with each of the three (mutually perpendicular)
coordinate axes.
Exercise (law of cosines) Consider triangle ABC. We denote by A, B, C the angles
subtended at the vertices A, B, C respectively. Let a, b, c be the lengths of the sides opposite
to the vertices A, B, C respectively (see Fig. 1.10). Show that
c2 = a2 + b2 − 2ab cos C.
each equal to the square of the radius and q = −p. Consequently, (see Fig. 1.17),
(r − p) · (r − q) = (r − p) · (r + p) = |r|2 − |p|2 = 0.
Polar coordinates
Let us find a way to get the coordinates vx = v · î, vy = v · ĵ, vz = v · k̂ of a vector v based
at the origin of a dextral rectangular Cartesian coordinate system. You have to refer to
Fig. 1.18 to understand whatever is said until Eq. (1.8). Let v make angle θ with the positive
direction of the z axis. this angle is called the polar angle. Take the projection of v on the
x − y plane and call the resulting vector vp . Let the angle made by vp with the positive
direction of the x axis be φ. This angle is called the azimuthal angle. The magnitude vp
of vp is v cos( π2 − θ ) = v sin θ. Project vp on x axis to get vx = vp cos φ = v sin θ cos φ.
Project vp on y axis to get vy = vp cos( π2 − φ) = vp sin φ = v sin θ sin φ. Now project v
on z axis to get vz = v cos θ. Thus the equation,
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 27
Fig. 1.17 Lines joining a point on a sphere with two diametrically opposite points are
perpendicular
v = vx î + vy ĵ + vz k̂
can be written as
If we use in Eq. (1.8) the unit vector v̂ specifying the direction of v we get
www.TechnicalBooksPDF.com
28 An Introduction to Vectors, Vector Operators and Vector Analysis
since |v̂| = 1. This equation tells us that a direction in space is completely specified fixing
the values of two parameters, namely, the polar angle θ and the azimuthal angle φ.
Exercise Show that all points on the unit sphere centered at the origin are scanned by
varying 0 ≤ θ ≤ π and 0 ≤ φ < 2π.
Solution Variation of φ over its range for a fixed value of θ traces out a circle on the unit
sphere. As θ is varied over its range, this circle, starting from the north pole, moves over
the whole sphere to reach the south pole.
This exercise shows that all directions passing through a point are spanned as θ and φ vary
over their ranges.
Cauchy–Schwarz inequality
In R3 Cauchy–Schwarz inequality is almost obvious. For any two vectors x and y we have
|x · y| ≤ ||x|| ||y||,
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 29
Properties (i), (ii) and (iv) are obvious from the definition of d (x, y). We need to prove
property (iii). Here is the proof.
We observe that
||x + y||2 = (x + y) · (x + y)
= ||x||2 + 2x · y + ||y||2
The last expression can now be tamed using Schwarz inequality, so that
Hint These results follow directly from the previous exercise in which you proved law of
cosines.
Exercise Show that (a) |a + b| ≤ |a| + |b| and (b) |a − b| ≥ |a| − |b|.
Solution Part (a) is simply the statement of triangle inequality for the triangle formed
by the vectors a, b, a + b. To get (b), we write |a| = |(a − b) + b| and apply (a) to get
|a − b| + |b| ≥ |a|, or, |a − b| ≥ |a| − |b|. If |a| > |b|, (b) follows. Otherwise interchange a
and b.
www.TechnicalBooksPDF.com
30 An Introduction to Vectors, Vector Operators and Vector Analysis
A distance function obeying conditions (i)–(iv) above is called a metric. The distance
between two vectors, as we have defined via Eq. (1.9), is called the Euclidean metric. In
3-D space, it follows from its definition that the curve with minimum Euclidean distance
joining two points is a straight line. Given a smooth surface in 3-D space (see
section 10.12), the curve with ‘shortest distance’ joining two points on the surface is
constrained to lie wholly on the surface. This restriction does not allow, in general, the
curve with the shortest distance on a surface to be a straight line. However, given a smooth
surface S, we can find a unique curve with shortest distance joining two distinct points on
the surface, called a geodesic on S. Thus, if we stretch a thread between two points on a
sphere S then this thread will lie along a great circle joining these two points and this is a
geodesic on the sphere.
W = (F cos θ )d = Fd cos θ = F · d.
Fig. 1.20 Work done on an object as it is displaced by d under the action of force F
Exercise A horse tows a barge along a two-path walking at 1 m/s. The tension in the rope
is 300N and the angle between the rope and the walk direction is 30◦ . How much work is
done by the horse per second? (That is, find the power produced by the horse).
When the work is done on the object, its energy is increased. This energy may be kinetic
(if the object accelerates) or potential (e.g., energy stored due to the change of position) or
it may be dissipated while doing work against frictional (dissipative) forces. Thus energy,
in whatever form, is a scalar quantity. In many cases the potential energy is written as the
scalar product of vector quantities. Examples are the potential energy of an electric dipole
p in an electric field E, (see Fig. 1.21)
V = −p · E
V = −µ · B.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 31
W = 2FL/2 = BiL2
www.TechnicalBooksPDF.com
32 An Introduction to Vectors, Vector Operators and Vector Analysis
Alternatively, we can use the expression of the potential energy involving the magnetic
moment µ. We refer to Fig. 1.22(c). The change in the potential energy V is related to the
work done on the loop by
The magnitude of the magnetic moment µ is given by µ = iA where A is the area of the loop
and its direction is perpendicular to the loop as shown in Fig. 1.22(c). µ starts perpendicular
to B and finishes parallel to B. This gives V (initial) = 0 and we have
W = 0 − (−iAB) = BiL2
Since both the expressions for W agree, we have better confidence in the formula
V = −µ · B.
b × a = −a × b,
because if we rotate a right handed screw from b to a it advances in the direction opposite
to that in which it advances when rotated from a to b.
Fig. 1.23 Vector product of a and b : |a × b| = |a||b| sin θ is the area of the
parallelogram as shown
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 33
shows that a × b = 0 if and only if a , 0 and b , 0 are proportional to each other, that is,
are linearly dependent.
The vector product is not associative as can be seen from
(a × a) × b , a × (a × b)
with a , 0, b , 0, as the LHS is always zero, while RHS is never zero unless b = αa. RHS
is a vector in the plane of a, b with magnitude a2 b sin θ (θ : angle between a and b).
The vector product is distributive, that is,
a × (b + c) = a × b + a × c,
and (a + b) × c = a × c + b × c.
a × (b + c) = a × b + a × c
(a + b) × c = a × c + b × c
b × a = −a × b. (1.10)
Exercise Show that (a) (a·b)2 +(a×b)2 = a2 b2 and (b) (a·b)2 − (a×b)2 = a2 b2 cos 2θ
where θ is the angle between a and b. Part (a) immediately leads to Cauchy–Schwartz
inequality,
|a · b| ≤ |a| |b|
with an additional piece of information that equality holds if and only if the vectors a and
b are linearly dependent.
Exercise If a⊥ and b⊥ are the components of a and b perpendicular to a vector c then
show that (a) a × c = a⊥ × c and (b) (a + b) × c = (a⊥ + b⊥ ) × c.
Solution Note that c, a and a⊥ are coplanar with a and a⊥ on the same side of c (Draw a
figure) and a⊥ ×c and a×c have the same direction. Let θ be the angle between a and c and
let the angle between a and a⊥ be φ. Note that θ + φ = π2 . Therefore, for the magnitudes,
we get
a⊥ = a cos φ = a sin θ,
www.TechnicalBooksPDF.com
34 An Introduction to Vectors, Vector Operators and Vector Analysis
leading to |a × c| = |a⊥ × c| so that (a) is proved. To get (b), note that a⊥ + b⊥ is the
component of a + b perpendicular to c and apply (a).
Consider an orthonormal basis î, ĵ, k̂ forming a right handed coordinate system. From
the definitions of the vector product and a right handed coordinate system it immediately
follows that
î × î = ĵ × ĵ = k̂ × k̂ = 0
and
î × ĵ = −ĵ × î = k̂
ĵ × k̂ = −k̂ × ĵ = î
Note that we can obtain the second and the third equation above from the first by cyclically
permuting the vectors î, ĵ, k̂. i.e., by simultaneously changing î 7→ ĵ, ĵ 7→ k̂, k̂ 7→ î. This
useful property holds for any vector relation involving an orthonormal basis.
For a left handed coordinate system the vectors î × ĵ, ĵ × k̂, k̂ × î are in opposite direction
to the basis vectors k̂, î, ĵ respectively. Therefore, Eq. (1.11) change to
î × ĵ = −k̂
ĵ × k̂ = −î
k̂ × î = −ĵ. (1.12)
Equations (1.11) and (1.12) are often taken to be the definitions of the right handed and
the left handed coordinate systems respectively.
Exercise Prove that
a2 b2 a1 b1 a1 b1
a×b = σ̂ − σ̂ + σ̂ ,
a3 b3 1 a3 b3 2 a2 b2 3
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 35
This expression for the vector product in component form contains no easily accessible
information about the magnitude and the direction of the vector product a × b. Also, it
depends on the coordinate system used as the components of the factors change if we use
another orthonormal basis, (that is, another coordinate system). On the other hand,
expressions involving vectors (and not their components) are invariant under the change
of coordinate system and each term in them has the same value in all coordinate systems.
Thus, if we can model a physical situation or a process using vectors and expressions
involving vectors alone, we are free of the limitation of viewing the process with reference
to a particular coordinate system and of extra baggage of transforming the expressions
from one coordinate system to the other as and when required. The most important
advantage of vectors is this coordinate-free approach they offer. In this book we will
exclusively follow this coordinate-free approach, although we will spend some time with
some of the important coordinate systems.
The components of a × b with respect to an orthonormal basis î, ĵ, k̂ (and the
corresponding coordinate system) can be expressed more conveniently in the form
î ĵ k̂
ax ay az ·
bx by bz
Exercise (Law of sines) Refer to the exercise where you are asked to prove law of cosines
for a triangle ABC and Fig. 1.10. Prove Eq. (1.14).
Solution Take the vector product of c = a + b successively with vectors a,b,c to get
a × c = a × b = c × b.
Equating the magnitudes of these vectors and dividing by abc gives a relation true for any
triangle,
sin A sin B sin C
= = . (1.14)
a b c
www.TechnicalBooksPDF.com
36 An Introduction to Vectors, Vector Operators and Vector Analysis
If we reflect a vector in the origin, it is expected to change sign. A vector which changes
sign under reflection in the origin (called inversion) is called a polar vector. However, this
change of sign under inversion is not carried over to the vector product of two polar vectors.
That is, if a and b are polar vectors then their vector product does not change sign under
inversion of both a and b. Due to this property a vector product of two polar vectors is
called a pseudo vector or an axial vector.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 37
figure from which the description of its contour appears counterclockwise. This marks an
important and useful generalization of the geometrical interpretation of a vector product.
Geometric interpretation of the coordinates of the vector product
Let a ≡ (a1 , a2 , a3 ) , b ≡ (b1 , b2 , b3 ) be two non-zero vectors with a non-zero vector
product. The individual Cartesian components of the vector product (a × b) ≡
(z1 , z2 , z3 ) have a geometrical interpretation related to that of (a × b) itself (see
Fig. 1.25). We have,
a1 b1
z3 = = a1 b2 − a2 b1 . (1.15)
a2 b2
However, the right side of this equation is the magnitude of the vector product of the
vectors with Cartesian components (a1 , a2 , 0) and (b1 , b2 , 0), so that its absolute value
|z3 | = |a1 b2 − a2 b1 | must equal the area of the parallelogram spanned by these vectors.
The sign of z3 is determined by the direction of the corresponding vector product:
Whether it is in the positive or negative direction of the z-axis. Now the vectors (a1 , a2 , 0)
and (b1 , b2 , 0) are simply the projections of the vectors a and b on the xy plane. Thus, |z3 |
is the area of the parallelogram obtained by projecting the parallelogram spanned by the
vectors a and b on the xy plane. (see Fig. 1.25). Similarly, |z1 | and |z2 | are the areas of the
projections of the parallelogram spanned by the vectors a, b on the yz and xz planes
respectively. If α1 , α2 , α3 are the angles made by the direction of the vector a × b with the
positive directions of the x, y, z-axes respectively, then
www.TechnicalBooksPDF.com
38 An Introduction to Vectors, Vector Operators and Vector Analysis
M = r × F.
To get the direction of M from the vector product we must base r and F at the same point
and take θ < π. In fact, the definition of torque in terms of the vector product is completely
general. The torque about any axis, not necessarily perpendicular to the plane containing r
and F is given by the component of M in the direction of the axis.
The next important physical quantity defined by the vector product is angular
momentum. A particle of mass m moving with velocity v has the angular momentum L
about the origin given by
L = mr × v = r × p,
F = qv × B.
3 A rigid body is the one for which the distance between every pair of particles in it remains invariant throughout its motion.
Thus, there cannot be any relative motion between different parts of a rigid body and it cannot be deformed by applying
external forces. The motion of a rigid body is composed solely of its translation and rotation as a whole. Of course, an ideal
rigid body is a fiction, however, in many situations we can approximate the motion of a solid body by that of a perfect rigid
body to get the required characteristics of the actual motion.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 39
T = p × E.
This can be easily understood by taking the dipole as two charges +q and −q separated
by a small distance 2d as in Fig. 1.21. The force on each charge has a magnitude qE. The
resulting torque is given by
M = T = 2d × (qE) = 2qd × E.
T = µ × B.
a · b × c = c · a × b = b · c × a.
Thus, the scalar triple product is invariant under the cyclic permutation of its factors given
by abc ↔ a → b → c → a. For example, note that
a · a × b = b · a × a = 0. (1.16)
www.TechnicalBooksPDF.com
40 An Introduction to Vectors, Vector Operators and Vector Analysis
In fact, while keeping the cyclic order if we change the · and the × in the triple product, its
value remains the same. For example,
a · b × c = c · a × b = a × b · c,
where the last equality follows because the scalar product of two vectors is independent of
the order of the vectors. Thus, the scalar triple product depends only on the cyclic order
abc and not on the position of · and × in the product. The sign of the scalar triple product
is reversed if the cyclic order is broken by permuting two of the vectors.
Fig. 1.27 Geometric interpretation of the scalar triple product (see text)
Exercise Show that a · b × c = 0 if and only if the vectors a, b, c are coplanar, that is, are
linearly dependent.
Answer a·b×c = 0 if and only if the volume of the corresponding parallelepiped is zero,
if and only if a, b, c are coplanar.
Suppose that a, b, c are mutually orthogonal vectors forming a left handed system. Then,
the signs of a and b × c will be opposite and the value of a · b × c will be negative. The
same conclusion applies even if a, b, c are not mutually orthogonal however, b × c makes
an obtuse angle with a. In this case, the negative sign is interpreted as the negative
orientation of the volume of the parallelepiped formed by the vectors a, b, c and their
scalar triple product is said to equal the volume of their parallelepiped having negative
orientation. Thus, in general, a scalar triple product is said to equal the oriented volume of
the parallelepiped formed by its factors. The fact that the transition from a right handed to
left handed system (or vice versa) changes the sign of the scalar triple product is expressed
by saying that the scalar triple product is not a genuine scalar (whose value is invariant
under any transformation of the basis) however, a pseudo-scalar. The right handed ↔ left
handed transition can be carried out by reflecting all the basis vectors in origin. In fact a
scalar triple product changes sign under the reflection of all of its factors (which form a
basis unless its value is zero) in the origin: −a · (−b × −c) = −a · b × c.
Exercise The scalar triple product can also be geometrically interpreted as the volume
of a tetrahedron. Consider a tetrahedron OABC with one of its vertices at the origin O
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 41
(see Fig. 1.28 ). Show that its volume is given by 16 [a · (b × c)] where all the vectors are as
defined in Fig. 1.28.
1 1
= · |b × c| · |a| cos θ
3 2
1
= [a · (b × c)].
6
Exercise Let a, b, c be non-coplanar. For an arbitrary non-zero vector d show that
Hint First note that the vectors a × b, b × c, c × a are non-coplanar because their scalar
triple product is not zero. Therefore, these vectors form a basis in which an arbitrary vector
d can be expanded. The coefficients in this expansion are determined by taking its scalar
product successively with c, a and b.
Exercise Express the scalar triple product in its component form,
www.TechnicalBooksPDF.com
42 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise Show that the area of the parallelogram spanned by a, b, namely, |a × b|, can be
expressed by
2 a · a a · b
|a × b| = (a · a)(b · b) − (a · b)(b · a) = · (1.17)
b · a b · b
Hint Treat the rows and columns forming the determinants of factors as matrices, and
find the determinant of the product of matrix of one factor and the transpose of the matrix
of the other factor. This works because the determinant of the product of matrices is the
product of their determinants and the determinant of a matrix is invariant under transpose
of that matrix.
Let us now prove that the vector product is distributive. Let {a, b, c} be three arbitrary
vectors and let x̂ be an arbitrary direction. Using the fact that the scalar triple product is
invariant under the cyclic permutation of its factors, we can write
x̂ · a × (b + c) = (b + c) · (x̂ × a)
= b · (x̂ × a) + c · (x̂ × a)
= x̂ · (a × b + a × c)
4 You are now advised to read the appendix on matrices and determinants, which will be used in the rest of the book.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 43
a × (b + c) = a × b + a × c,
(a + b) × c = a × c + b × c
R = n1 a1 + n2 a2 + n3 a3
where {n1 , n2 , n3 } are integers. The whole lattice is given by the set of vectors {R}
generated by giving the triplet {n1 , n2 , n3 } all possible integer values. Note that the volume
of a primitive cell is given by the scalar triple product a1 · a2 × a3 or any of its cyclic
permutations.
Consider a set of points {R} constituting the Bravais lattice of a crystal in which a plane
wave eik·r is excited. Here, k is a wave vector and r is an arbitrary point in the crystal. We
seek to find the set of wave vectors {K} for which the plane wave excitation has the same
periodicity as the Bravais lattice of the crystal, that is,
eiK·(r+R) = eiK·r ,
which means
eiK·R = 1 (1.18)
www.TechnicalBooksPDF.com
44 An Introduction to Vectors, Vector Operators and Vector Analysis
for all {R} in the Bravais lattice. The set of vectors {K} satisfying Eq. (1.18) is called the
reciprocal lattice of the given Bravais lattice. The corresponding Bravais lattice is called the
direct lattice.
If K1 , K2 satisfy Eq. (1.18), then so do their sum and difference, which simply means
that the set of reciprocal vectors form a Bravais lattice. We show that the primitive vectors
of the reciprocal lattice are given by
a2 × a3
b1 = 2π ,
a1 · a2 × a3
a3 × a1
b2 = 2π ,
a1 · a2 × a3
a1 × a2
b3 = 2π . (1.19)
a1 · a2 × a3
k = k1 b1 + k2 b2 + k3 b3 .
R = n1 a1 + n2 a2 + n3 a3
k · R = 2π (k1 n1 + k2 n2 + k3 n3 ).
We conclude from this equation that eik·R is unity for all R, only when the coefficients
{k1 , k2 , k3 } are integers (i.e., when k · R is an integral multiple of 2π). Thus, we must have,
for a reciprocal lattice vector K,
K = k1 b1 + k2 b2 + k3 b3 ,
where {k1 , k2 , k3 } are integers. Thus, reciprocal lattice is a Bravais lattice and bi s can be
taken to be its primitive vectors. bi s form the adjacent sides of a parallelepiped which is the
primitive cell of the reciprocal lattice.
Apart from, an enormous variety of situations in which the scalar triple product makes
its appearance, it is an important tool for the development of the theory of vector operators,
as we shall see in the next chapter. Also, the scalar triple product is the basis of a new and
powerful notation for vector algebra and calculus, namely the Levi-Civita symbols (see
section 1.11).
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 45
ϕ (a ◦ b ) = ϕ (a) × ϕ (b ). (1.20)
In other words, the image of the product of a and b in S1 is the product of their images
ϕ (a) and ϕ (b ) in S2 .
Example Consider (Z, +) and ((1, −1), ·) where Z is the set of integers and + is the
usual addition on it, while · is the usual multiplication on the two element set (1, −1).
Define a map ϕ by
ϕ (n) = (−1)n ,
that is,
www.TechnicalBooksPDF.com
46 An Introduction to Vectors, Vector Operators and Vector Analysis
Let us choose an orthonormal basis {î, ĵ, k̂} based at some origin O and the
corresponding Cartesian coordinate system. We now use it to assign the Cartesian
coordinates to all points in space. This procedure assigns a unique triplet of real numbers
to every point in space. In fact the set of all triplets of real numbers can be identified with
the set of all points in space. We call this set of triplets R3 . Although, R3 stands for the set
of all real number triplets, its one to one correspondence with the set of points in real
space justifies naming real space by R3 as we have been doing until now.
We have already established a one to one correspondence between the set of vectors
(representations of values of vector quantities in space) and R3 , (for a given basis), using
the fact that every vector can be written as a unique linear combination of the basis vectors
(see sections 1.3 and 1.6). Here, we want to show something more. First, we define the
addition in R3 as
Consider two vectors a and b with coordinates {a1 , a2 , a3 } and {b1 , b2 , b3 } respectively. This
means
a = a1 î + a2 ĵ + a3 k̂
b = b1 î + b2 ĵ + b3 k̂.
Using the distributive law for the multiplication of vectors by scalars and the commutativity
and the associativity of the vector addition we can write
Thus, the coordinates of a + b are simply the addition of the coordinates of a and those
of b. Then, we have the following association.
a ↔ (a1 , a2 , a3 ),
b ↔ (b1 , b2 , b3 ),
a + b ↔ (a1 + b1 , a2 + b2 , a3 + b3 ). (1.21)
Let us define the scalar product in R3 as the product of two 1 × 3 and 3 × 1 matrices (a row
vector and a column vector),
X
[a1 a2 a3 ][b1 b2 b3 ]T = ai bi (1.22)
i
where the superscript T denotes the transpose of a matrix. Thus, we see that the
correspondence Eq. (1.21) preserves the scalar product of vectors.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 47
Thus, we have a one to one map between the two sets: The set of vectors (whose
elements are the ‘values’ of one or more vector quantities) and R3 , (whose elements are
the triplets of real numbers) which preserves the addition on individual sets in the sense of
Eq. (1.20). Thus, the one to one map defined by Eq. (1.21) is an isomorphism between
these two sets. Two isomorphic sets are algebraically identical and it is enough to study
only one of them. Even the scalar and vector products can be expressed and processed in
terms of the components of vectors, which are triplets of real numbers. So you may come
up with the idea that we can just do away with the set of vectors and do everything using
the set of triplets of real numbers, namely R3 . This will free us from dealing with vectors
altogether. A nice idea, but has the following problem. At the end of section 1.3 we saw
that the one to one correspondence between vectors and R3 depends on the origin and the
basis chosen. There is a different isomorphism for each possible origin and each possible
basis because a change in the basis/origin changes the coordinates of every vector (see
Fig. 1.4). Since there could be uncountably many origins and bases, there are uncountably
many isomorphisms possible between the set of vectors and R3 . It is then impossible to
keep track of which isomorphism being used and to transform between these. On the
other hand, the coordinate free approach, in which we directly deal with the set of vectors,
frees us from this problem of keeping track of bases and transforming between them. It
also enables us to reach conclusions that are independent of any particular basis or the
coordinate system. Thus, coordinate free approach turns out to be more fruitful in many
applications. On the other hand, an intelligent choice of the coordinate system, basically
guided by the symmetry in the problem, can drastically reduce the algebra and can
sharpen the understanding of the physics of the situation. Therefore, a judicious choice
between these methods, depending on the problem, turns out to be rewarding.
A set V and the associated set of scalars S, with the operations of addition and scalar
multiplication defined on V , which have all the properties of vector addition and scalar
multiplication as listed in section 1.4, is called a linear space. If, in addition, we define a
scalar product and the resulting metric (a distance function giving distance between every
pair of elements), then it is called a metric space. Thus, a set of vectors is a metric space
with a Euclidean metric. Let us call the 3-D space comprising all vectors (that is, all values
of one or more vector quantities) E3 . Both E3 and R3 are metric spaces with Euclidean
metric (see Eq. (1.9) and the exercise following it). If a subset of a metric space is closed
under addition, that is, the addition of every pair of vectors in the subset gives a vector
in the same subset, then such a subset is a metric space in its own right and is called a
subspace of the parent metric space. A basis in a subspace can always be extended to that
of the whole space. The dimension of a subspace is always ≤ that of the whole space. Thus,
for example, a set of planar vectors (a plane) and a set of vectors on a straight line (a straight
line) are the 2-D and 1-D subspaces of E3 (R3 ) respectively.
Since R3 and E3 are isomorphic linear spaces, they can be used interchangeably in all
contexts. However, we will basically refer to the space E3 as we intend to deal directly with
the vectors, although we will make judicious use of R3 as well (when we operate by matrices
on 3-D column vectors comprising the coordinates of vectors, see the next chapter).
www.TechnicalBooksPDF.com
48 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise Any set on which addition and scalar multiplication operations (with all the
properties stated in section 1.4) are defined, is a linear space. Show that (i) The set of real
numbers forms a one dimensional linear space where addition of “vectors” is ordinary
addition and multiplication by scalars is ordinary multiplication. (ii) The set of positive
real numbers forms a linear space where addition of vectors is ordinary multiplication and
scalar multiplication is appropriately defined.
Solution (ii) The zero vector is the real number 1. “Multiplication” of the vector a by
the scalar λ means raising a to power λ. Thus, if the addition is denoted by ⊕ and scalar
multiplication by then
λ (a ⊕ b ) = (ab )λ = aλ bλ = (λ a) ⊕ (λ b ).
Exercise Verify that the complex numbers form a two dimensional linear space where
the addition is ordinary addition and scalars are real numbers.
Note the one to one correspondence between the index set {i, j, k} and the unit vector
variables {î, ĵ, k̂}. Thus, different values for the index string ijk, drawn from the set
{1, 2, 3} uniquely decide the value of εijk by giving the corresponding values to the vector
variables î, ĵ, k̂ in Eq. (1.23), drawn from the set {1̂, 2̂, 3̂}.
Exercise Show that the number of strings of length n made up of symbols such that each
symbol in the string is drawn from a set of m symbols, is mn .
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 49
Solution Each symbol can be chosen in m independent ways, so n symbols can be chosen
in mn independent ways.
In our case, we ask for the number of strings of length 3 made out of three symbols {123}.
By the above exercise, there are totally 33 = 27 of such strings, or, in other words, εijk
are totally 27 in number, which can be explicitly constructed by giving values from the set
{1̂, 2̂, 3̂} to the variables î, ĵ, k̂ in Eq. (1.23). By Eqs (1.16) and (1.23), if any two or more
variables î, ĵ, k̂ have the same value from the set {1̂, 2̂, 3̂} then εijk = 0. In other words,
εijk = 0 whenever any two or more indices ijk have the same value.
Exercise Show that exactly 21 εijk s are zero.
Hint The number of εijk , with the indices {i, j, k} all distinct, equals the number of
permutations of (123) = 3! = 6.
When all of î, ĵ, k̂ have different values, (î , ĵ , k̂), using Eq. (1.23), εijk = ±1 depending
on whether {ijk} is a cyclic permutation of {123} or not. This follows from Eq. (1.11) and
the fact that the scalar triple product changes sign if the cyclic order of its factors is changed
(see subsection 1.8.1). Thus, ε312 = 3̂· (1̂× 2̂) = +1 while ε132 = 1̂· (3̂× 2̂) = −1. Further,
εijk is invariant under the cyclic permutation of its indices because the scalar triple product
defining it is invariant under the cyclic permutation of its factors. εijk can be viewed as a
scalar valued function of three vector variables {î, ĵ, k̂} defined on the set {1̂, 2̂, 3̂}. When
we write all 27 values of εijk as a three dimensional (3 × 3 × 3) array, each element having
three indices, we call it a tensor. εijk is an antisymmetric tensor because all its non-zero
elements change sign under the exchange of two of their indices.
Incidentally, any two of the vector variables say î, ĵ can be used to give an operative
definition of the Kronecker delta symbol δij as
δij = î · ĵ.
This is because whenever î and ĵ pick up different values from the orthonormal set {1̂, 2̂, 3̂},
î · ĵ vanishes, while whenever î and ĵ have the same value î · ĵ is unity. Using this definition
we immediately see that
δji = δij .
Also, we have,
3
X
δij δjk = δi1 δ1k + δi2 δ2k + δi3 δ3k = δik (1.24)
j =1
The last equality follows because the sum in the middle is unity when i and k have the same
value out of {1, 2, 3}, while it vanishes if i and k have different values.
www.TechnicalBooksPDF.com
50 An Introduction to Vectors, Vector Operators and Vector Analysis
We will now prove an identity involving Levi-Civita symbols and some of the special
cases of this identity which turn out to be very useful in getting vector identities (see the
next section) and also in the development of vector calculus. This is
δil δim δin
εijk εlmn = δjl δjm δjn ,
δkl δkm δkn
where the elements of the determinant on the right are the Kronecker deltas we already
know. Here, the equality means that the action of the LHS on an expression depending on
the indices {ijk} and {lmn}, (taking values in {1, 2, 3}), is the same as that of the
determinant expression involving Kronecker deltas on the RHS. This gives a powerful way
to simplify the expressions involving the products of Levi-Civita symbols.
To prove this identity, we first note that the indices {ijk} and {lmn} correspond to two
sets of vector variables {î, ĵ, k̂} and {l̂, m̂, n̂} respectively, both taking values in the
orthonormal basis set {1̂, 2̂, 3̂}. As shown in Fig. 1.29 we refer to another orthonormal
basis {σ̂ 1 , σ̂ 2 , σ̂ 3 }. By Eq. (1.23) and the determinant giving scalar triple product we can
write
î · σ̂ 1 î · σ̂ 2 î · σ̂ 3
εijk = î · (ĵ × k̂) = ĵ · σ̂ 1 ĵ · σ̂ 2 ĵ · σ̂ 3 = |A| say
k̂ · σ̂ 1 k̂ · σ̂ 2 k̂ · σ̂ 3
and
l̂ · σ̂ 1 l̂ · σ̂ 2 l̂ · σ̂ 3
εlmn = l̂ · (m̂ × n̂) = m̂ · σ̂ 1 m̂ · σ̂ 2 m̂ · σ̂ 3 = |B| say.
n̂ · σ̂ 1 n̂ · σ̂ 2 n̂ · σ̂ 3
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 51
Here, |A| and |B| are the determinants of the corresponding matrices. Using the fact
that the determinant of a matrix is the same as that of its transpose and that the product of
the determinants of two matrices is the determinant of their product, (see Appendix A),
we get,
î · l̂ î · m̂ î · n̂
εijk εlmn = |A| · |B| = |A| · |BT | = |ABT | = ĵ · l̂ ĵ · m̂ ĵ · n̂
k̂ · l̂ k̂ · m̂ k̂ · n̂
To understand the last equality, note that a typical element of ABT is (see Fig. 1.29)
(î · σ̂ 1 )(l̂ · σ̂ 1 ) + (î · σ̂ 2 )(l̂ · σ̂ 2 ) + (î · σ̂ 3 )(l̂ · σ̂ 3 ) = îx l̂x + îy l̂y + îz l̂z = î · l̂.
Since the variables {î, ĵ, k̂} and {l̂, m̂, n̂} take values in the orthonormal basis set {1̂, 2̂, 3̂},
we have î · l̂ = δil etc, giving us the desired identity.
Before proceeding further, we need to introduce a convention, called Einstein
summation convention, regarding the sum over a term in an expression whose terms
depend on some index set, say {i, j, k}. As per this convention, a term in which an index
say i is repeated is to be summed over that index. Thus, for example, εijk εilm =
P3
i =1 εijk εilm , a sum in which at most one term survives. Also,
3
X
δkk = δkk = δ11 + δ22 + δ33 = 3.
k =1
Henceforth, in this book, whenever applicable, Einstein summation convention will always
be assumed to apply, unless stated otherwise. So you will have to be alert about this.
We can now obtain some special cases of the result we just proved. Thus, the
determinant for εijk εilm can be obtained from that for εijk εlmn by replacing l̂ by î. Since
all the indices {i, j, k} must be different, (otherwise εijk = 0), we must have î · î = 1 and
ĵ · î = 0 = k̂ · î. Substituting these values in the determinant and evaluating it we get
Next, consider
Here, we have used δjj = 3 and δkj δjl = δkl which we proved above (see Eq. (1.24)).
Finally, we have,
www.TechnicalBooksPDF.com
52 An Introduction to Vectors, Vector Operators and Vector Analysis
Let us try and express the vector product in terms of the Levi-Civita symbols. Using
Eqs (1.7) and (1.13) we can express the ith component of a × b as
In the last term, a sum over indices j = 1, 2, 3 and k = 1, 2, 3 is implied, which is a sum of
nine terms. However, seven out of these nine terms vanish, because the corresponding εijk
vanish due to repeated indices, so that only two terms survive. Thus,
We have,
= εkij εklm aj bl cm
= (a · c)bi − (a · b)ci .
Thus, the ith components (i = 1, 2, 3) of both the sides are equal, which proves the
identity. This identity tells us that the vector product of a polar and an axial vector equals
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 53
the difference of two polar vectors and hence is itself a polar vector. By permuting a, b, c
in cyclic order in the identity a × (b × c) = (a · c)b − (a · b)c we get two more identities,
b × (c × a) = (a · b)c − (b · c)a
c × (a × b) = (b · c)a − (c · a)b.
a × (b × c) + c × (a × b) + b × (c × a) = 0.
We have,
(a × b) × (c × d) = (a · c × d)b − (b · c × d)a
= (a · b × d)c − (a · b × c)d.
Throughout the remaining text, all these identities will be used very frequently. We
recommend that you practice these identities by using them in as large variety of problems
as possible. For future convenience we list these identities once again, separately. In the
remaining part of the book we will refer to these identities by their Roman serial numbers
in this list.
www.TechnicalBooksPDF.com
54 An Introduction to Vectors, Vector Operators and Vector Analysis
(a × b) · (c × d) × (e × f) = [abd][cef] − [abc][def],
= [abc][fcd] − [abf][ecd],
= [cda][bef] − [cdb][aef],
where we have used Grassmann notation for the scalar triple product.
λx + µa = αx + βb
(ii) The vector equation λx + µa = νb where λ , 0, µ, ν are given constant scalars and
a, b are constant vectors has a unique solution x = λ1 (νb − µa).
The fact that this equation admits a solution can be trivially checked. We have to
subtract µa on both sides and then divide by λ on both sides to get the given solution.
Properties of vector addition and scalar multiplication allow these operations. Next
we can substitute the given solution for x in the equation and check that it satisfies
the equation. Thus the given solution is a solution of the given equation. To see that
this solution is unique, assume two solutions x1 and x2 , substitute in the equation
and equate the two resulting expressions to show that x1 = x2 .
(iii) λa + µb = c to be solved for two unknown scalars λ, µ where all the three vectors are
given constant non-zero vectors.
Taking cross product by b on both sides of the equation from right we get
λa × b = c × b.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 55
|c × b|2
λ= ,
(a × b) · (c × b)
assuming that the pairs a, b and also b, c are not parallel to each other. If a and b are
parallel, we have a = νb and the equation reduces to (λν + µ)b = c. This shows that
b and c are also parallel, hence there are infinite number of solutions of λ and µ. To
get µ, we take cross product by a on both sides of the equation and proceed exactly as
before. The result is
|c × a|2
µ= .
(b × a) · (c × a)
(iv) The equation x · a = λ where λ is a known scalar and a is a known non-zero vector.
We rewrite the equation as
x · a = λa−1 · a, or (x − λa−1 ) · a = 0,
x − λa−1 = a × b,
x = λa−1 + a × b
x = λa + µb + ν (a × b)
(with λ, µ, ν scalars), in the equation. We get, after some algebra and using a · b = 0,
µ(b × a) + [ν (a · a) − 1]b = 0.
Since the vectors b × a and b are linearly independent, both the coefficients must
vanish separately, giving
µ = 0 and ν|a|2 − 1 = 0
www.TechnicalBooksPDF.com
56 An Introduction to Vectors, Vector Operators and Vector Analysis
1
which means ν = |a|2
and leads to
1
x = λa + 2
(a × b) = λa + (a−1 × b),
|a|
which satisfies the given equation irrespective of the value of the scalar λ.
(vi) The equations x · a = λ and x × b = c where a, b, c are given vectors with a, b non-
orthogonal (a · b , 0,) uniquely determine vector x.
Crossing the second equation on the left by a we get
a × (x × b) = a × c,
(a · b)x − (a · x)b = a × c
or,
1
x= (λb + a × c)
a·b
which satisfies both the equations.
To get the uniqueness, suppose that two vectors x1 , x2 satisfy the given equations.
This leads to
Therefore, the vector a is perpendicular to and vector b is parallel to the vector x1 −x2 .
This makes vectors a and b mutually orthogonal, contradicting the assumption that
they are not. Thus, we must require
x1 − x2 = 0 or, x1 = x2 .
a · y = α, (α , 0).
a×y = b
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 57
Solution Since
a · b = a · (a × y) = 0,
a × (b × a−1 ) = b − (a · b)a−1 = b.
Thus,
y = b × a−1
y = b × a−1 + λa
is also a solution.
Exercise Show that a vector is uniquely determined if its dot product with three non-
coplanar vectors are known.
Hint Expand the vector in the basis comprising the given three non-coplanar vectors.
Exercise The resultant of two vectors is equal in magnitude to one of them and is
perpendicular to it. Find the other vector.
Hint Let a + b = c with |a| = |c| = λ say and let |b| = µ. Also, a · c = 0. (Draw a figure).
Take the unit vectors along a and c as the orthonormal basis. Express a, b and c in terms of
this basis and use the first equation. Find b in terms of the angle θ it makes with c and its
magnitude λ.
√
Answer θ = π4 , µ = 2λ. You can get this answer just by drawing the figure.
www.TechnicalBooksPDF.com
58 An Introduction to Vectors, Vector Operators and Vector Analysis
points O and P . This line is the r axis and distance OP is the r coordinate of P . Note that
r is always non-negative, r ≥ 0. Let r̂ be the unit vector based at P and pointing away
from O along the r coordinate line. Now draw the circle of radius r with center at O and
lying in the plane defined by the unit vectors r̂ and k̂. As we go along this circle, only the
polar angle θ, namely the angle between r̂ and k̂ (which defines the positive direction of
the z axis), changes, while r and the third coordinate φ (see below) do not change. This
is the θ coordinate line, which is actually a circle of radius r. Now, draw a circle in the
plane parallel to the x − y plane passing through P , with its center on the z axis and with
radius r sin θ. (see Fig. 1.30 to check that this circle passes through P ). We can measure the
angular coordinate of a point on this circle, say φ, as the angle made by the radius of this
circle passing through that point with î which defines the positive direction of the x axis
(the azimuthal angle). As we go along this circle, only the coordinate φ changes, while the
other two, r and θ do not. This is the φ coordinate line, again a circle. Every point in R3
corresponds to a unique triplet of values of the (r, θ, φ) coordinates. Now draw the unit
vectors, θ̂ and φ̂ tangent (at P ) to the θ circle and φ circle respectively, so that the triplet
(r̂,θ̂,φ̂) forms a right handed system. Note that different points in space have different
triplet of vectors (r̂,θ̂,φ̂). We cannot express every vector as the linear combination of the
vectors from the same triplet. A vector like α θ̂, (α a scalar) which will appear in such a
linear combination, is a vector of length α and tangent (at P ) to the θ circle. However, the
change in the θ coordinate corresponds to the angular advance of the vector r = OP ~ along
the θ circle and not along a vector tangent at P to this circle. The vector r = OP ~ equals r r̂
where r̂ belongs to the triplet (r̂,θ̂,φ̂) defined at P and r is the magnitude of r, or the length
of the vector OP~ .
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 59
To find the relation between the Cartesian (x, y, z) and spherical polar (r, θ, φ)
coordinates, replace v by r̂ (magnitude r = 1) in Eq. (1.8). We get,
x = r sin θ cos φ,
y = r sin θ sin φ,
z = r cos θ. (1.27)
Exercise Convince yourself that the polar coordinates of all the vectors in R3 lie within
0 ≤ r < ∞, 0 ≤ θ ≤ π and 0 ≤ φ < 2π. (use Fig. 1.30)
Exercise Show that
and
x2 + y 2 + z2 = r 2 (spheres, r = constant).
From Eq. (1.27) we see that θ-coordinate surface is generated by all points whose (x, y, z)
coordinates satisfy
www.TechnicalBooksPDF.com
60 An Introduction to Vectors, Vector Operators and Vector Analysis
which are circular cones θ = constant. The φ coordinate surfaces are generated by all
points whose (x, y, z) coordinates satisfy
tan φ = y/x
and z is arbitrary. These are half planes, that is, the planes which terminate at the z axis,
because the other half plane, on the other side of the z axis corresponds to π + φ. All these
coordinate surfaces are depicted in Fig. 1.31.
Given any point in space, with coordinates R, θ0 , φ0 , the coordinate surfaces r = R,
θ = θ0 and φ = φ0 pass through that point. The φ coordinate line is the intersection of
the r = R, θ = θ0 surfaces and lies in a plane parallel to xy plane while θ coordinate line
is the intersection of r = R and φ = φ0 surfaces and lies in a plane normal to xy plane.
Therefore, the vectors θ̂,φ̂ tangent to these curves must be mutually perpendicular. The
plane containing these two vectors is tangent to the sphere r = R at the given point, so that
the unit vector r̂ must be normal to both θ̂,φ̂. Thus, the vectors r̂,θ̂,φ̂ form an orthonormal
basis. Such a system is called an orthogonal curvilinear coordinate system.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 61
has parabolic coordinates denoted by (µ, ν, φ), (µ, ν ≥ 0, 0 ≤ φ < 2π). These two sets of
coordinates are related by
x = µν cos φ,
y = µν sin φ,
1 2
z = µ − ν2 . (1.28)
2
These equations have all the information regarding the geometry of the parabolic
coordinate system. To get it, we first identify the coordinate φ with the azimuthal angle
defined above in the context of polar coordinates. Then, the first two of Eq. (1.28) tell us
that the φ coordinate line is a circle of radius µν, passing through the given point and the
corresponding basis vector φ̂ must be tangent to this circle at the given point. The φ
coordinate plane passes through the z axis, making an angle φ with the positive direction
of the x axis.
To get the coordinate lines for µ and ν, we first fix the azimuthal angle φ = 0. This
mean we choose the xz or y = 0 plane to see the variations of µ and ν. We assume that
the given point lies in the y = 0 plane. We now give some constant value to ν, say ν = ν0 .
With φ = 0 and ν = ν0 the first of Eq. (1.28) gives µ = x/ν0 and the third of Eq. (1.28)
becomes
1 x2
!
2
z= − ν0 . (1.29)
2 ν02
This is a parabola flattened by dividing each value of x2 by the constant 2ν02 and shifted
downwards from the origin by 12 ν02 . By choosing the value of ν0 properly, we can make
this parabola pass through the given point giving ν0 as the value of its ν coordinate. This
parabola is the coordinate line for µ, because only µ varies on it, while both ν = ν0 and
φ = 0 are constants. To get the coordinate line for ν, we make µ to be a constant µ = µ0
so that the third of Eq. (1.28) becomes
1 2 x2
!
z= µ − . (1.30)
2 0 µ20
This is an inverted parabola flattened by the division by the constant 2µ20 and shifted
upwards from the origin by 12 µ20 . By suitably choosing µ0 , we can make this parabola pass
through the given point, making chosen µ0 to be the value of its µ coordinate. This
parabola is the coordinate line for ν, on which only ν varies, while µ = µ0 and φ = 0 are
constants.
Let us now show that these parabolas intersect normally at the given point. Let dr1 and
dr2 be the differential displacements along the ν = ν0 and µ = µ0 parabolas respectively.
www.TechnicalBooksPDF.com
62 An Introduction to Vectors, Vector Operators and Vector Analysis
x02
!
2
= dx 1 −
µ20 ν02
= 0 (1.31)
where we have taken dx1 = dx = dx2 , differentiated Eqs (1.29) and (1.30) to get dz1 and
dz2 and used the fact that by the first of Eq. (1.28) x02 = µ20 ν02 for φ = 0. Geometrically, this
means that the tangent vectors to the two parabolas at the intersection point are orthogonal
to each other. Since the tangent vector to the φ coordinate circle at the given point is normal
to the y = 0 plane it is normal to the tangent vectors to the two parabolas. Thus, the basis
vectors for the parabolic coordinate system form an orthonormal triad (µ̂, ν̂, φ̂) which are
the tangent vectors to the three coordinate lines at the given point such that they form a
right handed system (Fig. 1.33).
If we change the azimuthal angle φ from zero, the y = 0 plane rotates through the same
angle, without changing the µ and ν parabolas in any way. This completes the construction
of the parabolic coordinate system.
Note, again, that the basis triad (µ̂, ν̂, φ̂) changes from point to point. Therefore, for
the same reasons as explained in the case of polar coordinates, every vector cannot be
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 63
expanded in terms of the same basis triad. To get the coordinate surfaces say for constant
µ (µ = µ0 ) we note that for an arbitrary value of φ say φ = φ0 , the first two of Eq. (1.28)
give x2 + y 2 = µ2 ν 2 and hence the equation for the parabola with µ = µ0 in the plane
corresponding to φ = φ0 is obtained by replacing x2 in Eq. (1.29) by x2 +y 2 . This equation
is independent of φ and hence applies to every value of φ. Thus, all the points (x, y, z)
satisfying
1 2 x2 + y 2
!
z= µ −
2 0 µ20
Fig. 1.33 Parabolic coordinates (µ, ν, φ). Coordinate surfaces are paraboloids of
revolution (µ = constant, ν = constant) and half-planes (φ = constant)
or,
x2 + y 2 = µ20 µ20 − 2z
for constant µ = µ0 lie on the paraboloid of revolution revolving the parabola (that is,
covering all values of φ) about the z axis. On this surface µ = µ0 and (ν, φ) can take all
possible values. The surface for constant φ = φ0 is a half plane, that is, the plane
terminating at the z axis, because the half plane on the other side of the z axis corresponds
to φ = π + φ0 . Thus, the families of coordinate surfaces are given by
x2 + y 2 = µ20 µ20 − 2z (1.32)
www.TechnicalBooksPDF.com
64 An Introduction to Vectors, Vector Operators and Vector Analysis
x = ρ cos φ,
y = ρ sin φ,
z = z, (1.35)
Fig. 1.34 Cylindrical coordinates (ρ, φ, z ). Coordinate surfaces are circular cylinders
(ρ = constant), half-planes (φ = constant) intersecting on the z-axis, and
parallel planes (z = constant)
x2 + y 2 = ρ2
tan φ = y/x
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 65
z = constant
(planes).
Exercise Find the coordinate lines and the coordinate surfaces for the prolate spheroidal
coordinates (0 ≤ η < ∞, 0 ≤ θ ≤ π, 0 ≤ φ < 2π) given by (see Fig. 1.35)
Fig. 1.35 Prolate spheroidal coordinates (η, θ, φ). Coordinate surfaces are prolate
spheroids (η = constant), hyperboloids (θ = constant), and half-planes
(φ = constant)
where the coordinate surfaces are
x2 y2 z2
+ + =1
a2 sinh2 η a2 sinh2 η a2 cosh2 η
−x2 y2 z2
− + =1
a2 sin2 θ a2 sin2 θ a2 cos2 θ
www.TechnicalBooksPDF.com
66 An Introduction to Vectors, Vector Operators and Vector Analysis
tan φ = y/x
Fig. 1.36 Oblate spheroidal coordinates (η, θ, φ). Coordinate surfaces are oblate
spheroids (η = constant), hyperboloids (θ = constant), and half-planes
(φ = constant)
where the coordinate surfaces are
x2 y2 z2
+ + =1
a2 cosh2 η a2 cosh2 η a2 sinh2 η
x2 y2 z2
+ − =1
a2 sin2 θ a2 sin2 θ a2 cos2 θ
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 67
www.TechnicalBooksPDF.com
68 An Introduction to Vectors, Vector Operators and Vector Analysis
Fig. 1.37 (a) Positively and (b) negatively oriented triplets (a, b, c), (c) Triplet (b, a, c)
has orientation opposite to that of (a, b, c) in (a)
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 69
We shall now show that the necessary and sufficient condition for a triplet (a, b, c) to
be positively oriented is that c · (a × b) or any of its cyclic permutations exceeds zero.
Suppose (a, b, c) are positively oriented. Then from the definitions of the positive
orientation and the vector product we see that both (a × b) and c are on the same side of
the (a, b) plane. This implies that the angle between (a × b) and c is less than π/2 which
means c · (a × b) > 0.
Suppose c · (a × b) > 0. This means the angle between (a × b) and c is less than π/2,
or, (a × b) and c are on the same side of the (a, b) plane, or the rotation from a toward b
advances a right handed screw on the same side of the (a, b) plane to which c points. In
other words, (a, b, c) are positively oriented.
Since the scalar triple product is invariant under cyclic permutations of factors, above
proof applies to all cyclic permutations of c · (a × b). Thus, we can conclude that the
orientation of (a, b, c) is invariant under the cyclic permutation of (a, b, c).
Triplets (a, b, c) and (d, e, f) are oriented (mutually) positively (negatively) with
respect to each other if they have the same (opposite) orientations. In particular, (a, b, c)
is oriented positively (negatively) with respect to an orthonormal basis (ê1 , ê2 , ê3 ) or
the corresponding coordinate axes (x, y, z ) if (a, b, c) and (ê1 , ê2 , ê3 ) have the same
(opposite) orientations. Whether a given triplet (a, b, c) is oriented positively or
negatively with respect to an orthonormal basis (ê1 , ê2 , ê3 ) is decided, respectively, by the
positive or negative sign of
a1 a2 a3
det(a, b, c) = b1 b2 b3 , (1.38)
c1 c2 c3
where each row consists of the components of the corresponding vector with respect to
the orthonormal basis (ê1 , ê2 , ê3 ) (see the second exercise on page 39). Exchanging the
first two columns of this determinant amounts to exchanging x, y axes or changing over to
a coordinate system with different handedness. This changes the sign of the determinant,
so that orientation of (a, b, c) with respect to the new coordinate system becomes
opposite to that with respect to the previous one. Thus, the sign of the determinant
comprising the components of a given triplet of vectors (a, b, c) decides the orientation of
(a, b, c) with respect to the corresponding orthonormal basis (ê1 , ê2 , ê3 ), or, as
sometimes said, with respect to the (x, y, z ) coordinates or axes. Thus, the sign of the
determinant in Eq. (1.38) does not have a geometrical meaning independent of a
coordinate system. However, a statement like ‘two non-coplanar ordered triplets have the
same or the opposite orientation’ has a coordinate free geometrical meaning.
Consider two ordered triplets of non-coplanar vectors a1 , a2 , a3 and b1 , b2 , b3 . The two
sets have the same orientation, that is, are both positively or both negatively oriented with
respect to a common coordinate system (x1 , x2 , x3 ) if and only if the condition
www.TechnicalBooksPDF.com
70 An Introduction to Vectors, Vector Operators and Vector Analysis
is satisfied. Using identity (A.31), we can write this condition in the form
where the symbol on the left denotes a function of six vector variables defined by
a1 · b1 a1 · b2 a1 · b3
[a1 , a2 , a3 ; b1 , b2 , b3 ] = a2 · b1 a2 · b2 a2 · b3 · (1.40)
a3 · b1 a3 · b2 a3 · b3
The last three equations are meaningful even if we do not assign a numeric value to the
individual orientation Ω. Equation (1.43) associates a value ±1 to the ratio of two
orientations, while Eqs (1.41) and (1.42) express equality or inequality of orientations. It is
possible to specify two possible orientations of triplets of vectors completely by assigning
numerical values say Ω = ±1 to these orientations by arbitrarily choosing the standard
value +1 for the orientation of the basis vectors (ê1 , ê2 , ê3 ) defining the coordinate
system. Such a situation arises in science and engineering in the context of every
measurable quantity. For example, equality of distances between points in space or even
the ratio of distances have meaning even if no numerical values are assigned to the
individual distances. It is of course possible to assign numerical values to individual
distances such that the ratio of distances equals the ratio of the corresponding real
numbers. This requires an arbitrary selection of a “standard distance” or a unit of distance
to which all other distances are referred. Thus Eq. (1.41) is analogus to saying that
distances between two pairs of points are equal without giving them specific values.
The triplet a1 , a2 , a3 is oriented positively or negatively with respect to (x1 , x2 , x3 )
coordinates according to whether they are oriented positively or negatively with respect to
the corresponding orthonormal basis (ê1 , ê2 , ê3 ), that is, whether
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 71
or
Sometimes, we denote the orientation of the coordinate system Ω(ê1 , ê2 , ê3 ) by
Ω(x1 , x2 , x3 ). Since the value of the determinant in Eq. (1.38) gives the signed volume of
the parallelepiped spanned by a triplet of linearly independent vectors, for two such
triplets of vectors we have,
[a1 , a2 , a3 ; b1 , b2 , b3 ] = 1 2 V1 V2 (1.46)
where V1 and V2 are, respectively the volumes of the parallelepipeds spanned by the two
triplets and the factors 1 , 2 depend on their orientations with respect to the basis
(ê1 , ê2 , ê3 ) defining the coordinate system:
1 2 = sgn[a1 , a2 , a3 ; b1 , b2 , b3 ] (1.48)
is independent of the choice of the coordinate system and has the value +1 if the
parallelepipeds have the same orientation but −1 if they have the opposite orientations. If
the two triplets refer to two different coordinate systems with the orthonormal bases
(ê1 , ê2 , ê3 ) and (ĥ1 , ĥ2 , ĥ3 ) then,
and the relative orientation of the two triplets, independent of the coordinate systems is
given by
1 2 µ = sgn[a1 , a2 , a3 ; b1 , b2 , b3 ] (1.50)
and
www.TechnicalBooksPDF.com
72 An Introduction to Vectors, Vector Operators and Vector Analysis
However, if it is possible to choose the two coordinate systems which are positively
oriented with respect to each other, so as to ensure µ = +1, then Eq. (1.48) applies, which
decides the relative orientation of the two triplets.
Our method of deciding the orientation of ordered sets of vectors by the sign of their
determinants can be applied to the doublets of non-collinear vectors spanning a plane. We
just have to find out
a1 · b1 a1 · b2
[a1 , a2 ; b1 , b2 ] = · (1.52)
a2 · b1 a2 · b2
decides whether the two doublets (a1 , a2 ) and (b1 , b2 ) have the same or opposite
orientations.
Exercise Let ê1 , ê2 be an orthonormal basis in a plane. Show that the doublets ê1 , ê2 and
ê2 , ê1 have opposite orientations.
Solution We have
ê1 · ê2 ê1 · ê1
[ê1 , ê2 ; ê2 , ê1 ] = = −1, (1.54)
ê2 · ê2 ê2 · ê1
so that,
Any two linearly independent vectors (a1 , a2 ) in the plane are oriented positively if
Thus, all doublets positively oriented with respect to the basis (ê1 , ê2 ) are positively
oriented with respect to π∗ .
An oriented plane π∗ can be characterized by a distinguished positive sense of rotation.
If a pair of vectors a, b is oriented positively with respect to π∗ , the positive sense of rotation
of π∗ is the sense of rotation by an angle less than π radians that takes the direction of a
into that of b.
www.TechnicalBooksPDF.com
Getting Concepts and Gathering Tools 73
Just as we can orient a plane, we can orient a 3-D region σ by specifying an orthonormal
basis (ĥ1 , ĥ2 , ĥ3 ) and defining the orientation of the oriented region σ ∗ by
All triplets which are positively oriented with respect to this basis are positively oriented
with respect to σ ∗ . When an oriented plane π∗ lies in an oriented 3-D region σ ∗ , we can
define the positive and negative sides of π∗ . We take two independent vectors b and c in π∗
that are positively oriented:
Ω(b, c) = Ω(π∗ ).
Ω(a, b, c) = Ω(σ ∗ ).
Since σ ∗ is oriented positively with respect to a Cartesian coordinate system, we can replace
this condition by
det(a, b, c) > 0.
If σ ∗ is oriented positively with respect to a right handed coordinate system, then the
positive side of an oriented plane π∗ is the one from which the positive sense of rotation in
π∗ appears counterclockwise.
www.TechnicalBooksPDF.com
2
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 75
Exercise From Eq. (2.2) derive the following equations for the line L in terms of
rectangular coordinates in E3 :
x1 − a1 x −a x −a
= 2 2 = 3 3,
u1 u2 u3
where xk = x · σ̂k , ak = a · σ̂k , uk = u · σ̂k , k = 1, 2, 3 and σ̂1 , σ̂2 , σ̂3 an orthonormal basis.
Hint [(x − a) × u] · σ̂3 = (x1 − a1 )u2 − (x2 − a2 )u1 etc.
Exercise
(a) Show that Eq. (2.2) is equivalent to the parametric equation
x = a + λû.
x = a + t2u
x × û = a × û.
We take the vector product on both sides from the left with û and use identity I to get
which we take to be the definition of the vector d. Noting that d · û = 0 we get, for the
length of vector x,
x2 = d 2 + (û · x)2 .
m = x × û
www.TechnicalBooksPDF.com
76 An Introduction to Vectors, Vector Operators and Vector Analysis
is called the moment of the line L . Figure 2.2 shows that the magnitude |m|, which is the
area of the parallelogram spanned by x and û, is the same for all points x on the line and
equals the distance d = |d| of the line from the origin O. Thus, any oriented line L is
uniquely determined by specifying the direction û and its moment m, or by specifying a
single quantity L = û + d × û.
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 77
collinear. If x, a, b are any three points on the line, the collinearity of the segments x − a
and b − a is expressed by the equation
(x − a) × (b − a) = 0. (2.4)
This differs from Eq. (2.2) in that u is replaced by the segment b − a which is proportional
to u. Thus, Eqs (2.2) and (2.4) are equivalent provided a and b are distinct points on the
line.
Exercise Find the directance to the line through points a and b (a) from the origin and
(b) from an arbitrary point c.
Answer
Exercise Show that the distance from an arbitrary point A to the line BC is
|a × b + b × c + c × a|
|b − c|
where a, b, c are the position vectors of points A, B, C respectively, with respect to some
origin O, (see Fig. 2.3).
Solution Let d be the vector from A perpendicular to b − c (see Fig. 2.3). We want to find
|d|. We can write
|d × (c − b)|
|d| =
|b − c|
www.TechnicalBooksPDF.com
78 An Introduction to Vectors, Vector Operators and Vector Analysis
d × c = −a × c + λ(b × c),
d × b = −a × b + c × b − λ(c × b).
Thus,
d × (c − b) = c × a + a × b + b × c
and substituting in the equation for |d| above, the result follows.
Exercise Let û, v̂, ŵ be the directions of three coplanar lines. The relative directions of
lines are then specified by α = v̂ · ŵ, β = û · ŵ, γ = û · v̂. Show that 2αβγ = α 2 + β 2 +
γ 2 − 1.
Solution Since û, v̂, ŵ are coplanar, they are linearly dependent, so that their Gram
determinant must vanish (see Appendix A). Thus we have,
1 γ β
Γ (û, v̂, ŵ) = γ 1 α = 0
β α 1
Exercise Find the parametric values λ1 , λ2 for which the line x = x(λ) = a + λû
intersects the circle whose equation is x2 = r 2 and show that λ1 λ2 = a2 − r 2 for every
line through a which intersects the circle.
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 79
s = t̂ × [t̂ × (b − a)].
−s = b − a − [(b − a) · t̂]t̂
1 1 1
(a × b) = (a × x) + (x × b). (2.6)
2 2 2
Now 12 (a × b) is the directed area of a triangle with vertices a, 0, b and sides given by a,
b, b − a. The other two terms in Eq. (2.6) can be interpreted similarly. We note that any
two of these three triangles have one side in common. Thus, Eq. (2.6) expresses the area of
a triangle as the sum of the areas of two triangles into which it can be decomposed. This
is depicted in Fig. 2.5(a) when x lies between a and b and in Fig. 2.5(b) when it does not.
From Eq. (2.6)
(a × b) · x = 0 (2.7)
which means that all three vectors and the three triangles they determine are in the same
plane. We define the vectors
1
B ≡ (a × x)
2
1
A ≡ (x × b), (2.8)
2
www.TechnicalBooksPDF.com
80 An Introduction to Vectors, Vector Operators and Vector Analysis
Fig. 2.5 With A and B defined in Eq. (2.8) (a) |a × b| = |A| + |B| and (b) |a ×
b| = |B| − |A|. These equations can be written in terms of the areas of the
corresponding triangles
whose magnitudes equal the areas of the corresponding triangles. These areas are depicted
in Figs 2.5(a) and 2.5(b). Note that the orientation of A and hence, the sign of A is opposite
in the two figures.
Since the segments of a line are all collinear, we can write
a − x = λ(x − b) (2.9)
|a − x| |a − x|
|λ| = or, λ = ± .
|x − b| |x − b|
Again, the vector product of Eq. (2.9) with x gives
a × x = λ(x × b).
|a − x| |B| B
λ=± =± =± , (2.10)
|x − b| |A| A
where the positive sign applies if x is between a and b and the negative sign applies if it is
not. The point x is called the point of division for the oriented line segment [a, b] and as
per Eq. (2.10), x is said to divide [a, b] in the ratio B/A. The division ratio λ parameterizes
the segment from a to b to give
a + λb
x= , (2.11)
1+λ
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 81
as can be obtained by solving Eq. (2.9). Thus, the midpoint of the segment [a, b] is defined
by λ = 1 and is given by 12 (a + b).
Equation (2.11) can be written as
Aa + Bb
x= . (2.12)
A+B
The scalars A and B in Eq. (2.12) are called homogeneous (line) coordinates for the point
x. They are also called barycentric coordinates because of the similarity of Eq. (2.12) to the
formula for center of mass of a rigid body. Unlike mass, however, the scalars A and B can
be negative and can be interpreted geometrically as oriented areas.
Exercise Prove that three points a, b, c lie on a line if and only if there are non-zero
scalars α, β, γ such that αa + βb + γc = 0 and α + β + γ = 0.
Hint This is an immediate consequence of Eq. (2.12).
The parameter λ is invariant under the shift in origin from O to O0 by a vector c as depicted
in Fig. 2.6. We have, with respect to the new origin,
|a − c − x + c| |a − x|
λ=± =± .
|x − c − b + c| |x − b|
This means (see Fig. 2.6)
B B0 B ± B0
λ= = 0 = . (2.13)
A A A ± A0
|c − x| B A A±B
λ0 = ± = 0 = 0 = 0 . (2.14)
|x| B A A ± B0
www.TechnicalBooksPDF.com
82 An Introduction to Vectors, Vector Operators and Vector Analysis
The point x is the point of intersection of the line through points [c, 0] with the line through
the points [a, b]. To get it we proceed as follows. Since c is collinear with x, we have,
|c − x| (c − x) · x̂
λ0 = ± = ,
|x| x
or, rearranging the terms and again using the fact that c is collinear with x,
c
x= , (2.15)
1 + λ0
which gives us the point of intersection in terms of the vector c and the ratio λ0 . From
Fig. 2.8 we see that
1
A + B = |a × b|
2
and
1
A0 + B0 = |(a − c) × (b − c)|.
2
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 83
|a × b|
λ0 = · (2.16)
|(a − c) × (b − c)|
Equations (2.15) and (2.16) determine x in terms of vectors a, b, c. They determine point x
in Fig. 2.8 and by interchanging a and c, they determine the point y in the same figure.
2.2 Planes
The algebraic description of a plane is similar to that of a line.
We set up an orthonormal basis in the plane, say σ̂ 1 , σ̂ 2 . We call such a plane σ̂ 1 , σ̂ 2
plane. Let a denote a fixed point on the plane. Then, every point x on the plane must satisfy
(see Fig. 2.9)
(x − a) · (σ̂ 1 × σ̂ 2 ) = 0. (2.17)
Fig. 2.9 A plane positively oriented with respect to the frame (î, ĵ, k̂)
www.TechnicalBooksPDF.com
84 An Introduction to Vectors, Vector Operators and Vector Analysis
a1 x1 + a2 x2 + a3 x3 = c, (2.18)
where a1 , a2 , a3 do not all vanish. Introducing the vector a ≡ (a1 , a2 , a3 ), (a , 0) and the
−−→
position vector x = OP ≡ (x1 , x2 , x3 ) of the point P , we can write Eq. (2.18) as a vector
equation:
a·x = c (2.19)
−−−→
Let y = OQ ≡ (y1 , y2 , y3 ) be the position vector of a particular point Q on the plane so
that a · y = c. Subtracting this from Eq. (2.19) we see that the points P of the plane satisfy
−−→
0 = a · (x − y) = a · P Q . (2.20)
Thus, the vector a is perpendicular to the line joining any two points on the plane. The
plane consists of the points obtained by advancing from any one of its points Q in
all directions perpendicular to a. The direction of a is called normal to the plane
(see Fig. 2.10).
The plane described by Eq. (2.19) divides space into two open half-spaces given by a·x <
c and a · x > c. The vector a points into the half space a · x > c. Thus, a ray from a point Q
of the plane in the direction of a comprises points whose position vectors x satisfy a · x > c.
The position vectors x of points P on such a ray are given by
−−→ −−−→
x = OP = OQ + λa = y + λa
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 85
where y is the position vector of Q and λ is a positive number. Dotting this equation by a
gives,
a · x = c + λ|a|2 > c.
In general, any vector b forming an acute angle with a points into the half space a · x > c,
since a · b > 0 means
a · x = a · y + λa · b > c.
If c > 0, the half-space a · x < c contains the origin as a · 0 = 0 < c. Then the direction of a
or the direction of the normal is away from the origin.
Equation (2.19) describing a given plane is not unique. It can be replaced by (λa) · x =
sgn c
λc, λ , 0. We can choose λ to be λ = |a| to cast the equation to the given plane in the
normal form
â · x = d
where d > 0 is a constant and â is the unit normal vector pointing away from the origin.
The constant d is the distance of the plane from the origin. To see this, note that the distance
of an arbitrary point on the plane with position vector x is |x| ≥ â · x = d, where equality
holds for x = d â. The distance d (Q ) of a point Q in space with position vector y from the
plane is then |â·y−d|. As an example, consider a plane wave with wave vector k propagating
in the direction k̂. The phase of a plane wave is given by k · r where r is the position vector
of a point on the wave. For a plane wave a surface of constant phase is a plane, because the
equation to such a surface must be k̂ · r = c. Such a plane is perpendicular to k̂ as shown
in Fig. 2.11.
www.TechnicalBooksPDF.com
86 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise Find the equation to the plane passing through (4, −1, 2) and perpendicular to
the planes 2x − 3y + z = 4 and x + 2y + 3z = 5.
Solution The equation to the plane can be written in the form
x · n̂ = d,
where x is the position vector of a point on the plane, n̂ is unit vector normal to the plane
and pointing into the region x· n̂ > d and d is the distance of the plane from the origin. n̂ is
given to be perpendicular to the vectors 2î − 3ĵ + k̂ and î + 2ĵ + 3k̂, so that its dot product
with these vectors must vanish, which means
2n1 − 3n2 + n3 = 0,
n1 + 2n2 + 3n3 = 0
Fig. 2.11 As seen from the figure, for every point on the plane k̂ · r = constant
and n̂ being a unit vector, n21 + n22 + n23 = 1. Solving this system we get,
±11 ±5 ∓7
n1 = √ , n2 = √ , n3 = √ .
195 195 195
Thus, the required equation becomes
!
11x + 5y − 7z
x · n̂ = √ = d.
195
Since the point (4, −1, 2) lies on the plane,
39 − 5 − 14 25
d=± √ = ±√ .
195 195
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 87
11x + 5y − 7z = 25.
Exercise Find the equation to a plane which passes through the line of intersection of
two planes which are equidistant from the origin.
Solution The equations to the given planes are x · n̂1 = d = x · n̂2 since both are
equidistant from the origin. The points lying on both these planes satisfy the linear
combination of their equations
where µ is a parameter. However, each term on the LHS of this equation can be taken
to equal ±d. Choosing both to equal +d makes µ = +1 and one +d and the other −d
makes µ = −1. In the first case the required equation becomes x · (n̂1 + n̂2 ) = 2d while
in the second case the equation is x · (n̂1 + n̂2 ) = 0 which is a plane passing through
the origin.
Exercise Find an expression for the angle between two planes given by x · n̂1 = d1 and
x · n̂2 = d2 .
Solution The angle between two planes is the angle θ between their unit normals and is
given by cos θ = n̂1 · n̂2 = λ1 λ2 + µ1 µ2 + ν1 ν2 , where (λ1 , µ1 , ν1 ) and (λ2 , µ2 , ν2 ) are
the direction cosines of n̂1 and n̂2 respectively.
Exercise Find the equation to a plane containing a line and parallel to a vector.
Solution Let the plane contain a line x = u + λv, λ being a parameter and parallel to
a given vector ω. Thus, the plane passes through a point with position vector u and is
perpendicular to v × ω. Its equation is
(x − u) · (v × ω ) = 0 or x · v × ω = u · v × ω.
Exercise Find the shortest distance between two skew lines as well as the equation to the
corresponding line.
Solution Skew lines is a pair of lines which are neither parallel nor intersecting. Let L1
and L2 be two skew lines with equations
L1 : x = u + λs and L2 : x = v + µt,
λ, µ being parameters. Thus, L1 passes through the point A with position vector u and is
parallel to vector s and L2 passes through the point B with position vector v and is parallel
to vector t (see Fig. 2.12). Let the segment P Q, joining points P and Q on the lines L1
and L2 respectively, give the shortest distance between them. Then P Q is perpendicular
to both the lines and hence, it is parallel to the cross product of the vectors s and t. The
segment P Q perpendicular to both the lines is unique, because if there was another such
www.TechnicalBooksPDF.com
88 An Introduction to Vectors, Vector Operators and Vector Analysis
(v − u) · (s × t) = 0, or v · (s × t) = u · (s × t),
(x − u) × p̂ = 0,
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 89
Exercise Find the radius vector s of the point of intersection of three planes (x−a)·n̂ = 0,
(x−b)·m̂ = 0 and (x−c)·p̂ = 0, where n̂, m̂, p̂ are the unit vectors normal to the respective
planes and n̂ · (m̂ × p̂) , 0.
1
Answer s = [(a · n̂)m̂ × p̂ + (b · m̂)p̂ × n̂ + (c · p̂)n̂ × m̂].
n̂ · (m̂ × p̂)
2.3 Spheres
Spheres form another instance of elementary geometrical figures. A sphere with radius r
and center c is the set of all points x ∈ E3 satisfying the equation
|x − c| = r or (x − c)2 = r2 . (2.21)
The vectors {x − c} satisfying Eq. (2.21) and also the constraint r̂ · (x − c) = constant, where
r̂ is a unit vector based at the center, trace out a circle on the sphere and can be taken to be
the defining equation for the circle.
As an example of applying vectors to sphere, we derive a basic result in spherical
trigonometry. For simplicity we deal with a unit sphere S with its center at the origin O
given by r 2 = 1. If A, B, C are any three points on S, then we call the intersection of the
planes OAB, OAC, OBC with S a spherical triangle (see Fig. 2.13).
The metric we adopt on S is that of the Euclidean space embedding S, so that the ‘length’
of the side AB is determined by the angle AOB = γ. In fact, these angles α, β, γ, which are
subtended by the sides BC, CA and AB at O give precisely the desired lengths if they are
expressed in radians that is, as a fraction of 2π (see section 1.2). We define the angle A at
the vertex A of the spherical triangle ABC to be that between the tangents AD and AE to
the great circles AB and AC. Note that the complementary parts of the great circles passing
www.TechnicalBooksPDF.com
90 An Introduction to Vectors, Vector Operators and Vector Analysis
through AB, BC and CA also form a spherical triangle ABC. We can specify the triangle
in Fig. 2.13 by requiring that every angle of the triangle ABC has to be less than π.
We wish to prove the identity
To this end we use identity II by replacing c by a and d by c. We get, remembering that all
vectors are unit vectors,
The angle between (â × b̂) and (â × ĉ) is the dihedral angle between the planes OAC and
OAB, that is, angle A. Further,
b̂ · ĉ = cos α,
where
Hint First get |(â × b̂) × (â × ĉ)| = |â × b̂||â × ĉ| sin A = sin γ sin β sin A. Then evaluate
|(â× b̂) × (â× ĉ)| differently to obtain a quantity σ which is invarient under any permutation
of vectors â, b̂, ĉ.
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 91
We prefer the following alternative definition because it leads to the generic parametric
equation which applies to all the conic sections. A conic is the set of all points in the
Euclidean plane E2 with the following property: The distance of each point from a fixed
point (the focus) is in fixed ratio (the eccentricity) to the distance of that point from a
fixed line (the directrix). This definition can be expressed as the equation defining the
conic in the following way. Denote the eccentricity by e, the directance from the focus to
the directrix by d = d ê (ê2 = 1) and the directance from the focus to any point on the
conic by r (see Fig. 2.14). The defining condition for the conic can then be written
|r|
e= .
d − r · ê
Solving this for r = |r| and introducing the eccentricity vector e = eê along with the so
called semi-latus rectum l = ed, we get the equation
l
r= . (2.22)
1 + e · r̂
This expresses the distance r from the focus to a point on the conic as a function of the
direction r̂ to that point. Equation (2.22) can also be expressed as a parametric equation
for r as a function of the angle θ between e and r̂. This equation is obtained by substituting
e · r̂ = e cos θ into Eq. (2.22). We get
l
r= . (2.23)
1 + e cos θ
This is the standard equation for conics however, we usually prefer Eq. (2.22) as it is an
explicit function of vectors and their scalar product, so that it shows the dependence of r
on the directions ê and r̂ explicitly.
Equation (2.22) traces a curve when r is restricted to the directions in a plane however,
if r is allowed to range over all directions in E3 then Eq. (2.22) describes a two
dimensional surface called conicoid. Our definition of a conic can be used for a conicoid
by redefining the directrix as a plane instead of a line. Different ranges of values of
eccentricity correspond to different conics or conicoid as shown in Table 2.1.
www.TechnicalBooksPDF.com
92 An Introduction to Vectors, Vector Operators and Vector Analysis
Figure 2.15 shows the 1-parameter family of conics with a common focus and pericenter.
The pericenter is the point on the conic at which r has the minimum value. For a hyperbola,
there are two pericenters, one on each branch of hyperbola. Only one of these is shown in
Fig. 2.15. If the conics in Fig. 2.15 are rotated about the axis joining the focus and the
pericenter, they “sweep out” corresponding conicoids.
Exercise Parametric curves x = x(λ) of the second order are defined by equation
a0 + a1 λ + a2 λ2
x= .
α0 + α1 λ + α2 λ2
Note that this generalizes Eq. (2.11) for a line. By the change of parameters λ → λ−
α1 /2α2 , this can be reduced to the form
a0 + a1 λ + a2 λ2
x= .
α + λ2
Show that
(a) For α = 1, the change of parameters λ = tan 12 φ can be used to put this equation in
the form
x = a cos φ + b sin φ + c
www.TechnicalBooksPDF.com
Vectors and Analytic Geometry 93
x = a cosh φ + b sinh φ + c.
Actually ultimate conclusion you can draw turns out to be true: All conics are second
order curves and conversely.
1−λ2 2λ
Hint (a) mf a0 = a + c, a1 = 2b, a2 = c − a cos φ = 1+λ2
, sin φ = 1+λ2
.
Conics and conicoids can be described in many different ways which disclose a variety of
their remarkable properties. However, any discussion of these issues will take us far away
from the main theme of this book. These are discussed at length in various books on
mechanics and geometry [9, 18].
www.TechnicalBooksPDF.com
3
The purpose of this chapter is to demonstrate how the geometry on a plane can be
effectively described using the set of complex numbers in place of planar vectors. We
choose the circle as the planar curve to be analysed for this purpose.
1 We assume that the reader is familiar with the algebra of complex numbers.
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 95
and
az ↔ ar,
where a is a scalar (real number) and r and z are the images of each other under the
isomorphism.
Thus, the set of vectors on a plane can be replaced by the set of complex numbers having
richer algebraic structure, as each complex number has a multiplicative inverse and there is
a unique identity element with respect to their product (z1 z2 = 1 implies z2 = 1/z1 and
z1 = 1/z2 ). Due to this isomorphism, we may use the same symbol z to denote a complex
number as well as a planar vector.
www.TechnicalBooksPDF.com
96 An Introduction to Vectors, Vector Operators and Vector Analysis
factors. In particular, squaring a vector z doubles the argument, while taking the square
root halfs the argument. As an example we multiply the function f (z ) = 1 − iu by the
function exp(iu ) (u real). The graph of 1 − iu is a straight line parallel to the y-axis
passing through the point z = 1. This line is tangent to the unit circle at the point z = 1,
as depicted in Fig. 3.2. By rotating this line over the angle u it remains a tangent moving
the point A in Fig. 3.2 to the point C. Since BC equals u, the arc length of the circle, the
locus of the point C represented by the equation
z = (1 − iu )exp(iu )
√
Fig. 3.3 Finding i
Exercise Show that the numbers whose nth power is unity are given by z = exp i 2πk
n
√
(k = 1, . . . , n). These are the n values of nth roots of unity, n 1.
Hint Divide the circumference of the unit circle by n to find the points whose nth
power is unity, obtained by performing one or more complete turns over the unit circle
(see Fig. 3.4).
The complex conjugate of a complex number z = x + iy = r exp(iθ ) is given by z∗ = x −
iy = r exp(−iθ ). The point z∗ is obtained by reflecting the point z in the x axis, as shown
in Fig. 3.5. We easily check that the real and imaginary parts of z are x = R(z ) = 12 (z + z∗ )
and y = I (z ) = −i 12 (z − z∗ ).
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 97
Fig. 3.5 z, z∗ , z ± z∗
x = cos u + u sin u,
y = sin u − u cos u.
The sum of any two conjugate numbers or functions is real while their difference is
imaginary. For any complex valued function f (u ), z = exp[f (u ) − f ∗ (u )] and
z = exp[i (f (u )+ f ∗ (u ))] are points on the unit circle.
Again, we easily find that√the modulus or the absolute value of a complex number
pz z=
∗
|z|exp(iθ ) is given by |z| = zz and its argument is obtained from exp(iθ ) = z∗ . Note
that
any function which is the quotient of two conjugate functions must have unit modulus
z z z∗
: ∗ = 1, because ∗ = 1.
z
z z
|z|2 = 1 + u 2
www.TechnicalBooksPDF.com
98 An Introduction to Vectors, Vector Operators and Vector Analysis
and
r
1 − iu
exp(iθ ) = exp(iu ).
1 + iu
q
Note that the function 11−iu
+iu has unit modulus.
The inverse of a complex number z = |z|exp(iθ ), with respect to the product of complex
numbers, is given by 1z = |z|
1
exp(−iθ ) because their product is z ( 1z ) = 1. Thus, the
quotient of two vectors z1 and z2 is given by zz1 = |z|z1 || exp{i (θ1 − θ2 )}. If the two vectors
2 2
are parallel, θ1 − θ2 = 0 and the quotient is purely real. The imaginary part vanishes in
this case, so that
z1 z1∗
− = 0 or z1 z∗2 − z∗1 z2 = 0. (3.1)
z2 z2∗
z1 z1∗
+ = 0 or z1 z∗2 + z∗1 z2 = 0 (3.2)
z2 z2∗
However,
z1 z2∗ − z1∗ z2 = 2i|z1 | |z2 | sin(θ1 − θ2 ),
z1 z2∗ = B + iA.
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 99
A = x1 y2 − x2 y1 ,
B = x1 x2 + y1 y2 .
and that the enclosed angles are equal: θ1 − θ2 = θ3 − θ4 . The two triangles constructed
on z1 , z2 and z3 , z4 are similar. For equality of angles, it is enough to require
z1 z3
∝ ,
z2 z4
or
z1 z4 ∝ z2 z3
with a real constant of proportionality. In the special case of z2 = z3 the two remaining
vectors make equal angles with the middle vector if
z1 z2 ∝ z22
www.TechnicalBooksPDF.com
100 An Introduction to Vectors, Vector Operators and Vector Analysis
with a real constant of proportionality. These rules are always p employed to prove the
equality of angles in geometrical figures. Thus, for example, z2 =p f (u ) bisects the angle
between z1 = f (u ) and the real axis z3 = 1. Similarly, z2 = if (u ) bisects the angle
between z1 = f (u ) and the imaginary axis z3 = i.
D is in general complex and its argument is the difference of the arguments of zz1 −z 3
1 −z4
and zz2 −z3
. If this difference is zero, that is, (see Fig. 3.7) if ∠z3 z1 z4 = ∠z3 z2 z4 , D is real.
2 −z4
In this case, the four points will be situated on a circle and the criterion for the
concentric configuration of four points is the reality of the cross ratio. Let three of the four
points be fixed on the circle and let z4 move over it. Then, D assumes all the positive and
negative real values. The circle is then parameterized by D and the formula for the
circle passing through z1 , z2 , z3 is
z1 − z3 z2 − z3
D= ÷ . (3.5)
z1 − z z2 − z
The value of the cross ratio depends on the order in which we take the four points. We
denote the sequence by writing D (1234) for the sequence chosen in the definition. We see
that interchanging 1 and 2 or 3 and 4 inverts the value. Interchanging 2 and 3 or 1 and 4
changes D into 1-D as can be checked by calculation. This leads to the rules named after
Mobius:
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 101
z1 − z2 z3 − z2 AB · CD
D (1324) = ÷ = = 1 − δ. (3.7)
z1 − z4 z3 − z4 AC · BD
Since the sum is 1 we get
AD · BC + AB · CD = AC · BD. (3.8)
In words: The product of the diagonals of a quadrilateral inscribed in a circle equals the
sum of the products of the opposite sides.
www.TechnicalBooksPDF.com
102 An Introduction to Vectors, Vector Operators and Vector Analysis
The modulus of ω is the ratio of the lengths of the vectors z − z0 and z + z0 and we know
from elementary geometry that this ratio is constant along a circle (Circle of Apollonius)
with its center on the straight line through −z0 and +z0 .
As in the ω plane the lines |ω| = constant (circles around the origin) and the lines ∆ =
constant (radii) are two orthogonal sets of curves, the two sets of circles in the z plane
for |ω| = constant and ∆ = constant must be orthogonal by the property of conformal
transformations.
This example leads to the following two geometrical conclusions:
(a) The circle passing through the points z1 and z2 such that the chord z1 z2 subtends at
any point of the arc z1 z2 of the circle constant angle ∆ is given by the equation:
z − z1
u exp(i∆) = .
z − z2
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 103
(b) The circle of Apollonius, for which the ratio of the distances of any point of the circle
to the two fixed points z1 and z2 is constant say a, is given by the equation
z − z1
a exp(iu ) = .
z − z2
One of the most important transformations is the inversion:
1
ω= ,
z∗
which leaves the argument the same while inverting the modulus.
Exercise Show that the inversion of the vertical straight line z = 1 + iu is a circle passing
through the origin.
1
Solution The real and imaginary parts of the inversion 1−iu are given by
1 u
x= 2
;y= ,
1+u 1 + u2
which are seen to satisfy the equation (x − 12 )2 + y 2 = 14 which is the Cartesian equation
of the circle with center at (0, 12 ) and radius 12 . We call this an O-circle.
Exercise Show that all straight lines in the complex z plane (z = u + i (mu + c ), u, m, c
real) can be converted to O-circles and conversely, the angle between two of these straight
lines being equal to the angle at the intersection of the two corresponding O-circles.
Under inversion the cross ratio of four points goes over to
ω1 − ω3 ω2 − ω3 z∗ − z∗ z∗ − z∗
D (ω ) = ÷ = 1∗ 3∗ ÷ 2∗ 3∗
ω1 − ω4 ω2 − ω4 z1 − z4 z2 − z4
which is the conjugate value of the original D (z ). The cross ratio is, in general, changed
by inversion, it will remain the same, if it is real. In other words, if the four points are on
a circle, before the transformation, they will still be on a circle after the transformation,
straight line being included as the circles of infinite radius.
There are pairs of curves which are mutual inversions, e.g., parabola and cardioid,
orthogonal hyperbola and lemniscate. All the properties concerning angles between
straight lines related to one member of a pair can immediately be converted to the
properties of angles between O-circles related to the second one. Concyclical location of
four or more points will be invariant with respect to inversion.
www.TechnicalBooksPDF.com
104 An Introduction to Vectors, Vector Operators and Vector Analysis
r exp(iu ) − r exp(iφ)
(Real function) exp(iα ) = .
r exp(iu ) − r
Dividing this by the conjugate equation we get exp(2iα ) = exp(iφ) implying α = φ/2
which is the constant angle property of the circle.
Next consider a circle with center at origin O and choose point A on the negative real
axis at a distance A from O (see Fig. 3.10). Draw a secant through A whose formula is
z = −a + s exp(iφ).
−a + s exp(iφ) = r exp(iu ).
a2 − 2 as cos φ + s2 = r 2 .
This equation has two roots s1 and s2 , the product of which equals a2 − r 2 , independent of
the choice of φ which proves the constant power property of the circle.
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 105
In order to get the radius r and the center zc of the circle represented by Eq. (3.10), we
solve this equation for u:
z1 − zz3
−u = .
z2 − zz4
As u is real, it must equal the conjugate of the right side so that the circle is represented by
the equation:
www.TechnicalBooksPDF.com
106 An Introduction to Vectors, Vector Operators and Vector Analysis
(z − zc )(z∗ − zc∗ ) = r 2
yields
z1 z4∗ − z2 z3∗ 2 2 z1 z2∗ − z1∗ z2
zc = and |zc | − r = .
z3 z4∗ − z3∗ z4 z3 z4∗ − z3∗ z4
Exercise Interpret the last expression in terms of the power of the circle (see above and
Fig. 3.11).
z = f (u ) = R + iu,
1 1
which is a straight line in the complex plane. The admittance, z = R+iu is then a circle.
Connecting R, C and L in parallel leads to an admittance
1 1
= f (u ) = + iu
z R
1
where u = ωC − ωL and represents a straight line in the complex plane. However, the
impedance z will now be a circle.
Fig. 3.12 Both impedance and admittance of this circuit are circles
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 107
There are circuits for which both impedance and admittance are circles. For the circuit in
Fig. 3.12 the admittance is
1 1 R + R2 + iωL
+ = 1
R1 R2 + iωL R1 R2 + iωLR1
and this as well as its inversion represent a circle.
There are circuits for which the variable parameter is not the frequency however, some
other quantity pertaining to the circuit. In Boucherot’s circuit (see Fig. 3.13) we find the
variable resistance u. The impedance is
a2
z=
i (b − a) + u
As another example, the circle diagram named after Heyland is obtained by plotting the
admittance of a motor as a function of load.
The reason why circle diagrams occur so often in electrical engineering is the linear
character of the fundamental equations. As the mechanical vibrations follow similar
equations, the field of application includes mechanics and acoustics.
www.TechnicalBooksPDF.com
108 An Introduction to Vectors, Vector Operators and Vector Analysis
v1 = av2 + bj2
v2 = av1 − bj1
must hold. Eliminate v1 from the first of Eq. (3.12) and the first of Eq. (3.13) to get
a2 − 1
j1 = v2 + aj2
b
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 109
and identify this with the second of Eq. (3.12). Comparing the corresponding coefficients
we see that for a symmetrical network the coefficients must satisfy
a = d and a2 − bc = 1.
Imposing these conditions on Eq. (3.11) we see that, for a symmetrical network, the
transformation is
az + b
ω= . (3.14)
cz + a
The characteristic value z = ∞ corresponds to the open output (j2 = 0) condition, while
z = 0 corresponds to the shorted output terminals. The corresponding values of ω,
denoted ω∞ and ω0 respectively, are
ω∞ = a/c ; ω0 = b/a.
The case where ω = z is of importance. The corresponding value of z is called the wave
impedance. An arbitrary number of networks, put in cascade,
√ would not change this
impedance. From Eq. (3.14) it follows that this value is b/c, which we denote by ωz .
Note that ωz2 = ω0 ω∞ , which means geometrically that (see Fig. 3.15) the triangles
ω∞ Oωz and ωz Oω0 are similar.
a a b a2
ω− z+ = − 2
c c c c
or,
(ω − ω∞ ) (z + ω∞ ) = ωz2 − ω∞
2
or,
ω − ω∞ ω + ω∞
= z
ωz − ω∞ z + ω∞
which means the triangles ω, −ω∞ , ωz and ω, ω∞ , z are similar (see Fig. 3.16).
www.TechnicalBooksPDF.com
110 An Introduction to Vectors, Vector Operators and Vector Analysis
As ω∞ , −ω∞ , ωz are fixed points in the plane, this offers us a method to construct point ω
for any given value of z, thus performing the transformation point by point.
Exercise Let x and y be the rectangular coordinates of a point x. Show that the equations
to an ellipse and a hyperbola, in terms of these coordinates, are
x2 y 2 x2 y 2
+ = 1, − =1
a2 b 2 a2 b 2
respectively. These parameters are related to those in Eq. (2.23) by
l
a= , b2 = al, x = r + ae.
|1 − e2 |
The curves and related parameters are shown in Figs 3.17(a),(b). Use the equations in terms
of coordinates to show that an ellipse has a parametric equation x = x(φ):
x = a cos φ + b sin φ,
www.TechnicalBooksPDF.com
Planar Vectors and Complex Numbers 111
x = a cosh φ + b sinh φ,
where a2 = a2 , b2 = b2 , and a · b = 0.
Hint Treat these as the curves on a complex plane and use complex algebra. Write the
equations to ellipse and hyperbola as z = a cos φ + ib sin φ and z = a cosh φ + ib sinh φ
respectively.
Theory of plane curves is a subject in itself and we recommend reference [26] for further
study.
www.TechnicalBooksPDF.com
www.TechnicalBooksPDF.com
Part II
Vector Operators
www.TechnicalBooksPDF.com
www.TechnicalBooksPDF.com
4
Linear Operators
where α and β are scalars. Such a function is called a linear operator, or operator for
brevity. In different contexts, such a function is also called a linear transformation or a
tensor. The term ‘tensor’ is used for describing certain properties of a physical system.
Thus, the ‘inertia tensor’ is a property of a rigid body or the ‘strain tensor’ is a property of
an elastic body. These are never called an ‘inertia or strain linear transformation’. On the
other hand, the term ‘transformation’ suggests a change of state of a physical system or an
equivalence of one state with another. The term ‘linear operator’ is generally used when
the emphasis is on the mathematical structure. Finally, we note that an operator is
essentially a mapping or association between the elements of two sets or between the
elements of the same set. Henceforth, in this book, whenever we refer to an operator, we
mean it to be a linear operator, unless otherwise specified.
Two simple examples of linear operators are α (x) = αx (the scalar multiplication
operator) and f (x) = a · x. In the first example, α is a fixed scalar and the operator maps a
vector x to a vector αx. Also, here the symbol α is used as the operator as well as a scalar.
In the second example, a is a fixed vector and the operator maps a vector x to the scalar
a · x. Note that if we change the fixed vector a in the second operator, we get a new
operator giving a new value for every vector x. This is often expressed by saying that the
operator parametrically depends on a. Similarly, the first operator parametrically depends
on α.
www.TechnicalBooksPDF.com
116 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise Check that the operators in these examples are linear operators.
The set of vectors on which a given operator f acts is called its domain. The set of vectors
or scalars generated by the action of f on its domain is called its range. All the operators
we deal with act on E3 . For a real life application, E3 consists of vector values of one or
more vector quantities e.g., electric and magnetic fields. When a linear operator f acts on
a vector x ∈ E3 , it either returns a vector y ∈ E3 , or a scalar α ∈ R. y (or α) is called the
image of x under f . In the first case, we denote f : E3 7→ E3 and in the second case
f : E3 7→ R. We assume that the domain of an operator we deal with is whole of E3 and
its range is either a subset of E3 or a subset of R. Two operators f : E3 7→ E3 or R and
g : E3 7→ E3 or R are equal if they have common domain (E3 ) and range (a subset of
E3 or R) and if f (x) = g (x) for all x ∈ E3 .
The product of two linear operators, in a given order, is a linear operator in itself and is
defined as an operator obtained by successively applying the two operators in the given
order. Thus, in order to get the action of the product f g on vector x we have to act first by
the operator g on x to get the vector g (x) and then act by the operator f on the vector
g (x) to get the vector f g (x). In general, the product of two linear operators is not
commutative.1 Such a product is written in many different ways like
g (f (x)) = g (f x) = gf (x) = gf x.
Note that the product of operators f : E3 7→ R and g : E3 7→ E3 is defined only in the order
f g : E3 7→ R. The general condition for the existence of the product of the two operators,
f g, is that the range of g must be a subset of the domain of f . Note that, two commuting
operators must be defined on a common set of vectors, forming the domain as well as the
range for both.
Exercise Show that the product of two linear operators is a linear operator. Check this
for the two operators defined in the above examples. Also check that two operators defined
via scalar multiplication as in the first example above, commute. In fact, check that the
operator of scalar multiplication α (x) = αx commutes with all linear operators.
The addition of two linear operators is defined by
for all x ∈ E3 and is itself a linear operator (check this). The operators being added must
be either E3 7→ E3 or E3 7→ R. Both the product and the addition of linear operators are
associative. That is, for three linear operators f , g and h we have
1 Two operators f and g are said to commute if f g (x) = gf (x) for all x ∈ E .
3
www.TechnicalBooksPDF.com
Linear Operators 117
This follows easily from the definitions of the product and the addition of operators. Using
the linearity of operators and the definition of their product we can show that the product
of operators is distributive with respect to addition, that is,
h(g + f ) = hg + hf .
Identity operator
The identity operator I is defined via
I (x) = x
for all x ∈ E3 . The scalar multiplication operator we saw above can also be defined as
(αI )(x) = αI (x) = αx. It is trivial to check that for every operator f
If = f = f I,
for all vectors x and y in E3 . The operator f † is called the adjoint of f . You will know its
utility after we use it in the sequel.
Exercise Show that (f † )† = f .
Consider two operators f and g and their product f g. Given any two vectors x, y ∈ E3 we
can write,
f gx · y = x · (f g )† y (4.1)
and
f gx · y = gx · (f )† y = x · (g )† (f )† y. (4.2)
Since the LHS of Eqs (4.1) and (4.2) are the same, their RHS must also be equal. Since
x, y ∈ E3 are arbitrary, this leads to the operator equality
(f g )† = (g )† (f )† .
www.TechnicalBooksPDF.com
118 An Introduction to Vectors, Vector Operators and Vector Analysis
equal E3 (or R) however, can be a proper subset of E3 (or of R). This can happen when two
or more elements of E3 have the same image under f . However, when this image set equals
E3 , (or R), that is, for every y ∈ E3 (or y ∈ R) there is a x ∈ E3 such that f (x) = y (or
f (x) = y), we call the operator ‘onto’. If two different elements of E3 always have different
images under f then the operator f is said to make a one to one mapping (or one to one
correspondence) between E3 and its image set under f . If f is both onto and one to one,
then we can define its inverse operator f −1 : E3 (or R) 7→ E3 as follows. For each y ∈ E3
or y ∈ R we find that unique element x ∈ E3 such that f (x) = y or y (x exists and is
unique since f is onto and one to one). We then define x = f −1 (y or y). This equation is
the result of solving y = f (x) (or y = f (x)) for x in just the same way as x = log(y ) is
the result of solving y = ex for x. Below we give two examples to illustrate this. Figure 4.1
illustrates the concept of the inverse of a mapping.
Fig. 4.1 Inverse of a mapping. A one to one and onto map f : X 7→ Y has the
unique inverse f −1 : Y 7→ X
If f −1 exists, we call the operator f invertible. It follows directly from its definition
based on f being a one to one correspondence that f −1 , if it exists, is unique.
Using the definition of the inverse, we can write,
for all x ∈ E3 and similarly for f f −1 (y), for all y ∈ E3 . This gives us the operator equation
f −1 f = I and f f −1 = I. (4.3)
The identity operators in Eq. (4.3) may act on different spaces. Thus, if f : E3 7→ R is
invertible with f −1 : R 7→ E3 then the product f −1 f : E3 7→ E3 is an operator on E3 while
f f −1 : R 7→ R is an operator on R. Both are identity operators on respective spaces.
We now check whether the inverse of a linear operator is linear. The answer is yes. We
have,
= I (x1 + x2 ) = x1 + x2 = f −1 y1 + f −1 y2
www.TechnicalBooksPDF.com
Linear Operators 119
and
0 = f −1 (0) = f −1 (f (a)) = a,
where the first equality follows because f −1 is a linear operator, so that f −1 (0) = 0.
Next we show that for an invertible operator f the set {f (x), f (y), f (z)} is linearly
independent (non-coplanar) provided the set {x, y, z} is linearly independent
(non-coplanar). We see that the equation
implies that all the coefficients α, β, γ vanish, because x, y, z are linearly independent.
Here, we have used f (a) = 0 implies a = 0 for an invertible operator. The same argument
shows that if x, y, z are linearly dependent, then so are f (x), f (y), f (z).
For arbitrary x ∈ E3 , let f gx = y, so that x = (f g )−1 y. Successively multiplying both
sides by (f )−1 and (g )−1 we get x = (g )−1 (f )−1 y. This leads to the operator equality
This equation defines the proportionality factor det f which depends exclusively on the
operator f and is an important characteristic of f . det f is called the determinant of the
operator f . Note that for an invertible operator f , det f , 0 because the vectors
f (σ̂ 1 ), f (σ̂ 2 ), f (σ̂ 3 ) are linearly independent, that is non-coplanar. Given any set {x, y, z}
of linearly independent (non-coplanar) vectors, the number of unit cubes that can be
accommodated in the parallelepiped with adjacent sides x, y, z is given by its volume
www.TechnicalBooksPDF.com
120 An Introduction to Vectors, Vector Operators and Vector Analysis
or,
f (x) · f (y) × f (z)
det f = . (4.4)
x·y×z
The determinant det f of an invertible linear operator is invariant under the change of
orthonormal basis. We shall see later that any two triads of orthonormal unit vectors can
be made to coincide by three successive independent rotations called Euler rotations (see
section 6.5). Under these rotations the volume of the unit cube scanned by one orthonormal
triad does not change. Since the determinant of f is simply the volume of the deformed
unit cube under the action of f , we see that det f is invariant under the change of basis,
which amounts to the rotation of one orthonormal triad of vectors to the other.
If f is invertible, then we know that any non-coplanar triad (x, y, z) is mapped to
another non-coplanar triad (f (x), f (y), f (z)). This makes both the numerator and the
denominator on the RHS of Eq. (4.4) non-zero, that is, det f , 0. Thus, if f is invertible,
then det f , 0.
If f is not invertible, then there exist two vectors x, y ∈ E3 , x , y such that f (x) =
f (y). We can make a linearly independent triad (x, y, z) by adding a non-coplanar vector
z to the set {x, y}, x , y satisfying f (x) = f (y). Using this triad in Eq. (4.4), we see that
det f = 0. This proves that det f , 0 implies f is invertible.
The last two paragraphs together imply that a linear operator f is invertible if and only
if det f , 0.
Many simple properties of the determinant det f now follow. First, it is trivial to check
that det I = 1. Next, consider the product gf of two linear invertible operators g and f .
We have, using an orthonormal basis {σ̂ 1 , σ̂ 2 , σ̂ 3 },
= det g · det f .
Thus, the determinant of the product is the product of determinants. This result can be
used to write
www.TechnicalBooksPDF.com
Linear Operators 121
If det f < 0, the operator f not only scales the volume of the parallelepiped formed by
(x, y, z) however, also changes its orientation. That is, f changes a right handed system
formed by (x, y, z) to a left handed one or the acute angle between x and y × z to an obtuse
angle between them. (see the interpretation of the scalar triple product in subsection 1.8.1).
Further, det f is defined via a scalar triple product, so that interchange of any two factors
changes its sign. This is not surprising, because interchanging any two of the three linearly
independent vectors changes them from a right handed to a left handed system and vice-
versa (see section 1.16).
4.1.4 Non-singular operators
An operator f with det f = 0 is called singular. If det f , 0 then f is called non-singular.
We can now prove that the following three statements are equivalent. We have proved some
parts of it in the last two sections however, it is worth putting everything at one place.
(a) f is non-singular.
(b) f (x) = 0 implies x = 0.
(c) f is invertible.
We first prove (a) ⇒ (b). Let {σ̂ k }, k = 1, 2, 3 be an orthonormal basis. Assume that
f (x) = 0 for some x , 0. This means
X
f (x) = xk f (σ̂ k ) = 0.
k
Since x , 0, not all xk s can be zero. Therefore, the above equation means that the vectors
f (σ̂ k ) k = 1, 2, 3 are linearly dependent (coplanar). Therefore, det f = 0 which
contradicts the assumption that f is non-singular.
(b) ⇒ (c): Suppose that f is not invertible, that is, it is not a one to one correspondence
between x and f (x) so that there are two different non-zero vectors x1 and x2 (x1 , x2 )
satisfying
f ( x1 ) = y = f ( x 2 ) ,
which gives
which means that there is a non-zero vector z = x1 − x2 with f (z) = 0. This contradicts
assumption (b).
(c) ⇒ (a) That f is invertible implies it is non-singular is proved in subsection 4.1.3.
4.1.5 Examples
We find the inverses of the following linear operators.
www.TechnicalBooksPDF.com
122 An Introduction to Vectors, Vector Operators and Vector Analysis
(b) g (x) = αx + b × x.
(a) Let
y · b = αx · b + (a · b)(x · b),
or,
y·b
x·b = .
α +a·b
Multiply both sides by a to get
a(y · b)
a(x · b) = .
α +a·b
Using Eq. (4.5) we get,
a(y · b)
y − αx = ,
α +a·b
or,
y a(y · b)
x= − = f −1 (y).
α α (α + a · b)
(b)
y = g (x) = αx + b × x. (4.6)
b · y = αb · x. (4.7)
b × y = αb × x + b × (b × x). (4.8)
b × y = α (y − αx) + (b · x)b − b2 x
= αy − (α 2 + b2 )x + α −1 (b · y)b,
www.TechnicalBooksPDF.com
Linear Operators 123
or,
αy + α −1 (b · y)b − (b × y)
x= = g −1 (y).
α 2 + b2
Exercise In these two examples, check that f −1 f (x) = x and f f −1 (y) = y. Also,
check whether both these operators are non-singular.
(c) We solve the vector equation
α1 a1 + α2 a2 + α3 a3 = c (4.9)
for αi s; {ai }, i = 1, 2, 3 and c being given, using vector methods. We then compare
our solution with that obtained by Cramer’s rule for solving simultaneous equations.
Cross the given equation with a3 to get
α1 (a3 × a1 ) + α2 (a3 × a2 ) = a3 × c
a2 · (a3 × c)
α1 = .
a2 · (a3 × a1 )
Similarly,
a3 · (a1 × c)
α2 =
a3 · (a1 × a2 )
and
a1 · (a2 × c)
α3 = .
a1 · (a2 × a3 )
The given vector equation is equivalent to
where aij is the ith component of aj and ci is the ith component of c with respect to some
orthonormal basis. By Cramer’s rule, its solution is
www.TechnicalBooksPDF.com
124 An Introduction to Vectors, Vector Operators and Vector Analysis
c1 a12 a13
c a22 a23
2
c3 a32 a33
α1 = ,
a11 a12 a13
a a22 a23
21
a31 a32 a33
where in the upper determinant the 1st column in [aij ] is replaced by [c1 , c2 , c3 ]T and
similarly for α2 and α3 . It is straightforward to check that the two solutions are equivalent.
If we try to apply the vector method given above to the equation with more than three
variables, (αi i = 1, . . . , 4 say), it fails. We can make one of the four terms vanish by taking
a suitable cross product and treat the resulting equation in three unknowns by the method
given above. However, the vectors in the three term equation are all coplanar making the
scalar triple product like a1 · (a2 × a3 ) vanish. Thus, a generalization of our method needs
a more general kind of algebraic setting, than the vector compositions based on dot and
cross products. Geometric algebra is such an algebra in which the above method can be
generalized. We refer to references [10, 7, 11] for a comprehensive treatment of geometric
algebra.
ek · ej = δjk ; j, k = 1, 2, 3,
where δjk = 1 if j = k and zero otherwise. To solve for ek we note that it is a vector normal
to both the vectors ej j , k and its scalar product with ek must be +1. Such a vector is
uniquely given by the vector product of ej j , k in the cyclic order of {123}. Thus, the
unique solution to these equations are given by
e2 × e3
e1 = ,
e
e3 × e1
e2 = ,
e
e1 × e2
e3 = .
e
Linear Operators 125
a = a1 e1 + a2 e2 + a3 e3 = ak ek , (4.10)
where the summation convention is used on the right. The coefficients ak are called
contravarient components of vector a (with respect to frame {ek }). We note that the
Eq. (4.10) is the same as Eq. (4.9) with α1,2,3 replaced by a1,2,3 and a1,2,3 replaced by
e1,2,3 and c replaced by a. Making these substitutions, we get the following solutions for
Eq. (4.10).
a · (e2 × e3 )
a1 = ,
e1 · (e2 × e3 )
a · (e3 × e1 )
a2 = ,
e1 · (e2 × e3 )
a · (e1 × e2 )
a3 = . (4.11)
e1 · (e2 × e3 )
ak = ek · a; k = 1, 2, 3.
a = a1 e1 + a2 e2 + a3 e3 = ak ek ,
where the coefficients ak are called covariant components of vector a (with respect to
frame {ek }).
Exercise Show that the contravariant components ak are given by
ak = ek · a.
Exercise Let î, ĵ, k̂ be an orthonormal basis and define a non-orthonormal frame by e1 =
î+3ĵ, e2 = 4ĵ and e3 = k̂. Find the corresponding reciprocal frame. Find the contravariant
and covariant components of a = 7î + 2ĵ + k̂ with respect to these frames. Draw figures
to depict both the frames and the contravariant and covariant components of a.
Exercise Show that primitive bases of the Bravis lattice of a crystal and its reciprocal
lattice form reciprocal frames.
126 An Introduction to Vectors, Vector Operators and Vector Analysis
A x = x × (a × b), (4.12)
= (b · x)(a · y) − (a · x)(b · y)
= −x · (y × (a × b)), (4.13)
giving
A † = −A .
where we have used the definition of the adjoint of an operator. Now we choose
1X
a×b = ak × σ̂ k
2
k
and consider
1X
σ̂ j × (a × b) = σ̂ j × (ak × σ̂ k )
2
k
1 X h i
= σ̂ j · σ̂ k ak − σ̂ k σ̂ j · ak
2
k
1 Xh i
= δjk (A σ̂ k ) − σ̂ k Ajk
2
k
1
= A σ̂ j − A † σ̂ j = A σ̂ j . (4.14)
2
Here, we have used identity I and the orthonormality of the {σk } basis. As an example,
the magnetic force due to magnetic field on a charged particle is a skewsymmetric linear
q
operator on the particle velocity given by F = B v = c v×B acting via the psuedovector B.3
Example
We find the adjoint as well as the symmetric and the skewsymmetric parts of the operator
We find
f † x = αx + b(a · x) + (c × d) × x,
1
f+ x = αx + [a(b · x) + b(a · x)],
2
and
1
f− x = [a(b · x) − b(a · x)] + x × (c × d)
2
1
= x × (a × b) + x × (c × d). (4.16)
2
where f+ and f− are the symmetric and skewsymmetric parts of f respectively.
Exercise Obtain these expressions for the adjoint, the symmetric and the skewsymmetric
parts of f given in Eq. (4.15).
which means
Note that this operator maps to 0 all (non-zero) vectors x which are parallel or antiparallel
to a, so that it is not invertible.
Let {a1 , a2 , a3 } {x1 , x2 , x3 } and {f1 , f2 , f3 } be the components of the vectors a, x and
f (x) with respect to some orthonormal basis. Then expressing the vector product in
terms of the Levi-Civita symbols we get
fi = εijk aj xk ,
where we have used the summation convention. Using the values of the antisymmetric
tensor εijk we can write this equation in the matrix form
0 −a3 a2
f ≡ a× ↔ [εijk aj ] ≡ a3 0 −a1 ·
−a2 a1 0
Linear Operators 129
where xk = σ̂ k · x k = 1, 2, 3 are the components of x in the basis {σ̂ k }. We can expand the
vectors fk = f (σ̂ k ) in the basis {σ̂ k } to get
X X
fk = f (σ̂ k ) = σ̂ j (σ̂ j · fk ) = σ̂ j fjk .
j j
There are three coefficients fjk for each value of k (that is, each fk ) so that for k = 1, 2, 3
there are nine coefficients fjk . We arrange them in a 3 × 3 matrix with j running over rows
and k over columns. We have,
f11 f12 f13
[f ] = [fjk ] = f21 f22 f23 ·
f31 f32 f33
The coefficients
fjk = σ̂ j · f (σ̂ k ) = σ̂ j · fk
which form a 3 × 3 matrix as above, are called the matrix elements of the linear operator f .
The matrix formed by fjk is called the matrix representing f in the basis {σ̂ k }. If we change
over to some other orthonormal basis say {êk }, the matrix representing f in the basis {êk }
is in general different than that representing f in the basis {σ̂ k }. Later in this discussion,
we shall relate these two matrix representatives of the same operator f . By [f ]{·} we denote
the matrix representing operator f using the basis {·}. Whenever the basis is fixed, we shall
drop the suffix {·}.
A linear operator is completely determined by its matrix in a given basis. To see this,
consider the action of f on an arbitrary vector x ∈ E3 . We have,
X XX
f (x) = f (σ̂ k )xk = σ̂ j fjk xk .
k j k
f (x) = y
is
X
(f (x))j = fjk xk = yj .
k
There is one such equation for each value of j = 1, 2, 3 so that the vector equation f (x) = y
is equivalent to the set of three simultaneous equations
X
fjk xk = yj j = 1, 2, 3
k
completely determined by the matrix [fjk ]. Written in matrix form, these equations read
where the first equality follows from the definition of the matrix element of an operator.
Thus, the matrix element of the addition of two operators equals the addition of the
matrix elements of the operators, or,
[f + g ] = [f ] + [g ].
For the product of two linear operators say gf consider (work this out),
Linear Operators 131
X X X
gf (σ̂ k ) = (g (σ̂ j ))fjk = σ̂ i gij fjk .
j i j
Compare with
X
gf (σ̂ k ) = σ̂ i (σ̂ i · gf (σ̂ k ))
i
to get
X
gij fjk = σ̂ i · gf (σ̂ k ).
j
The RHS of this equation is the ikth element of the matrix of the operator gf , while the
LHS is the ikth element of the product of the matrices of the operators g and f in that
order. Thus, we see that
σ̂ i · I (σ̂ k ) = σ̂ i · σ̂ k = δik
because the basis {σ̂ k } is orthonormal. Thus, the matrix representing the identity operator,
(which is the identity with respect to operator multiplication), is the unit matrix I, (which
is the identity with respect to matrix multiplication).
For an invertible operator f , using Eq. (4.17), we have,
I = [f −1 f ] = [f −1 ][f ] (4.18)
which simply means that the matrix representing f −1 is the inverse of the matrix
representing f . Since f −1 is assumed to exist det f , 0. Since det f is the same as that of
the matrix representing f , its determinant is non-zero and Eq. (4.18) is meaningful. We
have already seen that det f is invariant under the change of orthonormal basis so that
Eq. (4.18) holds irrespective of the orthonormal basis used. In fact we shall independently
prove that the determinant of [f ] is invariant under the change of basis. In particular, the
determinant of the operator f can be alternatively defined as the determinant of its matrix
in any orthonormal basis.
Thus, we have shown that the set of linear operators on E3 and the set of matrices
representing them (with respect to a fixed orthonormal basis) are isomorphic under the
binary operations of addition and multiplication defined on these sets. This fact is
expressed by saying that the algebra of linear operators on E3 and that of their matrix
representatives are equivalent4 .
4 To establish this equivalence both sets must have the algebraic structure called ring with respect to the multiplications defined
on them, which is known to be true. We shall not discuss this point any further.
132 An Introduction to Vectors, Vector Operators and Vector Analysis
We establish the relation between the matrix representing an operator and that
representing its adjoint. We have,
fjk† = σ̂ j · f † σ̂ k = f σ̂ j · σ̂ k = σ̂ k · f σ̂ j = fkj ,
which means
[f † ] = [f ]T (4.19)
FX = Y (4.20)
where F is the matrix of operator f in {σ̂ k } and X and Y are the column (3 × 1) matrices
comprising coordinates of vectors x and y in the basis {σ̂ k } (see the last equation in
section 4.4). We write f (x) = y for the corresponding matrices in the basis {êk } as
F0 X0 = Y 0 . (4.21)
It is straightforward to check, by expanding the basis {êk } using the basis {σ̂ k }, that for a
vector x ∈ E3 ,
X 0 = QX
F 0 QX = QY
or,
(Q−1 F 0 Q )X = Y (4.23)
F = Q−1 F 0 Q
F 0 = QFQ−1
Linear Operators 133
det(F 0 ) = det(QFQ−1 )
and
where det[·] is the determinant of the matrix representative of the corresponding operator.
5
distinct (no degeneracy) then the operator can be proved to be diagonalizable. Even if
degeneracy is present, we can find the maximal set of linearly independent eigenvectors,
that is, the corresponding operator can be diagonalized. All the information of a
diagonalizable operator is contained in its eigenvalues and eigenvectors because its action
on any vector, (by virtue of its linearity and by the fact that its eigenvectors form a basis),
can be expressed in terms of these quantities in the simplest possible way. The differential
or integral equations, which are the principal mathematical models in physics and
engineering, are often expressed or related to the eigenvalue problem of operators on
different kinds of spaces called function spaces. These are some of the reasons why the
eigenvalue problem is of such a paramount importance in mathematical modeling of real
life processes. Here, we shall confine ourselves to the case of operators on E3 with real
eigenvalues. As we shall see later, these are symmetric operators. We shall touch upon the
case of complex eigenvalues later.
The basis formed by the eigenvectors of an operator on E3 gives a coordinate frame in
E3 . Its coordinate axes are called principal axes and the frame is called the principal axes
system.
Typically, the operator is given in its matrix form [fjk ], that is, we are given the vectors
3
X
fk = f (σ̂ k ) = σ̂ j fjk ,
j =1
(f − λI )u = 0. (5.2)
Equation (5.2) tells us that the operator (f − λI ) must be singular because it maps a non-
zero vector u , 0 to the zero vector, so that its determinant must vanish.
If we expand the LHS of Eq. (5.3), successively applying the distributive law for the scalar
and the vector products, we can transform it to
λ3 − α1 λ2 + α2 λ − α3 = 0, (5.4)
where,
X
α1 = σ̂ k · fk = f11 + f22 + f33
k
g1 u1 + g2 u2 + g3 u3 = 0, (5.6)
gk = fk − λσ̂ k k = 1, 2, 3 (5.7)
are known for each eigenvalue λ and the scalar components uk = u · σ̂ k of the eigenvector
are to be determined2 for one eigenvalue λ. We can solve Eq. (5.6) for the ratios of uk as
follows. Cross Eq. (5.6) with g3 to get
u2 (g × g ) · (g2 × g3 )
= 3 1 · (5.9)
u1 |g3 × g1 |2
Similarly,
u3 (g × g ) · (g2 × g3 )
= 1 2 · (5.10)
u1 |g2 × g3 |2
We have already seen that if u satisfies Eq. (5.2), so does any of its scalar multiples. This
means that the length or the sense (orientation) of u is not determined by the eigenvector
equation (Eq. (5.2)). Therefore, it is not a surprise that Eq. (5.6) fixes only the ratios of the
components of u and we are free to fix the sign and magnitude of u by assigning any
convenient value to the component u1 . After u1 is assigned a value, Eq. (5.9) and
Eq. (5.10) determine u2 and u3 uniquely. Here, we have assumed that every pair of vectors
2 Note that the vectors (g , g , g ) must be coplanar, otherwise they form a linearly independent set of vectors and Eq. (5.6)
1 2 3
has only the trivial solution ui = 0, i = 1, 2, 3.
Eigenvalues and Eigenvectors 137
formed out of (g1 , g2 , g3 ) is linearly independent, if not, all of them will be proportional
to each other3 , in which case the ratios u2 /u1 and u3 /u1 obtained via Eq. (5.9) or
Eq. (5.10) will become indeterminate and Eqs (5.9), (5.10) do not apply. In such a case we
can proceed as follows. Since gk s are proportional to each other, we can put g2 = cg1 and
g3 = dg1 in Eq. (5.6) to get
or, since g1 , 0,
Thus, we can give arbitrary values to any two of the components of u and the remaining
component is fixed via Eq. (5.11). We can choose two sets of ui values in such a way that
the resulting eigenvectors (via Eq. (5.6) or Eq. (5.11)) are linearly independent. Setting
u1 = u2 = 1 in Eq. (5.6) for example, gives,
g1 + g2 + u3 g3 = 0. (5.12)
To get the respective eigenvectors, we have to solve Eq. (5.12) for u3 and Eq. (5.13) for u2
respectively, which is trivially done using known g1 ∝ g2 ∝ g3 . These eigenvectors are
trivially seen to be linearly independent. Any eigenvector corresponding to a different
choice of components will be a linear combination of these two eigenvectors. Thus,
linearly dependent pairs of gk s (so that they are mutually proportional) imply that the
eigenvectors belonging to the corresponding eigenvalue span a 2-D space i.e., a plane. This
is to be contrasted with the fact that when every pair of the gk s is linearly independent, the
eigenvectors belonging to the corresponding eigenvalue span a 1-D space.
It turns out that if λ is a simple root of the secular equation (for a symmetric linear
operator f ), then every two of the three vectors gk = fk − λσ̂ k are necessarily linearly
independent. Thus, the eigenvectors belonging to a simple root λ span a 1-D space i.e.,
a real line in E3 . If λ is a double root, every pair of gk s is linearly dependent so that the
eigenvectors belonging to a double root λ span a 2-D space, i.e., a plane in E3 . Note that any
two linearly independent (i.e., non-collinear) vectors in this plane can be the eigenvectors.
A multiple root of a secular equation is said to be k-fold degenerate if the root has
multiplicity k. To an eigenvalue with multiplicity k there correspond exactly k linearly
independent eigenvectors (in E3 , provided f is symmetric).
Eigenvalues of a symmetric operator are real. To get a flavor of the complex eigenvalues,
(i.e., complex roots of the secular equation) consider the skewsymmetric operator
f x = x × (σ̂ 1 × σ̂ 2 )
3 This is because (g , g , g ) have to satisfy Eq. (5.6) with one or more u , 0.
1 2 3 i
138 An Introduction to Vectors, Vector Operators and Vector Analysis
f (σ̂ 2 ) = σ̂ 2 × (σ̂ 1 × σ̂ 2 ) = σ̂ 1 = −i σ̂ 2
λ(λ2 + 1) = 0
The root λ = 0 corresponds to the eigenvector σ̂ 3 in Eq. (5.14). The eigenvalue equations
for the eigenvalue −i the first two of Eq. (5.14). The last equalities in these equations
derive from the fact that multiplication of a vector in the complex plane by −i results in
the clockwise rotation of that vector through π/2. In general, multiplication by eiθ results
in the counterclockwise rotation through θ. Thus, we see that complex eigenvalues result
in the rotation of the eigenvectors.
5.1.1 Examples
We obtain the eigenvalues and eigenvectors of the operator f represented by the matrix
4 −1 −1
[f ] = −1 4 −1
−1 −1 4
f (σ̂ 1 ) = 4σ̂ 1 − σ̂ 2 − σ̂ 3 = f1
Using Eqs (5.15) and (5.16) we get the values of the coefficients in the secular equation,
α1 = 4 + 4 + 4 = 12
α2 = 15 + 15 + 15 = 45
α3 = f1 · (f2 × f3 ) = 50 (5.17)
λ3 − 12λ2 + 45λ − 50 = 0
(λ − 2)(λ − 5)2 = 0.
g1 = f1 − 2σ̂ 1 = 2σ̂ 1 − σ̂ 2 − σ̂ 3
g1 × g2 = 3(σ̂ 1 × σ̂ 2 + σ̂ 2 × σ̂ 3 + σ̂ 3 × σ̂ 1 ) = g2 × g3 = g3 × g1
u1 = σ̂ 1 + σ̂ 2 + σ̂ 3 (5.19)
g1 = g2 = g3 = −(σ̂ 1 + σ̂ 2 + σ̂ 3 )
u2 = σ̂ 1 + σ̂ 2 − 2σ̂ 3 (5.20)
140 An Introduction to Vectors, Vector Operators and Vector Analysis
u3 = σ̂ 1 − σ̂ 2 (5.21)
is the other eigenvector. Therefore, every vector in the plane defined by u2 and u3 is an
eigenvector with eigenvalue λ = 5. (Operate by f on any linear combination of u2 and u3 ).
Although, our method to find the eigenvalues and eigenvectors is sufficiently general, it
may cost us more work than necessary in special cases. Often, an eigenvector is known in
advance. Then, the corresponding eigenvalue is easily obtained via
(f (u) · u)/|u|2
instead of using the secular equation. More often than not, an eigenvector can be identified
easily from the symmetries in the given problem. Thus, perusal of Eq. (5.15) shows that
adding these three equations we get
f (σ̂ 1 + σ̂ 2 + σ̂ 3 ) = 2(σ̂ 1 + σ̂ 2 + σ̂ 3 ),
u1 · u2 = (σ̂ 1 + σ̂ 2 + σ̂ 3 ) · (σ̂ 1 + σ̂ 2 + u3 σ̂ 3 ) = 2 + u3 = 0
This gives u3 = −2, so u2 = σ̂ 1 + σ̂ 2 − 2σ̂ 3 which coincides with Eq. (5.20). From
Eq. (5.15) we now find f (u2 ) = 5u2 so the eigenvalue is 5. The vector u1 × u2 =
−3(σ̂ 1 − σ̂ 2 ) is orthogonal to both u1 and u2 and is proportional to the eigenvector u3 in
Eq. (5.21).
Exercise Obtain the eigenvectors and eigenvalues of the operators represented by the
following matrices in an orthonormal basis.
1 0 5
(i) [f ] = 0 −2 0 ·
5 0 1
√ √
7 6 − 3
√ √
(ii) [f ] = 6 2 −5 2 ·
√ √
− 3 −5 2 −3
Eigenvalues and Eigenvectors 141
1 2 0
(iii) [f ] = 2 6 −2 ·
0 −2 5
Fig. 5.1 u · eiθ v = e−iθ u · v
142 An Introduction to Vectors, Vector Operators and Vector Analysis
However, the alternative scalar product is just (e−iθ u) · v. Thus, we have the general result
u · (λv) = (λ∗ u) · v
where we have used the fact that S is symmetric. Remember that when λ is complex, the
vectors u and λu are not collinear. Equation (5.22) gives
Two cases arise. In the first case, λ1 = λ2 = λ and the scalar product in Eq. (5.23) is non-
zero. This gives λ = λ∗ . This proves that the eigenvalues of a symmetric operator are real.
In the second case, λ1 , λ2 so that the scalar product of the two eigenvectors u and v must
vanish. This simply means that the eigenvectors belonging to two different eigenvalues of a
symmetric operator are orthogonal.
We now show that for every symmetric operator on E3 , there exists a set of eigenvectors
which are mutually orthogonal. The axes of the resulting frame are called the principal
axes. If all the three eigenvalues of the given symmetric operator are distinct, then this
statement follows from the fact that the eigenvectors belonging to different eigenvalues of a
symmetric operator must be orthogonal. Further, in this case, the eigenvectors are unique
upto multiplication by a scalar (there is no degeneracy) so that all the principal axes are
unique.
Now suppose λ1 , λ2 , λ1 , λ3 but λ2 = λ3 = λ say. Since λ1 is distinct from λ2
and λ3 , the eigenvector u1 belonging to λ1 must be orthogonal to both the eigenvectors
belonging to the degenerate eigenvalue λ. Further, we know that the eigenvectors belonging
to λ are linearly independent (non-collinear) and every vector in the plane spanned by two
linearly independent eigenvectors for λ is also an eigenvector. Thus, we can take any vector
in the plane normal to u1 as one of the eigenvectors, say u2 of the eigenvalue λ and the
third eigenvector u3 can be obtained from
u3 = u1 × u2 .
If all the three eigenvalues are equal to say λ, three linearly independent eigenvectors
belong to this common eigenvalue λ and every linear combination of them is also an
eigenvector. In other words, every vector in E3 is an eigenvector belonging to λ.
Obviously, any orthonormal triad of vectors (u1 , u2 , u3 ) gives a principal axes system.
The fact that a symmetric operator S has three orthogonal principal axes is expressed
by the equations
Suk = λk uk k = 1, 2, 3
Eigenvalues and Eigenvectors 143
and
uj · uk = 0 if j , k.
Thus, a symmetric operator is not only diagonalizable, but its eigenvectors naturally form
an orthogonal basis.
Thus, we see that the eigenvectors of a symmetric operator form an orthogonal basis
of E3 . This basis is called the eigenbasis of the symmetric operator and the corresponding
eigenvectors are called principal vectors and eigenvalues are called the principal values. If
we denote this basis by (u1 , u2 , u3 ) we can write an arbitrary vector x ∈ E3 as a linear
combination of the eigenvectors as
x = α1 u1 + α2 u2 + α3 u3 .
Dotting both sides by uk k = 1, 2, 3, using the orthogonality of the eigenvector basis and
dividing both sides by |uk |2 we get
αk = (uk · x)/|uk |2 .
where ûk is a unit vector in the direction of uk . Equation (5.24) is often written in terms of
the so called projection operators. Thus,
3
X
Sx = λk Pk x,
k =1
where the projection operator Pk which projects any vector x ∈ E3 onto the kth principal
axis along uk is given by
The canonical form Eq. (5.24) or Eq. (5.26) is called the spectral decomposition (or the
spectral form) of the symmetric operator S. Note that if we use an eigenvector uk in place
of x in the spectral decomposition, the eigenvalue equation for uk emerges trivially.
144 An Introduction to Vectors, Vector Operators and Vector Analysis
where we have used the fact that S is symmetric. On the other hand, the matrix [A ]
representing a skewsymmetric operator A in an orthonormal basis is skewsymmetric. We
have, for the ijth element of the matrix [A ],
where we have used the fact that A is skewsymmetric. Obviously, a matrix representing
a skewsymmetric operator has vanishing diagonal elements because they have to satisfy
Aii = −Aii . That means, the pairs (σ̂ i , A σ̂ i ), i = 1, 2, 3 are orthogonal.
As you may have noticed, all the matrices given in the exercise at the end of the last
section are symmetric.
From the spectral decomposition for a non-singular symmetric operator S, we can
write, for the inverse operator,
X 1
S −1 = P . (5.27)
λk k
k
Eigenvalues and Eigenvectors 145
Exercise Using the spectral decomposition of S (Eq. (5.26)) and that of S −1 (Eq. (5.28))
verify explicitly S −1 S = I = SS −1 .
We can show that the inverse of a symmetric operator is also symmetric. We have, for all
x ∈ E3 (SS −1 )† x = Ix = x implies (S −1 )† Sx = x = S −1 Sx which means (S −1 )† = S −1
because the inverse is unique.
A symmetric operator S is called positive, if all its eigenvalues λk > 0, k = 1, 2, 3
and non-negative if λk ≥ 0, k = 1, 2, 3. A positive symmetric operator S is also non-
negative, however, the converse is not true. A general linear operator f is called positive
(non-negative) if f (x) · x > 0, (≥ 0) for every x , 0.
Exercise Show that a non-negative symmetric operator S has a unique square root
X
S 1/2 = λ1/2
k Pk
k
Fig. 5.2 Symmetric transformation with principal values λ1 > 1 and λ2 < 1
146 An Introduction to Vectors, Vector Operators and Vector Analysis
A positive symmetric operator S+ on E3 transforms the unit sphere into ellipsoid. The
transformation is,
x = S+ n̂ (5.28)
where n̂ is any unit vector. This is a parametric equation for the ellipsoid with vector
parameter n̂. A non-parametric equation can be obtained by eliminating n̂ as follows.
2
−1
S+ x = n̂2 = 1.
−1
Since S+ is symmetric, we have,
2
−1 2
−1 −1 −1
S+ x = S+ x · S+ x = x · S+ x = 1. (5.29)
−1
Now using the spectral decomposition of S+ (Eq. (5.28)) and the properties of the
projection operator we can write Eq. (5.29) in the form
where xk = xk · uk . Equation (5.30) is the standard equation for an ellipsoid with semiaxes
λ1 , λ2 , λ3 (see Fig. 5.3).
In some situations, eigenvalues and eigenvectors are supplied as the initial information,
so that the corresponding symmetric operator can be constructed directly from its
spectral decomposition. Some variants of the spectral form are more convenient in certain
applications. All these variants are, of course, constructed from the eigenvectors and
eigenvalues.
x · (Sx) = 1 (5.31)
Eigenvalues and Eigenvectors 147
is equivalent to the standard coordinate forms for each of the following quadratic surfaces.
(a) Ellipsoid:
[f (x − a)]2 = 1
f (x − a) · f (x − a) = 1
or,
(x − a) · f † f (x − a) = 1.
(x − a) · S (x − a) = 1. (5.32)
We know from the previous exercise that Eq. (5.32) corresponds to that for an ellipsoid if
all the eigenvalues of S are positive, hyperboloid of one sheet if one eigenvalue is negative
and hyperboloid of two sheets if two of the eigenvalues are negative. Obviously, there is no
solution for all negative eigenvalues of S. Note that, for Eq. (5.32), all the quadratic surfaces
are centered at a.
we may face is when one of the three eigenvectors of a symmetric operator is known and
the other two are to be found. The remaining eigenvectors lie in the plane normal to the
known eigenvector, so the problem reduces to that of finding the spectrum of an operator
acting on a plane. Although, we can employ the general method to do this job, for a positive
symmetric operator S+ an efficient algorithm called Mohr’s algorithm is available. We first
state the algorithm and then justify it.4
The algorithm comprises the following.
Choose any convenient unit vector b̂ in the plane and compute the two vectors
where multiplication by eiθ rotates a vector counterclockwise through angle θ in the plane
on which the operator S+ acts. Then for b+ × b− , 0, the vectors
u± = α (b+ ± b− )
S+ u± = λ± u±
where u+ and u− are the principal vectors corresponding to the principal values λ+ and
λ− respectively.
Since S+ is a given operator, the vector S+ b̂, resulting due to its action on any given
unit vector b̂ in the plane is known. We write u = û+ and decompose b̂ into components
bk and b⊥ parallel and orthogonal to u respectively. We have,
S+ b̂ = S+ (bk + b⊥ ) = λ+ bk + λ− b⊥ ,
x̂ = 2(u · b̂)u − b̂
Thus, Eq. (5.34) involves three unknowns λ+ , λ− and φ (through x̂) so we need another
equation to solve for these unknowns. We have (Exercise)
1 1
−iS+ (i b̂) = (λ+ + λ− )b̂ − (λ+ − λ− )x̂. (5.36)
2 2
Combining Eqs (5.35) and (5.36) we get
Without losing generality, we assume λ+ ≥ λ− , so Eq. (5.37) show that the principal
values are determined by the magnitudes |b± | = λ+ ± λ− of the known vectors b+ and
b− , produced by the known action of S+ on the vectors b̂ and i b̂. Dotting the unit vector
equation b̂− = x̂ with u we have,
Equation (5.39) tells us that direction of u is half way between the directions b̂− , (or x̂) and
b̂ = b̂+ . Therefore,
If b̂+ × b̂− = 0 then b̂ is parallel or antiparallel to one of the principal vectors. Then x̂ =
±α b̂ and Eq. (5.42) yields only that vector. The other eigenvector is perpendicular to the
one found.
This completes the proof of Mohr’s algorithm. Figure 5.4 depicts the parameters in
Mohr’s algorithm.
This fixes b̂. Knowing the action of S+ on b̂, we can find the value of Z (φ) = b̂ · S+ b̂.
We can calculate the value of Z⊥ (φ) = Z (φ + π2 ) in the same way, by noting that the
corresponding b̂ vector is orthogonal to the x axis. Now, we draw a straight line making an
angle 2φ with the positive direction of x axis, intersecting it at a point O. Then, we mark
out two points S1 and S2 on this line, which are equidistant from O and are at distances
Z (φ) and Z⊥ (φ) from the origin respectively. While doing this, we may have to slide this
line parallel to itself along the x axis. Finally, we draw a circle (Mohr’s circle) with its center
at O and radius OS1,2 . This circle cuts the x axis at two points at distance λ− (closer)
and λ+ (farther) from the origin, giving us the required eigenvalues. All this is depicted
in Fig. 5.5. One eigenvector is along the bisector of the angle 2φ made with the positive
direction of x axis and the second eigenvector is orthogonal to it.
Exercise Where exactly would Mohr’s algorithm fail if the operator S+ was not positive?
5.3.1 Examples
(a) Using Mohr’s algorithm, we solve the eigenvalue problem for the operator
S+ (b) = (a × b) × a + (c × b) × c, (5.46)
To understand Eq. (5.48), make use of Fig. 5.6 and construct the vector
on this figure. Convince yourself that the result holds arbitrary a and c. Otherwise,
using the standard identity for the vector triple product and noting that â · i â = 0 we
have, from Eq. (5.49),
h i
−iS+ (i â) = −i a2 (i â) + c2 (i â) − c2 (i â · ĉ) · ĉ . (5.50)
The vector formed by the last two terms in Eq. (5.50) is orthogonal to c in the
direction −i ĉ with magnitude c2 sin θ where θ is the angle between i â and c.
Writing sin θ = sin(φ − π/2) = − cos φ = −(ĉ · â) (see Fig. 2.7) we see that the last
two terms in the bracket in Eq. (5.50) correspond to the vector −c2 (ĉ · â)(−i ĉ). We
now multiply by −i in Eq. (5.50) to get
S+ (b ) = a (a · b ) + c (c · b ).
S+ â = a2 â + (c · â)
and
This gives
or,
Therefore,
1
λ± = (|a | + |a− |)
2 +
1/2
1
2 2 2
2
= 1 + a + 4 (c · â) |c × â|
2
1h 2
a + (c · â)2 − |c × â|2 .
i
± (5.57)
2
154 An Introduction to Vectors, Vector Operators and Vector Analysis
c2
!
tan 2φ = sin 2θ
1 + a2
S+ (e) = (a × e) × a + (b × e) × b + (c × e) × c, (5.58)
where
a + b + c = 0.
Note that the condition a + b + c = 0 makes the vectors a, b, c coplanar and the given
operator then acts on the plane containing a, b, c.
Exercise Show that S+ in Eq. (5.58) is both, symmetric and positive.
Hint Use identity I.
Thus, Mohr’s algorithm applies. We proceed on the same line as the previous
examples. We get,
This gives,
a− = S+ (â) − iS+ (i â) = (b2 + c2 − a2 )â − 2(â · b)b − 2(â · c)c. (5.62)
We have,
Exercise Find the eigenvalues and the eigenvectors of the positive symmetric
operator
S+ (b ) = a (c · b ) + c (a · b ).
1
q
λ± = [T r (A) ± (T r (A))2 − 4 det(A)]
2
where T r (A) and det(A) mean the trace and the determinant of A respectively. This
immediately gives, for the eigenvalues of the symmetric matrix [S ] above,
1 1
λ± = (S11 + S22 ) ± [(S11 − S22 )2 + 4S12
2 1/2
] .
2 2
156 An Introduction to Vectors, Vector Operators and Vector Analysis
To get the eigenvectors, we assume the plane to be the complex plane and identify the
orthonormal basis (σ̂ 1 , σ̂ 2 ) with (σ̂ 1 , i σ̂ 1 ). In this basis, σ̂ 1 has coordinates (1, 0) and
σ̂ 2 = i σ̂ 1 has coordinates (0, 1). Therefore,
" #" # " #
S11 S12 1 S11
S (σ̂ 1 ) = =
S12 S22 0 S12
and
" #" # " #
S11 S12 0 S12
S (i σ̂ 1 ) = S (σ̂ 2 ) = = .
S12 S22 1 S22
The vector −iS (i σ̂ 1 ) is obtained by rotating S (i σ̂ 1 ) clockwise through π/2. This will
interchange its coordinates so that
" #
S22
−iS (i σ̂ 1 ) =
S12
and we get,
" #
S11 + S22
b+ = S (σ̂ 1 ) − iS (i σ̂ 1 ) = ,
2S12
and
" #
S11 − S22
b− = S (σ̂ 1 ) + iS (i σ̂ 1 ) = ·
0
5.5 Spectrum of S n
We define S n = S ◦S ◦S · · ·◦S (n times), where S ◦S (x) = S (S (x)). Let S be a symmetric
operator on E3 with eigenvalues {λk }. We show that S n is symmetric with eigenvalues λnk
and S n has the same eigenvectors as S.
Eigenvalues and Eigenvectors 157
where we have used the property Pj Pk = δjk Pk of the projection operators. Thus, we have
shown that if S l is a symmetric operator (because of its spectral representation in terms
of projectors) with eigenvalues (λk )l then S l +1 is a symmetric operator with eigenvalues
(λk )l +1 . However, we know that Eq. (5.65) is true for l = 1 which is simply the spectral
representation of the symmetric operator S. Therefore, by induction, Eq. (5.65) must be
true for any value l = n.
To show that S n and S share the same set of eigenvectors, consider an eigenvector u of
S with eigenvalue λ. We have,
This can be done by using the spectral representation of S and the expansion of a, b, c in
the eigenbasis {uk } of S.
6
x∗ · f (y) = x∗ · y (6.2)
is satisfied only by the vectors y whose projection along x∗ equals that of f (y) along x∗
and not by all vectors in E3 . All vectors y and f (y) satisfying Eq. (6.2) for any given x,
correspond to points which lie on a plane normal to x, so that an orthogonal operator
restricted to act on a plane will leave invariant all vectors on a line normal to this plane
which is a 2-D subspace of E3 and we call it E2 . To see this in another way, consider an
orthogonal operator f on a plane. It has one fixed point on the plane namely the origin
on the plane. As a subspace of E3 , this plane can be translated parallel to itself so that the
origin traces a line normal to the plane, all points on which are invariant under the action
Rotations and Reflections 159
of this f . Thus, an orthogonal operator acting on E3 leaves only one vector, namely the
origin or the zero vector, invariant, while an orthogonal operator acting on a plane leaves
invariant all points on a line normal to this plane. If we club this observation with the fact
that an orthogonal operator preserves the length of a vector as well as the angle between
vectors we see that an orthogonal operator corresponds to rotation (either about a point or
an axis) or reflection in the origin or in a plane as we shall see below.
We now prove different properties of an orthogonal operator.
Equation (6.1) can be rewritten
f † = f −1 . (6.3)
Thus, an orthogonal operator is a non-singular operator for which the inverse equals its
adjoint. The same property holds for the matrix representing an orthogonal operator. To
see this, consider the jkth element of the matrix for the operator f † f ,
which means
[f † f ] = [f † ][f ] = I (6.5)
[f † ] = [f ]−1 . (6.6)
where we have used [f † ] = [f ]T (Eq. (4.19)). The matrix satisfying Eq. (6.7) consists of
columns which are mutually orthogonal and individually normalized. Such a matrix is
called orthogonal. Thus, an orthogonal operator is represented by an orthogonal matrix.
Exercise Show that the inverse of an orthogonal operator (matrix) is an orthogonal
operator (matrix).
160 An Introduction to Vectors, Vector Operators and Vector Analysis
where 0 ≤ θ1 , θ2 < 2π are the angles between x and y and f (x), f (y) respectively. Thus,
an orthogonal transformation preserves angle between vectors. For fk = f σ̂ k , Eq. (6.1)
implies
fj · fk = σ̂ j · σ̂ k = δjk .
or,
Condition Eq. (6.9) tells apart two kinds of orthogonal transformations. An orthogonal
transformation is said to be proper if det f = +1 and improper if det f = −1. The proper
orthogonal transformations preserve the handedness of a orthonormal basis triad, while
the improper orthogonal transformations change the handedness of an orthonormal basis
Rotations and Reflections 161
triad. The handedness of a basis triad is changed if all the basis vectors are reflected in the
origin. If we replace the basis in a given linear combination for a vector x by the basis
reflected in the origin, the resulting linear combination gives the vector −x obtained by
reflecting x in the origin. If we reflect one of the basis vectors in the plane normal to it, the
handedness of the basis is changed and a general vector x gets reflected in that plane.
Thus, we see that the improper orthogonal transformation corresponds to reflection either
in the origin or in a plane. In fact, inversion of a vector x in the origin is the product of
reflections of x in the orthogonal planes as we shall see below. Since a transformation
leaving only the origin invariant has to be either a reflection or a rotation, a proper
orthogonal transformation must correspond to a rotation. The fact that it preserves the
handedness of the orthonormal basis is consistent with this conclusion.
p0 = U (p). (6.10)
Note that the parenthesis in the cross product term is necessary, because cross product is
not associative. Comparing with Fig. 6.1, we see that the first term is the projection of x in
the plane normal to n̂ (say x⊥ ) and the second term is the projection of x along n̂ (say xk ).
Thus we have,
U (x) = x⊥ − xk ,
which is simply the vector we get by reflecting x in the plane normal to n̂.
U (x) · U (y) = x · y.
162 An Introduction to Vectors, Vector Operators and Vector Analysis
This can simply be done by evaluating the LHS using the definition of U (x) in Eq. (6.11).
(Hint: use identity II.)
To show that det U = −1, we take a right handed orthonormal triad {σ̂ 1 , σ̂ 2 , σ̂ 3 } with
σ̂ 1 = n̂. Then, it is trivial to see that
U σ̂ 1 = −σ̂ 1 , U σ̂ 2 = σ̂ 2 , U σ̂ 3 = σ̂ 3 .
Therefore,
= −σ̂ 1 · (σ̂ 2 × σ̂ 3 )
= −1 × (+1) = −1 (6.12)
From Fig. 6.2 we see that Eq. (6.13) means θ = θ 0 or the angle of reflection equals the angle
of incidence. Crossing Eq. (6.10) with n̂ we get
n̂ × p0 = n̂ × p,
which simply means that p, n̂ and p0 lie in the same plane, determined by n̂ and p. Thus,
Eq. (6.10) is the full description of reflection, or the complete statement of the law of
reflection.
Rotations and Reflections 163
To find the inverse of U we note that y = U (x) implies x = U (y), that is, x and
y are mutual images under reflection. (This establishes the operator equation U 2 = I.)
Therefore, we have
U −1 = U . (6.14)
U † = U −1 = U (6.15)
Exercise Prove that the product of three elementary reflections in orthogonal planes is
an inversion, the linear transformation that reverses the direction of every vector.
Solution We denote by Un̂ (x) the operator for reflection of x in the plane normal to n̂.
Let {σ̂ 1 , σ̂ 2 , σ̂ 3 } be an orthonormal triad of vectors. Note that
Now let
x = σ̂ 1 x1 + σ̂ 2 x2 + σ̂ 3 x3
U1 U2 (x) , U2 U1 (x).
Exercise Show that the reflections defined via Eq. (6.16) commute.
Now consider two reflections which commute, that is, the corresponding reflection
operators satisfy
U1 U2 (x) = U2 U1 (x)
where we have used that U1 and U2 are symmetric and that they commute. Thus, if two
reflections commute, their product is a symmetric operator. Physically, this means that the
effect of two successive reflections can be obtained via a single reflection.
R (θ ) = Uv̂ Uû+v̂
which proves the theorem. Here, R (θ ) is the orthogonal operator for rotation about n̂
through angle θ.
as in Eq. (6.16). We know that Uej reverse the direction of ej and that their products are
orthogonal and symmetric. Now consider the spectral representation of S,
X
S= λk Pk
k
S = IS+ .
(ii) One eigenvalue (say jth) < 0. (λj < 0). We write
S = Uej S+ .
166 An Introduction to Vectors, Vector Operators and Vector Analysis
(iii) Two eigenvalues (say ith and jth < 0. (λi < 0, λj < 0). We write
S = Uei Uej S+ .
There are no other cases and in each of the above case we have shown that the symmetric
operator S can be written as the product of a symmetric orthogonal operator and a positive
symmetric operator.
Next, we obtain a unique rotation R for an arbitrary improper orthogonal operator I
satisfying
I = RU ,
where U is a simple reflection in the plane normal to any direction û as expressed by its
canonical form Eq. (6.11).
We use the fact that U 2 = I to write
I = (I U )U
= (−1)(−1) = 1.
Next, we prove the Polar Decomposition Theorem which states that every non-singular
operator f has a unique decomposition in the form
f = RS = I R, (6.18)
S = (f † f )1/2
I = (f f † )1/2 . (6.19)
S 0 = f †f
is symmetric. Further,
x · (f † f x) = (f x)2 > 0 if x , 0
which makes S 0 positive. Therefore, the square root of S 0 = f † f is well defined and unique
S = (f † f )1/2 .
R = f −1 = f (f † f )−1/2 . (6.20)
We have,
det f † = det f ,
or,
det(f † f ) = (det f )2 ,
or,
or,
which shows that R is a rotation. The other part of Eq. (6.18) namely,
f =I R
is proved similarly.
The eigenvalues and eigenvectors of S decide the basic structural properties of f (see
below for a geometric interpretation), because the other factor is just a rotation. They are
sometimes called principal vectors and principal values of f to distinguish them from
eigenvectors and eigenvalues of f which may, in general, be complex and are not related in
a simple way with the principal values which are always real. Of course, there is no
distinction if f itself is symmetric. Equation (6.18) clearly tells us that complex
eigenvalues correspond to rotations as we have seen before (see section 5.2).
The polar decomposition, Eq. (6.18), provides a simple geometrical interpretation for
any linear operator f . Consider the action of f on points x of a 3-D body or a geometrical
168 An Introduction to Vectors, Vector Operators and Vector Analysis
figure. According to Eq. (6.18), the body is first stretched and/or reflected along the
principal directions of f . Then, the deformed body is rotated about the axis through the
angle both specified by R .
6.2.1 Examples
(a) We find the polar decomposition of the skewsymmetric transformation
f † f = (a × b) × (x × (a × b))
= x · f †f y (6.23)
which means that f † f is a symmetric operator. Further, x·f † f x > 0 for x , 0 making
f † f a positive operator. It is easily verified that the square root operator is given by
f x = x + 2α σ̂ 1 (σ̂ 2 · x) (6.25)
is called a shear. Figure 6.4 shows the effect of f on a unit square in the σ̂ 1 σ̂ 2 plane.
We find the eigenvectors, eigenvalues, principal vectors and principal values of f in
this plane. We also find the angle of rotation in the polar decomposition of f .
It is easily seen that the only eigenvector of f in Eq. (6.25) is σ̂ 1 satisfying
f (σ̂ 1 ) = σ̂ 1 (6.26)
1 Note that f (x) is perpendicular to both, a × b and S (x).
Rotations and Reflections 169
To get the principal vectors and principal values of f we must find the operator f † f .
Note that
as can be seen from y · f x = x · f † (y) with f and f † as in Eqs (6.25) and (6.27)
respectively. We operate by f † f on the basis vectors (σ̂ 1 , σ̂ 2 ) to get
f † f σ̂ 1 = σ̂ 1 + 2α σ̂ 2
f † f σ̂ 2 = 2α σ̂ 1 + 1 + 4α 2 σ̂ 2 . (6.28)
S = (f † f )1/2 .
Exercise Employ Mohr’s algorithm to find the eigenvectors of f † f , using Eq. (6.28)
with σ̂ 2 = i σ̂ 1 .
170 An Introduction to Vectors, Vector Operators and Vector Analysis
Answer
u± = σ̂ 1 ± λ± σ̂ 2 .
−2αλ2+
tan θ = .
1 + 2αλ+ + λ2+
6.3 Rotations
We need a canonical form of an operator which gives the vector x0 obtained as a result of
rotating a vector x about the direction implied by a unit vector n̂. Proceeding on the lines
similar to reflection (subsection 6.1.1), we arrive at the following canonical form for the
rotation operator
This operator can be understood by analyzing Fig. 6.5. First, we resolve x in its
components xk and x⊥ lying in the plane normal to n̂ and along n̂ respectively. The first
term in the expression for R (x) is x⊥ which remains invariant under rotation while the
second term corresponds to the vector obtained by rotating xk counterclockwise through
angle θ. (x0k in Fig. 6.5). Here, we treat the plane normal to n̂ to be the complex plane and
multiplication by eiθ rotates a vector counterclockwise by an angle θ. Since we have
Rotations and Reflections 171
introduced a complex coefficient in the expression for the operator, the rule for the
invariance of the scalar product has to be replaced by
f ∗ x · f y = x · y. (6.32)
where f ∗ is obtained from f by complex conjugation. The operator f satisfying Eq. (6.32)
is called unitary.
Exercise Show that the operator R (x) in Eq. (6.31) satisfies Eq. (6.32). (Hint: Use
identity II).
Thus, the rotation operator in Eq. (6.31) preserves scalar products as it should. That the
determinant of R (x) in Eq. (6.31) is +1 can be proved along the same lines as we did for
the reflection operator. This establishes the operator R (x) in Eq. (6.31) as the rotation
operator. However, if we wish to carry on with the operator in Eq. (6.31) to get the
structure and properties of rotation, we need a general algebraic setting incorporating the
multiplication of a vector by a complex number as an integral part of it. Such an algebra is
the geometric algebra which can be used to model rotations in a general and elegant
manner [10, 7, 11]. Nevertheless we can develop the theory of rotations using only the
algebra of vectors we have learnt. We proceed to do that.
We first study infinitesimal rotations and then build up finite rotations as succession of
infinitesimal rotations. Consider an infinitesimal rotation of a vector x about the direction
implied by a unit vector n̂, through an angle δθ (see Fig. 6.6). The tip of vector x then moves
over an infinitesimal arc length ds of a circle of radius |x| sin φ giving ds = |x| sin φδθ
(Fig. 6.6). Since the circle is a smooth curve, we can choose the arc length ds generated
by the rotation to be so small that the change dx in vector x due to rotation (see Fig. 6.7)
172 An Introduction to Vectors, Vector Operators and Vector Analysis
can replace the arc length ds with a totally negligible (see discussion after Eq. (6.33)) error.
Further, when the sense of rotation is positive or counterclockwise, a right handed screw
advances in the direction of n̂ and the sense in which the rotating vector x traces the arc ds
corresponds to the direction of the vector n̂ × x. Thus, we can take
for every possible infinitesimal rotation. In fact this equation quantitatively defines an
infinitesimal rotation and the resulting infinitesimal arc length ds. The quantity dx is
called the differential of x(θ ) which is a vector valued function of θ. In the limit as
δθ 7→ 0, dx/δθ = n̂ × x becomes a vector tangent to the circle of rotation. Thus,
corresponding to an infinitesimal rotation the differential dx has magnitude |dx| = ds
and direction perpendicular to the plane defined by x and n̂ and tangent to the circle of
rotation as shown in Figs 6.6, 6.7. This differential has to be added to x to get the rotated
vector x0 (see sections 9.1 and 9.2). Therefore,
Fig. 6.7 Vectors dx and arc length ds as radius |x| sin θ is rotated through angle
δθ . As δθ 7→ 0 dx becomes tangent to the circle.
As we shall see later, (see section 9.6), the first equality in Eq. (6.33) becomes exact for
any angle of rotation θ if we replace its RHS by the Taylor series of the function x(θ )
whose successive terms involve successive powers of θ. Thus, the RHS of the first equality
in Eq. (6.33) is obtained by truncating this Taylor series after the term linear in θ which
is justified if the angle of rotation is small, so that the higher powers θ 2 , θ 3 · · · are orders
of magnitude smaller than θ and hence can be neglected. In such a case, we replace θ by
δθ to emphasize the smallness of the angle of rotation. Thus, the first equality in Eq. (6.33)
essentially corresponds to an infinitesimal rotation.
Rotations and Reflections 173
Let the vector x be rotated by an infinitesimal angle δθ1 about the direction given by a
unit vector n̂1 to get a vector x0 . Next rotate x0 through angle δθ2 about the direction given
by unit vector n̂2 to get the vector x00 . Using Eq. (6.33) and keeping the terms linear in δθ1
and δθ2 we get (do this algebra),
Now we reverse the order of rotations: Rotate x about n̂2 by δθ2 to get x0 and rotate x0
about n̂1 by δθ1 to get x00 . Going through the same algebra as above, keeping terms linear
in δθ1 and δθ2 , one can check that x00 is again given by Eq. (6.34) which proves that
infinitesimal rotations commute. The fact that finite rotations do not commute will
become clear below.
Now let a vector x be rotated about a unit vector n̂ through a finite angle θ to get a
vector x0 . The process is depicted in Fig. 6.5. As is shown in Fig. 6.5, we resolve x into two
components, xk in the plane of rotation and x⊥ normal to this plane, i.e., in the direction
of n̂. Rotation affects only the component xk while x⊥ remains invariant.
We imagine that the rotation of xk through θ is effected by N successive rotations
about n̂, each of magnitude θ/N . We assume that N is so large (or θ/N is so small) that
Eq. (6.33) applies to each of these rotations. Denote by x1 , x2 , . . . , xN = x0k the successively
rotated vectors. We have,
θ
x1 = (n̂ × xk ) + xk .
N
θ
x2 = (n̂ × x1 ) + x1
N
θ θ θ
= n̂ × n̂ × xk + xk + n̂ × xk + xk
N N N
" 2 #
θ 2θ
= n̂ × (n̂ × + (n̂ × + 1 xk ).
N N
N θ 2
" ! !
0 N θ
xk = (1 + (n̂ × + n̂ × (n̂ × + · · ·
1 N 2 N
! N #
N θ
+ n̂ × (n̂ × (· · · xk )
N N
N
θ
= 1 + n̂× xk . (6.35)
N
174 An Introduction to Vectors, Vector Operators and Vector Analysis
Note that Eqs (6.35) and (6.36) define operators (1 + Nθ n̂×)N and eθ n̂× respectively, on
E3 . The action of eθ n̂× on any vector x can be obtained by expanding it in powers of θ. We
have,
θ2 θ3
!
θ n̂×
e ≡ 1 + θ n̂ × + n̂ × (n̂ × + n̂ × (n̂ × (n̂ × + · · · . (6.37)
2! 3!
Fig. 6.8 Orthonormal triad to study the action of the rotation operator
n̂ × σ̂ 1 = σ̂ 2 ; n̂ × σ̂ 2 = −σ̂ 1 (6.38)
To see the effect of eθ n̂× on σ̂ 1 we evaluate RHS of Eq. (6.37) acting on σ̂ 1 and use
Eq. (6.38) to get
θ2 θ3 θ4
eθ n̂× σ̂ 1 = σ̂ 1 + θ σ̂ 2 − σ̂ 1 − σ̂ 2 + σ̂ + · · ·
2! 3! 4! 1
Collecting the coefficients of σ̂ 1 and σ̂ 2 we get
Similarly,
To get the result of eθ n̂× xk we resolve xk with respect to the basis (σ̂ 1 , σ̂ 2 , σ̂ 3 ):
Operating on the RHS of Eq. (6.41) by eθ n̂× and using Eq. (6.39) and Eq. (6.40) we get
or,
xk = −n̂ × (n̂ × xk ).
Since x⊥ and n̂ are parallel, we can add (n̂ × x⊥ ) = 0 = n̂ × (n̂ × x⊥ ) on the RHS and
x⊥ (= x0⊥ ) on both sides of the above equation, finally giving the desired result,
Equation (6.43) is equivalent to the operator identity, defining the rotation operator R
where [σ̂ j ]T and [êk ]T are the row (1 × 3) matrices with elements as the basis vectors {σ̂ j }
and {êk } respectively. Equation (6.46) gives, ejk = σ̂ j · êk , or,
Let θk = n̂ · σ̂ k denote the direction cosines of n̂ with respect to the basis {σ̂ k }. We can
then write
Note that this matrix relates the rotated vector x0 obtained by rotating the basis vectors
{σ̂ 1,2 } in the plane normal to n̂ = σ̂ 3 that is, by operating the corresponding rotation
Rotations and Reflections 177
P
operator on x = k xk σ̂ k , while its transpose relates the coordinates of the same vector
with respect to {σ̂ k } and {êk } respectively.
Exercise Show that the components of x given by column vectors [xj0 ] and [xj ] with
respect to the orthonormal bases {êk } and {σ̂ k } respectively are related by
or,
X
xj = xk0 ejk
k
or,
[xk0 ] = [ejk ]T [xj ] (6.50)
R (n̂, θ )x · R (n̂, θ )y = x · y
for all x, y ∈ E3 . This also proves that the matrix [ejk ] representing R (n̂, θ ) is orthogonal,
because we have already proved that a matrix representing an orthogonal operator is
orthogonal. If we denote by S the orthogonal matrix of the rotation operator defined by
Eq. (6.48), then the orthogonality condition means,
S T S = I = SS T or S T = S −1 , (6.51)
by choosing n̂ = σ̂ 1 as we did for the reflection operator. This means that the matrix [ekj ] in
Eq. (6.48) representing the rotation operator has determinant +1, because, we have proved
in section 4.4 that the determinant of the matrix representing a linear operator is identical
with the determinant of the operator. There is a one to one correspondence between the
set of rotations and the set of 3 × 3 orthogonal matrices with determinant +1. To see this,
note that the equality of matrices [R1 ] = [R2 ] representing rotations R1 and R2 implies
equality of rotations R1 = R2 because the equality of matrices would mean, via Eq. (6.46),
that the action of R1 and R2 on an orthonormal basis is identical and by linearity of the
operators this implies R1 x = R2 x for all x ∈ E3 . This establishes the required one to
one correspondence. In section 4.5 we have already seen that the matrix representing a
product of operators is the product of the matrices representing the individual operators.
This means, coupled with their one to one correspondence, that the set of 3 × 3 orthogonal
matrices with determinant +1 is isomorphic with the set of rotations.
Note that the operator in Eqs (6.43), (6.44), which gives the counterclockwise rotation
of the vector x by an angle θ also gives the clockwise rotation of x through the angle 2π −θ
(see Fig. 6.9), because the operator remains the same if we replace in its expression θ by
−(2π − θ ). This is in conformity with whatever we have said while dealing with rotation
as the means of changing direction (section 1.2). Thus, these two rotations give rise to the
same matrix representative apparently destroying the one to one correspondence between
the rotations and the set of 3 × 3 orthogonal matrices with determinant +1. However,
without losing generality we can stick only to the counterclockwise rotations alone, which
establishes the required one to one correspondence.
Fig. 6.9 Equivalent rotations: One counterclockwise and the other clockwise
Rotations and Reflections 179
Exercise The sum of the diagonal matrix elements fkk of a linear transformation f is
called the trace of f and denoted T r f . Show that the trace of rotation R (n̂, θ ) is given by
X
Tr R = σ̂ k · (R σ̂ k ) = 1 + 2 cos θ (6.52)
k
Hint This result follows trivially by explicitly summing the diagonal matrix elements of
R (n̂, θ ) remembering that j θj2 = 1.
P
Note that the trace is independent of the basis used to set up the matrix of R . In fact this
result is quite general.
Exercise Show that the trace of a linear operator f is independent of the basis used to
compute it.
We define the composition of two rotations (in the same way as the composition of two
operators) as their successive application to a vector and denote it by a ◦ separating two
rotations. We have already seen that the set of rotations on E3 and that of 3 × 3 orthogonal
matrices with determinant +1 are in one to one correspondence. Taking, the composition
of rotations and the matrix multiplication as the respective binary operations on these
sets, we see that this one to one correspondence is actually an isomorphism. This is
because x00 = R2 x0 = R2 ◦ R1 x corresponds to the following equation involving the
matrix representatives and the column matrices for the vectors [x00 ] = [R2 ][x0 ] =
[R2 ][R1 ][x]. It is easy to see that the product of two orthogonal matrices with
determinant +1 is an orthogonal matrix with determinant +1 (Exercise). This product
matrix must correspond to a single rotation about some axis through some angle, because
of the one to one correspondence between these two sets. Thus, the matrix representing
the result of the composition of rotations is the product of the matrices representing the
individual rotations. This establishes the required isomorphism. As a byproduct we have
found that the set of rotations is closed under their composition. Also, it is easy to see that
if R (n̂1 , θ1 )x = x0 and R (n̂2 , θ2 )x0 = x00 then the single rotation corresponding to
their composition R (n̂, θ ) is the one about the unit vector n̂ normal to the plane
containing the vectors x and x00 and through the angle given by
x · x00
cos θ = ·
|x||x00 |
The fact that two finite rotations say R (n̂1 , θ1 ) and R (n̂2 , θ2 ) do not commute in general,
that is,
is amply clear from Fig. 6.10. To see this analytically, we make use of the isomorphism
between the set of rotations (with 0 ≤ θ < 2π) and the set of 3 × 3 orthogonal matrices
with determinant +1 representing them. Since the multiplication of matrices is not
180 An Introduction to Vectors, Vector Operators and Vector Analysis
commutative, the matrices representing the LHS and the RHS of Eq. (6.53) are, in general,
different, corresponding to different rotations.
Exercise Show that two rotations about the same axis commute.
Hint Just visualize it! Note that the matrices for both the rotations have the form given by
the matrix representing a rotation about σ̂ z , if we take σ̂ z along the axis of rotation. Show
explicitly that these matrices commute.
a coordinate system with the coordinates of the same vector with respect to a new
coordinate system obtained by rotating the initial one about the same unit vector n̂ by the
same angle. This transformation, which does not involve an actual rotation of the vector
(so that there is no change in the state of the physical system), is called a passive
transformation. Whenever the successive application of the active and the passive
transformations amounts to the application of the identity transformation, we say that the
corresponding rotation (about the given axis through the given angle) is a symmetry or a
symmetry element for the physical system.
For example, the figure at the tip of vector F in Fig. 6.11(a), is actively rotated (about
the axis perpendicular to the xy plane and passing through the origin) with no change of
shape into a new position with position vector F0 . The components of the rotated vector
are related to those of the initial vector by (see Eq. (6.49),
0 " #
Fx cos θ − sin θ Fx
= ·
cos θ Fy
F 0
sin θ
y
In Fig. 6.11(b) the figure (and the vector F) is not rotated however, the coordinate axes
are, by the same angle and in the same sense. This is the passive transformation and the
coordinates of the vector along the new axes are (see Eq. (6.50)),
" #
Fx0 cos θ sin θ Fx
= ·
Fy − sin θ cos θ Fy
0
Note that the transformation matrices are orthogonal and are transpose and hence inverses
of each other. We have already proved this fact generally in Eq. (6.50). Therefore, if both
transformations are successively performed, as in Fig. 6.11(c), we get
Fx0 0 = Fx
Fy0 0 = Fy (6.54)
Thus, the numerical values of the new components are the same as those of the old
components. Therefore, a mere knowledge of these values does not indicate whether the
transformation was performed. This indistinguishably is due to a physical property of
plane surfaces: It is possible to rigidly rotate any plane figure. On the other hand, an
irregular surface does not allow any rigid motion. It still allows the passive coordinate
transformations which amount to mere relabeling of its points. However, there are no
corresponding active transformations, which leave the displaced body unaltered. For
example, suppose you are in a ship on the open sea and mark your position with respect to
some reference ship at a distance. If your ship and the reference ship are both rotated
about the same axis by the same angle, your position relative to the reference ship is
unaltered. This invariance is sometimes expressed by saying that the hallmark of a
symmetry is the impossibility of acquiring some physical knowledge.
182 An Introduction to Vectors, Vector Operators and Vector Analysis
In the above analysis, we have taken the basis vector σ̂ 3 (or the z axis) along the axis of
rotation. This is not necessary. Even if we take an arbitrary orthonormal basis and the
corresponding coordinate system, the matrices for the active and passive transformations
are transposes and inverses of each other, so that applying them in succession is the same as
applying the identity transformation. Thus, whether a given rotation is a symmetry element
does not depend on the orthonormal basis chosen to implement the active and passive
transformations.
Exercise If a right hand glove is turned inside out, it becomes a left hand glove. This is
an example of an active transformation (assume that inside and outside textures and colors
are identical). What is the corresponding passive transformation?
Consistent with our Newtonian view of space as a continuum of points making up an inert
vacuum is the assumption that the whole space is like an ideal rigid body, that is, the
Rotations and Reflections 183
distance between every pair of points in space remains constant despite all events taking
place in it. Thus, when a single vector is rotated about any axis by any angle, the whole
space is rotated along with the vector. The subsequent passive transformation relabels all
the points in space to reproduce the initial situation. Therefore, for a single vector in space
(which could be the position vector of a single particle in space), every possible rotation or
reflection (which is equivalent to two rotations by Hamilton’s theorem), is a symmetry
transformation. If we consider a system of non-interacting particles we can apply the
symmetry transformations to each particle separately, independent of other particles, so
that the same conclusion applies to such a system. Thus, the system of non-interacting
particles such as an ideal gas possesses highest symmetry. In contrast, the symmetry
elements of a figure like an equilateral triangle or a square, or a cube or a tetrahedron act
only on the points making up the figure and not on the rest of space. However, after the
succession of the active and the passive transformations, the whole space, including the
figure, must reproduce the initial situation. This is possible, only when the active
transformation reproduces the initial configuration of the figure. Only a finite set of
rotations and reflections meets this requirement. Thus, the symmetry elements of a solid
which leave its unit cell invariant, form a finite set. Thus, when a gas or a liquid condenses
to make a solid, the symmetry of the system is drastically reduced. This phenomenon is
called ‘symmetry breaking’. Generally, such a transition from liquid to solid phase, called a
phase transition, occurs at a particular temperature at which the symmetry breaks
spontaneously. Spontaneous symmetry breaking is responsible for the fact that the
quantities like volume, magnetization, mole numbers of chemical species etc are
macroscopically observable, that is, these variables are time independent on the atomic
scale of time and spatially homogeneous on the atomic scale of distance. On the other
hand, symmetries themselves are of far reaching significance as they give rise to all the
conserved quantities like energy, angular momentum, linear momentum etc, which make
the understanding of the dynamics of the system possible. For example, the dynamics of a
particle driven by a central force is completely known because of the conservation of
energy and angular momentum of such a particle. Further, Kepler’s laws of planetary
motion can be easily obtained using an additional conserved quantity, namely the
Runge–Lenz vector. The underlying symmetries and symmetry breaking is crucial for the
understanding of our physical world.
Exercise Find all the symmetry elements of a equilateral triangle and a square (see
Figs 7.1 and 7.2).
We now show that if the rotations R1 (n̂1 , θ1 ) and R2 (n̂2 , θ2 ) are symmetry elements for
a system, then so is their composition. Let R1 rotate a vector F to F0 and R2 rotate a vector
F0 to F00 . The composite rotation R12 (n̂, θ ) = R2 (n̂2 , θ2 ) ◦ R1 (n̂1 , θ1 ) must rotate F
to F00 . The matrix for the corresponding active transformation is the product [R2 ][R1 ]
and that for the passive transformation is the inverse of this product. Thus, applying the
composite active and passive transformation gives us the identity transformation. This
proves the result.
184 An Introduction to Vectors, Vector Operators and Vector Analysis
Thus, to every triple of Euler angles φ, θ, ψ the above construction associates a rotation
of 3-D space taking frame {σ̂ k } k = 1, 2, 3 into the frame {êk } k = 1, 2, 3. The ranges of
the Euler angles are
Thus, by continuously varying the Euler angles, we can generate all possible rotations.
Thus, the set of all possible rotations about a point is parameterized by three Euler angles
varying in their specified ranges.
The net rotation is given by the composition of Euler rotations in the order stated above.
We have,
Let us now set up the matrix representing R (n̂, χ ) in terms of its Euler angles. To do this,
we have to expand the vectors {êk } k = 1, 2, 3 in terms of the basis {σ̂ k } k = 1, 2, 3. To get
ê3 we have to first evaluate eφσ̂ 3 × σ̂ 1 and then evaluate eφn̂× σ̂ 3 where n̂ = eφσ̂ 3 × σ̂ 1 using
Eq. (6.43) or Eq. (6.45). Carrying out this calculation we get,
Evaluating ê1 and ê2 in the same way, we get, for the matrix representing R (n̂, χ ) in terms
of its Euler angles,
cos ψ cos φ − sin ψ sin φ cos θ − sin ψ cos φ − cos ψ sin φ cos θ sin θ sin φ
cos ψ sin φ + sin ψ cos φ cos θ − sin ψ sin φ + cos ψ cos φ cos θ − sin θ cos φ ·
sin θ sin ψ sin θ cos ψ cos θ
If we multiply the row vector [σ̂ 1 σ̂ 2 σ̂ 3 ] on the right by this matrix, then we get the row
vector [ê1 ê2 ê3 ]. The Euler rotations corresponding to arbitrary rotation, defined above,
are marred by the fact that their axes of rotation are not fixed directions in space. We can
define Euler rotations using a construction by which every rotation R (n̂, χ ) is reduced to
a composition of rotations about fixed axes of a standard basis. In this construction, an
arbitrary rotation is decomposed into Euler rotations as
Thus, the first rotation is about σ̂ 3 by an angle ψ the second rotation is about σ̂ 1 by an angle
θ and the third one is about σ̂ 3 by an angle φ. Note that êk = R σ̂ k = Rφ Rθ Rψ σ̂ k so
that it is quite easy to calculate the matrix elements of a rotation in terms of Euler angles.
186 An Introduction to Vectors, Vector Operators and Vector Analysis
Consider, for example, the rotation of σ̂ 3 . Rψ is a rotation about σ̂ 3 and hence will leave
σ̂ 3 invariant. Next, we have, using Eq. (6.43) or Eq. (6.45),
eθ σ̂ 1 × σ̂ 3 = σ̂ 3 cos θ − σ̂ 2 sin θ.
Therefore,
From this, the matrix elements ej3 = σ̂ j · ê3 can be read off directly. We get exactly the
same matrix representing R as before. Figure 6.13 shows the Euler rotations of a standard
basis one after the other, in the given order.
Note that the order of Euler rotations in Eq. (6.55) is opposite to that in Eq. (6.56).
However, both the expressions describe the same rotation R (n̂, χ ). We see that the same
set of Euler angles can be used to give two different parameterizations of the same rotation
with two different sequences of Euler rotations. The first parameterization is preferred by
astronomers because σ̂ 3 and ê3 can be associated with easily measured directions. On the
other hand Eq. (6.56) has the advantage of fixed rotation axes for Euler rotations even when
the Euler angles change with time (R (n̂, χ ) depends on time).
To show the equivalence of Eq. (6.55) with Eq. (6.56) we note that eθ êN × =
eφσ̂ 3 × eθ σ̂ 1 × e−φσ̂ 3 × and eψ ê3 × = eφσ̂ 3 × eθ σ̂ 1 × eψ σ̂ 3 × e−θ σ̂ 1 × e−φσ̂ 3 × Substituting in Eq. (6.55)
and noting that the successive rotations by equal and opposite angles about the same axis
result in identity transformation, we get Eq. (6.56).
Exercise In addition to Euler rotations engineers use three independent rotations called
roll, pitch and yaw, as shown in Fig. 6.14, to implement arbitrary rotation of the body via
where ψ, θ, φ are the angles of rotation corresponding to roll, pitch and yaw respectively.
Show that the transformed basis is given by
R vi = λi vi (i = 1, 2, 3). (6.59)
Since R is an orthogonal matrix, we have (see section 6.1) kRvi k = kvi k where for any
vector v, kvk denotes the Euclidean length (|vx |2 + |vy |2 + |vz |2 )1/2 . Therefore, by
Eq. (6.59),
det(λI − R) = 0. (6.61)
λ1 λ2 λ3 = det R = 1. (6.62)
At least one of the roots is real. To see this note that for large enough |λ|, the cubic term
dominates, so that the sign of the cubic polynomial in Eq. (6.61) is the same as that of λ.
This means that the graph of the cubic polynomial (which is a continuous function) has
to cut the λ axis at least once. If the other two eigenvalues (say λ2 , λ3 ) are complex, then
λ3 = λ∗2 (superfix ∗ denotes complex conjugation) and by Eq. (6.60) λ2 λ3 = 1, hence by
Rotations and Reflections 189
Eq. (6.62) λ1 = 1. If all the three roots are real, they can be (1, 1, 1) or (1, −1, −1). In any
case there is always one root, say λ1 , equal to +1, hence
R v1 = v1 (6.63)
which shows that the straight line through the origin in the direction of v1 , is invariant
under the transformation x 7→ R x. Obviously, this is the axis of rotation.
Let λ1 = 1, λ2 = eiθ , λ3 = e−iθ and let v1 , v2 , v3 form an orthonormal set. The
eigenvectors of a orthogonal matrix can always be orthonormalized. Call
u1 = v1
1
u2 = √ (v2 + v3 )
2
i
u3 = √ (v2 − v3 ). (6.64)
2
ui s form a orthonormal set (check it!) and can be taken to be real, because v2 and v3
can be taken to be complex conjugates.2 From Eq. (6.64) and the values of λi i = 1, 2, 3
we get,
R u1 = u1
or,
1
cos θ = (R11 + R22 + R33 − 1) .
2
Let the axis of rotation be in the direction of the eigenvector v, that corresponds to the
eigenvalue λ = 1, so that R v = v. Since R is orthogonal, R T R = I, hence v = R T v.
2 Just take the complex conjugate of R v
2 = λ2 v2 and compare with R v3 = λ3 v3 noting that λ3 = λ∗2 .
190 An Introduction to Vectors, Vector Operators and Vector Analysis
Transformation Groups
(a ◦ b ) ◦ c = a ◦ (b ◦ c ) (7.1)
192 An Introduction to Vectors, Vector Operators and Vector Analysis
a ◦ x = b and y ◦ a = b (7.2)
If the elements are numbers, vectors, matrices etc, the composition a◦b may either be
the sum or the product of a and b. In the case of mappings, transformations, rotations,
permutations. etc, the law is understood to be the usual law of composition; if a, b are
transformations, then a ◦ b is the transformation which results from performing b
first, then a.
Exercise Show that the set of all rotations in a plane {Rφ : 0 ≤ φ < 2π} forms a
group.
Hint All the rotations are about the same axis, perpendicular to the plane, so that
Rφ1 ◦ Rφ2 = Rφ1 +φ2 .
Exercise Prove the following laws which are the consequences of axioms (i), (ii),
(iii) above.
(iv) (Law of cancellation) If a, b, c ∈ G then
a ◦ b = a ◦ c implies b = c
b ◦ a = c ◦ a implies b = c (7.3)
Hint Use axiom (iii) and the fact that the elements x and y defined in (iii) are unique.
(v) (Identity) There is a unique element e ∈ G such that
a◦e = a = e◦a
for all a ∈ G.
Hint Use (iii) with b replaced by a to get
a ◦ e = a e0 ◦ a = a.
a−1 ◦ a = e = a ◦ a−1 .
(vii) (Extended associative law)
(a ◦ (b ◦ (c ◦ (· · · ))) · · · ) ◦ h = a ◦ b ◦ c ◦ · · · ◦ h
Note that the law of composition need not be commutative, that is, in general, a ◦ b , b ◦ a.
a, b ∈ G are said to commute if a ◦ b = b ◦ a. If all pairs of elements of G commute, then G
is said to be commutative or Abelian.
Let a ∈ G and m ≥ 0 be an integer. Then, am is defined as follows.
If all the elements an (n = 0, ±1, ±2, · · · ) are distinct, then the element a is said to be of
infinite order, otherwise, there is a smallest positive integer l, called the order of a, such
that al = e. Then, am = e provided l is a divisor of m and every power of a equals one
of the elements e, a, a2 , . . . , al−1 . The group comprising e, a, a2 , . . . , al−1 is called the cyclic
group of l elements.
If a subset G0 ⊆ G of a group G is a group with the same law of composition as G, it is
called a subgroup of G. For example, the rotations about a fixed axis form a subgroup of
the group of rotations on E3 . The distinct powers of an element a form a subgroup called a
subgroup generated by the element a. This could be the cyclic subgroup of finite or infinite
order. The order of a group is the number of elements in it which can be finite or infinite.
If G0 is a subgroup of G we write G0 < G. In any case, G < G and {e} < G. If G0 , G, G0 is a
proper subgroup, If G0 = {e}, G0 is a trivial subgroup.
Examples
(i) The vector space E3 is an additive Abelian group containing infinite elements. This is
obvious from the properties of vector addition listed in section 1.4.
(ii) Let G denote the following set of 2 × 2 real matrices,
! ! ! !
1 0 0 −1 −1 0 0 1
e= a= b= c= ·
0 1 1 0 0 −1 −1 0
It is straightforward to check that this set forms a group under matrix multiplication.
For example, a ◦ c = e and a−1 = c.
(iii) Let C4 denote the group of the rotational symmetries of a square, under the
composition of rotations, namely,
e = identity (rotation through 0)
a = counterclockwise rotation through π/2
b = counterclockwise rotation through π
a = counterclockwise rotation through 3π/2 (clockwise rotation through π/2)
Exercise Show that the groups in examples (ii) and (iii) are simply two different
realizations of ‘cyclic group of four elements’.
(iv) The sets Z2 of integers modulo 2 and ((1, −1), ·) are groups under the respective
binary operations and are isomorphic. We name them Z2 and C2 respectively. Both
are cyclic groups of two elements {e, a} with a2 = e. The three element group C3 is
194 An Introduction to Vectors, Vector Operators and Vector Analysis
given by {1, ω, ω2 } where ω = e2πi/3 . This is isomorphic with the group of three
rotations of angles 0, 2π/3, 4π/3 in the plane, which account for all the rotations
forming the symmetry elements of an equilateral triangle centered at the origin.
(v) We can consider the group of all symmetries of the equilateral triangle (see Fig. 7.1).
Thus, we allow reflections about the perpendicular bisectors as well. This is a six
element group and we denote it by S3 . Labeling the vertices {1, 2, 3} we can link every
element in S3 with some permutation of the vertices of the triangle. Let (12) denote
the permutation which interchanges vertices 1 and 2 while leaving the vertex 3 fixed.
This permutation is obtained by the reflection in the perpendicular bisector of the
edge joining 1 to 2. Similarly, the permutation (123), sending vertex 1 into 2, 2 into
3 and 3 into 1 is obtained by rotating the triangle through 120◦ . The permutation
(132) sending vertex 1 into 3, 3 into 2 and 2 into 1 is obtained by rotating the triangle
through 240◦ . Thus, we see that the group of symmetries of an equilateral triangle is
the same as the group of all permutations on three symbols.
(vii) We now deal with groups with infinite number of elements. Let SL(2, C) denote the
set of 2 × 2 matrices with complex entries, whose determinant equals 1. Thus, an
element of SL(2, C) is given by
!
a b
A= ,
c d
196 An Introduction to Vectors, Vector Operators and Vector Analysis
ad − bc = 1.
Hint Since the determinant of the product of matrices is the product of their
determinants, SL(2, C) is closed under matrix multiplication. Further, matrix
product is associative. Since det A = 1, A is invertible, and det A−1 = 1/ det A = 1,
implying A−1 exists and is in SL(2, C). The identity is given by
!
1 0
e= ·
0 1
(viii) SU (n) denotes the set of all n×n unitary matrices with determinant 1 and is a group
under matrix multiplication. SU (n) is closed under matrix multiplication because
given two unitary matrices U1 , U2 we see that their product is also unitary,
and the determinant of the product of matrices is the product of their determinants.
Further, matrix product is associative. Unit n × n matrix, which is the multiplicative
identity, is unitary.
For example, the group SU (2) consists of all 2 × 2 matrices of the form
!
a b
, where |a|2 + |b|2 = 1.
−b∗ a∗
ejk = σ̂ j · R σ̂ k ,
where ejk is the jkth element of the matrix. By subsection 6.3.1 we know that every such
matrix is a 3 × 3 orthogonal matrix with determinant +1 and that the sets of rotations and
their matrix representatives are isomorphic. The last exercise tells us that the set of all
orthogonal matrices with determinant +1 is a group under matrix multiplication which
we call SO (3). All this just means that the group of 3 × 3 matrices with determinant +1,
SO (3), is a faithful matrix representation of the rotation group O + (3). Thus, these two
groups have the same structure and properties and it is enough to study SO (3) to
understand rotations in E3 . In fact each 3 × 3 real matrix A defines a linear map
f : x 7→ Ax on E3 , so that, by the isomorphism between SO (3) and O + (3), the group
formed by the maps x 7→ Ax with A ∈ SO (3) is just O + (3). Since O + (3) is a three
parameter continuous group, so must be the isomorphic group SO (3).
Exercise Show that SO (3) is a three parameter group.
Solution The conditions of orthogonality on a 3 × 3 matrix A = [akj ] are, by Eq. (6.7),
3
X
a2kj = 1 (j = 1, 2, 3)
k =1
198 An Introduction to Vectors, Vector Operators and Vector Analysis
3
X
aki akj = 0 (i, j ) = (1, 2), (1, 3), (2, 3) (7.5)
k =1
τa (x) = x + a (7.6)
We show that the set of all translations in E3 forms an Abelian group. We have,
This proves what we wanted. All these properties follow from those of vector addition in
E3 . In fact the translation group is isomorphic with the group formed by E3 under vector
addition (Exercise).
An isometry of Euclidean space E3 is a bijective (one to one and onto)
p transformation
σ : E3 7→ E3 such that d (σ (x), σ (y)) = d (x, y), where d (x, y) = + (x − y) · (x − y) is
the Euclidean distance between x and y, for all x, y ∈ E3 .
We first show that all the isometries {σ } form a group.
(ii) (Associativity) Let σ1 , σ2 , σ3 be isometries. Then, both (σ1 σ2 )σ3 and σ1 (σ2 σ3 )
have to be obtained by successively applying σ3 , σ2 , σ1 (in that order), making
them equal.
(iii) (Identity) The identity transformation I (x) = x is an isometry.
(iv) (Inverse) By the bijection property, every isometry σ has an inverse, σ −1 and since
σ is an isometry
so that σ −1 is an isometry.
Items (i)–(iv) above show that the set of all isometries in E3 form a group.
We now obtain some of the basic properties of an isometry.
Consider an orthonormal basis {ê1 , ê2 , ê3 } and an isometry σ which leaves the vectors
{0, ê1 , ê2 , ê3 } invariant. Then, we want to show that σ is the identity. Let x, x0 ∈ E3 and
σ (x) = x0 . Since σ (0) = 0 we have d (x, 0) = d (σ (x), σ (0)) = d (x0 , 0). This gives,
x2 = (x0 )2 . (7.7)
x = x0
or, σ (x) = x for all x ∈ E3 , giving σ = I. Note that this conclusion is trivial for a linear
operator as it follows directly from linearity. However, isometry is not linear in general.
Let σ be an isometry which leaves 0 invariant, that is, σ leaves one point in E3 fixed.
Then we know that σ is an orthogonal transformation. In fact from Eq. (7.7) we know that
σ (0) = 0 implies σ (x) · σ (x) = x · x, or, σ preserves the length of vectors in E3 . Hence, σ
is an orthogonal transformation.
Let σ be an isometry with σ (0) = a. Then
σ (x) = Ax + a x ∈ E3 (7.9)
In fact every isometry is given by the form in Eq. (7.9), because when a , 0 (a = 0) in
σ (0) = a, it is given by Eq. (7.9) (Eq. (7.9) with a = 0) and there are no other cases.
We can now conclude that the group of isometries is a six parameter group, three
parameters are required to fix the orthogonal transformation A while three more are
required to fix the translation a. We are interested in the subgroup consisting of isometries
given by the product of a rotation and a translation, called Euclidean group. Each such
isometry is physically realized by a displacement of a rigid body. A rigid body is a system
of particles with fixed distances from one another, so every displacement of a rigid body
must be an isometry. A finite rigid body displacement must unfold continuously, so it
must be continuously connected to the identity. In the last subsection we saw that this
property is availed by rotations which are the elements of SO(3). Thus, only the isometries
composed of a rotation and a translation have this property. An isometry of this kind is
called a rigid displacement. Thus, all rigid displacements form a continuous group of
isometries having the canonical form (see Fig. 7.4)
Fig. 7.4 A rigid displacement is the composite of a rotation and a translation. The
translation vector a need not be in the plane of rotation.
where R ∈ SO (3) is a rotation. Note that the rotation R is about an axis through the
origin so the origin is a distinguished point in this representation of the rigid displacement.
However, the choice of origin was completely arbitrary in getting Eq. (7.11), so different
choices of the origin give different decompositions of a rigid displacement into a rotation
and a translation. Next, we show how these are related.
Let Rb denote a rotation about a point b and let R0 = R denote the same rotation
about the origin 0. The rotation about the point b can be effected via the following sequence
of operations. (i) Translate the body by −b to shift the point b to the origin. (ii) Perform
the rotation R about the origin. (iii) Translate by b to shift the origin back to the point b.
The resulting transformation is given by
Rb (x) = x (7.13)
The points x satisfying Eq. (7.13) are the fixed points of Rb . Combining Eqs (7.12) and
(7.13) we get
R (x − b) + b = x (7.14)
where Rb is a rotation about point b. The vector b can be decomposed into components
bk and b⊥ being parallel and perpendicular to the axis of rotation respectively to give
b = bk + b⊥ (7.16)
Comparison between Eqs (7.15) and (7.17) tells us that the following condition must be
satisfied by the required vector b.
a = b⊥ − R (b⊥ )
The vector on the RHS of this equation lies in a plane perpendicular to the rotation
axis determined by R . We can conclude from the above condition on b that a rigid
displacement R (x) + a is a rotation if and only if the translation vector a = a⊥ is
Transformation Groups 203
perpendicular to the axis of rotation. To emphasize this fact, we rewrite the condition on
b as
a⊥ = b⊥ − R (b⊥ ) (7.18)
We note that both the axes of rotation, through the origin 0 and through b are parallel and
share the same plane of rotation perpendicular to both of them. Both vectors a⊥ and b⊥
lie in the plane of rotation which we can view as a complex plane and replace the rotation
operator R in Eq. (7.18) by eiφ where φ is the angle of rotation and treat vectors a⊥ and
b⊥ like complex numbers. This gives
!
a⊥ 1 φ
b⊥ = = a⊥ 1 + i cot
1 − eiφ 2 2
where n̂ is the unit vector defining the axis of rotation. Note that the transformation
R (x) + a⊥ leaves every plane perpendicular to the rotation axis invariant and it consists
of a rotation-translation in each such plane. Thus, we have proved that every
rotation-translation R (x) + a⊥ in a plane is equivalent to the rotation centered at the
point b⊥ given by Eq. (7.19) as shown in Fig. 7.5. Our proof fails if there is no rotation
(φ = 0), in which case we have pure translation. Thus, we have proved that every rigid
displacement in a plane is either a rotation or a translation.
It is immediate from Eq. (7.17) that b⊥ = 0 implies Rb = R . Thus, the rotations differing
by the shift of origin along the rotation axis are equivalent. Indeed, no parameters defining
the rotation change by a translation along the axis of rotation.
where τak is the translation parallel to the rotation axis Rb . Equation (7.21) proves
Chasles theorem: Any rigid displacement can be expressed as a screw displacement. A
screw displacement consists of a product of rotation with a translation along the axis of
rotation (the screw axis). We have done more than proving Chasles theorem, we have
shown how to find the screw axis of a given rigid displacement. Although elegant, Chasles
theorem is seldom used in practice. Equation (7.11) is usually more useful, because the
center of rotation (the origin) can be specified at will to simplify the problem at hand.
Finally, note that b = bk (i.e., b⊥ = 0 in Eq. (7.16)) gives, via Eq. (7.12),
as it should.
Exercise A rigid displacement σ (x) = R (x) + a can be expressed as a product of a
translation τc and a rotation Rb centered at a specified point b. Determine the translation
vector c.
Hint Using Eq. (7.12) R (x) + a = τc Rb can be reduced to c = a − b + R (b). a, b
may be specified as column or row matrices and R ∈ SO (3) as a 3 × 3 special orthogonal
matrix. Otherwise R may be given as a rotation operator.
Exercise A subgroup H of group G is called an invariant subgroup if g −1 hg ∈ H for
every h ∈ H and every g ∈ G. Show that the translations T form an invariant subgroup of
the group E of isometries on E3 .
Solution Let σ ∈ E and τa ∈ T . Then,
(σ −1 τa σ )(x) = x + σ −1 (a)
Exercise Let S denote the reflection in the plane normal to a non-zero vector a. If τa is
the translation by a then Sa = τa Sτ−a is the reflection S shifted to the point a. Show that
SS−a = τ2a .
= S (x) − 2a
giving
Σ : x 7→ λA(x) + a, λ ∈ R, λ , 0, A orthogonal
Both isometries and similarities are subgroups of a more general group of transformations
called collineations which transform lines into lines. All transformations of the form
A : x 7→ A(x) + a A invertible
are collineations and are called Affine transformations. Affine transformations form a
group called Affine group. Note that both isometries and similarities are affine
transformations.
Let G be the affine group and let Ω be the set which is either E3 or a class of figures
in E3 but not both. We define a relation ≡ on Ω namely, α ≡ β if and only if there exists
σ ∈ G such that σ (α ) = β.
Exercise Show that ≡ is an equivalence relation.
Hint Again, this follows from the fact that affine transformations form a group. So
proceed just the way we showed congruence to be an equivalence relation.
Consider a subset of Ω consisting of all elements which are related via ≡. Such a subset is
called an equivalence class of ≡. To construct such a subset pick up an element in Ω and
collect all elements of Ω related to it. If the complement of this subset in Ω is not empty,
pick out an element from the complement and collect all elements related to it. Repeat this
procedure until all of Ω is exhausted. Obviously, all these subsets, or equivalence classes,
are mutually exclusive, because if any two of them have an element in common, by
transitivity property it will be related to all the elements of both the subsets, so that their
union will form a single equivalence class. Thus, Ω is partitioned by its equivalence
classes, that is, two equivalence classes have empty intersection and the union of all of
them is Ω.
When G is the affine group the elements of the equivalence class of ≡ on Ω via G are
called affine equivalent.
Instead of defining via the affine group, we can define ≡ via the similarity group or the
isometry group to get the same results.
We now classify the set of all central conics, (defined below), which are the orbits of
particles driven by the inverse-square law of force, using group of affine transformations or
groups of isometries and similarities.
Conics are the loci of the second degree, that is, the non-empty point sets in E2 , given
by Sprienger
(u 0 S + w )A(S T (u 0 )T + w T ) + 2(u 0 S + w )k T + c = 0
u 0 A0 (u 0 )T + 2u 0 (k 0 )T + c0 = 0 (7.24)
Γ 0 = {(x, y ) | a0 x2 + 2h0 xy + b0 y 2 + 2g 0 x + 2f 0 y + c0 = 0 a0 , 0 or h0 , 0 or b0 , 0}
To find the affine equivalent class of central conics, we have to find the criteria which
guarantee (or otherwise) the existence of an affine transformation connecting the given
conics Γ and Γ 0 . That is, given Γ and Γ 0 , as in Eqs (7.23) and (7.24), when can one find an
invertible matrix S transforming Γ 0 to Γ . We differ this question until we have obtained
the effect of the Euclidean transformations (isometries) on central conics and find its
equivalence classes.
When σ : u 7→ u 0 is an isometry, the above analysis goes through, with the reservation
that the matrix S defined by u = u 0 S + w must be orthogonal. We are interested in the
isometries continuously connected to identity, so we restrict to the Euclidean group and
require S to be special orthogonal (det S = +1). Since A is symmetric and S is special
orthogonal, we can choose S such that the matrix A0 = SAS T is diagonal with the diagonal
elements as the eigenvalues of A. Thus, we can write A0 = diag(λ, µ) where λ, µ are the
208 An Introduction to Vectors, Vector Operators and Vector Analysis
αx2 + βy 2 = 1 (7.27)
γx2 + δy 2 = 1 (7.28)
which is possible if and only if {γ, δ} = {α, β}. Thus, two central non-degenerate (i.e.,
∆ , 0) conics are Euclidean equivalent if and only if they have the same values for α and β
or, equivalently, for α + β and αβ given by Eq. (7.27). In other words, the quantities
(α + β )2 (a + b )2
= (7.30)
αβ (ab − h2 )
is the required invariant under the similarity group.
Under the affine group, the conics with Eqs (7.27) and (7.28) are equivalent if and only
if αβ and γδ have the same sign, because in this case (with U an invertible matrix not
necessarily orthogonal) the determinants of the corresponding matrices are related by
γδ = (det U )2 αβ (7.31)
Transformation Groups 209
and since the conic is central, αβ , 0. We also note that both α and β cannot be < 0
because in that case, no (x, y ) can satisfy Eq. (7.27). Thus, Eq. (7.31) does imply that αβ
and γδ have the same sign. There are thus only two affine equivalent classes of central
non-degenerate conics, namely those for which ab − h2 > 0 (ellipses) and those for which
ab − h2 < 0 (hyperbolae). Note that in the affine geometry, any two ellipses are equivalent,
while in Euclidean geometry they are equivalent if they have the same pair of Euclidean
invariants given by Eq. (7.29), which means that the two ellipses must be of the same size.
All ellipses are affine equivalent to the locus of the equation x2 + y 2 = 1 that is the unit
circle. All hyperbolae are affine equivalent to the locus of the equation x2 − y 2 = 1. This is
a disconnected set with two components, namely,
Finally, we note that the Euclidean equivalent figures have the following property: One
figure can be superposed on the other by rigid displacement. Thus, the group of rigid
displacements describes all possible relations of congruency. These relations underlie all
physical measurements. A ruler is a rigid body and any measurement of length involves
rigid displacements to compare a ruler with the object being measured.
Exercise This is a small project for the students:
Discuss the Euclidean, similarity and affine equivalence classes of non-singular central
quadrics in E3 i.e., the loci
n o
(x, y, z ) | ax2 + by 2 + cz2 + 2f yz + 2gzx + 2hxy + 2ux + 2vy + 2wz + d = 0
with
a h g u
a h g h b f v
h b f , 0 , 0,
g f c w
g f c
u v w d
where vertical bars mean the determinants of the corresponding matrices. Show in
particular, that there are three affine equivalent classes and find simple canonical
representatives of these classes.
Part III
Vector Analysis
Transformation Groups 213
s2n · s
tn =
(22 + 2) · (42 + 4) · · · [(2n)2 + 2n] · r 2n
The rule further says:
jiva = s − t1 + t2 − t3 + t4 − t5 + · · ·
s3 s5
= s− + − ···
r 2 · (22 + 2) r 4 (22 + 2)(42 + 4)
Substituting
(i) jiva = r sin θ,
(ii) s = rθ, so that s2n+1 /r 2n = rθ 2n+1 and noticing that
(iii) [(2k )2 + 2k ] = 2k · (2k + 1) so that
(iv) (22 + 2) · (42 + 4) · · · [(2n)2 + 2n] = (2n + 1)!,
and cancelling r from both sides, we see that the infinite series for Jiva is entirely equivalent
to the well known Taylor series for sin θ :
θ3 θ5 θ7
sin θ = θ − + − + ···
3! 5! 7!
It is now well known that calculus was developed in India starting mid-fifth century
(Aryabhata in Bihar) until mid-fourteenth century (Madhava in Kerala) with a long list of
brilliant mathematicians filling in the gap. Indians invented powerful techniques to
accelerate convergence of a series and to sum a given series to the required accuracy [18].
Thus, Madhava produced a table of values of sin θ and cos θ exact upto ten decimal digits
by summing up their Taylor series (better called Madhava series!). Values to this accuracy
were required for navigation (locating ships and finding directions on open sea) and
timekeeping (yearly scheduling of agricultural activities, vis-a-vis rainy season, to
maximize production).
8
Preliminaries
I : 0 1 − 1 2 − 2 3 − 3···
N : 1 2 3 4 5 6 7···
Answer
n
2 (n even)
f (n) =
− n−1 (n odd).
2
This example shows that an infinite set can be put to 1 − 1 correspondence with one of its
proper subsets. This is not possible for finite sets.
Since R, R3 and E3 are continua, we expect each of them to form an uncountable set.
Also, every subset of these spaces, which forms a continuous region of space must also be
an uncountable set. We accept this to be true without supplying any proofs.
from which the result follows. Thus, a sequence in E3 converging to a vector in E3 is also a
sequence in R3 converging to a point represented by the vector at the limit.
Uniqueness of the limit of a converging sequence enables us to re-define its convergence as
follows.
A sequence {xi } in a metric space X is a sequence converging to x∗ if for every > 0
there is an integer n0 > 0 such that d (xn , x∗ ) < whenever n > n0 . The fact that x∗ is the
limit of a converging sequence {xk } is summarily expressed as limk→∞ xk = x∗ .
Exercise Suppose {xn } is in R and limn→∞ xn = x∗ . Show that limn→∞ x1 = x1∗
n
provided xn , 0, (n = 1, 2, . . .) and x∗ , 0.
Exercise
Solution
|αj,n − αj | ≤ |xn − x|
which follow immediately from the definition of the norm in R3 show that
limn→∞ αj,n = αj , j = 1, 2, 3.
Conversely, if limn→∞ αj,n = αj , j = 1, 2, 3, then to each > 0 there is an integer
N such that n ≥ N implies
|αj,n − αj | < √ j = 1, 2, 3.
3
Hence, n ≥ N implies
12
X 3
2
|xn − x| = |α − α | < ,
j,n j
j =1
lim f (x ) = f (x∗ ),
x→x∗
or,
lim d (f (x ), f (x∗ )) = 0,
x→x∗
Exercise Show that if the functions f (x ) and g (x ) are continuous at x∗ then so is their
sum f (x ) + g (x ) and their product f (x )g (x ).
In general, we say that
lim f (x )
x→x∗
exists if for every sequence {xn } converging to x∗ , the corresponding sequence {f (xn )}
converges to the same limit. In terms of this definition, the result of the third exercise of
this section can be used to get
and
We start with the functions in the first of the three categories described above, namely, the
vector valued functions of a scalar variable, denoted f(t ).
Otherwise, the RHS of Eq. (9.1) will blow up as ∆t → 0 because the numerator remains
finite while the denominator tends to zero.
The derivative ḟ(t ) is a function of t in its own right, therefore we can differentiate it
by applying Eq. (9.1) to it, provided the corresponding limit exists. The resulting derivative
2
function is called the second derivative of f(t ) and is denoted f̈(t ) or ddt 2f (t ). Continuing
in this way we can define the third and higher order derivatives of f(t ).
As an important application, we consider a particle moving along a path which is a
continuous and differentiable curve, that is, the curve is the graph of a continuous and
222 An Introduction to Vectors, Vector Operators and Vector Analysis
differentiable function x(t ) of time t, giving the position vector of the particle at time t on
the path. The derivative ẋ = ẋ(t ) is called the velocity of the particle, defined by Eq. (9.1),
which we can abbrivate as
dx ∆x
ẋ = = lim , which defines ∆x = x(t + ∆t ) − x(t ).
dt ∆t→0 ∆t
The curve and the vectors involved in the derivative are shown in Fig. 9.1. Note that the
derivative ẋ or the velocity vector is always tangent to the curve. The derivative of the
velocity
d 2x ∆ẋ
ẍ = 2
= lim
dt ∆t→0 ∆t
d df(t ) dg(t )
(f(t ) + g(t )) = + = ḟ(t ) + ġ(t ) (9.2)
dt dt dt
and for two scalar valued functions of a scalar variable f (t ) and g (t ) we get
d df (t ) dg (t )
(f (t )g (t )) = g (t ) + f (t ) = f˙ (t )g (t ) + f (t )ġ (t ) (9.3)
dt dt dt
Using the definition of the dot product in terms of vector components and Eq. (9.3) we can
write
d d
(f(t ) · g(t )) = (f (t )gx (t ) + fx (t )gx (t ) + fx (t )gx (t ))
dt dt x
In particular, for a particle with velocity v(t ) and speed function v (t ) = |v(t )| we get
d 2 d
v (t ) = (v(t ) · v(t )) = 2v̇(t ) · v(t ).
dt dt
This equation relates the rate of change of kinetic energy of a particle with its velocity and
acceleration. On the other hand, if the particle is moving along a straight line, so that its
direction v̂ is constant while its speed changes with time, (v̇(t ) = v̇ (t )v̂), then,
d 2
v = 2v v̇ = 2v v̇ v̂ · v̂ = 2v̇ · v.
dt
We shall now show that a vector valued function v(t ) has constant magnitude if and only
if there is a vector ω satisfying
v̇ = ω × v (9.5)
To show that Eq. (9.5) implies constant magnitude for v, we just dot both sides by v to get
d 2
v · ω × v on RHS which is zero. This means 2v̇ · v = dt v = 0 or |v| is constant.
d 2
To show that constant magnitude of v, that is, 2v̇ · v = dt v = 0 implies the existence
of some ω satisfying Eq. (9.5), we choose ω = (v̂ × v̇)/v. Using identity I and the fact that
v̇ · v = 0 we can easily check that this ω satisfies Eq. (9.5).
Exercise If n̂ is a unit vector function of the scalar variable t, then show that
n̂ × d n̂ = d n̂ .
dt dt
Solution We make use of the fact that the vector of constant magnitude is perpendicular
to its derivative. Thus, n̂ is perpendicular to ddtn̂ . Therefore, we have,
n̂ × d n̂ = |n̂| d n̂ sin π = d n̂
dt dt 2 dt
since |n̂| = 1.
Exercise Let u = u(t ) be a vector valued function and write u = |u|. Show that
d d u (u × u̇) × u
(û(t )) = = .
dt dt u u3
Now consider
(u × u̇) × u u 2 u̇ − (u · u̇)u u u̇ − u̇u
= =
u3 u3 u2
1 d 2
where the last equality follows from u · u̇ = 2 dt u = u u̇.
Exercise Show that the conservation of angular momentum (h) of a particle driven by a
central force, (ḣ = 0), implies that both the magnitude and the direction of h are conserved
separately. Use this to show that the orbit of the earth around the sun never changes the
direction of its circulation about the sun.
Solution To prove the first part consider ḣ = 0 implies h · ḣ = 0 which implies
d d
dt (h · h) = dt (h2 ) = 0, where h = |h|. Thus, the magnitude of h is conserved separately.
Now, h = constant and hĥ = constant together imply ĥ = constant so that the direction
of h is separately conserved.
To get the second part, note that, for constant magnitude of h,
where r is the distance of the particle from the center of force. Equation (9.6) implies that
θ̇ ≥ 0 always, in a dextral (that is, right handed) frame so θ = θ (t ) increases
monotonically with time if h , 0. In a left handed frame θ̇ ≤ 0. What is important (and
physical) is that θ̇ cannot ever change its sign. This means that the orbit of the earth in the
central force field of the sun never changes the direction of its circulation, as the angular
momentum of its orbital motion around the sun is conserved. Note that this result applies
to all central forces.
Let us now see the effect of differentiation on the vector product of two functions and
the product of a vector valued and the scalar valued function. Let A(t ) and B(t ) be two
vector valued functions of a scalar variable t and φ(t ) be a scalar valued function of t.
Differentiating (A(t ) × B(t ))i = εijk Aj (t )Bk (t ) we get,
d dA dB
(A(t ) × B(t )) = ×B+A× . (9.7)
dt dt dt
Also, by differentiating the product of functions we get,
d dφ dA
(φA) = A+φ . (9.8)
dt dt dt
We can summarily conclude
dA
• If dt · A = 0 then |A| is constant.
• If A × dA
dt = 0, A , 0, then
dA
dt is parallel to A implying that A has constant direction.
Vector Valued Functions of a Scalar Variable 225
where k̂ is along the axis of the helix. Provide the equation for the circular helix with (i) z
coordinate and (ii) arc length s as a parameter.
Solution A circular helix is a curve which winds on a circular cylinder of radius a with its
axis along the z axis. When a point moving along the helix completes one turn, t increases
by 2π; x and y coordinates assume their original values, and z is increased by 2πb. As
dx
dt , 0, for all t, all points of the helix are regular for the parameter t.
Let z be the new parameter and b , 0. Then t = z/b and the equation to the helix
becomes
z z
x = a cos î + a sin ĵ + zk̂.
b b
Since t is an analytic function of z and dt/dz = 1/b , 0 every point of the helix is a
regular point for the new parameter z.
Now for the parameter arc length s, we know that,
s
!2 !2 !2
dx dy dz
ds = + + dt.
dt dt dt
then becomes a unit vector tangential to the path at the point x(s0 ) pointing along the
direction given by the increasing values of s. Denoting this tangential unit vector by t̂ we
can write
dx
t̂ = .
ds
Since t̂ is a unit vector we have t̂ · t̂ = 1 which gives
d t̂
· t̂ = 0
ds
d t̂
that is, the vector ds is orthogonal to t̂. This vector measures the amount by which the
direction of t̂ changes as s increases i.e., as the particle moves along the path. We write
d t̂ d t̂
= n̂ = κn̂ (9.9)
ds ds
d t̂ d t̂
where n̂ is the unit vector in the direction of ds and κ = ds is the rate of change of
direction of t̂ with s. κ is called the curvature of the path at the point x(s0 ). n̂ is called the
d t̂
principal normal unit vector. Note that n̂ is always in the direction of ds as κ is chosen to
be non-negative.
The equation κ = ρ1 defines the radius of curvature ρ at the corresponding point. A
straight line is a curve with zero curvature and infinite radius of curvature. In this case t̂
is along the line and n̂ can be in any direction perpendicular to t̂. The vector X = x + ρn̂
228 An Introduction to Vectors, Vector Operators and Vector Analysis
determines C, the center of curvature. The circle with center at C, radius ρ and in the plane
determined by n̂ and t̂ is called the circle of curvature or the osculating circle (see Fig. 9.3).
and
s s
n̂ = − cos √ î − sin √ ĵ.
a2 + b2 a2 + b2
Note that the curvature is the same for all points of the helix, while n̂ changes as we go
along the helix.
Exercise Obtain the parameterization of a circle of radius R by arc length. Find the
vectors t̂ and n̂ and hence the curvature and the radius of curvature at a point on the
circle. Show that these quantities are the same for the whole circle.
Vector Valued Functions of a Scalar Variable 229
d t̂ 1 s s
= − cos , sin
ds R R R
d t̂
so that the curvature κ = | ds | = R1 and the radius of curvature is R. Since these quantities
depend only on the circle radius R, they are the same for all points of the circle,
characterizing the circle as a whole.
Exercise
(a) For a scalar valued function of a scalar variable, y (x ), which is continuous and has a
continuous first derivative, the curvature κ is defined by dαds where s is the arch length
parameter of the graph of y (x ) verses x, α (s ) is the angle made by the tangent to the
graph at s with the positive direction of the x axis (see Fig. 9.4). Show that
y 00
κ= (9.10)
(1 + y 02 )3/2
where prime denotes differentiation with respect to x.
Solution
(a) Since y (x ) is continuous and differentiable, the piece of curve traversed by a small
enough increment ds can be approximated by a straight line, in which case we have
(see Fig. 9.4),
q q
2 2
ds = dx + dy = dx 1 + y 02 .
dα dα dx
=
ds dx ds
d 0
dx (arctan y )
= p
1 + y 02
y 00
= · (9.12)
(1 + y 02 )3/2
ẏ
sin α
(b) To get Eq. (9.11) just note that y 0 = ẋ = cos α = tan α and transform Eq. (9.10).
ẋ ẏ
Note that cos α = ± √ 2 2 and sin α = ± √ 2 2 (where the same sign must be
ẋ +ẏ ẋ +ẏ
taken in both the formulas) are the direction cosines of the tangent vector ẋ ≡ (ẋ, ẏ )
to the path at (x (t ), y (t )). The claim in the next question is satisfied if the speed of the
particle along the path is constant and equals unity so that parameters
ps and t become
identical (because s = vt for constant v) and we have dx ds
= dx 2 2
dt = ẋ + ẏ = 1.
Finally, we note that we take ẋ2 + ẏ 2 , 0, that is, the tangent always exists at all
points of the path. It is horizontal if ẏ = 0 and vertical if ẋ = 0.
Exercise Find the curvature and the radius of curvature of a circle of radius R using
Eq. (9.11).
A third unit vector, orthogonal to both t̂ and n̂ is uniquely defined as
b̂ = t̂ × n̂
We see that the triplet {t̂, n̂, b̂} forms a right handed system of orthonormal vectors at each
point of the curve. Since the triplet {t̂, n̂, b̂} changes from point to point on the curve, the
corresponding coordinate system also changes and is called a moving trihedral.
Since t̂ · b̂ = 0, we have,
d t̂ d b̂ d b̂
0= · b̂ + t̂ · = t̂ ·
ds ds ds
d b̂ d b̂
implying t̂ and ds are orthogonal. Since b̂ · b̂ = 1, b̂ · = 0. Thus, ddsb̂ is a vector
ds
perpendicular to both t̂ and b̂ so that the vector ddsb̂ is along n̂ and measures the rotation
of b̂ in the plane of b̂ and n̂ perpendicular to t̂, as the particle moves along the curve, or as
s changes. We write
d b̂
= τ n̂ (9.13)
ds
and call τ the torsion of the curve.
Exercise Find the binormal vector and the torsion for the circular helix.
Answer Using the previously obtained expressions for t̂ and n̂ for the helix,
b s b s a
b̂ = sin î − cos ĵ + k̂
c c c c c
√
where c = a2 + b2 . Further,
b
τ =− .
c2
Exercise A helix is defined to be a curve with non-zero curvature, such that the tangent
at every point makes the same angle with a fixed line in space called the axis. Show that
a necessary and sufficient condition that a curve be a helix is that the ratio of torsion to
curvature is constant.
Solution We first show that the all tangents making the same angle with the axis implies
a constant ratio of κ and τ. This condition can be expressed as
t̂ · ê = cos θ = c,
where t̂ is a unit tangent vector to the helix, ê is a unit vector along the axis and θ is
the (constant) angle between the tangent and the axis. Differentiating this equation with
respect to s gives
d t̂
· ê = κn̂ · ê = 0.
ds
232 An Introduction to Vectors, Vector Operators and Vector Analysis
Since κ , 0 we must have n̂ · ê = 0. Hence, ê is in the plane spanned by t̂ and b̂ and can be
expressed as a linear combination of them. Since t̂ · ê = cos θ and ê is a unit vector,
Differentiating with respect to s we get, since the derivatives of t̂ and b̂ are both
proportional to n̂,
or,
κ
= − tan θ = constant.
τ
We now assume that
κ sin θ
= − tan θ = − = constant.
τ cos θ
This means we can write
Now, we substitute the derivatives of t̂ and b̂ for κn̂ and τ n̂ respectively and then integrate
with respect to s to get,
cos θ t̂ + sin θ b̂ = ê
n̂ = b̂ × t̂
d n̂ d b̂ d t̂
= × t̂ + b̂ ×
ds ds ds
= τ n̂ × t̂ + κb̂ × n̂
= −τ b̂ − κt̂. (9.14)
Exercise Show that we can cast the Frenet–Seret formulae in the form
d t̂ d n̂ d b̂
= d̂ × t̂, = d̂ × n̂, = b̂ × d̂
ds ds ds
where d̂ = τ t̂ + κb̂ is the Darboux vector of the curve.
We can express the instantaneous velocity and acceleration of the particle as it moves
along a smooth path in terms of the orthonormal basis (t̂, n̂, b̂). From the definition of the
parameter s we see that the quantity ds
dt is simply the instantaneous speed v of the particle.
We then have, for the instantaneous velocity of the particle
dx dx ds
v= = · = v t̂. (9.15)
dt ds dt
Thus, the direction of the instantaneous velocity is always along the unit tangent vector to
the path in the direction of motion of the particle.
We get the acceleration of the particle by differentiating Eq. (9.15).
dv
a =
dt
dv d t̂
= t̂ + v
dt dt
d 2s ds d t̂
= 2
t̂ + v ·
dt dt ds
d 2s
= t̂ + v 2 κn̂
dt 2
dv
= t̂ + v 2 κn̂. (9.16)
dt
Thus, the acceleration has two components, one given by the rate of change of
instantaneous speed along the direction of motion and the other, with magnitude v 2 κ,
called centripetal acceleration, along the principal normal. We have thus connected the
kinametical quantities velocity and acceleration of the particle with the local geometry of
its path given by the triad (t̂, n̂, b̂).
Exercise A kinematical quantity called jerk (denoted j) is defined as the third order
derivative of the position vector with respect to time. Show that,
d 3x 2 dκ
j≡ = −κ t̂ + n̂ − κτ b̂ (9.17)
dt 3 ds
234 An Introduction to Vectors, Vector Operators and Vector Analysis
The acceleration does not involve the torsion of the orbit, but the jerk does. Show further
that,
v · (a × j) = −κτv 3 (9.18)
and
|v × a| = v 3 κ. (9.19)
These equations can be used to find the curvature κ and the torsion τ at any point of the
orbit by using the kinematical values v, a and j at that point.
Exercise Find the curvature and the torsion of the spiralling path of a charged particle in
a uniform magnetic field B.
Solution The Newtonian equation of motion is
dv
m = e (v × B)
dt
which implies
dv
v· =0
dt
so that |v| = v0 is a constant. The solution of the equation of motion is
e
v = v0 + {(x − x0 ) × B}
m
where v0 and x0 are the constants of integration. This gives
v · B = v0 · B = v0 B cos θ
where θ is the angle between v0 and B. Taking the vector product of v on both sides of the
equation of motion, we get, using identity I,
dv e
v× = [(v · B)v − v 2 B].
dt m
Similarly, differentiating the equation of motion once with respect to t we get
d 2v
2
e
j= 2 = [(v · B)B − B2 v].
dt m
We can now use Eqs (9.18), (9.19) to get the curvature κ and toesion τ as
e B sin θ
κ=
m v0
Vector Valued Functions of a Scalar Variable 235
and
e B cos θ
τ= ·
m v0
Exercise A spaceship of mass m0 moves in the absence of external forces with a constant
velocity v0 . To change the motion direction, a jet engine is switched on. It starts ejecting a
gas jet with velocity u which is constant relative to the spaceship and at right angle to the
spaceship motion. The engine is shut down when the mass of the spaceship decreases to m.
Through what angle θ does the direction of the motion of the spaceship deviate due to the
jet engine operation?
Solution Figure 9.5 shows a possible path of the satellite when the jet engine is on
(the actual path will depend on v0 ). Since there are no external forces, the equation of
motion is
dv dm
m +u n̂ = 0,
dt dt
where u = u n̂ is the velocity of the gas jet relative to the satellite and n̂ is the principal
normal. However, we know, via Eq. (9.16), that
dv dv
= v 2 κn̂ + t̂,
dt dt
where κ is the curvature and s is the length along the path of the satellite (arc length).
dv 2
Dotting the equation of motion with v and noting that v · n̂ = 0 we get v · dv dt = dt = 0
which means that the speed of the satellite as it moves along its path is constant in time.
This follows also from the fact that there are no external forces. Thus, only the centripetal
acceleration survives giving dv 2
dt = v κn̂. When substituted in the equation of motion it
becomes
dm
mv 2 κ = −u ,
dt
236 An Introduction to Vectors, Vector Operators and Vector Analysis
or,
mv 2 dm
= −u ,
R dt
or,
uR dm
dt = − .
v2 m
Here, we have used κ = R1 where R is the radius of curvature. In order to get the angular
advance of the satellite we transform this equation using vdt = Rdθ (which is justified
because the path is continuous and differentiable) to get
Rdθ uR dm
= dt = − 2 ,
v v m
or
u dm
dθ = − .
v m
Integrating, we get the required angular advance,
u m dm u
Z Z
m
θ = dθ = − = ln 0 .
v m0 m v m
(x − xp ) · t̂ = 0,
(x − xp ) · n̂ = 0,
Vector Valued Functions of a Scalar Variable 237
(x − xp ) · b̂ = 0.
Using the definitions of t̂ and n̂, we see that b̂ is parallel to x0p × x00p where prime denotes
the differentiation with respect to s, and this notation will be used subsequently. Thus, the
equation to the osculating plane gets the form
(x − xp ) · x0p × x00p = 0.
(x − xp ) · ẋp × ẍp = 0.
If the curve is a straight line, or a point, ẋp and ẍp are parallel, so that equation to
the osculating plane is satisfied by every x in space which means that the equation does
not determine the osculating plane. For a straight line, the osculating plane is determined
by the choice of the principal normal n̂ (see the text below the place where we have
defined n̂).
In Cartesian coordinates x, y, z, the equation to the osculating plane becomes
x − xp y − yp z − zp
ẋp ẏp żp = 0.
ẍp ÿp z̈p
Exercise Find the equation of the osculating plane to the circular helix.
Answer
x − a cos t y − a sin t z − bt
−a sin t a cos t b = 0,
cos t sin t 0
238 An Introduction to Vectors, Vector Operators and Vector Analysis
or,
δ(k ) (s0 ) = 0, k = 0, 1, . . . , n,
δ(n+1) (s0 ) , 0.
and
since the first and the second derivatives of x(s ) with respect to s equal t̂ and κn̂
respectively and t̂, n̂, b̂ form an orthonormal triad. Hence, the osculating plane has
contact of at least order two with the curve.
Now consider a second plane tangent to the curve at P . The distance function δ (s ) for
this plane is
δ (s ) = ±[x(s ) − x(s0 )] · ĉ
where ĉ is a unit vector normal to the plane. The first two derivatives of δ (s ) at P are
and
Therefore, the derivatives are non-zero, unless ĉ is parallel to b̂ making two planes coincide.
Thus, the order of contact of any plane other than the osculating plane is less than two.
Exercise Find the order of contact of the osculating plane to the circular helix.
Hint We know that the order of contact is at least two. Using the equation of the√
helix with
arc length as parameter, show that δ(3) = ±x000 (s0 ) · b̂ = ± ab
c4
, 0, where c = a2 + b2 .
Therefore, the required order of contact is two.
f 0 (ξ1 ) = f 0 (ξ2 ) = 0, s0 ≤ ξ1 ≤ s1 ≤ ξ2 ≤ s2 .
f 00 (ξ3 ) = 0, ξ1 ≤ ξ3 ≤ ξ2 .
f (s0 ) = (x(s0 ) − x0 )2 − a2 = 0,
Since x00 (s0 ) = κn̂ and x0 (s0 ) = t̂, the third of the above equations gives,
n̂ · (x(s0 ) − x0 ) + ρ = 0, (9.21)
where ρ is the radius of curvature. Dotting Eq. (9.20) with n̂ and then using Eq. (9.21) we
get α = −ρ. Squaring each p side of Eq. (9.20) and using f (s0 ) = 0 (first of the above three
equations) we get β = ± a2 − ρ2 . Using Eq. (9.20) (with the corresponding expressions
for α and β) we see that, for a > ρ, there are two limiting spheres the position vectors of
the centers of which are given by
q
x0 = x(s0 ) + ρn̂ ± a2 − ρ2 b̂. (9.22)
If we select a = ρ then the sphere has its center in the osculating plane. The intersection of
this sphere and the osculating plane is a circle of radius ρ and is called the osculating circle,
or the circle of curvature.
We define the order of contact between two curves in the same way as we did for a
curve and a plane. It turns out that the order of contact between the osculating circle and
the space curve is at least two.
which are called the natural, or intrinsic equations of a curve. We know that two congruent
curves have the same natural equations. We now show that the reverse implication is also
true: Two curves having the same natural equations are congruent.
Let the two curves be x = x1 (s ) and x = x2 (s ). By a rigid motion, we can make the
points corresponding to s = 0 coincide such that the moving trihedrals at these points
coincide. Now using Eqs (9.9), (9.13) and (9.14) it is straightforward to show that
d
(t̂ · t̂ + n̂1 · n̂2 + b̂1 · b̂2 ) = 0,
ds 1 2
or,
Therefore,
Since t̂, n̂, b̂ are unit vectors, it follows from Eq. (9.25) that
and that Eq. (9.24) applies for all s; not only at s = 0. From t̂1 · t̂2 we get,
x01 = x02
so that
x1 = x2 + c,
where c is the constant of integration. The initial condition x1 (0) = x2 (0) gives c = 0.
Therefore, for all s
x1 = x2
s2 00 s3
x(s ) = x(0) + sx0 (0) + x (0) + x000 (0) + · · · .
2 3!
Again, expressing the derivatives of x(s ) in terms of curvature and torsion via Eqs (9.9),
(9.13) and (9.14), we get,
s2 s3
x(s ) = x(0) + st̂(0) + κ (0)n̂(0) + − κ2 (0)t̂(0) + κ0 (0)n̂(0)
2 3!
+κ (0)τ (0)b̂(0) + · · · .
242 An Introduction to Vectors, Vector Operators and Vector Analysis
This Taylor series is equivalent to three scalar equations in terms of the components
x1 (s ), x2 (s ), x3 (s ) of x(s ) along triad basis t̂, n̂, b̂ with origin at s = 0. These are
κ 2 (0)s 3
x1 (s ) = s − + ···
6
κ (0)s 2 κ 0 (0)s 3
x2 (s ) = + + ···
2 6
κ (0)τ (0)s 3
x3 (s ) = + ··· (9.26)
6
We can use Eq. (9.26) to get the equations to the projections of the space curve on the
coordinate planes corresponding to the t̂, n̂, b̂ basis in the neighborhood of s = 0. Keeping
only the first terms in Eq. (9.26), the projections on the osculating plane, the rectifying
plane and the normal plane respectively are given by
κ (0) 2
x2 (s ) = x (s )
2 1
κ (0)τ (0) 3
x3 (s ) = x1 (s )
6
2τ 2 (0) 3
x32 (s ) = x (s ) (9.27)
9κ (0) 2
Fig. 9.6 Projections of a space curve on the coordinate planes of a moving trihedral
Exercise Find the natural equations for the cycloid, parametrically given by (see the next
section on plane curves)
Solution Since the curve is planar, the unit vector b̂ is a constant vector always
perpendicular to the plane of the curve. Therefore, d b̂/ds = 0, that is, τ = 0. To get the
equation in κ, we have to find the arc length measured from some fixed point on the
curve, using the given parametric equations. We have,
t
q
ds = dx2 + dy 2 = 2a sin dt,
2
giving
Z s Z t
t t
s= ds = 2a sin dt = −4a cos ,
0 π 2 2
where s is measured from the top of the cycloid, that is, s = 0 at t = π. From Eq. (9.19),
the parametric equation of the curve and v = ẋ, a = ẍ we get
Further,
t
s2 = 16a2 cos2 = 8a2 (1 + cos t )
2
giving us the required equation,
1
+ s2 = 16a2 .
κ2
Fig. 9.7 A construction for finding the equation of an involute C2 for a given evolute C1
and vice versa
or,
1 + u0 = 0
or, integrating,
u = c − s,
where c is the constant of integration. Hence, the equation for the involute is
X = x(s ) + (c − s )t̂ (9.29)
for any given evolute x = x(s ). Actually, for each value of c there will be an involute. So for
a given evolute there exists a family of infinite number of involutes. The same is true for a
given involute.
Let X1 and X2 be two points on two involutes for c = c1 and c = c2 in Eq. (9.29)
corresponding to a point P on the evolute curve x(s ). Subtracting the equation of one
involute from that of the other, we get
X1 − X2 = (c1 − c2 )t̂
or,
|X1 − X2 | = c1 − c2 .
Vector Valued Functions of a Scalar Variable 245
κ2 + τ 2
K2 = ·
κ 2 (c − s )2
Solution From Frenet formula
d T̂
= K T̂.
dS
Equation (9.21) coupled with T̂ · t̂ = 0 tells us that T̂ = ±n̂. Then
d T̂ d n̂ ds
=± .
dS ds dS
Further, u = c − s coupled with Eq. (9.21) and T̂ · t̂ = 0 gives
ds
T̂ = κ (c − s )n̂ ,
dS
from which we get
ds
T̂ · T̂ = 1 = ±κ (c − s ) .
dS
Therefore,
d T̂ d n̂ 1 −κt̂ − τ b̂
= = = K N̂.
dS ds κ (c − s ) κ (c − s )
and
κ2 + τ 2
K2 = ·
κ 2 (c − s )2
We now solve the reversed problem: Given a space curve, C2 , to find space curves, denoted
C1 , of which the given curve is an involute. We follow the same notational convention as
before: Small case letters for the evolute C1 and the capital letters for the involute C2 .
From the definition of the involute, we know that C2 must be perpendicular to every
tangent to the curve C1 we are seeking. Therefore, t̂ lies in the plane of N̂ and B̂. From
Eq. (9.29) we see that the targeted curve C1 , if it exists, is given by (see Fig. 9.9)
x = X − ut.
x = X + α N̂ + β B̂, (9.30)
Vector Valued Functions of a Scalar Variable 247
dx dX d N̂ d B̂ dα dβ
= +α +β + N̂ + B̂ .
dS dS dS dS dS dS
Using Eqs (9.9), (9.13) and (9.14) we get,
! !
dx dα dβ
= (1 − αK )T̂ + + T β N̂ + − T α B̂, (9.31)
dS dS dS
where K and T are respectively the curvature and the torsion of the involute C2 . As
dx dx ds ds
= = t̂
dS ds dS dS
and as t̂ must be a linear combination of N̂ and B̂, it follows that the coefficient of T̂ in
Eq. (9.31) must vanish, leading to
1
α= .
K
Thus, α (S ) is simply the radius of curvature of the involute at S.
Next convince yourself that X − x is parallel to dx/dS. Hence the coefficients of N̂ and
B̂ in Eqs (9.30) and (9.31) must be in the same ratio. We have,
dα
α dS +Tβ
= dβ
.
β −Tα
dS
or,
" Z !#
β = α tan T dS + C ,
where C is a constant of integration. Substituting the values of α and β in Eq. (9.30) we get
the equation of the evolute:
" Z !#
1 1
x = X + N̂ + tan T dS + C B̂. (9.32)
K K
Note that for a point P of the involute the corresponding points Q1 (x1 ), Q2 (x2 ), . . . on the
evolutes for different values of C C1 , C2 , . . . lie on a straight line parallel to the binormal
B̂ at P because xi − xj , i , j, i, j = 1, 2, . . . is proportional to B̂. Further, this line is at
a distance of K1 (radius of curvature at P ) because x has a component along N̂ which is
normal to B̂ and has magnitude K1 .
Exercise Obtain the equations for the evolutes of the circular helix.
Hint Specialize Eq. (9.32) to circular helix. All the required results are available in previous
exercises.
x2 y 2
+ =1
a2 b2
where (x, y ) are the coordinates of a point on the ellipse with respect to the coordinate
system based at the center of the ellipse and x, y axes along its major and minor axes
respectively. a and b are the lengths of the semi-major and the semi-minor axes
respectively. Introduce the parameter u by x = a cos u to get, via its non-parametric
equation above, y = b sin u. These are the parametric equations to the ellipse in terms of
the eccentric angle u. The position vector of any point P on the ellipse is given by
where a and b are vectors along the positive x and y directions with |a| = a and |b| = b so
that a · b = 0. This is depicted in Fig. 9.10.
The above equations to the ellipse tell us that an ellipse is obtained by reducing the
ordinates (y values) of all points on the circumscribing circle (see Fig. 9.10) by the factor ba .
Thus, the ellipse can be viewed as the projection of a circle placed in an inclined position
with respect to the x − y plane. We see that the area of the ellipse is ba times that of a circle
with radius equal to its semi-major axis a, that is, A = πab.
mr̈ + kr = 0
250 An Introduction to Vectors, Vector Operators and Vector Analysis
mẍ + kx = 0 ; mÿ + ky = 0
We can easily calculate |r1 | = a + c cos u and |r2 | = a − c cos u giving us an important
geometric property of ellipse,
Thus, the sum of the distances from the two foci to any point of the ellipse is constant and
equals 2a. Indeed, the ellipse is popularly defined to be the locus of the points for which the
sum of the distances to two fixed points is constant. This property is used in the so called
gardner’s construction (Fig. 9.12(a)). Attach the ends of a chord of constant length 2a to
two fixed points and draw the curve by keeping the chord stretched by a lead pencil.
For the ends of the minor axis, |r1 | = a = |r2 |. Figure 9.12(b) illustrates the relation
a2 = b2 + c2 . For a point vertically above one of the foci,
√ the coefficient of â is zero. In
this case, cos u = ±c/a = ±e. Consequently, sin u = 1 − e2 = b/a and the value of y
Vector Valued Functions of a Scalar Variable 251
is b sin u = b2 /a. This value is denoted by p and is called the parameter or the semilatus-
rectum of the ellipse (see Fig. 9.12(b)). (2p is the latus-rectum.) The eccentricity e and the
parameter p are sufficient to fix the shape and size of the ellipse, just as are a and b.
Fig. 9.12 (a) Drawing ellipse with a pencil and a string (b) Semilatus rectum
(c) Polar coordinates relative to a focus
To get the equation of the ellipse in polar coordinates we identify r = |r1 | to get, as derived
before,
r = a + c cos u. (9.35)
To get the θ coordinate (focal azimuth, see Fig. 3.9(c)) we note that
x coefficient of â in r1 a cos u + c
cos θ = = = · (9.36)
r r a + c cos u
Eliminating u from Eqs (9.35) and (9.36) we find the polar equation to the ellipse
1 1
= (1 − e cos θ ). (9.37)
r p
Exercise Extend the position vector r1 of a point P on the ellipse in the opposite direction
to get a chord of the ellipse. Let the chord be divided by the focus in the intercepts r1 and
r2 . Show that
1 1 2
+ = ·
r1 r2 p
252 An Introduction to Vectors, Vector Operators and Vector Analysis
Hint Use Eq. (9.37) for r1 and r2 and note that cos(θ + π ) = − cos θ. Thus, each chord
passing through the focus is divided by it into two parts such that their hamonic mean is
constant and equals p.
Thus, we have parameterized ellipse using the eccentric angle u with origin at the center
of the ellipse and using the polar coordinate θ with origin at one of the foci. We can
parameterize ellipse by the time of travel of a particle moving on it, tracing it in the sense
of increasing t, taking t = 0 when the particle was at the pericenter. Typical realization of
this situation is the motion of planets along their elliptic orbits around the sun, which sits
at one of the foci and interacts gravitationally with the planet. Actually, we are going to
re-parameterize the elliptical path of the planet by expressing t in terms of the eccentric
anomaly u.
We start with re-writing Eq. (9.33) as
which is a parametric equation r = r(u ) of the elliptic orbit. Now the task is to determine
the parameter u as a function of time u = u (t ) so that Eq. (9.38) directly gives the
dependence of r on t, r = r(t ). From Eq. (9.38) we get, for the specific angular
momentum (angular momentum per unit mass),
H = r × ṙ = z × ż − ea × ż,
or,
Hdt = z × dz − ea × dz.
z × dz = a × bdu.
Similarly,
a × dz = a × b cos udu.
We now make use of the fact that H is the arial velocity of the planet, or, HP = 2πab
where P is the period of the orbit. Substituting for H from this equation into Eq. (9.42)
and dividing out by ab, we finally get the desired equation relating u and t which can be
combined with Eq. (9.38) to get the parameterization of the elliptic orbit in time,
2πt
= u − e sin u. (9.43)
P
This equation is called Kepler’s equation and can be used to obtain the position of the
planet on its orbit at a given time. To make use of this equation, we have to solve it for u
as a function of t. Unfortunately, the equation is transcendental and the solution cannot be
expressed in terms of elementary functions. It can be solved numerically using the method
of successive approximations [2, 17].
where n̂ is the vector normal to the plane defining the axis of rotation. Using Eq. (6.45) we
get, for the position vector of P at time t,
Writing x(t ) = x (t )î + y (t )ĵ and equating the corresponding coefficients we get the
parametric equations for the cycloid,
x (t ) = a(t − sin t ),
a+c
− sin t n̂ × c.
c
Vector Valued Functions of a Scalar Variable 255
Fig. 9.14 Epicycloid. Vectors are (i ) : c, (ii ) : a, (iii ) : a + c, (iv ) : −R (t, n̂)c, (v )
: R (t, n̂)(a + c), (vi ) : R ( ac t, n̂)(−R (t, n̂)c), (vii ) : x(t )
Resolving this into components we get the parametric equations for the epicycloid,
a+c
x (t ) = (a + c ) cos t − c cos t
c
a+c
y (t ) = (a + c ) sin t − c sin t . (9.45)
c
When a = c the curve is called a cardioid (Fig. 9.15) and is given by the parametric
equations
x (t ) = 2a cos t − a cos(2t ),
A third kind of cycloid is the so called hypocycloid which is obtained exactly like the
epicycloid, except that the rolling circle of radius c is interior to the fixed circle of radius a
(see Fig. 9.16). Assuming that the initial position of the rolling point P is at the tip of the
vector a and proceeding exactly as in the case of epicycloid,
x (t ) = a cos t, y (t ) = 0
and the hypocycloid degenerates into the diameter of the fixed circle, traced out back and
forth (see Fig. 9.17). It is interesting to note that this example provides a way to draw a
straight line merely by means of circular motions.
For the case c = a/3 the parametric equations for the hypocycloid become
2 1
x (t ) = a cos t + a cos(2t ),
3 3
2 1
y (t ) = a sin t − a sin(2t ).
3 3
This can be converted to
5 4
x2 + y 2 = a2 + a2 cos(3t ),
6 9
so that the hypocycloid meets the fixed circle exactly at three points and the corresponding
curve appears in Fig. 9.16.
Fig. 9.17 A point P on the rim of a circle rolling inside a circle of twice the radius
describes a straight line segment
258 An Introduction to Vectors, Vector Operators and Vector Analysis
x (t ) = a(t − sin t ),
where a is the radius of the circle and a + c is the distance of P from its center. Note that, at
t = 0 the position of P is (0, −c ), or the vector c0 in Fig. 9.18. These curves appear as the
brachistochrones and tautochrones inside a gravitating homogeneous sphere [19].
As an example of a curve with a loop, consider the curve given by the parametric
equations x (t ) = t 2 − 1, y (t ) = t 3 − t. As t varies from −∞ to +∞ the curve crosses the
origin twice for t = −1 and t = +1 while x(t ) is unique for all other values of t
(Fig. 9.19). The interval −1 < t < +1 corresponds to a loop of the curve. The sense of
increasing t defines the sense of traversing the curve if we imagine the points
corresponding to t = −1 and t = +1 as distinct, one lying on top of the other.
The whole oriented curve can be decomposed into simple arcs, for example, into the
arcs corresponding to n ≤ t ≤ n + 1 where n runs over all integers. The standard example
of a closed curve is a circle parameterized by x (t ) = a cos t, y (t ) = a sin t, which
physically describes the uniform motion of a particle on a circle of radius a with t as time.
If t varies in any half-open interval α ≤ t < α + 2π the point P (x, y ) traverses the circle
counterclockwise exactly once. In general, a pair of continuous functions x (t ), y (t )
defined in a closed interval a ≤ t ≤ b represents a closed curve provided x (a) = x (b ) and
y (a) = y (b ). The closed curve will be simple if (x (t1 ), y (t1 )) = (x (t2 ), y (t2 )) implies
t1 = t2 whenever a ≤ t < b.
The positive sense of traversing a closed curve is defined by the ordering of the points
P0 P1 P2 corresponding to t0 < t1 < t2 respectively (see Fig. 9.20). Note that any cyclic
permutation of the points P0 P1 P2 does not change the sense of traversing a closed curve.
If the curve C is a simple closed curve, it divides all points of the plane into two classes,
those interior to C and those exterior to C. We say that C has counterclockwise orientation
if its interior lies on the positive (that is, left) side (Fig. 9.22).
If the closed curve C consists of several loops, then it is not always possible to describe C
such that all enclosed regions are on the positive side of C (see Fig. 9.23).
Vector Valued Functions of a Scalar Variable 261
(where the same sign must be taken in both the formulas) correspond to directions in which
the tangent can be traversed. The corresponding angles α differ by an odd multiple of π.
One of the two directions correspond to increasing t, while the other one to decreasing t.
ẏ sin α
Since y 0 = ẋ = cos α the positive direction of the tangent that corresponds to increasing
values of t is the one that forms with the positive direction of x axis an angle α for which
cos α has the same sign as ẋ and sin α has the same sign as ẏ. The corresponding direction
cosines are given by
ẋ ẏ
cos α = p and sin α = p .
ẋ2 + ẏ 2 ẋ2 + ẏ2
If ẋ > 0, then the direction of increasing t on the tangent is that of increasing x and the
angle α has a positive cosine. Similarly, the normal direction resulting due to the rotation
of the positive tangent (given by increasing t) in the counterclockwise sense by π2 has the
unambiguous direction cosines
π −ẏ π ẋ
cos α + =p , sin α + =p .
2 2
ẋ + ẏ 2 2 ẋ + ẏ 2
2
It is called positive normal direction and points to the ‘positive side’ of the curve (see
Fig. 9.24).
If we introduce a new parameter τ = χ (t ) on the curve, then the values of cos α and
sin α remain unchanged if dτ dτ
dt > 0 and they change sign if dt < 0 that is, if we change the
sense of the curve, then the positive sense of tangent and normal is likewise changed.
262 An Introduction to Vectors, Vector Operators and Vector Analysis
Sign of curvature
We know that the curvature of a plane curve is defined by the rate of change of direction
of the tangent to the curve with the arc length parameter s, measured by dα ds where α is
the angle made by the tangent with the positive direction of the x-axis. Since the absolute
value of the difference between two values of s has a invariant geometric meaning, namely
the distance between two
points of the curve measured along the curve, the absolute value
of κ namely |κ| = dα
ds does not depend on the choice of a parameter. However, the sign
of the difference must always be taken to be the same as the sign of the difference of the
corresponding s values.
Since we defined s to be an increasing function of t, the sign of κ depends on the sense
of the curve corresponding to increasing t. Obviously, κ > 0 if α increases with s, that is,
if the tangent to the curve turns counterclockwise as we trace the curve with increasing s
or t. This happens when the curve is convex towards the x-axis and the sense of increasing
s is from left to right, while the tangent turns clockwise, when traced in the same sense of
increasing s, if the curve is concave towards the x-axis. When κ > 0 The orientation of the
curve C is such that the positive side of C is also the inner side of C, that is, the side towards
which C curves (see Fig. 9.25).
Fig. 9.25 (a) A convex function with positive curvature, and (b) a concave function
with negative curvature
Vector Valued Functions of a Scalar Variable 263
Exercise Find the curvature of the function y = x3 and find its sign in the regions x < 0
and x > 0. Check how the tangent turns as x increases in these regions.
Note that f(s (t )) = f(s ) since the corresponding values match. By Eq. (8.2) we can write
! !
df ∆f(s (t )) ∆s (t )
= lim lim
dt ∆t→0 ∆s (t ) ∆t→0 ∆t
! !
∆f(s ) ∆s (t )
= lim lim
∆t→0 ∆s ∆t→0 ∆t
! !
df ds
= . (9.48)
ds dt
Equation (9.48) gives us a rule for differentiating a compound function called the chain
rule.
and
Z b Z b Z b
[f(t ) + g(t )]dt = f(t )dt + g(t )dt. (9.50)
a a a
264 An Introduction to Vectors, Vector Operators and Vector Analysis
Z b "Z b #
a × f(t )dt = a × f(t )dt
a a
"Z b #
= − f(t )dt × a. (9.52)
a
Further, we have the “fundamental formula for the integral calculus” which evaluates the
integral of a derivative.
Z b b
df(t )
dt = f(t ) = f(b ) − f(a). (9.53)
a dt a
Another fundamental result is the following formula for the derivative of an integral.
d t
Z
f(s )ds = f(t ), (9.54)
dt a
where s is the dummy variable of integration.
s2
f(t + s ) = f(t ) + sḟ(t ) + f̈(t ) + · · ·
2!
∞ k k
X s d
= f(t ). (9.55)
k! dt k
k =0
Vector Valued Functions of a Scalar Variable 265
Thus, if we know the values of the function and all its derivatives at t then its value at (t + s )
can be obtained as a power series in s. Such a function is said to be analytic at t. A function
analytic at all points in an interval is called analytic in that interval. A function analytic
over its entire domain is called an entire function. Taylor series is very useful in applications
because it can approximate a complicated analytic function by a polynomial obtained after
truncating its Taylor series ensuring the required accuracy. For a given analytic function,
it is always possible to find the minimum number of terms in its Taylor series whose sum
will give the value of the function within the required accuracy.
We can use the fundamental formula, Eq. (9.53), to obtain the Taylor expansion of an
analytic function along with the remainder after k terms.
The fundamental formula gives us
Z t +s
I= ḟ(v )dv = f(t + s ) − f(t ).
t
viathe change of variables v = t + s − u. We can now integrate by parts to get, with ḟ(t ) =
df
du u =t
,
s Z s Zs
I = u ḟ(t + s − u ) + u f̈(t + s − u )du = sḟ(t ) + u f̈(t + s − u )du.
0 0 0
f(t + s ) = f(t ) + I
s
s2 u2 d3
Z
= f(t ) + sḟ(t ) + f̈(t ) + f(t + s − u )du (9.56)
2! 0 2! du 3
giving the first three terms in the Taylor series, the last integral being the remainder term.
k − 1 successive integrations by parts give the first k terms of the series with the
corresponding remainder term involving kth derivative of f(t ). The remainder term can
be used to estimate the truncation error incurred by truncating the series after k terms.
10
We now deal with the functions of vector arguments. These functions are either scalar
valued or vector valued, with corresponding one to one or many to one maps given by
f : E3 7→ R or by f : E3 7→ E3 . A vector valued function of vector argument is equivalent
to a triplet of scalar valued functions of vector argument given by
where f1,2,3 (x) are the scalar valued functions of x given by the components of f(x) with
respect to some orthonormal basis {î, ĵ, k̂}.
functions of three variables. A given function of three variables f (x, y, z ) can be reduced
to a function of a single variable by giving constant fixed values to any two of the variables,
say y and z and treat x as the only variable varying over the allowed domain of x values.
Such a function of a single variable, say x, can then be differentiated by using the standard
definition of its derivative, assuming that it is a continuous and differentiable function of x.
This derivative is called the partial derivative of f (x, y, z ) with respect to x. If we fix z = z0
then f (x, y, z ) is reduced to the function f (x, y, z0 ) = f (x, y ) which defines a surface
in R3 . If we now fix y = y0 we get the function f (x, y0 , z0 ) = f (x ) whose graph is the
curve giving the intersection of the surface f (x, y ) and the plane y = y0 . Geometrically,
the partial derivative of f (x, y, z ) with respect to x is given by the tangent of the angle
between a parallel line to the x axis and the tangent line to the curve u = f (x, y0 , z0 ). It is,
therefore, slope of the surface u = f (x, y, z0 ) in the direction of the x axis (see Fig. 10.1).
Thus, the partial derivatives of f (x, y, z ) with respect to x, y, z are given by
f (x + ∆x, y, z ) − f (x, y, z ) ∂f
lim = (x, y, z ) = fx (x, y, z )
∆x→0 ∆x ∂x
f (x, y, z + ∆z ) − f (x, y, z ) ∂f
lim = (x, y, z ) = fz (x, y, z ). (10.2)
∆z→0 ∆z ∂z
!
∂f (x, y )
= fx (1, 2) = (2x + 2y )x=1,y =2 = 6.
∂x x =1,y =2
∂f (1,2)
We should not write it simply as ∂x since f (1, 2) = 21 is a constant and has 0 as its
x-derivative.
Since the partial derivatives fx,y,z (x, y, z ) are the functions of three variables, they can
be again partially differentiated with respect to x, y, z. Assuming that the order of
differentiation does not matter, we get the six derivatives, namely, fxx , fxy , fxz , fyy , fyz , fzz
∂2 f ∂fx ∂2 f ∂fy
where fxx = ∂x2
= ∂x
, fxy = ∂x∂y
= ∂x
etc.
Exercise Assuming that the order of differentiation does not matter, how many partial
derivatives of order r of a function of n variables are possible?
Solution Let r1 , r2 , . . . , rn denote the number of occurrences of the variables x1 , x2 , . . . , xn
in a rth order possible partial derivative of a function f (x1 , x2 , . . . , xn ). We must have
r1 + r2 + · · · + rn = r. A general arrangement can be viewed as n stars separated by n − 1
∂8 f
bars. For example, the eighth order partial derivative of a six variable function
∂x13 ∂x2 ∂x64
corresponds to (r = 8 and n = 6) ∗ ∗ ∗| ∗ |||| ∗ ∗ ∗ ∗ where the string of stars ending at the kth
bar (1 ≤ k ≤ n − 1) gives the order of differentiation with respect to the variable xk and the
string of stars starting after the n − 1th bar gives the order of differentiation with respect to
the variable xn . If a pair of bars does not sandwitch any stars, or if the last string of stars
is absent, the differentiation with respect to the corresponding variable is absent. The total
number of distinct distributions is then given by the number of ways of selecting r places
out of n + r − 1 places to be filled by stars and rest of the places are filled by bars. (there
are n + r − 1 stars and bars together). This is given by (n+rr−1). Thus, there are (n+rr−1)
rth order partial derivatives of a function of n variables. A function of three variables has
fifteen derivatives of fourth order and 21 derivatives of fifth order.
In the last section we saw that the existence of the derivative of a function of a single scalar
variable guarantees the continuity of the function. In contrast to this, the existence of the
partial derivatives fx,y,z (x, y, z ) does not imply the continuity of f (x, y, z ). Thus for
example, the function u (x, y ) = 2xy/(x2 + y 2 ), (x, y ) , (0, 0) ; u (0, 0) = 0 is
continuous as a function of x for any fixed y and is also continuous as a function of y for
any fixed x, so that it has partial derivatives everywhere. However, it is discontinuous at
(0, 0) as its value at all points on the line x = y is 1 except at (0, 0). However, we have the
following results, which we state here without proof.
If a function f (x, y, z ) has partial derivatives fx , fy and fz everywhere in an open set
R and these derivatives everywhere satisfy the inequalities
Further, if both the partial derivative of order r and the partial derivative obtained by
changing the order of differentiation in any way are continuous in a region R, then both
these derivatives are equal in R, that is, the order of differentiation is immaterial. This
makes the number of partial derivatives of rth order of f (x, y, z ) decidedly smaller than
otherwise expected, as we have calculated in the previous exercise.
∂u ∂ξ ∂u ∂η
ux = + + · · · = uξ ξx + uη ηx + · · ·
∂ξ ∂x ∂η ∂x
∂u ∂ξ ∂u ∂η
uy = + + · · · = uξ ξy + uη ηy + · · ·
∂ξ ∂y ∂η ∂y
∂u ∂ξ ∂u ∂η
uz = + + · · · = uξ ξz + uη ηz + · · · (10.3)
∂ξ ∂z ∂η ∂z
In order to prove Eq. (10.3) all that we need to use is that all functions involved are
differentiable. We have,
ξ (x +∆x, y +∆y, z +∆z )−ξ (x, y, z ) = ξ (x +∆x, y +∆y, z +∆z )−ξ (x, y +∆y, z +∆z )
ξ (x, y, z + ∆z ) − ξ (x, y, z )
+ ∆z. (10.5)
∆z
By differentiability of ξ (x, y, z ) we mean that replacing the difference quotients on the RHS
of this equation by the respective partial derivatives would give an error
• linear in the Euclidean distance traversed as we p
go from x, y, z to x + ∆x, y + ∆y, z +
∆z, that is, the error is given by ρ, where ρ = ∆x2 + ∆y 2 + ∆z2 and
• the error goes to zero faster than ρ → 0, that is, → 0 faster than ρ → 0.
Thus, we can write, upto first order of smallness in ρ, (That is, neglecting the terms of
the second and higher order in ρ in the expression for error if any,)
∆ξ = ξx ∆x + ξy ∆y + ξz ∆z.
This is exactly the same as replacing the distance traversed between two points along a path
by the Euclidean distance between these two points, a procedure we have incurred before.
Similarly, we get,
∆η = ηx ∆x + ηy ∆y + ηz ∆z.
∆u = uξ ∆ξ + uη ∆η + · · ·
∆u = ux ∆x + uy ∆y + uz ∆z.
Comparing the last two equations for ∆u we get Eq. (10.3), which is called the chain rule
for differentiating a compound function of several variables.
Exercise Find expressions for all second order derivatives of u.
Exercise Find all partial derivatives of the first and the second order with respect to x
and y for the following functions of x, y:
Functions with Vector Arguments 271
1
(i) u = v log w where v = x2 and w = 1+y .
(ii) u = evw , where v = ax and w = cos y.
xy
(iii) u = v tan−1 w where v = x−y and w = x2 y + y − x.
(iv) u = g (x2 + y 2 , ex−y ).
(v) u = tan(x tan−1 y ).
Equations (10.6), (10.7) define a new operator, called ‘grad’ or ‘del ’ operator which
operates on a scalar valued function f (x) and returns a vector valued function (∇f )(x)
via
∂f ∂f ∂f
(∇f )(x) = (x)î + (x)ĵ + (x)k̂. (10.7)
∂x ∂y ∂z
where î, ĵ, k̂ is the orthonormal basis in which coordinates of x are (x, y, z ). Note that the
notation ∇(f (x)) is meaningless because f (x) is a number and the del operator does not
act on a number.
In order to be useful, we must show that the definition of the del operator is invariant
under the change of basis, that is, it is the same for all orthonormal bases, so that the del
operator (∇f )(x) has the same value at x irrespective of the basis used to evaluate it. We
∂f ∂f
do this by treating the del operator as a vector with coordinates u1 = ∂x , u2 = ∂x ,
1 2
∂f
u3 = ∂x with respect to the coordinate system corresponding to the basis î, ĵ, k̂. Let the
3
coordinates of a vector x in this coordinate system be x1 , x2 , x3 . We know that a new
coordinate system is obtained from the old one by rotating and/or translating it. Hence,
the new coordinates say x10 , x20 , x30 are related to the old ones by
3
X
xj0 = ajk xk + bj
k =1
where [ajk ] is an orthogonal matrix, whose inverse equals its transpose, and b is the vector
by which the origin of the old system is translated. Due to orthogonality of the
transformation [ajk ], the old coordinates can be re-expressed in terms of the new ones as
3
X
xk = akj (xj0 − bj ), k = 1, 2, 3.
j =1
272 An Introduction to Vectors, Vector Operators and Vector Analysis
where we have used the chain rule. Thus, under the coordinate transformation the operator
∇f transforms like a vector and its components in the transformed system are given by the
partial derivatives of the transformed function with respect to the transformed coordinates.
Given a vector x the vector (∇f )(x) is the same irrespective of the coordinate system used
to evaluate it.
We are now equipped to define the directional derivative of a function of three variables.
Given a scalar valued function f (x) ≡ f (x, y, z ), its derivative in a direction â is given by
∂f ∂f ∂f
(â · ∇)f = nx + ny + nz (10.8)
∂x ∂y ∂z
where (nx , ny , nz ) are the direction cosines of â with respect to the basis î, ĵ, k̂. This is
called the directional derivative of f (x) in the direction â. Henceforth, we shall drop the
parentheses in the expressions for the del operator and the directional derivative, implying
their actions implicitly. Also, we may allow replacing the unit vector â by a general vector a
(with magnitude different than unity) in the definition of the directional derivative. In that
case the direction cosines in Eq. (10.8) are replaced by the components of a.
Exercise Let f (x) ≡ f (x1 , x2 , x3 ) be a differentiable scalar valued function with
∂f
{x1 , x2 , x3 } referring to the orthonormal basis {î1 , î2 , î3 }. Show that ∂x , k = 1, 2, 3 are the
k
directional derivatives along {î1 , î2 , î3 } respectively.
∂f
Solution Notice that ∂xk
= îk · ∆f , k = 1, 2, 3.
Exercise Find the directional derivative of a scalar field f (x) along a continuous and
differentiable curve parameterized by time t.
Answer This is simply the total time derivative of the function f (x(t )) with respect to t
as can be seen from (see section 10.7)
df
= ẋ · ∇f (x).
dt
Functions with Vector Arguments 273
The RHS is just the directional derivative of f (x) in the direction of the velocity or the
tangent vector to the curve.
The concept of the directional derivative can be quite simply generalized to vector fields as
follows. The directional derivative of a vector field f(x) ≡ (f1 (x), f2 (x), f3 (x)) along â is
given by a vector with components (â · ∇f1 (x), â · ∇f2 (x), â · ∇f3 (x)) in the same basis in
which the field f(x) is resolved. Since each component of ∇f is invarient under the change
of basis, so is ∇f itself.
Another elegant approach called geometric calculus is developed by D. Hestenes and
collaborators in the context of functions with multivector arguments. This approach can
be adapted to both the scalar as well as vector valued functions of vector arguments, vectors
being a special case of multivectors. This is a coordinate-free approach, where arguments
of functions are treated as vectors as such, without resolving them into components using
a particular basis. The increment ∆x in the vector argument x is decomposed as aτ where
the vector a gives the direction and τ is a scalar variable. The directional derivative is then
defined as1
f (x + aτ ) − f (x)
a · ∇f (x) = lim . (10.9)
τ→0 τ
Note that this definition is meaningful even if the function f is vector valued, because
the limit defining it is meaningful in that case. We will show below, for a scalar valued
function, that the definitions of the directional derivative given by Eqs (10.8) and (10.9) are
equivalent. Thus, the LHS of Eq. (10.9) can be viewed as the dot product of a and ∇f (x).
In this section, unless stated otherwise, the same symbol f will represent both the scalar
and vector valued function and the corresponding result applies in both the cases.
We now obtain some basic results regarding the directional derivative. Consider
f (x + aτ + bτ ) − f (x)
(a + b) · ∇f (x) = lim
τ→0 τ
" #
f (x + aτ + bτ ) − f (x + aτ ) f (x + aτ ) − f (x)
= lim +
τ→0 τ τ
f (x + cτa) − f (x)
(ca) · ∇f (x) = c lim = c (a · ∇f (x)). (10.11)
cτ→0 cτ
(ii) Assuming either f or g or both to be scalar valued, or, by replacing the product f g
by the dot product f · g if both are vector valued, show that
a · ∇(f g ) = (a · ∇f )g + f (a · ∇g ). (10.13)
We refer to this as the “product rule”. It is trivial to check that a·∇cf (x) = ca·∇f (x). This
equation and Eq. (10.12) together show that the directional derivative is a linear operator.
Now let f be a scalar valued function of a scalar argument and let λ(x) be a scalar
valued function of a vector argument. Then, the directional derivative of the compound
function f (λ(x)) is
" #
f (λ(x + aτ )) − f (λ(x)) λ(x + aτ ) − λ(x)
a · ∇f = lim
τ→0 λ(x + aτ ) − λ(x) τ
" #" #
f (λ + ∆λ) − f (λ) λ(x + aτ ) − λ(x)
= lim lim
∆λ→0 ∆λ τ→0 τ
df
= (a · ∇λ) . (10.14)
dλ
This is the chain rule for the directional derivative of a compound function.
The directional derivative of the vector valued and the scalar valued constant functions
f(x) = b and f (x) = c are trivially zero, as seen from its definition. We have
a · ∇b = 0 = a · ∇c.
It follows directly from the definition of the directional derivative that the directional
derivative of the identity function I (x) = x is
a · ∇x = a.
We can use general rules Eqs (10.12) and (10.13) to find the derivatives of more
complicated functions. The derivatives of algebraic functions of x can be obtained in this
way. For example, we note that the “magnitude function” |x| is related to x by the algebraic
equation |x|2 = x · x. Using Eq. (10.13) we can write
x a · ∇x a · ∇|x| a (a · x̂)x
a·∇ = − 2
x= − .
|x| |x| |x| |x| |x|2
Hence,
a − (a · x̂)x̂
a · ∇x̂ = . (10.17)
|x|
Exercise Find the derivatives
(a) a · ∇(x × b) Where b is a constant vector independent of x.
Answer a × b. Follows from the definition of the directional derivative.
(b) a · ∇(x × (x × b)).
Answer (a · b)x + (x · b)a − 2(a · x)b.
Hint Use identity I and the product rule.
Exercise Let r = r(x) = x − x0 , r = |r| = |x − x0 | where x0 is independent of x. Show that
(a) a · ∇r = a · r̂.
a − (a · r̂)r̂
(b) a · ∇r̂ = .
r
a2 − (a · r̂)2
(c) a · ∇(r̂ · a) = .
r
(a · r̂)(a × r̂)
(d) a · ∇(r̂ × a) = .
r
(r̂ · a)|r̂ × a|
(e) a · ∇|r̂ × a|2 = − .
r
r̂ a − 2(a · r̂)r̂
(f) a·∇ = .
r r2
1 a · r̂
(g) a · ∇ 2 = −2 3 .
r r
1 1 3(a · r̂)2 − |r̂ × a|2
(h) (a · ∇)2 2 = .
2 r r4
1 1 (a · r̂)2 + |r̂ × a|2
(i) (a · ∇)3 2 = 4a · r̂ .
6 r r5
276 An Introduction to Vectors, Vector Operators and Vector Analysis
a · r̂
(j) a · ∇ log r = .
r
(k) a · ∇r 2k = 2k (a · r)r 2(k−1) .
(l) a · ∇r 2k +1 = r 2k (a + 2k (a · r̂)r̂).
In the last two cases k , 0 is an integer and r , 0 if k < 0.
It is quite easy to see that the definition of the directional derivative in Eq. (10.8) follows
from that in Eq. (10.9). Choosing an orthonormal basis {ê1 , ê2 , ê3 } and denoting the
components of x with respect to this basis by x1 , x2 , x3 we get
∂f ∂f ∂f
= n1 + n2 + n3 . (10.18)
∂x1 ∂x2 ∂x3
The second equality follows from Eq. (10.10) and the last equality follows because the
directional derivative (Eq. (10.9)) along the direction of one of the basis vectors reduces to
the corresponding partial derivative (Eq. (10.8)).
We note that the action of the ‘del ’ operator on a scalar valued function f (x) is the
same as that of the linear operator
3
X ∂ ∂ ∂ ∂
∇≡ êj = ê1 + ê2 + ê3 . (10.19)
∂xj ∂x1 ∂x2 ∂x3
j =1
Using the linearity of the directional derivative and Eq. (10.18) we can express the
directional derivative of a vector valued function in terms of the partial derivatives of its
component functions in an orthonormal basis (ê1 , ê2 , ê3 ). We have,
! !
∂f1 ∂f1 ∂f1 ∂f2 ∂f2 ∂f2
= n1 + n2 + n3 ê + n1 + n2 + n3 ê
∂x1 ∂x2 ∂x3 1 ∂x1 ∂x2 ∂x3 2
!
∂f3 ∂f3 ∂f3
+ n1 + n2 + n3 ê . (10.20)
∂x1 ∂x2 ∂x3 3
Functions with Vector Arguments 277
We can replace the unit vector â by a vector a of arbitrary (usually small) magnitude in the
same direction, without losing generality. In this case the components of the unit vector â,
namely, the direction cosines n1 , n2 , n3 are replaced by the components of a that is, by
a1 , a2 , a3 . The components of the directional derivative of the vector valued function f(x),
usually called a vector field, are completely specified by the matrix product
is called the Jacobian matrix of the differentiable map x 7→ f(x). When evaluated at a
particular value of x, we get a matrix whose elements are numbers, called Jacobian matrix
at x, denoted J (x). The Jacobian matrix plays the role of the derivative of the vector valued
function of a vector variable, because it gives a linear approximation to f at x just as the
derivative of a function of a single variable. Extending this analogy further, we call the
linear map defined by the Jacobian J (x) to be the map tangent to f at x. The Jacobian
matrix can be generalized to a differentiable map f : Rn 7→ Rm defining the derivative of
such a map. When m = n (in our case m = n = 3) we can evaluate the determinant of the
Jacobian J (x) which is called the Jacobian determinant of f(x) at x, denoted |J (x)|.
x = r cos θ, y = r sin θ
which maps a rectangle into a circular sector (see Fig. 10.2). Find the Jacobian matrix and
the Jacobian determinant of this mapping. Find all points (r, θ ) where the Jacobian
determinant vanishes.
G (τ ) = f (x + aτ ).
Using this definition of G (τ ) and the definition of the directional derivative (Eq. (10.9)) it
is clear that
dG (0)
= a · ∇f (x),
dτ
d 2 G (0)
= a · ∇(a · ∇f (x)) ≡ (a · ∇)2 f (x),
dτ 2
d k G (0)
= a · ∇((a · ∇)k−1 f (x)) = (a · ∇)k f (x). (10.21)
dτ k
We now expand the function with scalar argument G (τ ) in Taylor series about τ = 0 and
evaluate it at τ = 1. We get
∞
dG (0) 1 d 2 G (0) X 1 d k G (0)
G (1) = G (0) + + + · · · = .
dτ 2 dτ 2 k! dτ k
k =0
Functions with Vector Arguments 279
Using Eq. (10.21) we can express this Taylor series in terms of f (x). This gives the desired
Taylor expansion
(a · ∇)2
f (x + a) = f (x) + a · ∇f (x) + f (x) + · · ·
2!
∞
X (a · ∇)k
= f (x) ≡ ea·∇ f (x), (10.22)
k!
k =0
Each element in the second term is simply the dot product of a and ∇fk (x), k = 1, 2, 3.
The second term is given by the product of the Jacobian matrix and the vector
a ↔ [a1 a2 a3 ]T .
such that g (∆x) → 0 as |∆x| → 0. Note that the first term on RHS is simply the
directional derivative in the direction of ∆x. This equation implies that, for any required
accuracy, we can choose |a| small enough so as to make the first two terms of the Taylor
series in Eq. (10.22) give the value of f (x + a) within the required accuracy. In other
words, the remainder term in the series obtained after truncating it after the second term,
namely the last term in the equation
can be chosen as small as we please by choosing |a| small enough.2 We call the directional
derivative appearing in this equation, namely, a · ∇f (x), the differential of the function
f (x). For a scalar valued function, f (x), Eq. (10.23) is the equation for a line in 3-D in
the range x and x + dx. Thus, we see that, for any scalar valued differentiable function, the
differential provides a linear approximation to that function in a small enough range of its
argument. For a vector valued function f(x) Eq. (10.24) now becomes a vector equation,
with f replaced by
f1 (x) a · ∇f1 (x) g1 (a)
f(x + a) = f2 (x) + a · ∇f2 (x) + |a| g1 (a) ,
f1 (x) a · ∇f3 (x) g1 (a)
where the scalar valued functions fk (x), k = 1, 2, 3 are the components of f(x) with
respect to some orthonormal basis. The term in the middle, involving the gradients, is
precisely equal to the product of the Jacobian matrix times a ≡ [a1 a2 a3 ]T . Since
lim|a|→0 gk (a) = 0; k = 1, 2, 3 we can make the linear approximation to f(x) at x by the
Jacobian matrix J (x) as accurate as we please by making |a| small enough.
To appreciate the importance of the differential, (which is the same as the directional
derivative of a scalar valued or vector valued function f (x) in the direction of fixed vector
(a), we view it as a function of a for fixed x, say F (a). We have already shown that the
differential is a linear function of a (see Eqs (10.10),(10.11)). Expanding the Taylor series
about the point x0 and putting ∆x = x − x0 we have, for small enough |∆x|,
Note that the vector x − x0 in the differential (x − x0 ) · ∇f (x0 ) plays the role of vector a in
a · ∇f (x) which is a linear function of a. Therefore, using this linearity and Eq. (10.25) we
get, to the first order in ∆x
f (x) −f (x0 ) = (x−x0 ) ·∇f (x0 ) = x·∇f (x0 ) −x0 ·∇f (x0 ) = F (x) −F (x0 ). (10.26)
If we couple linearity of F (x) with Eq. (10.26), we see that the differential provides a linear
approximation to any differentiable function. Since linear functions are simple enough to
be analyzed completely, Eq. (10.26) establishes the importance of the differential. Note
that Eqs (10.25) and (10.26) apply to both the scalar valued as well as the vector valued
function f (x).
2 Compare with
df
f (a + h) = f (a) + h + |h|g (h)
dx x=a
where
lim g (h) = 0.
h→0
Functions with Vector Arguments 281
However, both x and ∆x are now not arbitrary, but x must satisfy x = x(t ) and ∆x must
join the point x(t ) and a neighboring point on the path given by x(t + ∆t ), that is,
∆x = x(t + ∆t ) − x(t ). Therefore, the variation of f (x) along the curve is given by
Now we subtract f (x(t )) from both sides, divide by ∆t on both sides and take the limit as
∆t → 0 on both sides to get the desired result,
3
df X ∂f
(x(t )) = ẋ(t ) · ∇f (x(t )) = ẋi (t ) , (10.27)
dt ∂xi
i =1
where ẋi i = 1, 2, 3 and xi i = 1, 2, 3 are the components of ẋ(t ) and x(t ) respectively,
with respect to some orthonormal basis. Thus, the time rate of change of a function of the
position vector of a particle, as it moves along its path, is given by the directional derivative
of this function along the direction of the velocity vector, which is tangent to the path in
the same sense as traversed by the particle.
Exercise The Lagrangian of a system with n degrees of freedom is a function of 2n + 1
variables, namely, L (q(t ); q̇(t ); t ) where q(t ) ≡ q1 (t ), q2 (t ), . . . , qn (t ) are the
generalized coordinates and q̇(t ) ≡ q̇1 (t ), q̇2 (t ), . . . , q̇n (t ) are the generalized velocities.
The motion is viewed as the path traced by a point in the configuration space spanned by
the n generalized coordinates. Similarly, the Hamiltonian of such a system is given as a
function of 2n + 1 coordinates, namely, H (q(t ); p(t ); t ) where p(t ) ≡ p1 , p2 , . . . , pn are
the generalized momenta. The motion is viewed as that of a point in phase space spanned
by n generalized coordinates and n generalized momenta. Find the expressions for dL dt
and dH
dt .
Answer
n n
dL X ∂L X ∂L ∂L
= q̇i + q̈i + ,
dt ∂qi ∂q̇i ∂t
i =1 i =1
d
where q̈i = dt q̇i .
n n
dH X ∂H X ∂H ∂H
= q̇i (p(t )) + ṗi + ,
dt ∂qi ∂pi ∂t
i =1 i =1
282 An Introduction to Vectors, Vector Operators and Vector Analysis
where the generalized velocities are the functions of generalized momentum vector
p(t ).
If the function f(x(t )) is vector valued, Eq. (10.27) can be expressed invoking the Jacobian.
We can write
ẋ(t ) · ∇f1 (x(t ))
df
(x(t )) = ẋ(t ) · ∇f2 (x(t )) ,
dt
ẋ(t ) · ∇f3 (x(t ))
where fi (x(t )) i = 1, 2, 3 are the components of the function f(x(t )). The RHS of this
equation is simply the product of the Jacobian matrix of the function f(x(t )) and the
column matrix comprising the components of the vector ẋ(t ). Thus, we have found the
Chain rule for differentiating the composite function f (x(t )) or f(x(t )).
f = ∇φ.
φ is called the potential of f. We know that the differential of a function φ(x) is its
directional derivative in the direction of a. The directional derivative is simply the rate at
which the value of φ changes in the direction of a. If we choose â to be unit vector giving
the chosen direction, then â · ∇φ has its maximum value when â and ∇φ are in the same
direction, that is, â · ∇φ = |∇φ|. Thus, the gradient ∇φ(x) specifies both the direction as
well as the magnitude of the maximum change in the value of φ(x) at any point x in the
domain of φ. In general, the change in the values of φ, in any given direction â, based at a
point x, is given by the scalar product of â with ∇φ(x).
Fig. 10.3 The gradient vector is orthogonal to the equipotential at every point
∇x · b = b.
∇x2 = 2x,
These formulas enable us to determine the gradients of certain functions without referring
to the directional derivative at all. Thus, if f (|x|) is a function of the magnitude of x alone,
then, by using the second of Eq. (10.28) while applying the chain rule Eq. (10.14), we get
∂f
∇f = x̂ · (10.29)
∂|x|
Later we will meet potential functions in connection with the line integrals over vector
fields derivable from a potential.
said to be C 1 -invertible if the image set f(U ) is an open set V and if there exists a C 1 -map
g : V 7→ U such that g ◦ f and f ◦ g are the respective identity maps on U and V . For
example, if f : E3 7→ E3 is given by f(x) = x + b where b is a fixed vector, then f is
C 1 -invertible, its inverse being the translation by −b.
Exercise Let U be the subset of R2 consisting of all pairs (r, θ ) with r > 0 and 0 < θ < π.
Let
with x = r cos θ and y = r sin θ. Show that this is a C 1 -map and find the image set f(U ).
Show that the inverse map is given by
x
q
g(xî + y ĵ) = x2 + y 2 î + cos−1 p ĵ,
x + y2
2
p
with r = x2 + y 2 and θ = cos−1 √ 2x 2 .
x +y
Answer The image of U is the upper half plane consisting of all (x, y ) such that y > 0,
and arbitrary x. Inverse can be checked explicitly.
In many cases a map may not be invertible over the whole space or over arbitrary subsets
of it, but can still be C 1 -invertible locally in the following sense. Let a point x ∈ U . We say
that a map f is locally C 1 -invertible at x if there exists an open set U1 satisfying x ∈ U1 ⊂ U
such that f is C 1 -invertible on U1 .
Exercise Show that the map given by Eq. (10.30) is not C 1 -invertible on all of R2 , but
given any point, it is locally invertible at that point.
Hint If we take r < 0, the inverse map given inpthe previous exercise does not work.
However, we can locally invert by choosing r = − x2 + y 2 in the inverse map at a point
with r < 0.
In most cases the locally invertible map cannot be expressed in closed form. However, there
is a very important result which gives computable criterion for local invertibility of a map.
so that
∂f
|JF (a, b )| = (a, b ),
∂y
which, by assumption, is not zero and the inverse mapping theorem then implies what we
are asked to prove.
The result of this exercise can be used to discuss implicit functions. We assume that the
function f : U 7→ R defined in the exercise has the value c at (a, b ), or, f (a, b ) = c.
We wish to find out whether there is some differentiable function y = φ(x ), defined near
x = a, such that φ(a) = b and
f (x, φ(x )) = c
for all x near a. If such a function φ exists, we say that y = φ(x ) is the function determined
implicitly by f .
We know that F (a, b ) = (a, c ) and that there exists a C 1 -inverse G defined locally near
(a, c ). We can write
286 An Introduction to Vectors, Vector Operators and Vector Analysis
for some function g. This equation shows that we have put z = f (x, y ) and y = g (x, z ).
We define
φ(x ) = g (x, c ).
This proves that f (x, φ(x )) = c. Furthermore, by definition of an inverse map, G (a, c ) =
(a, b ) so that φ(a) = b. This proves the implicit function theorem in two dimensions.
Exercise Show that the function f (x, y ) = x2 + y 2 implicitly defines a function y =
φ(x ) near x = 1. Find this function. Take (i) (a, b ) = (1, 1), (ii) (a, b ) = (−1, −1).
Answer
∂f
(i) c = f (1, 1) = 2. ∂y
= 2 , 0, so the implicit function y = φ(x ) near x = 1
(1, 1)
√
exists. It can be found by explicitly solving 2 = x2 + y 2 : y = 2 − x2 .
√
(ii) y = − 2 − x2 .
In general, the equation f (x, y ) = c defines some curve as in Fig. 10.4(a). As indicated in
Fig. 10.4(b), we see that there is an implicit function near the point (a, b ), which exists only
for points near x = a and not for all x values. It is straightforward to generalize the implicit
function theorem to higher dimensional functions f : Rn 7→ R but we will not pursue it
here.
Fig. 10.4 Neighborhood of point (a, b ) on f (x, y ) = c is locally given by the implicit
function y = f (x )
Functions with Vector Arguments 287
with
That is, we want to solve the equation u = f(x) for x where u is a point near u0 = f(x0 ),
where, at x = x0 we must have, for the Jacobian determinant |Jf (x0 )|,
φx (x0 ) φy (x0 ) φz (x0 )
|Jf (x0 )| = ψx (x0 ) ψy (x0 ) ψz (x0 ) , 0.
χx (x0 ) χy (x0 ) χz (x0 )
The differentials dx, dy, dz and du, dv, dw satisfy the linear relations (see Eq. (10.23))
du = dφ = φx dx + φy dy + φz dz
dv = dψ = ψx dx + ψy dy + ψz dz
dw = dχ = χx dx + χy dy + χz dz (10.31)
or,
du = Jf (x)dx (10.32)
where
φx (x) φy (x) φz (x)
Jf (x) = x ψ ( x ) ψ y ( x ) ψ z ( x )
χx (x) χy (x) χz (x)
or,
q
(∆x · ∇f (x)) · (∆x · ∇f (x))
q
= (hφx + kφy + lφz )2 + (hψx + kψy + lψz )2 + (hχx + kχy + lχz )2 (10.33)
where φx,y,z , ψx,y,z , χx,y,z are the partial derivatives giving the row-wise elements of the
Jacobian matrix and h, k, l are the components of ∆x. Let M denote an upper bound on the
absolute values of all the elements of the Jacobian matrix taken at all points of the segment
joining x and x + ∆x. This gives
q
(hφx + kφy + lφz )2 + (hψx + kψy + lψz )2 + (hχx + kχy + lχz )2
√ p
≤ 3M (|h| + |k| + |l|) ≤ 3M h2 + k 2 + l 2 (10.34)
|x − x0 | < δ (10.36)
of the point x0 in the domain R of f. Let u0 = f(x0 ). For a fixed u we write the equation
u = f(x) which is to be solved for x, in the form
x = g(x), (10.37)
where
where A stands for an appropriately chosen fixed non-singular operator (or matrix) with
inverse denoted by A−1 . Thus, Eq. (10.37) is equivalent to A(u − f(x)) = 0, which by
multiplication with A−1 yields
where I is the identity operator represented by the unit matrix. Thus, a solution x of
Eq. (10.37), that is, a fixed point of the map g, furnishes a solution of u = f(x).
Functions with Vector Arguments 289
We show that a fixed point of the map g can be reached by reaching the limit of xn
defined by the recursion formula
provided the Jacobian matrix, which in this case we denote by g0 (x), representing the
derivative of the vector mapping g is of sufficiently small size. This procedure is popularly
known as the method of successive approximations. Making the ‘small size’ requirement
more precise, we require that for all x in the neighborhood of x0 given by Eq. (10.36), the
largest element of the matrix g0 is less than 16 in absolute value and that
1
|g(x0 ) − x0 | < δ.
2
The last equation is the condition on the initial value from which to start the iteration.
First, we prove by induction that, under the assumptions stated, the recursion formula
in Eq. (10.39) successively gives vectors satisfying Eq. (10.36). This assures us that xn lie in
the domain of g so that the sequence can be continued indefinitely. From Eq. (10.35) with
M = 61 we see that,
1
|g(y) − g(x)| ≤ |y − x| for |x − x0 | < δ, |y − x0 | < δ. (10.40)
2
Now the inequality in Eq. (10.36) holds trivially for x = x0 . If it holds for x = xn , we find
for the vector xn+1 defined by Eq. (10.39) that
1 1
|xn+1 −x0 | ≤ |xn+1 −x1 | + |x1 −x0 | = |g(xn ) −g(x0 )| + |g(x0 ) −x0 | ≤ |xn −x0 | + δ.
2 2
This proves that |xn − x0 | < δ for all n.
To see that the sequence {xn } converges, we observe that by Eq. (10.40),
1
|xn+1 − xn | = |g(xn ) − g(xn−1 )| ≤ |xn − xn−1 |.
2
In the same way,
1
|xn − xn−1 | ≤ |xn−1 − xn−2 |,
2
1
|xn−1 − xn−2 | ≤ |xn−2 − xn−3 |
2
and so on. These inequalities together imply
1 δ
|xn+1 − xn | ≤ |x − x0 | ≤ n+1 . (10.41)
2n 1 2
290 An Introduction to Vectors, Vector Operators and Vector Analysis
Since the distance between successive iterates decreases exponentially, the sequence {xn }
must converge to its limit say x∗ . In this limit, the distance between successive iterates goes
to zero. Therefore, the substitution of this limit x∗ in g(x) must return the same vector x∗ .
In other words this limit x∗ solves Eq. (10.37). Another way to see this is the following.
Since g(x) is continuous, if the sequence {xk } k = 1, 2, . . . converges to x∗ the sequence
g(xk ) k = 1, 2, . . . must converge to g(x∗ ). However, by virtue of Eq. (10.39) these two
sequences are identical, making their limits the same, that is, x∗ = g(x∗ ).
Since the function g depends continuously on u, the xn obtained successively by
recursion formula Eq. (10.37) also depends continuously on u. Since the convergence of
the sequence {xn } does not depend on u, it follows that its limit x∗ is a continuous
function of u. Also, we have |x∗ − x0 | ≤ δ because |xn − x∗ | < δ for all n. If there existed a
second solution x0 with x0 = g(x0 ) and |x0 − x0 | ≤ δ we find from Eq. (10.40) that
1
|x0 − x∗ | = |g(x0 ) − g(x∗ )| ≤ |x0 − x∗ |
2
which makes |x0 − x∗ | = 0 and x0 = x∗ .
Thus, we have established the existence, uniqueness and continuity of a solution x∗ of
the equation u = f(x), for which |x∗ − x0 | ≤ δ, provided the function g(x) defined by
Eq. (10.38) has the derivative g0 with elements less than 61 in absolute value for
|x∗ − x0 | ≤ δ and provided |g(x0 ) − x0 | < 21 δ. These requirements can be satisfied for all u
sufficiently close to u0 by a suitable choice of the matrix A. By the definition of g
(Eq. (10.38))
The existence of this inverse is guaranteed by our basic assumption that the matrix f0 (x0 )
has a non-vanishing determinant, that is, the Jacobian of the mapping f does not vanish at
the point x0 . The assumed continuity of the first derivatives of the mapping f implies that
g0 (x) depends continuously on x; hence the the elements of g0 (x) are arbitrarily small, for
instance less than 61 , for sufficiently small |x − x0 |, say for |x − x0 | ≤ δ. Moreover, by
Eq. (10.38)
1
|g(x0 ) − x0 | = |A(u − f(x0 ))| = |A(u − u0 | < δ,
2
Functions with Vector Arguments 291
or,
where
lim (v − u) = 0.
v→u
This equation just says that the vector x satisfying u = f(x) is a differentiable function of
vector u and that the Jacobian matrix of x with respect to u is the inverse of the matrix
f0 (x) = J (x).
We substitute the inverse functions in the given functions to get the compound functions
3 The transformation f(x) could be passive, that is, the one which changes the coordinates of the same vector referring to a
different basis.
292 An Introduction to Vectors, Vector Operators and Vector Analysis
These equations are identities as they hold for all values of u, v, w. We now differentiate
each of these equations with respect to u v and w regarding them as independent
variables and apply the chain rule to differentiate the compound functions. We then
obtain the system of equations
1 = φx gu + φy hu + φz ku 0 = φx gv + φy hv + φz kv 0 = φx gw + φy hw + φz kw
0 = ψx gu + ψy hu + ψz ku 1 = ψx gv + ψy hv + ψz kv 0 = φx gw + φy hw + φz kw
0 = χx gu + χy hu + χz ku 0 = χx gv + χy hv + χz kv 1 = χx gw + χy hw + χz kw
Solving these equations for nine unknowns gu,v,w , hu,v,w , ku,v,w we get the partial
derivatives of the inverse functions x = g (u, v, w ), y = h(u, v, w ), z = k (u, v, w ) with
respect to u, v, w expressed in terms of the derivatives of the original functions φ(x, y, z ),
ψ (x, y, z ), χ (x, y, z ) with respect to x, y, z, namely,
1 1
gu = [ψ χ − ψz χy ] gv = [χy φz − χz φy ] gw
D y z D
1
= [φ ψ − φz ψy ]
D y z
1 1
hu = [ψz χx − ψx χz ] hv = [χz φx − χx φz ] hw
D D
1
= [φ ψ − φx ψz ]
D z x
1 1
ku = [ψx χy − ψy χx ] kv = [χx φy − χy φx ] kw
D D
1
= [φ ψ − φy ψx ] (10.43)
D x y
Functions with Vector Arguments 293
This justifies calling the Jacobian the derivative of a differentiable map f : E3 7→ E3 . For a
2-D map Eq. (10.43) reduce to
ψy φy ψx φ
gu = , gv = − , hu = − , hv = x , (10.44)
D D D D
where the Jacobian determinant D is given by
φx φy
D = ·
ψx ψy
−y y x x
θx = =− , θy = 2 = 2. (10.45)
x2 + y 2 r 2 x +y 2 r
From the formulae for the derivatives of the inverse functions (Eq. (10.44)) for the 2-D case,
we find that the Jacobian determinant of the functions x = x (u, v ) and y = y (u, v ) (where
the coordinates themselves replace the function names g and h) with respect to u and vis
given by
!−1
d (x, y ) ux vy − uy vx 1 d (u, v )
= xu yv − xv yu = 2
= = . (10.46)
d (u, v ) D D d (x, y )
Thus, the Jacobian determinant of the inverse system of functions is the reciprocal of the
Jacobian determinant of the original system.4 This is not surprising, because these
Jacobians are the inverses of each other, as we have shown above (see the last para before
the present subsection).
∂2 x
Exercise Find the second order derivatives for a 2-D map xuu = ∂u 2
= guu and yuu =
∂2 y
∂u 2
= huu .
Hint Differentiate the equations (with ux = φx , xu = gu etc.)
1 = ux xu + uy yu
0 = vx xu + vy yu (10.47)
again with respect to u and use the chain rule. Then, solve the resulting system of linear
equations regarding the quantities xuu and yuu as unknowns and then replace xu and yu
by the expressions already known for them. Note that the determinant of the doubly
differentiated system is again D and hence, by hypothesis, is not zero.
Answer
2 2
1 uxx vy − 2uxy vx vy + uyy vx uy
xuu = − 3
D vxx vy2 − 2uxy vx vy + vyy vx2 vy
and
2 2
1 uxx vy − 2uxy vx vy + uyy vx ux
yuu = 3 ·
D vxx vy2 − 2vxy vx vy + vyy vx2 vx
4 This is the analogue of the rule for the derivative of the inverse of a function of a single variable. See, for example, [5]
volume I.
Functions with Vector Arguments 295
we can compose these two maps to get a differentiable and 1 − 1 map from open set R1 to
the open set R3 as g ◦ f(x) = g(f(x)). If the components of f(x) are
∂u ∂u
= Φξ φx + Φη ψx + Φζ χx , = Φξ φy + Φη ψy + Φζ χy ,
∂x ∂y
∂u
= Φξ φz + Φη ψz + Φζ χz ,
∂z
∂v ∂v
= Ψξ φx + Ψη ψx + Ψζ χx , = Ψξ φy + Ψη ψy + Ψζ χy ,
∂x ∂y
∂v
= Ψξ φz + Ψη ψz + Ψζ χz ,
∂z
∂w ∂w
= Ωξ φx + Ωη ψx + Ωζ χx , = Ωξ φy + Ωη ψy + Ωζ χy ,
∂x ∂y
∂w
= Ωξ φz + Ωη ψz + Ωζ χz . (10.48)
∂z
Equation (10.48) can be written in the matrix form,
∂u ∂u ∂u
∂x Φξ Φη Φζ φx φy φz
∂y ∂z
∂v
∂v ∂v
= Ψξ Ψη Ψζ ψx ψy ψz ·
∂x ∂y ∂z
∂w ∂w ∂w
Ωξ Ωη Ωζ χx χy χz
∂x ∂y ∂z
Since the determinant of the product of matrices is the product of their determinants, we
conclude that the Jacobian determinant of the composition of two transformations is
296 An Introduction to Vectors, Vector Operators and Vector Analysis
g 0 (t0 )
m= (10.50)
f 0 (t0 )
Similarly, the slope of the image curve
dη/dt ψx f 0 + ψy g 0 c + dm
µ= = = , (10.52)
dξ/dt φx f 0 + φy g 0 a + bm
Since
dµ ad − bc
=
dm (a + bm)2
we find that µ is an increasing function of m if ad − bc > 0 and decreasing function if
ad−bc < 0. More precisely, this holds locally, excluding the directions where m or µ become
infinite.
Increasing slopes correspond to increasing angles of inclination or to counterclockwise
dµ
rotation of the corresponding directions. Thus, dm > 0 implies that the counterclockwise
Functions with Vector Arguments 297
dµ
sense of rotation is preserved, while it is reversed for dm < 0 Now ad−bc is just the Jacobian
determinant
d (ξ, η ) φx φy
=
d (x, y ) ψx ψy
10.12 Surfaces
As for curves, in most cases the parametric representation is found suitable for surfaces [5].
Since a surface is a two dimensional object, it requires two parameters to fix a point on it, as
against one parameter required to fix a point on a curve. Thus, a parametric representation
of a surface is given by parameterizing the position vector x ≡ (x, y, z ) of a point on the
surface,
do not all vanish at once. We can summarize this condition in a single inequality
If the inequality Eq. (10.55) is satisfied, in some neighbourhood of each point on the surface
given by the R2 7→ R3 map in Eq. (10.53), it is certainly possible to express one of the three
coordinates in terms of the other two.
298 An Introduction to Vectors, Vector Operators and Vector Analysis
At each point on the surface with parameters u, v we can partially differentiate the
position vector to give
The three determinants Eq. (10.54) are just the components of the vector product xu × xv .
The expression on the left of the inequality in Eq. (10.55) is the square of the length of the
vector xu × xv so that condition Eq. (10.55) is equivalent to
xu × xv , 0 (10.58)
where v = θ is the “polar inclination” or the polar angle and u = φ is the “longitude” or
the azimuthal angle made by the point on the sphere. Note that the functions relating x, y, z
to u, v are single valued and cover all the sphere. As v runs from π/2 to π the point x, y, z
spans the lower hemisphere, that is,
q
z = − r 2 − x2 − y 2
while the values of v from 0 to π/2 give the upper hemisphere. Thus, for the parametric
representation it is not necessary, as it is for the representation
q
z = ± r 2 − x2 − y 2 ,
to apply two single valued branches of the function in order to span the whole sphere.
We obtain another parametric representation of the sphere by means of stereographic
projection. In order to project the sphere x2 + y 2 + z2 − r 2 = 0 stereographically from the
north pole (0, 0, r ) on the equatorial plane z = 0, we join each point of the surface to the
north pole N by a straight line and call the intersection of this line with the equatorial plane
the stereographic image of the corresponding point of the sphere (see Fig. 10.5). We thus
obtain a 1 − 1 correspondence between the points of the sphere and the points of the plane,
except for the north pole N . Using elementary geometry, we find that this correspondence
is expressed by
Functions with Vector Arguments 299
2r 2 u 2r 2 v (u 2 + v 2 − r 2 )r
x= , y = , z = , (10.60)
u2 + v2 + r 2 u2 + v2 + r 2 u2 + v2 + r 2
where (u, v ) are the rectangular (cartesian) coordinates of the image point in the plane.
These equations can be regarded as the parametric representation of the sphere, the
parameters (u, v ) being the rectangular coordinates in the u, v (equatorial) plane.
As a further example, we give parametric representation of surfaces
x2 y 2 z2 x2 y2 z2
+ − = 1 and + − = −1
a2 b2 c2 a2 b2 c2
called the hyperboloid of one sheet and the hyperboloid of two sheets respectively
(see Fig. 10.6). The hyperboloid of one sheet is represented by
x = a cos u cosh v,
y = b sin u cosh v,
z = c sinh v (10.61)
where 0 ≤ u < 2π; −∞ < v < +∞ and the hyperboloid of two sheets by
x = a cos u sinh v,
y = b sin u sinh v,
z = ±c cosh v (10.62)
R of the (u, v ) plane there corresponds one point of the surface and typically the converse
is also true.5
Fig. 10.6 (a) Hyperboloid of one sheet and (b) Hyperboloid of two sheets
Just as we can parameterize a surface by mapping a region in the u, v plane via Eq. (10.53),
we can parameterize a curve on a surface by mapping an appropriate curve in the u, v plane
onto the given curve on the surface. Thus, a curve u = u (t ), v = v (t ) in the u, v plane
corresponds, by virtue of Eq. (10.53), to the curve
z (t ) = χ (u (t ), v (t ))) (10.63)
on the surface. Thus for example, the coordinate lines passing through a point on the sphere
have the parametric equations u = φ = constant (longitudes) and v = θ = constant
(latitudes). Corresponding curves in the u, v plane are the lines parallel to v and u axes
respectively. The net of parametric curves (the mesh of latitudes and longitudes on the
sphere) corresponds to the net of parallels to the axes in the u, v plane.
The tangent to the curve on the surface corresponding to the curve u = u (t ), v = v (t )
in the u, v plane has the direction of the vector xt = dx
dt , that is,
!
du dv du dv du dv du dv
xt = (xt , yt , zt ) = xu + xv , yu + yv , zu + zv = xu +xv .
dt dt dt dt dt dt dt dt
(10.64)
At a given point on the surface, the tangential vectors xt of all curves on the surface passing
through that point are linear combinations of two vectors xu , xv which respectively are
tangential to to the parametric lines v = constant and u = constant passing through that
5 This is not always the case. For example, in the representation Eq. (10.59) of the sphere by spherical coordinates, the poles
of the sphere correspond to the whole line segments given by v = 0 and v = π.
Functions with Vector Arguments 301
point. (e.g., the vectors φ̂ and θ̂ for the spherical polar coordinates on a sphere.) This means
that the tangents all lie in the plane through the point spanned by the vectors xu and xv ,
that is, the tangent plane to the surface at that point. The normal to the surface at that
point is perpendicular to all tangential directions, in particular to the vectors xu and xv .
Thus, the surface normal is parallel (or antiparallel) to the direction of the vector product
xu × xv = (yu zv − yv zu , zu xv − zv xu , xu yv − xv yu ). (10.65)
One of the most important keys to the understanding of the given surface is the study of
the curves that lie on it. Here, we give the expression for the arc length s of such a curve.
We start with
!2 !2 !2 !2
ds dx dy dz
= + + = xt · xt ,
dt dt dt dt
!2 !2 !2
du dv du dv du dv
= xu + xv + yu + yv + zu + zv
dt dt dt dt dt dt
!2 ! ! !2
du du dv dv
= E + 2F +G . (10.66)
dt dt dt dt
Here, the coefficients E, F, G, the Gaussian fundamental quantities of the surface, are
given by
!2 !2 !2
∂x ∂y ∂z
E = + + = xu · xu
∂u ∂u ∂u
∂x ∂x ∂y ∂y ∂z ∂z
F = + + = xu · xv
∂u ∂v ∂u ∂v ∂u ∂v
!2 !2 !2
∂x ∂y ∂z
G = + + = xv · xv (10.67)
∂v ∂v ∂v
These depend only on xu , xv and therefore on the surface and its parametric
representation and not on the particular choice of the curve on the surface. The expression
Eq. (10.66) for the derivative of the length of arc s with respect to the parameter t usually
302 An Introduction to Vectors, Vector Operators and Vector Analysis
is written symbolically without reference to the parameter used along the curve. One says
that the line element ds is given by the quadratic differential form (“fundamental form”)
Our original condition on the parametric representation (inequality Eq. (10.55)) can now
be formulated as the condition
EG − F 2 > 0 (10.70)
du dv
xt = xu + xv .
dt dt
If we now consider a second curve, u = u (τ ), v = v (τ ) on the surface referred to a
parameter τ, its tangent has the direction of the vector
du dv
xτ = xu + xv .
dτ dτ
If the two curves pass through the same point on the surface, the cosine of the angle of
intersection ω is the same as the cosine of the angle between xt and xτ . Hence,
xt · xτ
cos ω = .
|xt ||xτ |
Functions with Vector Arguments 303
We have,
! !
du dv du dv
xt · xτ = xu + xv · xu + xv
dt dt dτ dτ
!
du du du dv du dv dv dv
= E +F + +G . (10.72)
dt dτ dt dτ dτ dt dt dτ
Consequently, the cosine of the angle between two curves on the surface is given by
E du du
dt dτ + F du dv
dt dτ + du dv dv dv
dτ dt + G dt dτ
cos ω = q q
2 2 2 2
E du
dt + 2F du
dt
dv
dt + G dv
dt E du
dτ + 2F du
dτ
dv
dτ + G dv
dτ
(10.73)
We end this subsection by giving one more example of parametrization of a surface which
comes up frequently in applications. We consider torus. This is obtained by rotating a circle
about a line which lies in the plane of the circle, but does not intersect with it (see Fig. 10.7).
We take the axis of rotation as the z-axis and choose the y-axis so as to pass through the
center of the circle, whose y-coordinate we denote by a. If the radius of the circle is r < |a|,
we obtain
as a parametric representation of the circle in the y − z plane. Now letting the circle rotate
about the z-axis, we find that for each point on the circle x2 + y 2 remains constant; that is,
x2 + y 2 = (a + r cos θ )2 . If φ is the angle of rotation about the z-axis, we have
x = (a + r cos θ ) sin φ,
304 An Introduction to Vectors, Vector Operators and Vector Analysis
y = (a + r cos θ ) cos φ,
z = r sin θ (10.74)
with 0 ≤ φ < 2π, 0 ≤ θ < 2π as a parametric representation of the torus in terms of the
parameters θ and φ. In this representation the torus appears as the image of the square of
side 2π in the θ, φ plane. Any pair of boundary points of this square lying on the same line
θ = constant or φ = constant corresponds to only one point on the surface and the four
corners of the square all correspond to the same point on the surface.
Equation (10.67) gives, for the line element on the torus,
where f1,2,3 (x) are the scalar valued component functions of the vector valued function
f(x) with respect to some orthonormal basis (see Eq. (10.1)). If we fix a position vector
x, then we get the corresponding vector f(x) giving us the unique value of the divergence
∇·f(x). Thus, the divergence of a vector field is itself a scalar field and we can calculate ‘the
divergence at a point’. The value of the divergence of a vector field at a point is a measure of
how much a vector f(x) spreads out from (or flows into) the point x in question. Thus, the
vector function in Fig. 10.8(a) has large positive divergence (if the arrows pointed inward
it would be a large negative divergence), the function in Fig. 10.8(b) has zero divergence
and Fig. 10.8(c) again shows a function of positive divergence. Here is a nice possible
observation of the divergence phenomenon [9]. Imagine standing at the edge of a pond.
Sprinkle some sawdust on the surface. If the material spreads out then you have dropped
it at a point of positive divergence; if it collects together, you have dropped it at a point of
negative divergence. The vector function v in this model is the velocity of the water. This
is a two dimensional example but it helps give us a feel for the meaning of divergence. A
point of positive divergence is a sourse or ‘foucet’; a point of negative divergence is a sink
or ‘drain’.
Functions with Vector Arguments 305
Exercise If the functions in Fig. 10.8 are va = r = xx̂ + y ŷ + zẑ, vb = ẑ and vc = zẑ,
calculate the divergences.
Answer ∇ · va = 3, ∇ · vb = 0, ∇ · vc = 1.
Pn ∂xk
In fact the first result can be generalized to n dimensions as ∇ · x = k =1 ∂xk = n.
Fig. 10.8 Vector fields given by (a) va (b) vb (c) vc as defined in this exercise
r̂
Exercise Sketch the vector function v = r2
and compute its divergence except at r = 0.6
Hint Write
r̂ x y z
2
= 2 2 2 3/2
x̂ + 2 2 2 3/2
ŷ + 2 ẑ
r (x + y + z ) (x + y + z ) (x + y + z2 )3/2
2
and evaluate ∇ · v.
Answer ∇ · rr̂2 = 0
The result of the above exercise can be explained as follows. The flux of a vector field across
the surface enclosing a volume is simply the integral of the corresponding vector valued
function on the surface. If we enclose the point of interest in an infinitesimal cube, then,
as we will see later, this flux equals ∇ · vdV where v defines the field and dV is the volume
of the infinitesimal cube. For v = rr̂2 , looking at its expression with respect to the cartesian
system x̂, ŷ, ẑ, it is clear that the flux through the opposite faces of the cube cancel each
other so that the net flux through the cube is zero. Since dV , 0 we must have ∇ · rr̂2 = 0.
Note that the divergence of a vector field changes even if the field has a changing
magnitude in a single direction. Thus, for the field given by v(x) = cos(πx )x̂ the
divergence is ∇ · v = −π sin(πx )and varies sinusoidally with x. At any point the field
6 To find what happens at r = 0 read the appendix on Dirac delta function.
306 An Introduction to Vectors, Vector Operators and Vector Analysis
flows into the point along x axis if sin(πx ) > 0 and out of it if sin(πx ) < 0. For the field
v = rr̂2 , the field spreads out as r 2 as we go out from the origin, but its magnitude falls as
1
r2
so that its divergence is zero.
Since the operator del transforms like a vector under the rotation and translation of a
coordinate system, the divergence ∇ · v of a vector field v transforms like the scalar product
of two vectors, that is, like a scalar.
Exercise In two dimensions, show that the divergence transforms as a scalar under
rotation.
Hint Use the rotation (about the z-axis) matrix explicitly to transform (vx , vy ) and (x, y ),
then use the chain rule to show that the expression for ∇ · v remains invariant.
The curl
The curl of a vector field is the vector product of the del operator with the vector valued
function defining the field, say v. It can be conveniently defined using Levi-Civita symbols,
X ∂vk
(∇ × v)i = εijk i, j, k = 1, 2, 3. (10.77)
∂xj
jk
Here εijk are the Levi-Civita symbols, v1,2,3 (x) and x1,2,3 are the components of v(x) and
x respectively with respect to some orthonormal basis.
Exercise Write down ∇ × v explicitly in terms of its components.
Answer
The value of ∇ × v(x) at a point x is a measure of how much the vector v(x) “curls around”
the point x in question. Thus, the three functions in Fig. 10.8 all have zero curl while the
functions in Fig. 10.9 have a substantial curl, pointing in the z direction, as the rule of
fixing the direction of a cross product would suggest. In anology with the illustration for
divergence, imagine that you are standing at the edge of a pond. Float a small paddle wheel
(like a cork with toothpicks pointing out radially); if it starts to rotate, you have placed it at
a point of non-zero curl. A whirlpool would be a region of large curl. To furnish intuition
further, we can read Eq. (10.78) geometrically.
Functions with Vector Arguments 307
∂v
Thus, in Fig. 10.10(a), the signs of ∂v∂y
z
and ∂zy are opposite enhancing the first term
in Eq. (10.78). In Fig. 10.10(b) these signs are the same, weakening the first term.
Figure 10.10(c) shows that the sign of the gradient of a component along the
corresponding axis can be determined by the change in its value along that axis, thus
deciding its contribution to the curl of the field.
Exercise Suppose the function sketched in Fig. 10.9(a) is va = y x̂ − xŷ and that in
Fig. 10.9(b) is vb = y x̂. Calculate their curls and the divergence.
Answer ∇ × va = −2ẑ and ∇ × vb = −ẑ. Both have zero divergence. This is consistant
with Fig. 10.9, which shows the fields which are not spreading out, but are only curling
around.
After defining the divergence and the curl, we need to obtain rules for their action on
expressions involving vector valued functions and also their combined action on such
functions. For completeness we also state here the corresponding rules for the action of
the del operator on the scalar valued functions. We have, for the scalar valued functions
f (x), g (x) and the vector valued functions A(x), B(x)
308 An Introduction to Vectors, Vector Operators and Vector Analysis
∇(f + g ) = ∇f + ∇g, ∇ · (A + B) = ∇ · A + ∇ · B
∇ × (A + B) = ∇ × A + ∇ × B,
and
as can be easily checked using their definitions and the linearity of differentiation. Different
rules apply for different types of products of functions, that is, scalar valued products f g
and A · B and the vector valued products f A and A × B. This leads to six product rules,
two for gradients,
∇ · (f A) = f (∇ · A) + A · (∇f ), (10.82)
∇ · (A × B) = B · (∇ × A) − A · (∇ × B) (10.83)
∇ × (f A) = f (∇ × A) − A × (∇f ), (10.84)
we get
∂Bj
!
X ∂Bi
[A × (∇ × B)]i = Aj − .
∂xi ∂xj
j
Functions with Vector Arguments 309
Similarly,
∂Aj
!
X ∂Ai
[B × (∇ × A)]i = Bj − .
∂xi ∂xj
j
Putting all terms together, we get, for the ith component of the RHS,
∂Bj ∂Aj
X !
Aj + Bj = [∇(A · B)]i .
∂xi ∂xi
j
= (∇ × A)k Bk − (∇ × B)j Aj
= B · (∇ × A) − A · (∇ × B). (10.86)
All the above rules for differentiating expressions of functions are valid for all
differentiable functions, scalar or vector valued, as the case may be. Therefore, these rules
can be treated as vector identities involving differential operators. You may try and prove
all these identities using Levi-Civita symbols.
Second derivatives
Upto now we obtained rules to find different types of derivatives of expressions involving
various types of functions. We shall now find rules to evaluate second derivatives obtained
by combining different types of first derivatives, namely, the gradient, the divergence and
the curl. Since ∇f is a vector for a scalar valued function f , we can take the divergence and
the curl of it. We have,
(i) Divergence of the gradient: ∇2 f ≡ ∇ · (∇f ).
(ii) Curl of gradient: ∇ × (∇f ).
The divergence ∇ · v is a scalar, so we can take its gradient:
(iii) Gradient of divergence: ∇(∇ · v).
The curl ∇ × v is a vector, so we can take its divergence and curl :
(iv) Divergence of curl: ∇ · (∇ × v).
(v) Curl of a curl: ∇ × (∇ × v).
These are all the possibilities and we consider them one by one.
∂2 f ∂2 f ∂2 f
= + + . (10.88)
∂x2 ∂y 2 ∂z2
∇ × (∇f ) = 0. (10.90)
7 For curvilinear coordinates, where the unit vectors themselves depend on position, they too must be differentiated.
312 An Introduction to Vectors, Vector Operators and Vector Analysis
∂2 f ∂2 f
In this double sum, the pairs of terms like ∂x ∂x and ∂x ∂x occur with opposite
1 2 2 1
signs and cancel8 and all terms can be paired this way. Hence, the sum vanishes and
we get
[∇ × (∇f )]i = 0, i = 1, 2, 3.
(iii) ∇(∇ · v) seldom occurs in physical applications. Note that ∇2 v , ∇(∇ · v).
(iv) The divergence of a curl, like the curl of a gradient, is always zero.
∇ · (∇ × v) = 0. (10.91)
∂2 vk
∇ · (∇ × v) = εijk .
∂xi ∂xj
In this triple sum, for a fixed value of k, two terms occur with interchanged values of
indices i and j. These terms are identical but with opposite signs and hence cancel.
All terms occur in such pairs so that the sum vanishes, thus proving Eq. (10.91).
(v) The curl of curl operator can be decomposed into the gradient of divergence and the
vector Laplacian as follows.
∇ × (∇ × v) = ∇(∇ · v) − ∇2 v. (10.92)
∂2 vm
[∇ × (∇ × v)]i = εkij εklm
∂xj ∂xl
Using
this becomes
∂ ∂vj ∂2 vi
!
− = [∇(∇ · v)]i − [∇2 v]i .
∂xi ∂xj ∂xj2
Note that Eq. (10.92) can be taken to be a coordinate free definition of ∇2 v in preference
to Eq. (10.89) which depends on cartesian coordinates.
Exercise In what follows r denotes a position vector r = |r| is its magnitude, A(r) and
B(r) are vector fields, φ(r) is a scalar field and f (r ) is a function of r. All fields and
functions have continuous first derivatives. Using Levi-Civita symbols or otherwise, prove
the following.
(i) ∇ × (∇ × A) = ∇(∇ · A) − ∇2 A.
(ii) A × (∇ × B) = ∇B (A · B) − (A · ∇)B where ∇B operates on B only.
(iii) Given ∇ × A = 0 = ∇ × B show that ∇ · (A × B) = 0.
(iv) For constant a and b show that ∇ × [(a × r) × b] = a × b.
(v) ∇ · r = 3, ∇ × r = 0, ∇(A · r) = A, (A · ∇)r = A.
f (r )r
(vi) ∇r n = nr n−2 r, ∇ · r = 1 d
r 2 dr
(r 2 f ).
We will use any one or more of these results in the sequel, as and when required.
Exercise A particle performs uniform circular motion on a circle of radius r and position
vector r. Show that (a) ∇ × v = 2ω and (b) ∇ · v = 0, where v is the linear velocity and ω
is the (constant) rotational velocity of the particle.
Solution
∇ × v = ∇ × (ω × r) = ω (∇ · r) − ω · ∇r.
However, ∇·r = 3 and ω·∇r = ω, which gives (a). Thus, we see that the curl operator
transforms the velocity vector into a rotational velocity vector.
(b) ∇ · v = ∇ · (ω × r) = r · ∇ × ω − ω · ∇ × r = 0, since ω is a constant vector and
∇ × r = 0.
Thus, there are basically two types of second derivatives, the Laplacian, which is of
fundamental importance and the gradient of divergence which we seldom encounter.
Since the second derivatives suffice to deal with practically all the physical applications,
going over to higher derivatives will reduce to an academic exercise without any physical
motivation.
Fig. 10.11 The Network of coordinate lines and coordinate surfaces at any arbitary
point, defining a curvilinear coordinate system
∂x ∂x ∂x ∂x ∂x ∂x
· = · = · = 0.
∂u ∂v ∂u ∂w ∂v ∂w
We are interested in the differential displacement dx as we go from x(u, v, w ) to x(u +
du, v + dv, w + dw ). We have, in terms of the corresponding Jacobian matrix,
∂s1 ∂s1 ∂s1
∂u ∂v ∂w
du
∂x ∂x ∂x
dx = ∂s ∂s2 ∂s2
dv
2 = du + dv + dw = xu du + xv dv + xw dw.
∂u ∂v ∂w ∂u ∂u ∂w
dw
∂s ∂s3 ∂s3
3
∂u ∂v ∂w
(10.95)
where s1,2,3 (u, v, w ) = x(u, v, w ) ·{û, v̂, ŵ} are the components of x in the û, v̂, ŵ mutually
orthogonal directions. This defines the line element ds = |dx| via
ds2 = dx · dx = xu · xu du 2 + xv · xv dv 2 + xw · xw dw2
Thus, the volume of the rectangular parallelepiped with sides ds1 , ds2 , ds3 is given by
The product h1 h2 h3 ensures that the the last term has the dimension of volume; as the
curvilinear coordinates can be dimensionless quantities like angles.
The u coordinate surface passing through the point (u0 , v0 , w0 ) is the collection of
points (x, y, z ) satisfying u (x, y, z ) = u0 and similarly, the v and w coordinate surfaces
are given by v (x, y, z ) = v0 and w (x, y, z ) = w0 , where (x, y, z ) are the Cartesian
coordinates with respect to some rectangular Cartesian coordinate system. We can vary
the point (u0 , v0 , w0 ) over the region for which the curvilinear coordinate
transformations, Eqs (10.93), (10.94), are defined. Therefore, we can replace u0 , v0 , w0 in
the equations defining the coordinate surfaces by u, v, w and say that a particular triad of
coordinate surfaces emerges when particular values of u, v and w are substituted on the
RHS of these equations. Thus, we write, for the equations defining the coordinate surfaces
The normals to the coordinate surfaces are given by ∇u, ∇v, ∇w (see section 10.8) which,
owing to orthogonality, must satisfy
0 = ∇u · ∇v = ∇u · ∇w = ∇v · ∇w.
The vectors normal to the coordinate surfaces are tangent to the corresponding coordinate
curves so that we can define the fundamental triad for the curvilinear coordinates as
∇u ∇v ∇w
û = v̂ = ŵ = . (10.97)
|∇u| |∇v| |∇w|
Let ds1 = ds1 û, ds2 = ds2 v̂, ds3 = ds3 ŵ be the differential displacement along the
û, v̂, ŵ directions. Since ∇u, ∇v, ∇w have the same values in all the orthonormal basis
triads and since ∇u, ds1 , ∇v, ds2 and ∇w, ds3 are the pairs of parallel vectors, we can
write,
du = ∇u · ds1 = |∇u||ds1 |,
dv = ∇v · ds2 = |∇v||ds2 |,
This gives,
du 2 dv 2 dw2
ds2 = ds12 + ds22 + ds32 = + + . (10.99)
|∇u|2 |∇v|2 |∇w|2
Comparing equations Eqs (10.96) and (10.99) we get
1 √
h1 = = xu · xu impling û = h1 ∇u,
|∇u|
1 √
h2 = = xv · xv impling v̂ = h2 ∇v,
|∇v|
1 √
h3 = = xw · xw impling ŵ = h3 ∇w. (10.100)
|∇w|
Example For spherical polar coordinates we identify u = r, v = θ and w = φ, where
z −1 y
q
−1
2 2 2
r = x + y + z , θ = cos p and φ = tan .
x2 + y 2 + z2 x
Functions with Vector Arguments 317
∂φ ∂φ ∂φ
∇φ = ∇u + ∇v + ∇w
∂u ∂v ∂w
! ! !
1 ∂φ 1 ∂φ 1 ∂φ
= û + v̂ + ŵ, (10.102)
h1 ∂u h2 ∂v h3 ∂w
where n̂ is the unit outward normal to the surface, defined via its direction cosines
Fig. 10.12 (a) Evaluating x·da (b) Flux through the opposite faces of a volume element
Let us now consider a differential volume of the shape of a rectangular parallelepiped with
sides ds1 , ds2 , ds3 defined above so that its volume is dV = ds1 ds2 ds3 = h1 h2 h3
dudvdw. Let us assume that the pairs −û, û be the outward normals to the front and the
back sides, −v̂, v̂ be the outward normals to the left and the right sides and −ŵ, ŵ be the
outward normals to the bottom and the top sides of the box. Then for the front face,
da = −h2 h3 dvdwû and f · da = −(h2 h3 f1 )dvdw, where f1,2,3 are the components of f
Functions with Vector Arguments 319
along û, v̂, ŵ respectively and the product h2 h3 f1 is to be evaluated at u. On the back face,
∂
product h2 h3 f1 is to be evaluated at u + du so that f · da = (h2 h3 f1 + ∂u (h2 h3 f1 )
du )dvdw. Therefore, the net flux through the front and back pair of faces is
" #
∂ 1 ∂
(h h f ) dudvdw = (h h f )dV .
∂u 2 3 1 h1 h2 h3 ∂u 2 3 1
In the same way, the right and the left sides give
1 ∂
(h h f )dV
h1 h2 h3 ∂v 1 3 2
and the bottom and the top sides contribute
1 ∂
(h h f )dV .
h1 h2 h3 ∂w 1 2 3
Thus, the total flux through the box is given by
" #
1 ∂ ∂ ∂
(∇ · f)dV = (h h f ) + (h1 h3 f2 ) + (h h f ) dV
h1 h2 h3 ∂u 2 3 1 ∂v ∂w 1 2 3
This gives
" #
1 ∂ ∂ ∂
∇·f = (h h f ) + (h3 h1 f2 ) + (h h f ) . (10.105)
h1 h2 h3 ∂u 2 3 1 ∂v ∂w 1 2 3
Combining Eq. (10.105) with Eq. (10.102) we get, for the Laplacian operator,
" ! ! !#
2 1 ∂ h2 h3 ∂φ ∂ h3 h1 ∂φ ∂ h1 h2 ∂φ
∇ φ = ∇ · ∇φ = + + .
h1 h2 h3 ∂u h1 ∂u ∂v h2 ∂v ∂w h3 ∂w
(10.106)
Our last task in this subsection is to express the curl ∇ × f of a vector field f in terms of the
derivatives with respect to the curvilinear coordinates u, v, w. The principle we follow for
this is
The sense of circulation is given by that which makes a right handed screw advance in the
direction of da. The required circulation can be explicitly calculated for an infinitesimal
loop of rectangular shape. For each side of the rectangle, we have to find the scalar
product of f with the vector along the side and in the direction consistent with the sense of
circulation. In the first place, the surface enclosed by an infinitesimal loop can be taken to
be a plane. Consider such a rectangular loop in the û, v̂ plane, with ŵ normal to it
320 An Introduction to Vectors, Vector Operators and Vector Analysis
(see Fig. 10.13). From Fig. 10.13 and ŵ pointing out of the page, it is clear that the sense of
circulation which makes a right handed screw advance in ŵ direction is counterclockwise,
as shown. The vector on the side along û is ds1 = h1 du û that on the side along v̂ is
ds2 = h2 dv v̂ and area
da = h1 h2 dudv ŵ.
f · ds1 = h1 f1 du
Along the top side, the sign is reversed and h1 f1 is evaluated at v + dv rather than v. Both
sides together give
" #
∂
−(h1 f1 ) + (h1 f1 ) du = − (h f ) dudv.
∂v 1 1
v +dv v
The coefficient of da serves to define the w component of the curl. Constructing the u and
v components in the same way, we get
Functions with Vector Arguments 321
" # " #
1 ∂ ∂ 1 ∂ ∂
∇×f = (h f ) − (h f ) û + (h f ) − (h f ) v̂
h2 h3 ∂v 3 3 ∂w 2 2 h1 h3 ∂w 1 1 ∂u 3 3
" #
1 ∂ ∂
+ (h f ) − (h1 f1 ) ŵ. (10.107)
h1 h2 ∂u 2 2 ∂v
Exercise Express the vector derivatives, that is, gradient, divergence, curl and Laplacian
in terms of (a) spherical polar and (b) cylindrical coordinates for a scalar field u (x) and a
vector field v(x).
Answer
(a) Gradient:
∂u 1 ∂u 1 ∂u
∇u = r̂ + θ̂ + φ̂.
∂r r ∂θ r sin θ ∂φ
Divergence:
1 ∂ 2 1 ∂ 1 ∂vφ
∇·v = ( r v r ) + ( sin θv θ ) + .
r 2 ∂r r sin θ ∂θ r sin θ ∂φ
Curl:
" # " #
1 ∂ ∂vθ 1 1 ∂vr ∂
∇×v = (sin θvφ ) − r̂ + − (rvφ )
r sin θ ∂θ ∂φ r sin θ ∂φ ∂r
" #
1 ∂ ∂vr
θ̂ + (rvθ ) − φ̂.
r ∂r ∂θ
Laplacian:
∂2 u
! !
2 1 ∂ 2 ∂u 1 ∂ ∂u 1
∇ u= 2 r + 2 sin θ + .
r ∂r ∂r r sin θ ∂θ ∂θ r 2 sin2 θ ∂φ2
(b) Gradient:
∂u 1 ∂u ∂u
∇u = ρ̂ + φ̂ + ẑ.
∂ρ ρ ∂φ ∂z
322 An Introduction to Vectors, Vector Operators and Vector Analysis
Divergence:
1 ∂ 1 ∂vφ ∂vz
∇·v = (ρvρ ) + + .
ρ ∂ρ ρ ∂φ ∂z
Curl:
Laplacian:
1 ∂2 u ∂2 u
!
2 1 ∂ ∂u
∇ u= ρ + 2 + 2.
ρ ∂ρ ∂ρ ρ ∂φ2 ∂z
11
Vector Integration
In this chapter we learn how to integrate a vector field, or a vector valued function f(x),
over x.
We are interested in three possibilities. First, the variable of integration, x, can vary over a
continuous region R of volume V in space. Second, x is confined to vary over a piece of a
smooth surface, that is, a surface parameterized by x(u, v ) which has continuous partial
∂x ∂x
derivatives ∂u and ∂v . Third, x is constrained to vary over a piece of a smooth curve,
parameterized, say by x(t ), which is a continuously differentiable function of t. The first
option is called a volume or a triple integral, the second option is called a surface integral
and the last option is called a line integral. We learn about these integrals one by one,
starting with the line integral.
In the limit as n → ∞ the vectors ∆xk become tangent to the curve, so we are projecting the
field values f(xk ) along the tangent at that point to the curve summing the corresponding
products along the curve. Thus, the value of the line integral is influenced by both the field
as well as the curve along which the integral is taken. Later, we will obtain conditions under
which the value of a line integral depends only on the field values at the end points and not
on the curve joining them.
The line integral in Eq. (11.1) can be transformed using the fact that the curve is
parameterized by a continuously differentiable function x(t ). Let x0 = x(T1 ) and
xn = x(T2 ) correspond to the end points P0 and P1 respectively. We choose values
t0 = T1 , t1 , t2 , . . . , tn = T2 in the closed interval [T1 , T2 ] and let xk = x(tk ). We define
∆xk = ∆x(tk ) = x(tk ) − x(tk−1 ) and ∆tk = tk − tk−1 . Then the line integral in Eq. (11.1)
gets transformed to
Z T2 n !
X ∆x(tk )
f(x(t )) · ẋ(t )dt = lim f(x(tk )) · ∆tk (11.2)
T1 n→∞ ∆tk
k =1
dx(t )
where ẋ(t ) = dt is the velocity or the tangent vector to the curve at the point x(t ). If we
resolve the field along some fixed orthonormal basis then the line integral becomes
3 Z
X T2
fi (x1 (t ), x2 (t ), x3 (t ))ẋi (t )dt (11.3)
i =1 T1
where f1,2,3 and x1,2,3 are the components of f(x) and x respectively with respect to the
fixed orthonormal basis. In particular, for a scalar valued function f (x) the line integral
becomes
Z T2
f (x1 (t ), x2 (t ), x3 (t ))dt (11.4)
T1
whose value depends only on the end points and not on the path connecting them.
Vector Integration 325
The curve C over which we want to integrate a vector field f(x) may consist of many
smooth oriented arcs C1 , C2 , . . . CN joined at their end points where their derivatives may
not match, so that the whole path can be parameterized by continuous functions with
finite jump discontinuities in the derivative at finite number of points where the smooth
arcs join. In such a case we can write
Z Z Z Z
f(x) · dx = f(x) · dx + f(x) · dx + · · · + f(x) · dx. (11.5)
C C1 C2 CN
Another possibility is that C is a closed curve. We assume that the curve is oriented
counterclockwise as the parameter t increases.
Exercise Evaluate the integral in Eq. (11.2) for the planar field f(x) = −y î − xy ĵ on the
circular arc C shown in Fig. 11.2 from P0 to P1 .
Solution We parameterize C by x(t ) = cos t î + sin t ĵ, 0 ≤ t ≤ π/2. Therefore,
f(x(t )) = − sin t î − cos t sin t ĵ.
Differentiating x(t ) we get ẋ(t ) = − sin t î + cos t ĵ. Therefore the integral becomes
Z T2 Z π/2
π 1
f(x(t )) · ẋ(t )dt = (sin2 t − cos2 t sin t )dt = − = 0.4521.
T1 0 4 3
Exercise Evaluate the integral in Eq. (11.2) for the field f(x) = zî + xĵ + y k̂ on the helix
C shown in Fig. 11.3,
from P0 to P1 .
Solution f(x(t )) · ẋ(t ) = −3t sin t + cos2 t + 3 sin t Hence the required integral is
Z 2π
(−3t sin t + cos2 t + 3 sin t )dt = 7π ≈ 21.99.
0
Exercise Find the work done by the electrostatic field due to a point charge q on a test
charge as it traverses the paths shown in Fig. 11.4(a) and (b).
Fig. 11.4 In carrying a test charge from a to b the same work is done along either
path
1 q
Hint E = 4π r̂ where r is the radial distance of the test charge from the source q. Work
0 r2
done along the circular arcs is zero.
Vector Integration 327
Z b !
q 1 1
Answer W = − E · ds = − for both the paths.
a 4π0 ra rb
R
Exercise For the field f(x) = xy î + (x2 + y 2 )ĵ find Γ f(x) · dx where Γ is
(ii) Along the x axis y = 0 = dy and along the vertical line x = 5 and dx = 0. This gives
Z Z 21
f(x) · dx = (25 + y 2 )dy = 3612.
Γ 0
We see that two values do not agree, so that the integral depends on the path. Thus, as
explained below, the field is not conservative.
Sometimes we may have to evaluate the line integral separately on different parts of the
given curve, as the following exercise shows.
R
Exercise Evaluate Γ f · dx, where f = xĵ − y î and Γ is the unit circle about the origin.
Solution We note that f · dx = xdy − ydx. We can parameterize the unit circle by x as
y 2 = 1 − x2 , but then y is not a single valued function of x. We can circumvent this by
viewing the curve as made up of two parts (see Fig. 11.5), Γ1 and Γ2 where Γ1 is the upper
semi-circle and Γ2 the lower, arrows indicating the positive direction along Γ as shown in
Fig. 11.5.
On Γ1 :
√ −xdx
y = 1 − x2 , dy = √
1 − x2
and
on Γ2 :
√ xdx
y = − 1 − x2 , dy = √ .
1 − x2
328 An Introduction to Vectors, Vector Operators and Vector Analysis
Z −1
−x2 √ ! Z 1
x2 √ !
= √ − 1 − x2 dx + √ + 1 − x2 dx
1 1 − x2 −1 1 − x2
= 2π.
The following three rules for evaluation of line integrals can be easily checked.
R R
(i) C (kf) · dx = k C f · dx. where k is a (scalar) constant.
(ii) For two vector fields f and g
Z Z Z
(f + g) · dx = f · dx + g · dx.
C C C
(iii) Any two parameterizations of C giving the same orientation on C yield the same value
of the line integral Eq. (11.1).
Exercise Prove rule (iii).
Solution Let the curve C be parameterized by x(t ), a ≤ t ≤ b and also by
dt
x∗ (t ∗ ), a∗
≤ t ∗ ≤ b∗ and let these be related by t = φ(t ∗ ). We are given that dt ∗ > 0. Thus,
x(t ) = x(φ(t ∗ )) = x∗ (t ∗ ) and dt = (dt/dt ∗ )dt ∗ . Therefore, the line integral over C can
be written
Z Z b∗ Zb Z
dx dt ∗ dx
f(x∗ ) · dx∗ = f(x(φ(t ∗ ))) · dt = f ( x ( t )) · dt = f(x) · dx.
C a∗ dt dt ∗ a dt C
Vector Integration 329
Note that f(x(t )) and f(x(φ(t ∗ ))) are different functions of their arguments but their
values match at t and t ∗ satisfying t = φ(t ∗ ), both corresponding to the same point P on
the curve of integration.
We now give two results often used while evaluating line integrals. Let {î, ĵ, k̂} be an
orthonormal basis and {x, y, z} be the corresponding Cartesian coordinate system. Let a
vector field f(x) have components f1,2,3 (x) along î, ĵ, k̂ respectively and let Γ be some
smooth curve in space.
Z Z Z
= f1 (x)dx + f2 (x)dy + f3 (x)dz (11.6)
Γ Γ Γ
where we have used the orthonormality of the basis. Thus, a line integral over a vector
field along a curve Γ is the sum of the line integrals over the components of the field
along Γ .
f(x) = x2 y 2 î + y ĵ + zy k̂
Solution Note that the curve is on the xy plane and z = 0 = dz along the curve.
We have,
Z Z
f(x) · dx = (x2 y 2 î + y ĵ) · (dxî + dy ĵ)
Γ Γ
Z Z
2 2
= x y dx + ydy
Γ Γ
and
Z 4
ydy = 8.
0
330 An Introduction to Vectors, Vector Operators and Vector Analysis
Therefore,
Z
f(x) · dx = 264.
Γ
(ii) Now let Γ be a smooth and simple closed curve oriented positively, that is,
counterclockwise. Let Γ1 , Γ2 , Γ3 be the projections of Γ on xy, yz and zx planes
respectively, all oriented positively. Thus, in Fig. 11.6, Γ is the oriented curve ABCA,
Γ1 is oriented as OABO, Γ2 as OBCO and Γ3 as OCAO. We have,
Z Z Z Z Z Z
f · dx + f · dx + f · dx = f · dx + f · dx + f · dx
Γ1 Γ2 Γ3 AB,BO,OA BC,CO,OB CA,AO,OC
Z Z Z
= f · dx + f · dx + f · dx
AB BC CA
Z
= f · dx (11.7)
Γ
because all integrals except those on the arcs of Γ cancel as each of them is traversed
twice in opposite directions (see Fig. 11.6). Equation (11.7) is always valid whenever
Γ is a simple closed curve.
Fig. 11.6 Line integral around a simple closed curve as the sum of the line integrals
over its projections on the coordinate planes
This gives,
dF dx dy dz
dF (t ) = dt = (f1 (x(t )) + f2 (x(t )) + f3 (x(t )) )dt. (11.8)
dt dt dt dt
where f1,2,3 (x(t )) are the components of f(x(t )) at the point P corresponding to t on the
curve joining P0 and P1 along which the line integral is evaluated. Thus, for any two points
P and P 0 on the curve of integration we can write, by elementary integration,
Z P0
dF = F (P 0 ) − F (P ) = F (t 0 ) − F (t ). (11.9)
P
where t 0 and t are the parameter values corresponding to P 0 and P respectively. Here, we
assume that t 0 > t so that the sense of traversal from P to P 0 gives the orientation of the
curve of integration.
We emphasize that the differential dF (t ) is that of a scalar valued function of a single
scalar variable t. Function F (t ) depends on the parameterization x(t ) and hence on the
curve joining P0 and P1 along which the integration is carried out. Therefore, the value
of the integral essentially depends on the curve of integration. Equations (11.8), (11.9) are
completely general and every line integral can be expressed as in Eq. (11.9).
Taking cue from the above observations we can define what is called a Linear
Differential Form at all points in the domain of the field f(x) (and not necessarily along
some curve) as
where A(x), B(x), C (x) are the scalar valued functions giving components of f(x) in all
of its domain. This is called a Linear Differential Form because of its linear dependence
on the differentials dx, dy, dz while their coefficients are functions of x. The advantage of
introducing the differential form L is that along a curve parameterized say by t, it naturally
reduces to the differential of a scalar valued function. Thus, every line integral along a curve
joining two points say P0 and P has the form
ZP Zt !
dx dy dz
F (P ) = L= A(x(t )) + B(x(t )) + C (x(t )) dt = F (t ) (11.11)
P0 T0 dt dt dt
We will assume that the functions A(x), B(x), C (x) are C 1 , that is, they have continuous
first derivatives throughout the domain of the field f(x).
332 An Introduction to Vectors, Vector Operators and Vector Analysis
We are interested in finding out the class of fields, the value of whose line integral
depends only on the end points irrespective of the curve joining the end points used to
evaluate the integral. This happens when the field is conservative, that is, the field is the
gradient of a potential φ(x) so that
f(x) = ∇φ(x)
at all points at which f(x) is defined. Then using Eq. (10.27) we can write
Z Z T2 Z T2
dφ
T
f(x)·dx = ∇φ(x(t ))·ẋ(t )dt = dt = φ(x(t )) T21 = φ(P1 ) − φ(P0 ).
C T1 T1 dt
Thus, if the field is the gradient of a potential, then its line integral depends only on the
values of the potential at the end points, independent of the curve joining the end points.
It turns out that the reverse implication is also true. That is, a vector field whose line
integral over a smooth arc joining any two points in its domain depends only on its end
points then it must be conservative, that is, it must be the gradient of some scalar field
φ(x). A vector field being the gradient of some scalar field is equivalent to its linear form
being a perfect differential, that is, there is a scalar valued function φ(x) satisfying
Note that we require this equation to be valid at every point in the domain of f and not
only at the points on some curve in the domain. The RHS of Eq. (11.12) is easily
recognized as the differential dφ of the scalar valued function φ. Now assume that the
line integral of f over some smooth oriented arc Γ depends only on the end points of Γ .
We want to show that there is a scalar function φ(x) defined on the domain of f such that
dφ = L where L = A(x)dx + B(x)dy + C (x)dz is the linear differential form giving the
integrand of the line integral. Without losing generality we can assume that any two points
in the domain can be connected by a smooth oriented arc. We fix a point P0 in the domain
and define the function φ(x) = φ(P ) at any point P as the value of the line integral over
any smooth oriented (from P0 to P ) curve joining P0 and P . To get the partial derivatives
of φ consider any point (x, y, z ) ≡ P and a smooth oriented curve, say Γ , joining P0 and P .
Since the domain is an open set, all points (x + ∆x, y, z ) = P 0 are in the domain, provided
|∆x| is sufficiently small. Let γ be the oriented straight line segment joining P and P 0 (see
Fig. 11.7). We can arrange, without losing generality, that the curve Γ + γ is a simple
oriented polygonal arc without any knots and overlaps with initial point P0 and final point
P 0 . It follows, then, by Eq. (11.5) that
Z Z Z
0
φ(x + ∆x, y, z ) − φ(x, y, z ) = φ(P ) − φ(P ) = L− L = L
Γ +γ Γ γ
Z x +∆x
= A(t, y, z )dt = A(x, y, z )∆x (11.13)
x
Vector Integration 333
∂φ
=A
∂x
∂φ ∂φ
and similarly, ∂y
= B and ∂z
= C. This shows that dφ = L as we wanted.
Exercise Show that a vector field is conservative, if and only if its line integral over every
closed loop is zero.
We have proved that the conservative property of a vector field and dependence of its line
integral only on the end points of the curve of integration are equivalent. However, this
result is not of much practical value unless we find out some independent criteria to
determine whether a given vector field is conservative or not. Equivalently, we have to find
out whether a given differential form L is a perfect differential or not, that is, whether
there is a function φ(x) satisfying L = ∇φ · dx.
The necessary condition for a vector field to be conservative is that its curl vanishes
everywhere in its domain. Since the field is given to be conservative, we have,
for all x in the domain of f(x) because we have shown before that the curl of a gradient is
always zero (see Eq. (10.90)). It is useful to state this necessary condition in terms of the
linear differential form which, for a conservative field, ought to be a perfect differential:
∂φ ∂φ ∂φ
L = A(x)dx + B(x)dy + C (x)dz = ∇φ(x) · dx = dx + dy + dz,
∂x ∂y ∂z
which means,
∂φ ∂φ ∂φ
A= , B= C= .
∂x ∂y ∂z
334 An Introduction to Vectors, Vector Operators and Vector Analysis
Suitably differentiating both sides of these equations and assuming that the order of
differentiation does not matter, we get the following necessary conditions for a vector field
to be conservative.
∂B ∂C ∂C ∂A ∂A ∂B
Bz − Cy = − = 0, Cx − Az = − = 0, Ay − Bx = − = 0.
∂z ∂y ∂x ∂z ∂y ∂x
(11.15)
which are defined except for points on the z-axis (x = y = 0). Thus, the domain of
definition of this differential form, or the corresponding field, is all space except z-axis.
We show below that this differential form satisfies Eq. (11.15) and is a perfect differential
but there exists a class of simple closed curves in its domain such that the integral of this
differential form around such a curve does not vanish. In order to see that this is a perfect
differential, we introduce the polar angle θ of a point P (x, y, z ) by
x y
cos θ = p , sin θ = p
x2 + y 2 x2 + y 2
Vector Integration 335
that is, the angle formed with the x, z-plane by the plane through P and passing through
the z-axis. Then,
y
dθ = d tan−1 = L,
x
so that L is represented as the total differential of the function u = θ. We get
Z Z 2π
L= dθ = 2π , 0
C 0
where n is the number of times the closed curve of integration winds around the z axis:
Each winding adds up 2π on the RHS of the above equation (see Fig. 11.8).
Fig. 11.8 Each winding of the curve of integration around the z axis adds 2π to its
value
RP
Therefore, the value of P dθ taken for two different paths with end points P0 , P is the same
0
only if going along one path from P0 to P and returning along the other path to P0 we go
zero times around the z-axis. We can avoid any path going around the z-axis by avoiding
all paths crossing the half plane y = 0, x ≤ 0, that is, we remove this half plane from the
region R over which the field is defined. To every point on the allowed path we can assign
RP
a unique value of θ with −π < θ < π. Therefore, the integral P dθ has a unique value
0
θ (P ) − θ (P0 ), which does not depend on a particular path. Similarly, the integral over a
closed path in this region has value zero.
336 An Introduction to Vectors, Vector Operators and Vector Analysis
for 0 ≤ t ≤ 1.
Examples of convex sets are solid spheres or cubes. Examples of connected but not
convex sets are solid torus, a spherical shell (space between two concentric spheres) and
the outside of a sphere or cylinder. A set R which is not connected consists of connected
subsets called the components of R. Examples of disconnected sets are the set of points
not belonging to a spherical shell, or a set of points none of whose coordinates are
integers.
Now let C0 and C1 be any two paths in R, given by (x0 (t ), y0 (t ), z0 (t )) and
(x1 (t ), y1 (t ), z1 (t )) respectively. Let their end points P 0 and P 00 , corresponding to t = 0
and t = 1 respectively, be the same. The connected set R is simply connected, if we can
deform C0 into C1 by means of a continuous family of paths Cλ with common end points
P 0 , P 00 . This means that there exist continuous functions (x (t, λ), y (t, λ), z (t, λ)) of the
two variables t, λ for 0 ≤ t ≤ 1, 0 ≤ λ ≤ 1 such that the point P (t, λ) = (x (t, λ),
y (t, λ), z (t, λ)) always lies in R and such that P (t, λ = 0) coincides with P (t ) = (x0 (t ),
y0 (t ), z0 (t )), P (t, λ = 1) coincides with P (t ) = (x1 (t ), y1 (t ), z1 (t )), P (t = 0, λ)
coincides with P 0 and P (t = 1, λ) coincides with P 00 . For each fixed λ the functions
(x (t, λ), y (t, λ), z (t, λ)) determine a path Cλ in R that joins the end points P 0 and P 00 .
As λ varies from 0 to 1, the path Cλ changes continuously from C0 to C1 . This defines the
“continuous deformation” of C0 into C1 (see Fig. 11.9).
As can be easily seen, convex sets are simply connected. The family of curves Cλ
continuously deforming C0 to C1 , all curves with common end points P 0 , P 00 , is given by
Thus, Cλ is obtained by joining the points of C0 and C1 that belong to the same t by a line
λ
segment and taking the point that divides the segment in the ratio 1−λ . The points obtained
in this way all lie in R because of its convexity. A different type of simply connected set is
given by a spherical shell. A region R in space obtained after removing the z-axis is not
simply connected because the two semicircular paths
and
have the same end points but cannot be deformed into each other without crossing the
z-axis.
We shall now prove the following theorem:
If the coefficients of the differential form corresponding to the field f given by
have continuous first derivatives in a simply connected domain R and satisfy conditions
Eq. (11.15), namely,
Bz − Cy = 0, Cx − Az = 0, Ay − Bx = 0,
A = φx , B = φy C = φz .
R P 00
It is enough to prove that P 0 L over any simple polygonal arc joining P 0 and P 00 has a value
that depends only on P 0 and P 00 . We represent two oriented arcs C0 and C1 parametrically
338 An Introduction to Vectors, Vector Operators and Vector Analysis
where (x, y, z ) are the functions of t, λ forming the continuous family of paths. We assume
that these functions have continuous first and mixed second derivatives with respect to t
and λ for 0 ≤ t ≤ 1 and 0 ≤ λ ≤ 1. Then by elementary integration,
Z Z Z1 Z1
L− L= dt (Axt + Byt + Czt )λ dλ.
C1 C0 0 0
Now, using the chain rule and the conditions Eq. (11.15) we get the identity
+Bx xλ yt + By yλ yt + Bz zλ yt + Cx xλ zt + Cy yλ zt + Cz zλ zt
since xλ , yλ , zλ vanish for t = 0, 1 because the end points are independent of λ. This
completes the proof.
We see the important part played by the assumption that the region R is simply
connected: It enables us to convert the difference of the line integrals into a double integral
over some intermediate region. The above proof can be extended to the case where the
intermediate paths are continuous but may not be differentiable with respect to λ and also
to the case where C0 and C1 are only sectionally smooth, that is, polygonal arcs.
Exercise Find out whether the field f(x) = exy î + ex+y ĵ is conservative.
Solution The coefficients in the linear form are A(x, y ) = exy and B(x, y ) = ex+y and
conditions Eq. (11.15) reduce to ∂A∂y
= ∂B∂x
. Evaluating both sides we see that they are not
equal. Hence, the field is not conservative.
Exercise Find whether the following fields are conservative.
Vector Integration 339
∂ 2
(x y + u (y )) = x2 + 3y 2 .
∂y
φ(x, y ) = x2 y + y 3 .
Exercise Show that f(x) = (sin y + z )î − (x cos y − z )ĵ − (x − y )k̂ is conservative and
find the function φ such that f(x) = ∇φ.
Solution We check that ∇ × f = 0 so that this field is conservative.
To find a potential φ we equate the components of f = ∇φ. We get
∂φ
(i) fx = ∂x
= sin y + z,
∂φ
(ii) fy = ∂y
= x cos y − z,
∂φ
(iii) fz = ∂z
= x − y.
Integrating fx , fy , fz with respect to x, y, z respectively, we obtain
(iv) φ = x sin y + xz + f (y, z ),
(vi) φ = xz − yz + h(x, y ).
340 An Introduction to Vectors, Vector Operators and Vector Analysis
Since the derivatives are partial derivatives, the “constants” of integration are functions of
variables which are not integrated over. Note that (iv),(v),(vi) each represent φ. Therefore,
f (y, z ) must occur in (v). The only possibility is to identify f (y, z ) with −yz plus some
function of z but not involving x. By (vi) we see that f (y, z ) must simply be −yz + C
where C is a constant. Thus,
φ = x sin y + xz − yz + C.
Exercise Let f(x) = x2 î + xy ĵ. and let the path C consist of the segment of the parabola
x2
y = between (R0, 0) and (1, 1) (C1 ) and the line segment from (1, 1) to (0, 0) (C2 ) (see
Fig. 11.10). Find C f(x) · dx.
where the integration is taken in the positive (counterclockwise) sense. Note that, in
general, this integral depends on the direction n̂ because if we change n̂ (by rotating the
plane say) the integrand and hence the integral will change. The limit Γ → P requires that
every point of Γ approaches P . If this limit exists, then Gn is independent of Γ . As we show
below, if Γ is a planar curve and f(x) has Taylor series expansion around P , then the limit
exists and is independent of Γ .
We choose the origin at P and let a point on Γ have position vector x relative to the
origin at P . We expand f(x) around P that is, around 0. We get
where R is the remainder containing all the second and higher order terms (x · ∇)2 · · · (see
Eq. (10.22)). We set up a rectangular Cartesian coordinate system (ξ, η, ζ ) with its origin
at P such that (ξ, η ) plane contains Γ (see Fig. 11.11). The vector x has the components
(ξ, η, ζ ) and let the components of f(x) be fξ (x), fη (x), fζ (x). This gives
where all the partial derivatives are evaluated at the origin. Along Γ , dx ≡ (dξ, dη, 0) so
that
∂fη ∂fη
! !
∂fξ ∂fξ
(x · ∇f) · dx = ξ +η dξ + ξ +η dη.
∂ξ ∂η ∂ξ ∂η
∂fη ∂fη
! !
∂fξ ∂fξ
f(x) · dx = f(0) · dx + ξ +η dξ + ξ +η dη + R · dx.
∂ξ ∂η ∂ξ ∂η
∂fξ ∂fξ
Z Z Z Z
f(x) · dx = f(0) · dx + ξdξ + ηdξ
Γ Γ ∂ξ Γ ∂η Γ
∂fη ∂fη
Z Z Z
+ ξdη + ηdη + R · dx. (11.18)
∂ξ Γ ∂η Γ Γ
Z Z
ξdξ = 0 = − ηdξ
Γ Γ
and
Z Z
ξdη = − ηdξ = S
Γ Γ
∂fη ∂fξ
Z ! Z
1 1
f(x) · dx = − + R · dx. (11.19)
S Γ ∂ξ ∂η S Γ
In the last term the integral is of the order of |x|3 as R is of the order of |x|2 . Therefore, the
last term is of the order of |x| and vanishes in the limit Γ → P or |x| → 0. Therefore,
∂fη ∂fξ
Z
1
Gn = lim f(x) · dx = −
Γ →P S Γ ∂ξ ∂η
S1 = S î · n̂ S2 = S ĵ · n̂ S3 = S k̂ · n̂
Gn = G0 · n̂
where
and
Z
1
Gi0 = lim f(x) · dx i = 1, 2, 3.
Γi →P Si Γi
∂fz ∂fy
Z
1
G10 = lim f(x) · dx = −
Γ1 →P S1 Γ1 ∂y ∂z
Z
1 ∂fx ∂fz
G20 = lim f(x) · dx = −
S
Γ2 →P 2 Γ2 ∂z ∂x
∂fy
Z
1 ∂fx
G30 = lim f(x) · dx = − .
Γ3 →P S3 Γ3 ∂x ∂y
We immediately identify G0 with curl f or ∇ × f. Thus, the curl of a vector field that can be
Taylor expanded around a point P can be approximated by its line integral around a simple
closed curve Γ surrounding the point P . The approximation gets better as the size of Γ gets
smaller but the quantitative estimate of the error will involve the field.
x − x1 (t )
g1 (x, t ) = −Gm1 . (11.20)
|x − x1 (t )|3
The particle at x1 = x1 (t ) is called the source of the field and the mass m1 is the source
strength. The field g1 is a map (actually a one parameter family) assigning a definite vector
g1 (x, t ) to every point x in space, at a given instant of time. Note that the time dependence
is solely due to the motion of the source.
If a particle of mass m is placed at a point x in the gravitational field g1 , we say that the
field exerts a force
where 5x is the derivative (gradient) with respect to the field variable x. Henceforth, we
leave suffix x to be understood. The gravitational potential energy of a particle with mass
m at x is given by
−Gmm1
V1 (x, t ) = mφ1 (x, t ) = . (11.24)
|x − x1 (t )|
It is important to clearly distinguish between potential and potential energy. Latter is the
shared energy of two interacting objects, while former is characteristic of a single object
namely its source.
The gravitational field g(x, t ) of a N particle system is given by the superposition of
fields
N N
X X x − xk (t )
g(x, t ) = gk (x, t ) = −G mk . (11.25)
|x − xk (t )|3
k =1 k =1
due to field in Eq. (11.25) which is consistent with the law of superposition of forces. This
field can be derived from a potential; thus
where
X X mk
φ(x, t ) = φk (x, t ) = −G . (11.28)
|x − xk (t )|
k k
Note that this does not include the potential energy of interaction between the particles
producing the field. The internal energy can be ignored as long as we are concerned only
with the influence of the system on external objects.
The gravitational field of a continuous body is obtained from that of a system of N
particles via the following limiting process. We divide the body into small parts which can
be regarded as particulate and in the limit of infinitely small subdivision the sum in
Eq. (11.25) becomes the integral,
x − x0 (t )
Z
g(x, t ) = −G dm0 (11.30)
|x − x0 (t )|3
where dm0 = dm(x0 , t ) is the mass given by the differential of the mass distribution
m(x0 , t ), supposed to be known. In other words, this is the mass of a small enough
corpuscle at point x0 at time t. Similar limiting process for Eq. (11.28) gives us the
gravitational potential of a continuous body
dm0
Z
φ(x, t ) = −G . (11.31)
|x − x0 (t )|
Henceforth we shall not write the time dependence explicitly.
Equation (11.27) applies with φ(x, t ) given by Eq. (11.31) so that we find the field g by
differentiating Eq. (11.31).
For a spherically symmetric mass distribution, the integral in Eq. (11.31) can be easily
evaluated. We place the origin at body’s centre of mass and denote the position vectors with
respect to the centre of mass by r and r0 instead of x and x0 which we use in the case of an
external inertial frame (see Fig. 11.12).
A spherically symmetric mass density is a function of radial distance alone. Thus,
dm0 = ρ (r 0 )r 0 2 dr 0 dΩ (11.32)
Vector Integration 347
dm0
Z Z Z
0 02 0 dΩ
φ(r) = −G 0
= −G ρ (r )r dr .
|r − r | |r − r0 |
For r > r 0 , (field point external to the body), we can easily evaluate the integral
Z Zπ
dΩ sin θdθ 4π
0
= 2π 1
= (11.33)
|r − r | 0 [r 2 + r 0 2 − 2rr 0 cos θ ] 2 r
and the remaining integral simply gives the total mass of the body
Z ZR
M = dm0 = 4π ρ (r 0 )r 0 2 dr 0 .
0
dm0
Z
GM
φ(r) = −G 0
=− . (11.34)
|r − r | r
This is just the potential of a point particle with mass M (= mass of the body) placed at
the centre of mass of the spherically symmetric body. Obviously, the gravitational field of
a spherically symmetric body (g = −5φ) is also the same as the particle with mass M
placed at its centre. Since many celestial bodies are nearly spherically symmetric, this is
an excellent first approximation to their gravitational fields. Indeed, in many cases it is
sufficient to apply Eq. (11.34).
To get a more accurate description of gravitational fields produced by non-spherical
bodies, we employ perturbation methods which enable us to systematically evaluate the
effects of deviations from spherical symmetry. The basic idea is to expand the potential of a
given body in Taylor series about its centre of mass. Obviously, we need a series expansion
1 0
for the scalar valued function |r−r 0 | . For r > r we have the following well known result,
which we derive at the end.
∞ 0 n
!
1 1 X r 0
= 1 + P ( r̂ · r̂ ) , (11.35)
0 n
|r − r | r r
n=1
348 An Introduction to Vectors, Vector Operators and Vector Analysis
1
P2 (r̂ · r̂0 ) = (3(r̂ · r̂0 )2 − 1)
2
1
P3 (r̂ · r̂0 ) = (5(r̂ · r̂0 )3 − 3(r̂ · r̂0 )). (11.36)
2
A variant of Eq. (11.35) is
∞
1 1 X
−2n 0
= 1 + r P ( r · r ) , (11.37)
0 n
|r − r | r
n=1
P1 (r · r0 ) = r · r0
1
P2 (r · r0 ) = (3(r · r0 )2 − r 2 r 0 2 )
2
1
P3 (r · r0 ) = (5(r · r0 )3 − 3r 2 r 0 2 r · r0 ). (11.38)
2
It is clear from Eq. (11.35) that the magnitude of the nth term in the expansion is of the
0
order of ( rr )n so the series converges rapidly at a distance r which is large compared to the
dimensions of the body. Series (11.37) gives a series for the potential
( Z Z )
G 1 0 0 1 0 0
φ (r) = − M + 2 P1 (r · r )dm + 4 P2 (r · r )dm + · · ·
r r r
By Eq. (11.38)
Z "Z #
0 0 0 0
P1 (r · r )dm = r · r dm = r · [0] = 0.
R
Here, r0 dm0 gives the position vector of the centre of mass which vanishes because the
centre of mass is at the origin.
It is convenient to express the next term in the expansion (involving P2 (r · r0 )) in terms
of the inertia operator I : R3 7−→ R3 or the moment of inertia tensor of the body, defined
by, (Remember that I r ∈ R3 is a vector),
Z Z
I r = dm0 r0 × (r × r0 ) = dm0 (r 0 2 r − (r0 · r)r0 ). (11.39)
Vector Integration 349
where I1 , I2 , I3 are the principal moments of inertia which are the eigenvalues of the inertia
operator. The last equality follows because the trace is seen to be independent of the basis
used to compute it.
Therefore,
Z Z
1 1 1
P2 (r · r )dm = dm0 (3(r · r0 )2 − r 2 r 0 2 ) = [r 2 T r I − 3r · I r] = r · Qr,
0 0
2 2 2
(11.41)
Qr = rT r I − 3I r. (11.42)
This gives,
Following the well known terminology from electromagnetic theory, we call Q the
gravitational quadrupole tensor. (Again, remember that LHS of Eq. (11.42) is a vector
in R3 .)
350 An Introduction to Vectors, Vector Operators and Vector Analysis
This is called a harmonic or multipole expansion of the potential. The quadrupole term
describes the first non-zero correction to the potential of a spherically symmetric body.
The gravitational field (g = −5φ) can be obtained from Eq. (11.43) with the help of
1
5 r · Qr = Qr,
2
5r n = nr n−1 r̂.
Thus,
G 1 5
g(r) = − M r̂ − Qr̂ − ( r̂ · Qr̂ ) r̂ + · · · . (11.44)
r2 r2 2
where I1 = I2 is the moment of inertia about any axis in the plane normal to the symmetry
axis and passing through the centre of mass, called equatorial moment of inertia, I3 is the
moment of inertia about the symmetry axis, or the so called polar moment of inertia. û is
the direction of the symmetry axis.
Exercise Prove Eq. (11.45).
Solution Let {σ̂ 1 , σ̂ 2 , û} be the eigenbasis of the inertia operator of the axially symmetric
body and let r = r1 σ̂ 1 + r2 σ̂ 2 + r3 û be a position vector. Due to symmetry about the axis
given by û, the eigenvalues corresponding to {σ̂ 1 , σ̂ 2 } must be equal, giving the eigenvalues
to be I1 , I1 , I3 . We get
I r = I1 (r1 σ̂ 1 + r2 σ̂ 2 ) + I3 r3 û
= I1 r + I3 r3 û − I1 r3 û
From Eq. (11.44), then, the gravitational field of an axially symmetric body is
( 2 )
MG 3 R 2
g(r) = − 2 r̂ + J2 [1 − 5(r̂ · û) r̂ + 2r̂ · û û] + · · · . (11.47)
r 2 r
where Jn are the dimensionless constant coefficients. As stated above, J2 measures the
oblateness of the body and is related to the moment of inertia via Eq. (11.48). The
constant J3 measures the extent to which the body is “pearshaped”, (i.e., the southern
hemisphere fatter than the northern hemisphere). The advantage of Eq. (11.49) is that it
can be immediately written down once the axial symmetry is assumed and the constants
Jn can be determined empirically, in particular, by fitting Eq. (11.49) to data on orbiting
satellites. For the earth,
û − (û · r̂)r̂
û · 5r̂ = , (11.50)
r
we can differentiate the term (n = 3) in Eq. (11.49) to get its contribution to the
gravitational field as
352 An Introduction to Vectors, Vector Operators and Vector Analysis
( 3
GM 5 R 2 3
3
g3 (r) = − 3 J −7(r̂ · û) + 3r̂ · û r + 3(r̂ · û) − u , (11.51)
r 2 3 r 5
where u = r û. The contribution of the term with n = 2 is already obtained in Eq. (11.47).
Differentiating, in this way, term by term in Eq. (11.49), we can express the gravitational
field of a axially symmetric body as
∞
GM X
g(r) = − 3 r + g ( r ) . (11.52)
n
r
n=2
Finally, we establish Eq. (11.35). We have, using law of cosines, (see Fig. 11.13)
0 2
0
! !
r r
= r 2 1 + r̂ · r̂0
−2
r r
or,
√
|r − r0 | = r 1 + ,
where
r0 r0
! !
0
= − 2r̂ · r̂ . (11.53)
r r
Exercise Develop the multipole expansion for the electrostatic potential at r due to an
arbitrary localized charge distribution, in powers of 1r . This is analogous to the above
development of multipole expansion for the gravitational potential of an arbitrary
localized mass distribution. Give the geometric interpretation of the of the terms
proportional to r12 , r13 , r14 . Compare these two cases. (Consult ref [9]).
We shall now obtain the equation to the surface of the earth by assuming it to be an
equipotential for the effective gravitational potential
1
Φ (r) = V (r) − (Ω × r)2 , (11.55)
2
where V (r) is the true gravitational potential at the earth’s surface and the last term is the
centrifugal potential. We do this by expressing Φ (r) in terms of the ellipticity parameter
for the earth given by
a−c
= ,
c
a, c being the equatorial and polar radii of the earth respectively. We show that the resulting
shape of the earth is an approximate oblate spheroid. We differentiate the geopotential Φ (r)
to express the equatorial and polar gravitational accelerations, ge and gp respectively, in
terms of the ellipticity parameter . We use the observed values of ge and gp namely,
ge = 978.039 cm/sec2
gp = 983.217 cm/sec2
g = −5Φ (11.56)
must be normal to the surface. If it had a tangential component, it will make the fluid
flow on the surface. This means that the surface of the earth is an equipotential surface
defined by
354 An Introduction to Vectors, Vector Operators and Vector Analysis
Φ (r) = Φ0 , (11.57)
Fig. 11.14 Earth’s rotation affected its shape in its formative stage
Due to axial symmetry in the problem, earth’s gravitational potential V can be described by
the Legendre expansion Eq. (11.49). Therefore, to the second order, earth’s shape is given
explicitly by the equation
( 2 h )
GM⊕ 1 a i 1 h i
Φ (r) = − 1 − J2 3(r̂ · û) − 1 − Ω2 r 2 1 − (r̂ · û)2 = Φ0 , (11.58)
2
r 2 r 2
where û = Ω̂ specifies the rotation axis, a is the equatorial radius of the earth and we
have used identity II. The surface described by this equation is called geoid. Its deviation
from the sphere is characterized by the so called ellipticity (or flattening) parameter
defined by
a−c
= , (11.59)
c
with c as the earth’s polar radius. To evaluate the constant Φ0 in Eq. (11.57) we set r = c
and r̂ · û = 1 in Eq. (11.58) giving
J2 a2
( )
GM⊕
Φ0 = − 1− 2 . (11.60)
c c
To express the ellipticity parameter in terms of other parameters we set r = a and r̂· û = 0
in Eq. (11.58) to get
J2 a2
( )
GM⊕ 1 1 2 2 GM⊕
− 1 + J2 − Ω a = − 1 − 2 (1 + ), (11.61)
a 2 2 a c
Vector Integration 355
where we have used ac = 1 + and Eq. (11.60). Since and J2 are known to be small
quantities, it suffices to solve this equation for to the first order, so that
3 1
= J2 + β, (11.62)
2 2
where
Ω2 a3 Ω2 a
β= = (11.63)
GM⊕ GM⊕ /a2
is the ratio of the centripetal to the gravitational acceleration at the equator.
The potential Φ (r) can now be expressed in terms of and β,
3
1 r 3h
( )
GM⊕ 1 a 1
i
2 2
Φ (r) = − 1 + ( − β ) − (r̂ · û) + β 1 − (r̂ · û) . (11.64)
r 2 r 3 2 a
To get the equation for the geoid, to the first order in , we approximate ar in Eq. (11.64)
by ac = 1 + , use binomial theorem and simplify Eq. (11.64) keeping only the first order
terms. We then equate the resulting expression to that for Φ0 obtained by expressing the
LHS of Eq. (11.61) in terms of and β, namely,
GM⊕ 1
Φ0 = − 1 + ( + β ) . (11.65)
a 3
This gives the equation for the geoid, to the first order in the ellipticity parameter as
(r · û)2 (r × û)2 r2 b2 − c2
" ! #
2
1= + = 2 1+ (r̂ · û) (11.67)
c2 b2 b c2
b−c
for small c ≡ 0 as
b h i
r≈ 1
≈ b 1 − 0 (r̂ · û)2 . (11.68)
[1 + 20 (r̂ · û)2 ] 2
GM⊕ GM⊕ a2 1
Φ (r, λ) = − + 3
J2 (3 sin2 λ − 1) − Ω2 r 2 cos2 λ. (11.69)
r 2r 2
and the magnitude of the acceleration g is given by
!2 12
∂Φ 2
!
1 ∂Φ
g = − + . (11.70)
∂r r ∂λ
Due to the smallness of , g is almost normal to the spherical earth, although it is strictly
normal to the geoid. Thus, g deviates from theradial direction (which defines (λ)) only by
a small angle of the order of . Therefore, ∂Φ
∂λ
≈ , making the second term in Eq. (11.70)
2
of the order of and hence negligible. Therefore,
∂Φ GM 3 GM⊕ a2
−g = = 2⊕ − J2 (3 sin2 λ − 1) − Ω2 r (1 − sin2 λ). (11.71)
∂r r 2 r4
From Eq. (11.66) we substitute the value of r on the geoid at arbitrary latitude λ
r = a 1 − sin2 λ (11.72)
GM⊕ 2
3 GM⊕
2
2
2
−g = 1 + 2 sin λ − J 2 3 sin λ − 1 − Ω a 1 − sin λ . (11.74)
a2 2 a2
GM⊕ 3 GM⊕ 3
ge = − 2 1 + J2 − β = 2 1 + − β . (11.75)
a 2 a 2
π
Similarly, putting λ = in Eq. (2.20) we get the value at the poles
2
GM⊕
gp = (1 + β ). (11.76)
a2
Using the given experimental values of ge and gp and the known values of a and Ω we can
solve the simultaneous Eqs (11.75) and (11.76) to get
We can substitute these values of and β in Eq. (11.62) to get the value of J2 which agrees
with the value of J2 mentioned above, which was obtained using satellite data, within one
percent. This gives us a check on the internal consistency of the theory.
The shape of the earth given by the geoid Eq. (11.66) agrees with measurements of
sea level to within a few meters. However, radar ranging to measure the height of the ocean
is accurate to a fraction of a meter. This shows the need to develop more refined models
for the shape of the earth. The principal deviation from the geoid is an excessive bulge
around the equator. This is attributed to a retardation of the rotating earth over past million
years. For a detailed exposition of the physics of the earth, the reader may consult ref [16]
and [24].
where the + sign applies if the rotation of x1 towards x2 is counterclockwise and − sign
applies if it is clockwise. This definition of the area vector suggests the following
construction of an area integral:
Z b n
1 1 X
A= x × dx = lim xk × ∆xk (11.79)
2 a 2 n→∞
k =1
with nk=1 ∆xk = b − a. Note that this is totally a vector relation in which differential area
P
vectors are added to give the resulting area vector in the limit as |∆x| → 0. If n is large
enough, we can approximate this area integral by the sum
n
1X
A ≈ xk × ∆xk
2
k =1
1 1 1
= x × x1 + x1 × x2 + · · · + xn−1 × xn . (11.80)
2 0 2 2
As depicted in Fig. 11.15, each term in this sum is the area vector of a triangle with one
vertex at the origin. The magnitude of the kth term approximates the area swept out by
the line segment represented by the vector variable x as its tip moves continuously along
the curve joining a and b from xk−1 to xk with its tail at the origin, while the direction of
the corresponding area vector is consistent with the sense of rotation of x from xk−1 to xk .
Thus, the sum in Eq. (11.80) approximates the area vector corresponding to the area swept
358 An Introduction to Vectors, Vector Operators and Vector Analysis
out as the variable x moves from a to b. Thus, the integral Eq. (11.79) is the area vector for
the total area swept out by the vector variable x as it moves continuously along the curve
from a to b. Thus, the value of the area integral Eq. (11.79) is not path independent as the
area swept out depends on the path from a to b.
If the curve is represented by the parametric equation x = x(t ), with x(0) = a, then the
corresponding area vector can be obtained as a parametric function A = A(t ) as
Z x(t ) Z t
1 1
A(t ) = x × dx = x × ẋdt with x and ẋ both functions at t. (11.81)
2 x(0) 2 0
for the kth element of the area, hence from Eq. (11.83),
Z
1
|A| = |x × dx|. (11.84)
2 C
We emphasize that Eq. (11.84) follows from Eq. (11.83) only when all coplanar elements of
area have the same orientation. as in Fig. 11.16(a). This condition is not met if the curve C
is self-intersecting or does not enclose the origin.
Fig. 11.16 Area swept out by radius vector along a closed curve. Cross-hatched
region is swept out twice in opposite directions, so its area is zero.
The area integral Eq. (11.83) is independent of the origin although the values of the vector
variable x depends on the origin. To see this, displace the origin inside the curve C in
Fig. 11.16(a) to a place outside the curve as shown in Fig. 11.16(b). Choosing the points a
and b on C we separate C into two pieces C1 and C2 , so the area integral becomes
Z Z Z
1 1 1
A= x × dx = x × dx + x × dx.
2 C 2 C1 2 C2
Referring to Fig. 11.6(b) we see that the coordinate vector sweeps over the region inside
C once as it goes between a and b along C, but it sweeps over the meshed region to the
left of C2 twice, once as it traverses C2 and again as it traverses C1 and since the sweeps
over the latter region are in opposite directions their contributions to the integral have
same magnitude but opposite signs, and hence cancel. We are thus left with the area vector
corresponding to C as claimed.
For a general proof that the closed area integral is independent of the origin, we displace
the origin by a vector c by making the change of variables x → x0 = x − c. Then,
Z Z Z Z
x0 × dx0 = (x − c) × dx = x × dx − c × dx.
C C C C
360 An Introduction to Vectors, Vector Operators and Vector Analysis
R
However, the last term vanishes because C dx = 0, so the independence of origin of the
area integral is proved. Note that the cancellation of the parts of the integral proving its
independence of the origin remains valid even if the origin is chosen out of the plane
containing the curve c. Thus, the value of the area integral over a closed plane curve is
invariant of the origin, even if the origin is taken out of the plane containing the curve.
The area integral of a closed planar curve can be evaluated to give the area enclosed by
an self-intersecting plane curve such as the one shown in Fig. 11.17. The sign of the area
integral for subregions are indicated in the figure, with zero for subregions which are swept
out twice with opposite signs.
The integral Eq. (11.79) or Eq. (11.83) applies to curves in space which do not lie in
plane, giving the area of the surface swept out by the vector variable x while traversing the
curve. Such integrals may find application in Computer Aided Design, for example, applied
to the design of automobile parts.
Fig. 11.17 Directed area of a self-intersecting closed plane curve. Vertical and
horizontal lines denote areas with opposite orientation, so cross-hatched
region has zero area.
Fig. 11.18 Interior and exterior approximations to the area of the unit disc |x| ≤ 1
+ +
for n = 0, 1, 2 where A− − −
0 = 0, A1 = 1, A2 = 2, A2 = 4.25, A1 =
+
6, A0 = 12
We divide the plane into squares by first drawing x, y axes and then drawing the sequences
of parallel lines to x and y axis respectively at a separation of one unit of length. The
coordinates of the points of intersection of this mesh are x = 0, ±1, ±2, . . . and
y = 0, ±1, ±2, . . .. This mesh covers the whole plane by closed unit squares without a gap
or overlap. Also, the interiors of any two squares of this mesh are disjoint. Let A0+ (S ) be
the number of squares having points in common with S and A−0 (S ) be the number of
squares totally contained in S. Note that A0+ (S ) and A−0 (S ) also give the areas of figures
formed by these squares because the area of a single square is unity. Next, divide each
square into four equal squares of side 21 and area 14 . Let A1+ (S ) be the area covered by the
number of such squares (each of area 2−1 × 2−1 = 2−2 ) overlapping S and A−1 (S ) be the
area covered by the number of such squares contained in S. Since the area of individual
squares is now reduced by a factor of 2−2 , one or more such smaller squares may get
accommodated in the interior portion of S lying between the boundary of the figure
corresponding to A−0 (S ) and the boundary of S. This increases the interior area covered
2 Calculus with functions of three variables is called calculus of three variables. Calculus of three variables and vector calculus
are two sides of the same coin. Former is carried out in R3 while the latter is carried out in E3 .
362 An Introduction to Vectors, Vector Operators and Vector Analysis
by the smaller squares in comparison to that covered by the larger squares. On the other
hand, a larger square, overlapping S but not contained in S, when divided into four
smaller squares of equal area will generate one or more smaller squares with no overlap
with S at all, thus reducing the area of the figure corresponding to A0+ (S ). Thus, we see
that,
A−0 (S ) ≤ A−1 (S )
and
We iterate this process n times each time dividing the square in the previous iterate by 12
to get each square of side 2−n and area 2−2n . Reiterating exactly the same argument which
led to inequalities Eq. (11.85), we get, at nth step, (see Fig. 11.18)
A−n−1 (S ) ≤ A−n (S )
and
+
An−1 (S ) ≥ An+ (S ). (11.86)
It is clear that the values An+ (S ) form a monotonically decreasing and bounded sequence
converging to a value A+ (S ), while A−n (S ) increase monotonically and converge to a
value A− (S ). The value A− (S ) represents the inner area, the closest we can approximate
the area of S from below by congruent squares contained in S while the outer area gives
the least upper bound obtained by covering S by congruent squares. If both these values
are the same, we say that S is Jordan measurable and call the common value
A+ (S ) = A− (S ) = A(S ) the content or the Jordan measure of S. We express the fact that
S is Jordan measurable by saying that S has an area A(S ).
The difference An+ (S ) − A−n (S ) gives the total area of squares after nth iteration that
overlap with S, however, are not completely contained in S. All these squares contain
boundary points of S so that
where ∂S is the boundary of S. If the boundary of S has zero area, then we find that
which means A+ (S ) = A− (S ) = A(S ), that is, S has area A(S ). Thus, S has an area if its
boundary ∂S has zero area. We can also show that if S has an area then A+ (∂S ) = 0.
The criterion A+ (∂S ) = 0 is sufficient to show that most of the planar regions we
encounter in practice have definite area. This is certainly true if ∂S consists of a finite
number of arcs described by a function f (x ) or g (y ) with f or g continuous over a finite
Vector Integration 363
closed interval. The uniform continuity3 of continuous functions over bounded closed
interval immediately shows us that these arcs can be covered by a finite number of
rectangles of arbitrarily small area say 2 . Therefore,
A+ (∂S ) ≤ n2
Everything, we have said above, about the areas of the planar sets carries over immediately
to the volumes in three dimensions. In order to define the volume V (S ) of a bounded set
S in 3-D space, we have to use subdivisions of space into cubes of sides 2−n . The set S has
a volume if its boundary can be covered by a finite number of these cubes with arbitrarily
small total volume. This is true for all bounded sets S whose boundary consists of a finite
number of surfaces each of which is represented by a continuous function f (x), x varying
over a closed planar set.
(x, y ) ∈ R ; 0 ≤ z ≤ f (x, y ).
The surfaces enclosing this set are (i)z = f (x, y ) (ii) R (z = 0) and (iii) (x, y ) ∈ ∂R;
0 ≤ z ≤ f (x, y ). We define the double integral by the volume V (S ) of the set S, which can
be obtained as follows.
3 Uniform continuity of f (x ) means that for every ε > 0, there is a ∆ > 0 such that d (x , x ) < ∆ implies d (f (x ), f (x )) < ε
1 2 1 2
for every (x1 , x2 ). This means that a finite arc given by f (x ) can be covered by a finite, say n, number of squares of size ε2 ,
for every ε > 0.
364 An Introduction to Vectors, Vector Operators and Vector Analysis
The sums in this inequality are respectively called the lower sum and the upper sum.
We now make the subdivision of R finer and finer, such that the number of subsets the
number of subdivisions and the largest diameter of Ri ; i = 1, . . . , N tends to zero. The
continuous function f (x, y ) is uniformly continuous in the closed and bounded set R, so
that the maximum difference Hi − hi tends to zero with the maximum diameter over the
sets Ri of the subdivision. The differences over the upper and the lower sum also tend to
zero, since,
N
X N
X N
X
Hi A (Ri ) − hi A (Ri ) = (Hi − hi ) A (Ri )
i =1 i =1 i =1
N
X
≤ max (Hi − hi ) A (Rk )
i
k =1
= max (Hi − hi ) A(R). (11.88)
i
It follows from inequality Eq. (11.87) that the upper and lower sum both converge to the
limit V (S ) as the number of subdivisions N → ∞ or the largest diameter tends to zero.
Vector Integration 365
We obtain the same limiting value if we take the value of the function f (xi , yi ) at a point
(xi , yi ) ∈ Ri , instead of hi or Hi . We call the limit V (S ) the double integral of f over the
set R and write
Z Z
V (S ) = f (x, y )dR. (11.89)
R
Suppose, we now lift the restriction z = f (x, y ) > 0. Due to continuity of f (x, y ) the
surface (x, y ) ∈ R ; z = f (x, y ) may cut the x, y plane in some continuous curve and the
set S defined above is divided into two (or more, but we assume two) sets, one above and
the other below the x, y plane, each corresponding to two distinct parts, R+ and R− of the
domain R. These are the set S + given by (x, y ) ∈ R ; z = f (x, y ) > 0 and the set S − given
by (x, y ) ∈ R ; z = f (x, y ) < 0. We define a new set S ∓ by (x, y ) ∈ R ; z = −f (x, y ) > 0.
Both these are the sets of points above the x, y plane so that
Z Z Z Z
+
f (x, y )dR = V (S ) and (−f(x, y))dR = V(S∓ ) = V(S− ).
R+ R−
This means
Z Z
f (x, y )dR = V (S + ) − V (S − ).
R
We can summarize as follows. Consider a closed and bounded set R with area A(R) = ∆R
and a function f (x, y ) that is continuous everywhere in R including its boundary. We
subdivide R into N non-overlapping Jordan measurable subsets R1 , R2 , . . . , RN with areas
∆R1 , ∆R2 , . . . , ∆RN . In Ri we choose an arbitrary point (xi , yi ) where f (xi , yi ) = fi and
form the sum
N
X
VN = fi A(Ri ).
i =1
Since A(∂R) = 0 we can choose all Ri to lie entirely in the interior of R having no points
common with the boundary of R.
366 An Introduction to Vectors, Vector Operators and Vector Analysis
b−a d−c
∆x = and ∆y = .
n m
Let the points of subdivision be x0 = a, x1 , x2 , . . . , xn = b and y0 = c, y1 , y2 , . . . , yn = d.
We have N = nm. Every subregion is a rectangle with area A(Ri ) = ∆Ri = ∆x∆y. For the
point (xi , yi ) we take any point in the corresponding rectangle Ri and then form the sum
X
f (xi , yi )∆x∆y
i
over all the rectangles of the subdivision. If we now let both m and n simultaneously tend
to infinity, the sum tends to the integral of the function f over the rectangle R.
These rectangles can also be characterized by two suffixes µ and ν corresponding to the
coordinates x = a + ν∆x and y = c + µ∆y of the lower left hand corner of the rectangle in
question. Here, 0 ≤ ν ≤ (n − 1) and 0 ≤ µ ≤ (m − 1). With this identification of rectangles
with suffixes ν and µ we may write the sum as the double sum
n−1 m−1
X X
f (xν , yµ )∆x∆y. (11.90)
ν =0 µ=0
where h and k are numbers chosen conveniently. We call Ri the rectangles of the division
that lie entirely within R. Ri do not completely fill the region R. However, as we have
noted above, we can calculate the integral of the function f over R by summing only over
interior rectangles and then passing to the limit. Whenever we use a rectangular grid with
lines parallel to x and y axes we replace the in the integral differential dR by dxdy. Thus,
Z Z Z Z
f (x, y )dR = f (x, y )dxdy.
R R
Further, the dummy variables of integration x, y can be replaced, in the integral, by any
other pair of variables (u, v ), (ξ, η ) etc.
Vector Integration 367
The subdivision by the polar coordinate net (see Fig. 3.20) also finds frequent application.
We subdivide the entire angle 2π into n parts ∆θ = 2π/n and also choose a quantum ∆r
for the r coordinate. We draw the lines θ = ν∆θ (ν = 0, 1, 2, . . . , n − 1) through the origin
and also the concentric circles rµ = µ∆r, (µ = 0, 1, 2, . . .). We denote by Ri the patches
formed by their intersection which lie entirely in the interior of R, and the areas of Ri by
∆Ri . Then, the integral of the function f (x, y ) is given by the limit of the sum
X
f (xi , yi )∆Ri ,
and the double integral of f over R is obtained in the limit n → ∞ (or equivalently ∆r → 0
and ∆θ → 0) of this sum.
As an example, consider f (x, y ) = 1 over some bounded region R in the x, y plane.
Then, the double integral of f (x, y ) is given by the volume below the region R shifted
vertically to the plane z = 1. This volume is given by f (x, y ) · A(R) = 1 · A(R) = A(R).
Thus, we get the result
Z Z
dR = A(R).
R
368 An Introduction to Vectors, Vector Operators and Vector Analysis
Our next example is the double integral of f (x, y ) = xy over the rectangle
a ≤ x ≤ b ; c ≤ y ≤ d, or, more generally, any function f (x, y ) that can be decomposed as
a product of a function of x and a function of y in the form f (x, y ) = φ(x )ψ (y ). We use
the same division of the rectangle as in Eq. (11.90) and the value of the function at the
lower left hand corner of the sub-rectangle in the summand. The integral is then the
limit of the sum
n−1 m−1
X X
∆x∆y φ(ν∆x )ψ (µ∆y ),
ν =0 µ=0
From the definition of the ordinary integral, as ∆x → 0 and ∆y → 0 these factors tend to
the integrals of the corresponding functions over the respective intervals from a to b and
from c to d. Thus, we get a general rule that the double integral of a function satisfying
f (x, y ) = φ(x )ψ (y ) over a rectangle a ≤ x ≤ b ; c ≤ y ≤ d can be resolved into the
product of two integrals
Z Z Z b Z d
f (x, y )dxdy = φ(x )dx · ψ (y )dy.
R a c
This rule and the summation rule (see below) yield the integral over any polynomial over
a rectangle with sides parallel to the axes.
In our last example, we use a subdivision by a polar coordinate net. Let the region R be
the unit disc centered at the origin, given by x2 + y 2 ≤ 1 and let
q
f (x, y ) = 1 − x2 − y 2 .
where we have taken the value of the function at an intermediate circle with the radius
ρµ = (rµ+1 + rµ )/2. All subregions that lie in the same ring have the same contribution
and since there are n = 2π/∆θ such regions the contribution of the whole ring is
Vector Integration 369
q
2πρµ 1 − ρµ2 ∆r.
Thus, for the regions that are joined together, the corresponding integrals are added.
or,
Z Z
f (x, y )dR ≤ 0.
R
f (x, y ) ≥ g (x, y )
and
Z Z Z Z
f (x, y )dR ≥ − |f (x, y )|dR.
R R
If m is the greatest lower bound and M is the least upper bound of the function f (x, y ) in
R, and ∆R is the area of R, then,
Z Z
m∆R ≤ f (x, y )dR ≤ M∆R.
R
with µ lying between m and M. The precise value of µ cannot be specified more exactly.
This equation is called the mean value theorem in integral calculus. Generalizing, we can
say that for an arbitrary positive continuous function p (x, y ) on R,
Z Z Z Z
p (x, y )f (x, y )dR = µ p (x, y )dR,
R R
Vector Integration 371
where µ is a number between the greatest and the lowest values of f (x, y ) on R that cannot
be further specified.
We close by making the following two observations. The first is that a double integral on
R varies continuously with the function to be integrated. This means, given two functions
f and g satisfying
where ∆Ri is now the volume of the region Ri . The sum may be taken over all regions Ri ,
or, over those Ri which are interior to R. If we now take the limit as N → ∞ such that the
largest of the diameters of Ri tends to zero, then the sum tends to a limiting value which is
independent of the mode of subdivision or the choice of the intermediate points. We call
this limit the integral of the function f (x, y, z ) over the region R and write it as
Z Z
f (x, y, z )dR.
R
372 An Introduction to Vectors, Vector Operators and Vector Analysis
In particular, if the we subdivide into rectangular boxes with sides ∆x, ∆y, ∆z then the
volumes of all the inner regions Ri have the same value ∆x∆y∆z and the corresponding
integral is written as
Z Z Z
f (x, y, z )dxdydz.
R
Apart from the changes in notation all that has been said about the double integral is valid
for the triple integral.
is a function of y, which we integrate between the limits c and d to obtain the double
integral. In symbols,
Z Z Z d Z b
f (x, y )dxdy = φ(y )dy, φ(y ) = f (x, y )dx,
R c a
or,
Z Z Z d Z b
f (x, y )dxdy = dy f (x, y )dx. (11.92)
R c a
That is, in the repeated integration of a continuous function with constant limits of
integration, the order of integration can be reversed. This facility of changing the order of
integration is particularly useful in the explicit calculation of simple definite integrals for
which no indefinite integral can be found.
Z ∞ −ax −bx
e −e
Exercise Evaluate I = dx.
0 x
Solution We can write
ZT Zb
I = lim dx e−xy dy,
T →∞ 0 a
Hint Write
Z∞ Za
I= dx f 0 (xy )dy
0 b
We can resolve a double integral into a succession of single integrals even if the region of
integration is not a rectangle. We first consider a convex region R. A line parallel to x or
y axis cuts the boundary of such a region in not more than two points unless it forms a
part of the boundary (see Fig. 11.21). We can draw the so called lines of support giving the
circumscribing rectangle as shown in Fig. 11.21., at x = x0 , x = x1 , y = y0 , y = y1 . As we
move, for example, the line x = x0 towards right, it cuts the boundary of R at two points
whose y coordinates are functions of x say ψ1 (x ) and ψ2 (x ) as shown in Fig. 11.21.
Similarly, as we move the line y = y0 upwards, it cuts the boundary of R at two points
whose x coordinates are functions of y say φ1 (y ) and φ2 (y ) as shown in Fig. 11.21. Thus,
if we want to integrate f (x, y ) over x for a fixed value of y = yc we must integrate between
φ1 (yc ) and φ2 (yc ). Treating y as a parameter, then, the integral
Z φ2 (y )
f (x, y )dx
φ1 (y )
is a function of y and similarly, the integral
Z ψ2 (x )
f (x, y )dy
ψ1 (x )
is a function of the parameter x.
The resolution of the double integral over R into repeated single integrals is then given by
the equations
Z Z Z y1 Z φ2 ( y )
f (x, y )dR = dy f (x, y )dx
R y0 φ1 ( y )
Z x1 Z ψ2 ( x )
= dx f (x, y )dy. (11.95)
x0 ψ1 ( x )
Vector Integration 375
The generalization to the case of non-convex region R (see Fig. 11.22) is straightforward.
A line x =constant may now intersect the boundary of R in more than two points giving
rise to more than one segments over which we have to integrate f (x, y ) with respect to
y. Each pair of the Rpoints of intersection of the line x =constant gives rise to a pair of
functions of x. By f (x, y )dy we then mean the sum of the integrals of the function
f (x, y ) for a fixed x, taken over all the intervals that the line x =constant has in common
with the closed region.
It is possible to evaluate the double integral by dividing R into subregions each
corresponding to the fixed number of terms in such a sum. The integral over x ranges
from x0 to x1 which are the circumscribing vertical lines for R, that is, along the whole
interval over which the region R lies.
Answer
√
Z Z Z +1 Z + 1−x2
(a) f (x, y )dR = dx √ f (x, y )dy.
R −1 − 1−x2
Z Z Z −1 Z +√4−x2 Z 2 Z √
+ 4−x2
(b) f (x, y )dR = dx √ f (x, y )dy + dx √ f (x, y )dy
R −2 − 4−x2 1 − 4−x2
√ √
Z +1 Z + 1−x2 Z +1 Z + 4−x2
+ dx √ f (x, y )dy + dx √ f (x, y )dy. (11.96)
−1 − 4−x2 −1 − 1−x2
Answer
Z Z Z a Z x
f (x, y )dR = dx f (x, y )dy
R 0 0
Z a Z a
= dy f (x, y )dx. (11.97)
0 y
where
Z Z
f (x, y, z )dxdy
B
In this repeated integral we could have carried out integration in any order, (say first with
respect to x, then with respect to y and finally with respect to z) giving the same triple
integral. Thus, we can conclude that a repeated integral of a continuous function
throughout a closed rectangular region is independent of the order of integration.
Exercise Express the triple integral of a function f (x, y, z ) continuous on the closed
spherical region x2 + y 2 + z2 ≤ 1 in terms of repeated single integrals.
Answer
√ √
Z Z Z Z +1 Z + 1−x2 Z + 1−x2 −y 2
f (x, y, z )dxdydz = dx √ dy √ f (x, y, z )dz.
R −1 − 1−x2 − 1−x2 −y 2
Exercise Find the mass of the right triangular pyramid of rectangular base sides a and
height 3a/2 with uniform density ρ (see Fig. 11.25).
Solution Denoting the volume of the pyramid by V and its mass by M we know that
M = ρV . Thus, we have to find the volume of the pyramid given by the triple integral
Z Z Z
dP ,
P
378 An Introduction to Vectors, Vector Operators and Vector Analysis
where P is the pyramidal region of integration. To evaluate this triple integral, we convert it
to three repeated single integrals. We vary z from 0 to 3a/2. For a fixed z, using similarity
of triangles AOC and ADE (see Fig. 3.55) we find that y varies from 0 to a − 2z/3.
Now fixing both z and y and again using similarity of triangles which we leave for you
to find, we see that x varies from 0 to a − 2z/3 − y. Thus we get,
Z 3a/2 Z a−(2z/3) Z a−(2z/3)−y
M =ρ dz dy dx
0 0 0
1
= a3 ρ.
4
x = φ(u, v ), y = ψ (u, v )
Vector Integration 379
be a 1 − 1 mapping of R onto the closed region R̃ in the u, v plane. We assume that both
φ and ψ are C 1 functions and their Jacobian determinant
φu φv
D = = φu ψv − ψu φv
ψu ψv
is never zero in R. In other words, the functions x = φ(u, v ) and y = ψ (u, v ) possess a
unique inverse u = g (x, y ) and v = h(x, y ). Moreover, the two families of curves u =
constant and v = constant form a net over the region R. Each curve in the family u =
constant corresponds to a fixed value of u and is parameterized by v. We have,
We can construct the mesh of curves on the x, y plane as follows. We first cover the u, v
plane by the rectangular mesh of straight lines u = ν∆u and v = µ∆v, ν, µ = 0, ±1,
±2, . . . and then map each of these curves on the x, y plane by x = φ(u, v ), y = ψ (u, v )
giving the mesh on the x, y plane by the curves defined in Eq. (11.98). This mesh
subdivides the region of integration R into subregions Ri which are not, in general,
rectangular (see Fig. 11.16(b)). However, the subregions R̃i into which the region R̃ gets
divided are rectangular (see Fig. 11.16(a)). To find the double integral, we have to find the
area of the subregion Ri , multiply by the value of the function f at a point in Ri , sum this
product over Ri lying entirely within R and then take the limit of this sum as
∆u → 0, ∆v → 0.
The way we have constructed the subregions Ri tells us that the curves defining its
boundary are separated pairwise by the parameter values ∆u and ∆v. The coordinates of
the vertices of R̃i are (uν , vµ ), (uν + ∆u, vµ ), (uν , vµ + ∆v ), (uν + ∆u, vµ + ∆v ) and the
x, y coordinates of the vertices of Ri are obtained by mapping these coordinates by φ and
ψ respectively. If Ri were a parallelogram joining these vertices, instead of being bounded
380 An Introduction to Vectors, Vector Operators and Vector Analysis
by curves, then the area of Ri is given by the absolute value of the determinant (or the
absolute value of the cross product of the corresponding vectors)
φ(uν + ∆u, vµ ) − φ(uν , vµ ) φ(uν , vµ + ∆v ) − φ(uν , vµ )
·
ψ (uν + ∆u, vµ ) − ψ (uν , vµ ) ψ (uν , vµ + ∆v ) − ψ (uν , vµ )
Since φ and ψ are C 1 , we can approximate, for example, φ(uν + ∆u, vµ ) − φ(uν , vµ ) by
φu (uν , vµ )∆u∆v so that the area of Ri is approximated by the absolute value of
φu (uν , vµ ) φv (uν , vµ )
∆u∆v = ∆u∆vD.
ψu (uν , vµ ) ψv (uν , vµ )
Thus, forming the required sum and passing to the limit as ∆u → 0, ∆v → 0, we obtain
the expression for the double integral transformed to the new variables,
Z Z
f (φ(u, v ), ψ (u, v ))|D|dudv.
R̃
We will not pause here to show that the area of Ri coincides with the corresponding
parallelogram in limit ∆u → 0, ∆v → 0 and state the final result:
If the transformation x = φ(u, v ); y = ψ (u, v ) represents a continuous 1 − 1 mapping
of the closed Jordan measurable region R of the x, y plane to a region R̃ of the u, v plane
and if the functions φ and ψ are C 1 and their Jacobian
d (x, y )
= φu ψv − ψu φv
d (u, v )
is everywhere different from zero, then
Z Z Z Z
d (x, y )
f (x, y )dxdy = f (φ(u, v ), ψ (u, v )) dudv. (11.99)
R R̃ d (u, v )
We may add that the transformation formula is valid even if the Jacobian determinant
vanishes without reversing its sign at a finite number of isolated points in the region of
integration. In this case we cut these points out of R by enclosing them in small circles of
radius ρ. Equation (11.99) is valid for the remaining region. If we then let ρ → 0
Eq. (11.99) continues to be valid for the region R by virtue of the continuity of all
functions involved.
We can obtain the same result for the transformation of a triple integral over a three
dimensional region R which can be stated as follows.
If a closed Jordan measurable region R of x, y, z space is mapped on a region R̃ of
u, v, w space by a 1 − 1 transformation
The whole x, y plane is spanned by 0 ≤ r < ∞ and 0 ≤ θ < 2π so that for a given
finite region, the integral on RHS can be replaced by
Z R Z 2π
r f (r cos θ, r sin θ )drdθ.
0 0
with 0 ≤ r < ∞, 0 ≤ θ ≤ π and 0 ≤ φ < 2π. We obtain for the Jacobian determinant,
sin θ cos φ r cos θ cos φ −r sin θ sin φ
d (x, y, z )
= sin θ sin φ r cos θ sin φ r sin θ cos φ = r 2 sin θ.
d (r, θ, φ)
cos θ −r sin θ 0
Exercise Find the transformed triple integral for f (x, y, z ) over the whole space, in the
cylindrical coordinates ρ, θ, z related to cartesian coordinates by x = ρ cos θ, y =
ρ sin θ, z = z, where 0 ≤ ρ < ∞, 0 ≤ θ < 2π and −∞ < z < +∞.
d (x,y,z )
Solution We easily find that d (ρ,θ,z )
= ρ. This gives
Z Z Z Z ∞ Z 2π Z +∞
f (x, y, z )dxdydz = ρ F (ρ, θ, z )dρdθdz, (11.102)
R 0 0 −∞
x2 + y 2 z2
+ 2 =1
a2 b
in the form
b 2
q
z=± a − x2 − y 2 .
a
The volume of half of the ellipsoid above the x, y plane is given by the double integral
Z Z q
b
V = a2 − x2 − y 2 dxdy
a R
Vector Integration 383
x2 y 2 z2
+ + =1
a2 b2 c2
we make the transformation
d (x, y )
x = aρ cos θ, y = bρ sin θ, = abρ
d (ρ, θ )
to get, for half the volume
Z Z r
x2 y 2
Z Z q
V =c 1 − 2 − 2 dxdy = ρ 1 − ρ2 dρdθ,
R a b R̃
where the region R̃ is the rectangle 0 ≤ ρ ≤ 1, 0 ≤ θ ≤ 2π. Thus,
Z 2π Z1 q
2
V = abc dθ ρ 1 − ρ2 dρ = πabc.
0 0 3
Therefore, the full volume Ve is
4
Ve = 2V = πabc.
3
Finally, we calculate the volume of the pyramid enclosed by the three coordinate planes
and the plane hx + ky + lz = 1 where we assume that h, k, l are positive. This volume is
given by
Z Z
1
V = (1 − hx − ky )dxdy,
l R
384 An Introduction to Vectors, Vector Operators and Vector Analysis
(1−hx )
where the region of integration is the triangle 0 ≤ x ≤ 1h , 0 ≤ y ≤ k in the x, y plane.
Therefore,
Z 1 Z
1 (1 − hx )
h
V = (1 − hx − ky )dy.
l 0 0 k
(1 − hx )2
Integration with respect to y gives and we integrate again by substituting
2k
1 − hx = t to get
1
V = .
6hkl
This agrees with the rule that the volume of a pyramid is one third of the product of its base
area with its height. Note that, in the single crystal scenario, h, k, l are the reciprocals of
the miller indices (which are positive integers with no common factors) of a crystal lattice
plane, intersecting the crystal axes, with intercepts h, k, l.
In many instances, the volume triple integral is evaluated by converting it to a
succession of single integrals over spherical polar or cylindrical coordinates. As a generic
application, we calculate the volume of a solid of revolution obtained by rotating a curve
x = φ(z ) about the z-axis. We assume that the curve does not cross the z-axis and that the
solid revolution is bounded above and below by the planes z = constant. Therefore, the
inequalities defining the solid are of the form a ≤ z ≤ b and 0 ≤ x2 + y 2 ≤ (φ(z ))2 .
In terms of the cylindrical coordinates
x y
q
z, ρ = x2 + y 2 , θ = cos−1 = sin−1
ρ ρ
the volume triple integral becomes
Z Z Z Z b Z 2π Z φ (z )
dxdydz = dz dθ ρdρ.
R a 0 0
This gives, after integration,
Zb
V =π φ(z )2 dz. (11.103)
a
This integral can be interpreted as the sum of the volumes of the discs of radii φ(z ) and
width ∆z stacked together to fill the region of integration, in the limit ∆z → 0.
Next, let the region R contain the origin O of the spherical polar coordinate system
(r, θ, φ) and let r = f (θ, φ) be the surface defining the boundary of R. Then, the volume
of R is given by
Z 2π Z π Z f (θ,φ)
V = dφ sin θdθ r 2 dr.
0 0 0
Vector Integration 385
1 2π
Z Zπ
V = dφ f 3 (θ, φ) sin θdθ. (11.104)
3 0 0
If R was a closed spherical ball of radius R, so that f (θ, φ) = R is constant, Eq. (11.104)
yields the volume 34 πR3 .
Area of a curved surface
We wish to find an expression for the area of a curved surface by means of a double integral.
We construct a polyhedron circumscribing the given surface such that each of its polygonal
faces is tangent to the surface at one point, as follows.
We assume, that the surface is represented by a function z = f (x, y ) with continuous
derivatives on a region R on the x, y plane. We subdivide R into n subregions Rν , ν = 1,
2, . . . , n with areas ∆Rν , ν = 1, 2, . . . , n. In these subregions we choose points (ξν , ην ), ν =
1, 2, . . . , n. At the point on the surface xν , yν , ζν = f (xν , yν ), we construct the tangent
plane to the surface and find the area of the portion of this plane lying above the region Rν
(see Fig. 11.27). Let βν be the angle that the tangent plane
makes with the x, y plane and let ∆τν be the area of the portion τν of the tangent plane
above Rν . Then, the region Rν is the projection of τν on the x, y plane. Therefore,
To get cos βν note that βν is also the angle between the normals to the planes
φ1 (x) = z = 0 and φ2 (x) = (z − ζν ) − fx (ξν , ην )(x − ξν ) − fy (ξν , ην )(y − ην ) = 0 or
between the gradients of φ1 (x) and φ2 (x). The vectors ∇φ1 and ∇φ2 are (001) and
(fx (ξν , ην )fy (ξν , ην )1) respectively. Evaluating their dot products by their components
q by their magnitudes and equating these, we get 1 = |∇φ1 | |∇φ2 | cos βν =
and
1 + fx2 (ξν , ην ) + fy2 (ξν , ην ) cos βν or,
386 An Introduction to Vectors, Vector Operators and Vector Analysis
1
cos βν = q .
1 + fx2 (ξν , ην ) + fy2 (ξν , ην )
Therefore,
q
∆τν = 1 + fx2 (ξν , ην ) + fy2 (ξν , ην ) · ∆Rν .
and let n → ∞ and simultaneously the diameter of the largest subdivision tend to zero.
This sum will then have the limit, independent of the way we subdivide R,
Z Z q
A= 1 + fx2 + fy2 dR. (11.105)
R
We use this integral to define the area of the given surface. Note that if the surface happens
to be a plane surface, for example z = f (x, y ) = 0, we have
Z Z
A= dR,
R
which agrees with our definition of the area of a planar region. Sometimes we call
q q
dσ = 1 + fx2 + fy2 dR = 1 + fx2 + fy2 dxdy
the element of area of the surface z = f (x, y ). The area integral can be written
symbolically in the form
Z Z
A= dσ .
R
We find
∂z x ∂z y
= −p ; = −p .
∂x R2 − x2 − y 2 ∂y R2 − x2 − y 2
where the region of integration is the circle of radius R lying in the x, y plane with its
origin at the center. Introducing polar coordinates and resolving into single integrals
we get
Z 2π Z R Z R
rdr rdr
A = 2R dθ √ = 4πR √ = 4πR2 ,
0 0 R2 − r 2 0 R2 − r 2
where the last integral on the right can be easily evaluated by substituting
R2 − r 2 = u 2 .
If the equation to the surface is given in the form φ(x, y, z ) = 0 then we get another
expression for its area. Assuming this equation gives implicit dependence of z on
independent variables x and y, and also that φz , 0, we get
dφ ∂φ ∂φ ∂z ∂z φ
= + = 0; or, fx = = − x.
dx ∂x ∂z ∂x ∂x φz
Similarly,
∂z φy
fy = =− .
∂y φz
for the area where the region R is again the projection of the surface on the x, y plane.
If, instead of z = z (x, y ), the surface was given by x = x (y, z ) then the expression for
area would be
Z Z q Z Z q
2 2 2 2 2 1
A= 1 + xy + xz dydz = φx + φy + φz dydz, (11.107)
φx
388 An Introduction to Vectors, Vector Operators and Vector Analysis
or, if the surface was given by y = y (z, x ) then the expression for area would be
Z Z q Z Z q
2 2 2 2 2 1
A= 1 + yx + yz dzdx = φx + φy + φz dzdx. (11.108)
φy
Equations (11.106), (11.107), (11.108) define the same area. To see this, apply the
transformation
x = x (y, z ), y = y,
d (x, y ) φ
= z,
d (y, z ) φx
so that
Z Z q Z Z q
φx2 + φy2 + φz2 1 dxdy = φx2 + φy2 + φz2 1 dydz,
φz φx
R R̃
and expressing the area of the surface as an integral over the appropriate parameter domain.
Then, a definite region in the (u, v ) plane corresponds to the surface. Without going into
the details we simply state the expression for the area of a surface in terms of its parametric
description.
Z Z q
A= (φu ψv − ψu φv )2 + (ψu χv − χu ψv )2 + (χu φv − φu χv )2 dudv. (11.109)
R
where E, F, G are the coefficients of the line element given by Eqs (10.66), (10.67), (10.68).
Vector Integration 389
Exercise Using the parametric representation of a sphere of radius R via the spherical
polar coordinates θ, φ, 0 ≤ θ ≤ π, 0 ≤ φ ≤ 2π, show that
dσ = R2 sin θdθdφ
Here, r coordinate of a point on the curve is the radius of the circle it traces out as it rotates
about the z-axis. This gives
E = 1 + φ0 2 (r ), F = 0, G = r 2
where r is the distance from the z-axis to the point on the rotating curve corresponding
to s.
Exercise Use the above integral with respect to the arc length parameter to calculate
the surface area of the torus obtained by rotating the circle (x − a)2 + z2 = r 2 about the
z-axis.
Solution We introduce arc length as parameter, so that the distance u of a point on the
circle from the z-axis is given by u = a + r cos(s/r ). The area is, therefore,
Z 2πr Z 2πr
s
2π uds = 2π a + r cos ds = 2πa · 2πr.
0 0 r
390 An Introduction to Vectors, Vector Operators and Vector Analysis
The area of the torus is therefore equal to the product of the circumference of the
generating circle and the length of the path traced out by the center of the circle.
Finally, the moments of a curve x (s ), y (s ), z (s ) in space with mass density µ(s ) are
defined by
Z s1 Z s1 Z s1
Tx = µxds, Ty = µyds, Tz = µzds,
s0 s0 s0
RRR
1
That is, the center of mass has the coordinates, {ξ, η, ζ} = M R
µ{x, y, z}dxdydz where
RRR
M= R
µdxdydz.
If the mass distribution is homogeneous, that is, µ = constant the center of mass of
the region is called its centroid. The centroid is clearly independent of the choice of the
constant positive value of the mass density. Thus, the centroid becomes a geometrical
concept associated only with the shape of the region R, independent of the mass
distribution.
Exercise Find the center of mass of a homogeneous hemispherical region H with mass
density 1.
Solution The region is given by x2 + y 2 + z2 ≤ 1 ; z ≥ 0. The two moments Tx and Ty
are zero as the respective integrations with respect to x and y vanish. For
Z Z Z
Tz = zdxdydz,
H
z = z, x = r cos θ, y = r sin θ
to get
√
1 1−z2 2π 1
1 − z2
Z Z Z Z
π
Tz = zdz rdr dθ = 2π zdz = .
0 0 0 0 2 4
2π
Since the total mass is 3 , the coordinates of the center of mass are (0, 0, 83 ).
Exercise Find the center of mass of a hemispherical surface of unit radius over which a
mass of unit density is uniformly distributed.
Solution For the parametric representation
This leads to Tx = 0 = Ty because these involve integrating cos φ and sin φ over a single
period and
Z π/2 Z 2π
Tz = sin θ cos θdθ dφ = π.
0 0
Since the total mass is 2π we see that the coordinates of the center of mass are (0, 0, 21 ).
392 An Introduction to Vectors, Vector Operators and Vector Analysis
Moment of inertia
The moment of inertia plays the role of mass for the rotational motion of a rigid body. The
kinetic energy of a body rotating uniformly about an axis equals the product of the square
of the rotational velocity and the moment of inertia. The moment of inertia of a continuous
mass distribution with density µ(x) = µ(x, y, z ) over a region R with respect to the x-axis
is given by
Z Z Z
Ix = µ(y 2 + z2 )dxdydz.
R
This is simply integrating over the distance of every point (x, y, z ) in R from the x-axis
multiplied by the mass density µ(x, y, z ). The moments of inertia about the other two axes
are defined similarly. The moment of inertia about a point, say the origin is defined to be
Z Z Z
µ(x2 + y 2 + z2 )dxdydz
R
and the moment of inertia with respect to a plane say the y, z plane is
Z Z Z
µx2 dxdydz.
R
A complete description of the arbitrary rotational motion of a rigid body requires the so
called products of inertia
Z Z Z
Ixy = − µxydxdydz = Iyx ,
R
Z Z Z
Iyz = − µyzdxdydz = Izy ,
R
Z Z Z
Izx = − µzxdxdydz = Ixz . (11.114)
R
The three quantities Ix , Iy , Iz and the six products of inertia are sufficient to describe
arbitrary rotational motion of a rigid body. These nine quantities, written as a symmetric
matrix, are collectively called the moment of inertia tensor. The mutually perpendicular
axes with respect to which the moment of inertia tensor becomes diagonal are called the
principal axes. Generally, these are determined by the symmetry elements of the rigid body.
The moment of inertia with respect to an axis parallel to x-axis and passing through the
point (ξ, η, ζ ), is given by the expression
Z Z Z
µ[(y − η )2 + (z − ζ )2 ]dxdydz,
R
Vector Integration 393
Z Z Z Z Z Z
2 2 2 2
= µ[(y − η ) + (z − ζ ) ]dxdydz + (η + ζ ) µdxdydz.
R R
Since any arbitrary axis of rotation can be chosen to be the x-axis, the result we have got
can be expressed as follows.
The moment of inertia of a rigid body with respect to an arbitrary axis of rotation is
equal to the moment of inertia of the body about a parallel axis through its center of mass
plus the product of the total mass and the square of the distance between the center of mass
and the axis of rotation.
Finally, the moment of inertia of a surface distribution, with respect to the x-axis, is
given by
Z Z
µ(y 2 + z2 )dσ ,
S
Z Z Z
= (x2 + z2 )dxdydz,
V
Z Z Z
= (x2 + y 2 )dxdydz. (11.115)
V
2 1 4
Z Zπ Z 2π
8π
I = r dr sin θdθ dφ = .
3 0 0 0 15
Exercise For a beam with edges a, b, c parallel to x-axis, y-axis, z-axis respectively, with
unit density and the center of mass at the origin, find the moment of inertia about the x, y
plane.
Z a/2 Z b/2 Z c/2
c3
Solution dx dy z2 dz = ab .
−a/2 −b/2 −c/2 12
Exercise Find the moment of inertia tensor of a right triangular pyramid with constant
density ρ shown in Fig. 11.25 about the origin O. Diagonalize this matrix using the
technique developed in Chapter 2 to find its eigenvalues and eigenvectors, which define
the principal values and principal axes of this moment of inertia tensor.
Solution With i, j = x, y, z we can write, for the elements of the moment of inertia
tensor,
2
y + z2
Z 3a/2 Z a−(2z/3) Z a−(2z/3)−y
−xy −zx
Iij = ρ dz dy dx −xy z2 + x2 −yz · (11.116)
0 0 0
−zx −yz x2 + y 2
We have already found that the total mass of the pyramid is M = 14 a3 ρ, or, ρ = 4M a3
.
Carrying out the integrations in the above equation and using the expression for ρ in terms
of the total mass M we get
2 13 −2 −3
Ma
Iij =
−2 13 −3 · (11.117)
40
−3 −3 8
In order to obtain the principal moments of inertia about the origin O and the principal
axes of inertia, we diagonalize the inertia tensor using the methods of section 5.1. The result
is
2 15 0 0
(p ) Ma
Iij =
0 5 0 , (11.118)
40
0 0 14
1
ĵp = √ (î + ĵ + 2k̂)
6
√
2 1
k̂p = − (î + ĵ) + √ k̂, (11.119)
3 3
where î, ĵ, k̂ are the unit vectors along the x, y, z-axes shown in Fig. 11.25.
where the integral is expressed in terms of the values of f (x ) at the boundary points.
The corresponding result in two dimensions is called Gauss’s theorem, or the divergence
theorem. This theorem connects the integral of the divergence of a 2-D vector field f(x) =
f (x)î + g (x)ĵ in a 2-D region R with the line integral of the normal component of this
field along the boundary curve of R taken in the positive sense, which we denote by C+ . In
two dimensions, this theorem, stated in the form involving functions of several variables,
is called Gauss’s theorem, or the divergence theorem. When stated in the vector form, the
same result is called Stokes theorem. The divergence theorem in 2-D is thus stated as
Z Z Z
[fx (x, y ) + gy (x, y )]dxdy = [f (x, y )dy − g (x, y )dx ], (11.120)
R C+
The divergence theorem holds good for any 2-D open set R bounded by one or more closed
curves, each consisting of a finite number of smooth arcs and for functions f and g which
are C 1 throughout R and on C (see Fig. 11.28). We do not give the proof of this theorem
396 An Introduction to Vectors, Vector Operators and Vector Analysis
in detail, suffice it to say that the proof is based on the method of expressing the double
integral as successive single integrals, which we have seen in detail before. We will explore
some of the applications of this all-important theorem.
Stokes theorem
As we have already noted, Stokes theorem in 2-D is obtained by casting the divergence
theorem in the vector form. Thus, let the functions f (x, y ) and g (x, y ) be the components
of a 2-D vector field f. Thus, the integrand of the double integral in Eq. (11.120) simply
becomes the divergence of f. In order to get a vector expression for line integral in
Eq. (11.120), we parameterize the oriented boundary curve C+ by the arc length s. Here,
the sense of increasing s corresponds to the positive orientation of the boundary curve C.
The RHS of Eq. (11.143) then becomes
Z
[f (x, y )ẏ − g (x, y )ẋ ]ds,
C
t̂. Thus, n̂ is the normal pointing to the right side of the oriented curve C (see subsection
9.2.5). Since in this case C+ is oriented so as to have region R on the left side of C+ , we see
that n̂ is the outward normal to the region R (see Fig. 11.29). The components ξ, η of the
unit vector n̂ are the direction cosines of the outward normal:
ξ = cos θ ; η = sin θ
Here, the integrand on the right is the scalar product f · n̂ of the vector f with components
dx dy
f , g and the vector n̂ with components dn , dn . Since n̂ is a unit vector, the scalar product
f · n̂ represents the component fn of the vector f in the direction of n̂. Thus, the divergence
theorem takes the form
Z Z Z Z
∇ · fdxdy = f · n̂ds = fn ds. (11.122)
R C C
In words, the double integral of the divergence of a plane vector field over a set R is equal
to the line integral, along the boundary C of R, of the component of the vector field in the
direction of the outward normal.
There is another form of Stokes theorem in the plane offering an entirely different vector
interpretation. To get to it, we put
We take the two functions a and b to be the components of a vector field g, where g is
obtained at each point from vector f by its counterclockwise π/2 rotation. We see that
aẋ + bẏ = g · t̂ = gt ,
where gt is the tangential component of the vector g. The integrand of the double integral
in Eq. (11.123) gives the z component of (∇ × g)z of curlg provided we assume the field
g to continue in the whole of 3-D space coinciding with g ≡ (a(x, y ), b (x, y )) on the x, y
plane. The Stokes theorem now takes the form,
Z Z Z
(∇ × g)z dxdy = gt ds.
R C
Since any plane in space can be taken to be the x, y plane of a suitable coordinate system,
we arrive at the following general formulation of Stokes theorem.
Z Z Z
(∇ × g)n dA = gt ds, (11.124)
R C
where R is any plane region in space, bounded by the curve C and (∇×g)n is the component
of the vector ∇×g or curlg in the direction of the normal n̂ to the plane containing R. Here
C has to be oriented in such a way that the tangent vector t̂ points in the counterclockwise
direction as seen from the side of the plane toward which n̂ points. In other words, the
corresponding rotation of a right handed screw should advance it in the direction of n̂.
If the complete boundary C of R consists of several closed curves (see Fig. 11.28), these
results remain valid provided we extend the line integral over each of these curves oriented
properly, so as to leave R on its left side. If the functions a and b satisfy the condition
ay = bx ,
the expression adx + bdy becomes a perfect differential. Since ay = bx , the double integral
over R in Eq. (11.123) vanishes so that,
Z
(adx + bdy ) = 0 (11.125)
C
over a path joining two end points P0 and P1 has the same value for all paths in R joining
the end points P0 and P1 , provided R is simply connected (see section 11.1).
Exercise Use the divergence theorem in the plane to evaluate the line integral
Z
f du + gdv
C
for the following functions and paths taken in the counterclockwise sense about the given
region.
Exercise Assuming the conditions for the divergence theorem hold, derive the following
expressions in polar coordinates for the area of a region R with boundary C:
Z Z
1 2
r dθ, − rθdr,
2 C+ C+
where in the second formula we assume that R does not contain the origin.
R
Hint Note that A = 21 C (xdy − ydx ).
+
Solution We first make the region interior to C simply connected by means of cuts as
shown in Fig. 11.30. Let Γ be the boundary of the simply connected region so formed, say
R, oriented positively, that is, by traversing it keeping region R on the left. The positive
sense on Γ is indicated by the arrows in Fig. 11.30. We have,
Z Z Z Z
(adx + bdy ) = (adx + bdy ) − (adx + bdy ) − · · · − (adx + bdy )
Γ C C1 Cn
Z n
X
= (adx + bdy ) − mi . (11.127)
C i =1
combined with the rule for differentiating a product immediately gives a prescription for
integrating by parts that is basic to the theory of partial differential equations. We substitute
for both f and g a product of functions, namely, f (x, y ) = a(x, y )u (x, y ) and g (x, y ) =
b (x, y )v (x, y ), where the functions a, b, u, v are C 1 in R as well as on C. Since
Recognizing
ωxx + ωyy = ∇2 ω ≡ ∆ω
dx dy
and that dn and dn are the direction cosines of the outward normal to the boundary C of
R so that
dx dy dω
ωx + ωy =
dn dn dn
is the directional derivative ofω in the direction of the outward normal to C, we obtain, for
Green’s first theorem,
Z Z Z Z Z
dω
(ux ωx + uy ωy )dxdy = u ds − u∆ωdxdy. (11.130)
R C dn R
Subtracting the two equations gives an equation symmetric in u and ω known as Green’s
second theorem:
Z Z Z !
dω du
(ω∆u − u∆ω )dxdy = u −ω ds. (11.131)
R C dn dn
402 An Introduction to Vectors, Vector Operators and Vector Analysis
These two theorems of Green are basic in solving the partial differential equation (Laplace
equation) ∇2 u = uxx + uyy = 0 (see books on Electrodynamics like [9, 13]).
Fig. 11.31 Amount of liquid crossing segment I in time dt for uniform flow of velocity v
We take this area of the parallelogram swept by the liquid crossing the segment I in time
dt (∠(n̂, v) < π/2) to be positive while the corresponding area for the unit vector n̂ such
that ∠(n̂, v) > π/2 is taken to be negative. If ρ is the density of the liquid, then (v · n̂)ρsdt
is the mass of the liquid that crosses I toward the side to which n̂ points.
Now let C be the curve in the x, y plane. We select one of the two possible unit normals
along C and call it n̂. In a flow with velocity and density depending on x, t the integral
Z
(v · n̂)ρds (11.132)
C
Vector Integration 403
represents mass of the liquid crossing C in unit time toward the side of C to which n̂ points.
This follows by approximating C by a polygon and the flow for which the velocity is constant
across each side of the polygon.
If C is the boundary of a region R and if n̂ is the outward normal, the integral
represents the mass of the liquid leaving R in unit time. Applying the divergence theorem
as in Eq. (11.122)we can express the flow through C as a double integral
Z Z Z Z
(v · n̂)ρds = (ρv) · n̂ds = ∇ · (ρv)dxdy. (11.133)
C C R
We can compare this flow of mass through C out of R with the change in mass contained
in R. The total mass of the liquid contained in the region R at time t is
Z Z
ρ (x, t )dxdy.
R
If we assume that the mass is conserved, then mass can only be lost from R by passing
through the boundary C. Hence, by Eq. (11.133) we have
Z Z
∂ρ
(∇ · (ρv) + )dxdy = 0. (11.134)
R ∂t
Since this identity holds for arbitrary R, if we progressively reduce the area of R the integral
will have the value given by the product of the integrand evaluated at some arbitrary point
in R and the area of R. Since area of R > 0, the integrand must vanish at all points at
which the velocity field is defined. Stated more rigorously, if we divide Eq. (11.134) by the
area of R then in the limit as area of R tends to zero, we get
∂ρ
∇ · (ρv) + = 0. (11.135)
∂t
This differential equation expresses the law of conservation of mass in the flow. In terms of
the components (v1 , v2 ) of the velocity vector we can write Eq. (11.135) as
!
∂ρ ∂ρ ∂ρ ∂v1 ∂v2
+ v1 + v2 +ρ + = 0. (11.136)
∂t ∂x ∂y ∂x ∂y
∂v1 ∂v2
∇·v = + = 0. (11.137)
∂x ∂y
Combining Eqs (11.133) and (11.137) we see that the total amount of an incompressible
liquid crossing a closed curve C is zero:
Z
(v · n̂)ds = 0. (11.138)
C
Stokes theorem, in the form of Eq. (11.124) applied to the vector field v has also
interesting consequences for the liquid flow. The integral over a closed oriented curve C
namely,
Z
v · t̂ds
C
where t̂ is the unit tangent vector corresponding to the orientation of C, is called the
circulation of the liquid around C. By stokes theorem, this circulation is equal to the
double integral
Z Z
(∇ × v)z dxdy
R
over the enclosed region R. Hence, the quantity
∂v2 ∂v1
(∇ × v)z = − , (11.139)
∂x ∂y
called the vorticity of the motion, measures the density of circulation at the point
x ≡ (x, y ) in the sense that the area integral of the vorticity gives the circulation around
the boundary. A flow is called irrotational if the vorticity vanishes everywhere, that is, if
∂v2 ∂v1
(∇ × v)z = − = 0. (11.140)
∂x ∂y
By stokes theorem, the circulation around a closed curve C vanishes if C is the boundary of
a region where the motion is irrotational. Since Eq. (11.140) is the condition for
v1 dx + v2 dy to be a perfect differential, there exists for an irrotational flow in every
simply connected region a scalar valued function φ(x, t ) such that
∆φ = φxx + φyy = 0.
Vector Integration 405
of the Laplace equation. By Eq. (11.141) the velocity potential has the components
ax ay
v1 = − v2 = −
r2 r2
and is singular at the origin (see Fig. 11.32(a)). All velocity vectors point towards the origin
for a > 0, away from the origin for a < 0. The velocity of the liquid at a given location does
not change with time, although the velocities at different points are different. Such a flow is
said to be a steady flow. The circulation around any closed curve not passing through the
origin vanishes, since vorticity is zero as can be easily checked so that
Z Z Z
v · t̂ds = v1 dx + v2 dy = − dφ = 0.
C C C
The amount of liquid passing outward through a simple closed curve C in unit time is
Z Z ! Z Z
dy dx xdy − ydx
ρ v · n̂ds = ρ v1 − v2 = ρ v1 dy − v2 dx = −aρ 2 2
,
C C ds ds C C x +y
Fig. 11.32 (a) Flow with sink and (b) Flow with vortex
Z Z
xdy − ydx
Exercise Show that 2 2
= dθ.
C x +y C
starting and finishing values of θ are the same as we trace the simple closed curve C once,
making the limits of the integral the same, so that the value of the integral is zero. Therefore,
0 if C does not enclose the origin
Z
ρ v · n̂ds =
C −2πaρ if C encloses the origin.
Thus, the amount of mass flowing through every simple closed curve C enclosing the origin
in unit time is the same. For a > 0 the origin acts as a sink where mass disappears at the
rate of 2πaρ units in unit time. For a < 0 there is a source of mass at the origin, giving out
mass at the same rate.
Let us now consider a steady flow given by the velocity potential
y
φ = cθ = c tan−1 .
x
Despite φ being multiple valued, the corresponding velocity field is single valued:
cy cx
v1 = v2 = − .
r2 r2
The vector field v is everywhere normal to the radii from the origin (see Fig. 11.32(b)).
Again, the velocity field is singular at the origin.
The circulation around a closed curve C has the value
Z Z Z
v1 dx + v2 dy = − dφ = −c dθ.
C C C
Thus, the circulation is zero for a simple closed curve not enclosing the origin. For a simple
closed curve encircling the origin in the counterclockwise sense we find the value −2πc for
the circulation. This corresponds to a vortex of strength −2πc concentrated at the origin.
On the other hand, the flow of mass in unit time through any closed curve C not passing
through the origin is zero, since here
Z Z Z
xdx + ydy dr
ρ v · n̂ds = cρ 2 + y2
= cρ = 0.
C C x C r
We consider a 2-D surface in the 3-D space which is piecewise smooth, that is, every
point P0 (x0 ) of the surface has a neighborhood S which can be represented by a vector
valued function of two parameters x(u, v ) having continuous partial derivatives with
respect to u and v in S. All points in S are covered by varying x(u, v ) as the parameters
u, v vary over an open set γ in the u, v plane such that different (u, v ) correspond to
different points on S. Further, we want the function x(u, v ) to have the derivatives
xu (u, v ) and xv (u, v ) with respect to u, v in γ that are continuous and linearly
independent. We call such a representation a regular local representation of the surface.
We have seen that, the equations
xu × xv , 0 or |xu × xv |2 > 0.
where the determinant in this equation, denoted Γ (xu , xv ), is called the Gram determinant.
Γ (xu , xv ) = 0 implies that xu × xv = 0, that is, xu , xv are collinear and hence linearly
dependent. Conversely, if |xu × xv |2 = Γ (xu , xv ) > 0, then xu × xv , 0 so that xu , xv
are not collinear and hence are linearly independent. Therefore, the fact that x(u, v ) is a
regular local parameterization of the surface implies that Γ (xu , xv ) > 0, so that xu , xv are
linearly independent.
The vectors xu (u, v ) and xv (u, v ) at a point P = x(u, v ) of S are tangential to S at P
and span the tangent plane π (P ) of S at P . Thus, every point of the tangent plane has the
position vector xT (u, v ) = x(u, v ) + λxu (u, v ) + µxv (u, v ) with suitable coefficients λ
and µ.
In order to orient the surface S, we first assign an orientation to the tangent plane π (P ).
Orienting a plane means specifying one of the two sides of it. This can be done by specifying
one of the two unit normals to the plane. In order to specify one of the two unit normals
to the tangent plane π (P ) and make it the oriented tangent plane π∗ (P ), we specify an
ordered pair of linearly independent vectors ξ (P ) and η (P ) in π (P ). The order of these
vectors, ξ (P ), η (P ) or η (P ), ξ (P ) decides which of the two possible directions along the
line normal to the plane π (P ) at P is the direction of the corresponding vector product
ξ (P ) × η (P ) or η (P ) × ξ (P ) (see Fig. 11.33).
408 An Introduction to Vectors, Vector Operators and Vector Analysis
Thus, the orientation of π∗ (P ) is specified in terms of the direction of the vector product of
ξ (P ), η (P ), with the order of factors being the same as that of the ordered pair of ξ (P ) and
η (P ) chosen to specify the orientation of π∗ (P ). Thus, the oriented tangent plane π∗ (P )
can be specified by the pair (π (P ), n̂) or (π (P ), −n̂) where
ξ (P ) × η (P )
n̂ = (11.142)
|ξ (P ) × η (P )|
or
η (P ) × ξ (P )
−n̂ = (11.143)
|η (P ) × ξ (P )|
is the unit vector giving the direction of the vector product of ξ (P ) and η (P ) in the
chosen order. Since there are only two possible orientations of π (P ), they can be specified
via a dicotomic function Ω(π∗ (P )) = Ω(ξ (P ), η (P )) = ±1 where each of the two values
corresponds to one of the two possible orientations, but which value corresponds to which
orientation is arbitrary (see subsection 1.16.1). Any other ordered pair of independent
tangential vectors ξ 0 (P ), η 0 (P ) at P determines the same orientation if the angle between
the corresponding vector products is less than π/2, that is,
0 0 0 0 ξ · ξ 0 ξ · η 0
[ξ (P ), η (P ); ξ (P ), η (P )] = (ξ (P ) × η (P )) · (ξ (P ) × η (P )) = > 0.
η · ξ 0 η · η 0
where sgn(x ) equals −1 for x < 0 and +1 for x ≥ 0 respectively. Equivalently, the the
ordered pairs of tangential vectors ξ, η and ξ 0 , η 0 give the same orientation to π if
n̂ · n̂0 > 0
where n̂ and n̂0 are the unit vectors specifying the directions of ξ × η and ξ 0 × η 0
respectively. Since ξ (P ), η (P ) and ξ 0 (P ), η 0 (P ) belong to the same plane, there are only
two possibilities: n̂ · n̂0 = +1 (n̂0 = n̂) or n̂ · n̂0 = −1 (n̂0 = −n̂).
We now use the orientation of the tangent plane to the surface S at a point P on S to
define the orientation of the surface S in the following way. We say that the unit normals
defining the orientations of the tangent planes π∗ (P ) depend continuously on P , when
these normals to the planes π∗ (P ) at the points close to each other (in the Euclidean
sense) are themselves close to each other (in the Euclidean p sense). That is, given > 0
however small, there exists δ > 0 such that, (u − u1 )2 + (v − v1 )2 < δ implies
|n̂(x(u, v )) − n̂(x(u1 , v1 ))| < . This is expressed by saying that the the orientation
Ω(π∗ (P )) of tangent plane at P varies continuously as P varies on S. An oriented surface
S ∗ is defined as a surface S with continuously oriented tangent planes π∗ (P ).
It is possible to find another criterion to ascertain the unit vector deciding the
orientation of a tangent plane. If {ξ, η} stands for one of the two ordered pairs drawn out
of ξ (P ) and η (P ), then the corresponding unit normal n̂ deciding the orientation of
π∗ (P ) is the one which makes the triplet {ξ, η}, n̂ positively oriented (see section 1.16).
This is equivalent to the following inequality,
Thus, out of the two possible unit vectors perpendicular to π (P ) at P , the unit vector
satisfying inequality Eq. (11.145) decides the orientation of π (P ). As we show in the next
para, this vector also specifies the orientation of a connected surface S. Let ê1 , ê2 , ê3 be
the orthonormal basis to which all vectors are referred. Then, if the triplets (ξ, η, n̂) and
ê1 , ê2 , ê3 , have the same orientation, we can write (see section 1.16),
and we call this vector n̂, defining the orientation of S ∗ , the unit normal vector pointing to
the positive side of the oriented surface S ∗ or the positive unit normal to S ∗ .
We can now understand how to assign an orientation to a connected surface S. We
choose a point P on S, and the pair (ξ, η ) in the tangent plane π (P ), which decides, via
Eq. (11.142), one of the two possible unit vectors n̂ and n̂0 specifying the orientation of
S at P . This unit vector actually specifies the orientation of the whole surface S, as the
following argument shows. At P we have n̂0 = n̂, where = (P ) = ±1. Since the unit
vectors n̂, n̂0 are assumed to vary continuously with P , the same is true for (P ) = n̂ · n̂0 .
Thus, is a continuous function on S having only the values +1 or −1. If (P ) , (Q )
for any two distinct points P and Q on S, it follows from the continuity of that = 0
at some point along a curve on S joining P and Q, contradicting the definition of . As a
410 An Introduction to Vectors, Vector Operators and Vector Analysis
result, has same value at all points on S. Thus, any orientation of S is given by either the
unit normal n̂(P ) or n̂0 (P ) = −n̂(P ). If the positive unit normal corresponding to S ∗ is n̂,
the other possible orientation corresponding to −n̂ as its positive unit normal is called −S ∗ .
From Eq. (11.144) we see that
Ω(−S ∗ ) = −Ω(S ∗ ),
where Ω(S ∗ ) = Ω(ξ (P ), η (P )) for some tangent plane π (P ) on S. Thus, the orientation
of the positive normal n̂ to a connected surface S at a single point P uniquely determines
the positive normal at any other point Q and hence determines the orientation of S. All
that we need to do is to continuously carry the positive unit normal at P to Q along a curve
on S joining P and Q, so that it coincides with the positive unit normal at Q to S. There
are connected surfaces on which a positive unit normal at a point cannot be transported
along a curve on the surface, to coincide with the positive unit normal at some other point
on the surface. Such a surface cannot be assigned any orientation and is not orientable. The
Mobius strip is the most celebrated example of a connected surface that is not orientable.
Orientation of a surface S becomes quite simple if it forms the boundary of a region R
in space. Such a surface can be oriented even if it is not connected, as, for example, the
surface forming the boundary of a spherical shell. At each point P on S we can distinguish
an interior normal pointing into R from an exterior normal pointing away from R. Both
these normals vary continuously with P . We can take the exterior normal as the positive
normal to define an orientation of S. We call the resulting oriented surface S ∗ Oriented
positively with respect to R. Thus, for example, for a spherical shell
a ≤ |x| ≤ b
defines a unit normal vector for (u, v ) in γ. If n̂ is the positive unit normal to S ∗ we have
n̂ = ẑ
with = (u, v ) = ±1. By continuity of n̂ and ẑ is continuous which, when coupled with
the fact that = ±1, would mean that is constant on every connected component of γ.
For = 1, that is, for
Ω(S ∗ ) = Ω(xu , xv ),
Vector Integration 411
Ω(S ∗ ) = Ω(u, v ).
If the same part of S ∗ has another regular parametric representation in terms of parameters
ū, v̄ varying over the region γ̄, we have, by Eq. (10.65),
!
d (y, z ) d (z, x ) d (x, y )
xu × xv = , , ,
d (u, v ) d (u, v ) d (u, v )
or,
d (ū, v̄ )
xu × xv = (x̄u × x̄v ).
d (u, v )
Thus, the unit normals ẑ and z̄ˆ for the two parametric representations are related by
!
d (ū, v̄ )
ẑ = sgn z̄ˆ .
d (u, v )
d (ū, v̄ )
> 0.
d (u, v )
As an illustration, we consider the unit sphere S ∗ with center at the origin, oriented
positively with respect to its interior. With u = x and v = y as parameters for z , 0, we
have,
√
x = (u, v, 1 − u 2 − v 2 ), where = sgn z.
The assumption a < 1 keeps the surface from intersecting itself. The resulting strip has the
parametric representation
u
x = 1 + v sin cos u,
2
u u
1 + v sin sin u, v cos
2 2
with v restricted to −a < v < a. The points (u, v ), (u + 4π, v ), (u + 2π, −v ) in the u, v
plane correspond to the same point on the surface. Making a definite choice of parameters
u0 , v0 for an arbitrary point P on the surface, Eq. (11.147) gives a regular local parametric
representation of S for (u, v ) ∈ γ given by
Along the center line v = 0 on the surface, Eq. (11.146) defines a unit normal vector
u u u
ẑ = cos u cos , sin u cos , − sin
2 2 2
that varies continuously with u. Starting out with the unit normal ẑ = (1, 0, 0) at the point
(1, 0, 0) of S corresponding to u = 0 and letting u increase from 0 to 2π, we complete a
circuit along the center line of the surface, returning to the same point but with the opposite
unit normal ẑ = (−1, 0, 0). We find similarly that carrying a small oriented tangential curve
along the circuit, we return to the same point with its orientation reversed. Thus, it is not
possible to choose a continuously varying unit normal, or a side of S in a consistent way.
In other words, the Mobius strip is not orientable.
Vector Integration 413
Exercise Let S be the mobius strip with the parametric representation given by
Eq. (11.147). (a) Show that the line v = a/2 divides S into an orientable and a
non-orientable set. (b) Show that the line v = 0 does not divide S, that is, the set S1
obtained by removing all points with v = 0 from S is still connected. (c) Show that S1 is
orientable.
Solution
(a) The line v = a/2 divides S into a part S 0 a/2 < v < a (or, equivalently, −a < v <
−a/2) and oriented by ξ = xu , η = xv and a part S 00 given by −a/2 < v < a/2
which is just another Mobius strip.
(b) S1 is representable by Ω(ξ (P ), η (P )) with v restricted to the interval 0 < v < a,
where P can be varied continuously over S1 . Obviously, any two points on S1 can be
joined by a curve on S1 which is the image of the corresponding points (u, v ) in the
parameter plane.
(c) S1 is oriented by ξ = xu , η = xv .
Summary
We now summarize the relevant points covered in this section. First, an orientable surface
S has two possible orientations which are given by the two possible normals to the surface
at some point P on it. These are obtained by choosing two non-collinear (linearly
independent) vectors ξ and η based at P , spanning the tangent plane to S at P . We form
two possible ordered pairs (ξ, η ) and (η, ξ ) to define two possible unit normals to the
tangent plane or to the surface S at P :
ξ ×η η ×ξ
n̂ = ; −n̂ = .
|ξ × η| |η × ξ|
414 An Introduction to Vectors, Vector Operators and Vector Analysis
We denote the corresponding oriented surface by S ∗ . From the above equations it is clear
that the triplet (ξ, η, n̂) is positively oriented, while the triplet (ξ, η, −n̂) is negatively
oriented. In order to decide which of these unit normals define the positive orientation of
S ∗ , we first orient the 3-D space containing S ∗ as follows. We choose a coordinate system
defined by the orthonormal basis (ê1 , ê2 , ê3 ) to resolve the vectors in some region of space
R and say that R is positively oriented if this coordinate system is right handed and
negatively oriented if it is left handed. If the coordinate system is right handed, it has the
same orientation as the triplet (ξ, η, n̂) and we say that n̂ defines the positive orientation
of S ∗ and −n̂ defines its negative orientation. On the other hand, if the coordinate system
is left handed, it has the same orientation as (ξ, η, −n̂) and we say that −n̂ defines the
positive orientation of S ∗ and n̂ defines its negative orientation. In general, a unit vector n̂
defining the positive orientation of S ∗ is said to be on the positive side of S ∗ . For a closed
surface S at the boundary of a region R we choose the coordinate system such that the
unit normal defining the positive orientation of S ∗ is its outward normal.
If a surface S is parameterized by a C 1 function x = x(u, v ), we can replace the pair
(ξ, η ) by the pair of tangent vectors (xu , xv ) to define the unit vector, via their vector
product, giving the orientation of S ∗ . If this unit vector ẑ defines the positive orientation of
S ∗ , we say that S ∗ is positively oriented with respect to the parameters u, v.
Now, consider an oriented surface S ∗ with an oriented and closed boundary curve C ∗ .
Let the unit normal vector n̂ at point P on S decide the orientation of S ∗ . We drop a
perpendicular from P to the plane containing the curve C ∗ to meet this plane at point O.
Let P1 and P2 be the points on C ∗ such that traversing C ∗ from P1 toward P2 is in the same
sense defining the orientation of C ∗ . Then C ∗ is positively oriented with respect to S ∗ if the
−−→ −−→
triplet ( OP 1 , OP 2 , n̂) is positively oriented. Further, we say that S ∗ is positively oriented
with respect to the x, y axes if the triplet (ê1 , ê2 , n̂) is positively oriented.
is the one given by the limit of the Riemann sum (positive for positive f ) when the
orientation of I ∗ corresponds to the sense of increasing x, that is, for a < b. Interchanging
the end points of I ∗ converts I ∗ into the interval −I ∗ , with opposite orientation, so that
Eq. (11.147) can also be written as
Z Z
f (x )dx = − f (x )dx. (11.148)
−I ∗ I∗
A similar situation prevails regarding the integral over an oriented region R∗ in the x, y
plane. When R∗ is oriented positively with respect to the ê1 , ê2 basis defining the
coordinate system, Ω(R∗ ) = Ω(ê1 , ê2 ), the differential area dxdy is positive and the
double integral
Z Z
f (x, y )dxdy
R∗
is the limit of the Riemann sums obtained from the subdivisions of the plane into squares
of area 2−2n . The integral has a non-negative value for a non-negative f . In case Ω(R∗ ) =
−Ω(ê1 , ê2 ) = Ω(ê2 , ê1 ) resulting in a negative value for the differential area dydx we get
Z Z Z Z
f dxdy = − f dydx,
R∗ R∗
where the integral on the right has the usual meaning as the limit of sums. Thus, we have
the rule that
Z Z Z Z
f dxdy = − f dxdy,
−R∗ R∗
where −R∗ is obtained by changing the orientation of R∗ . The substitution formula given
by Eq. (11.99) becomes, for the oriented region R∗ ,
Z Z Z Z
d (x, y )
f (x, y )dxdy = f (x (u, v ), y (u, v )) dudv,
R ∗ T ∗ d (u, v )
for smooth 1 − 1 mappings
x = x (u, v ), y = y (u, v )
416 An Introduction to Vectors, Vector Operators and Vector Analysis
of T ∗ onto R∗ as long as the Jacobian determinant d (x, y )/d (u, v ) has the same sign
throughout T ∗ . The sign given by the orientation of R∗ or that of T ∗ to the corresponding
integrals is determined as follows. The rule is that the orientation of R∗ attributes a
positive sign to dxdy if the x, y coordinate system has the orientation of R∗ and negative
one otherwise. The sign attributed by the orientation of T ∗ to dudv is then the one that
agrees with the relation
d (x, y )
dxdy = dudv.
d (u, v )
Once the proper sign is attached to the differential area dS = dxdy or dT = dudv, the
rest of the integration amounts to the evaluation of the corresponding double integral.
While learning about line integrals, we came across linear differential forms, also
called first order differential forms, which are expressions linear in the differentials
dx, dy, dz. A second order differential form is an expression quadratic in the differentials
dx, dy, dz and has the form
where a, b, c are C 1 functions over their domain. Here, we obtain a general form of the
surface integral of the second order differential form over an oriented surface S ∗ in terms
of the surface integral of functions over the unoriented surface S. We already know that if
S has the parametric representation
Here, the integral is over the region R in the u, v plane corresponding to S The integral is
understood in the sense of a double integral with the surface element
q
dS = ξ 2 + η 2 + ζ 2 dudv
being treated as a positive quantity or, equivalently, R is given the positive orientation with
respect to the u, v system. Orientability of S is not essential for the definition of A.
Exercise Express the total area of the Mobius strip as an integral, using its parametric
representation given by Eq. (11.147).
Vector Integration 417
More generally, for a function f (x) defined on the surface S, we can form the integral of f
over the surface:
Z Z Z Z q
f dS = f ξ 2 + η 2 + ζ 2 dudv. (11.150)
S R
The value of this integral is independent of the particular parametric representation used
for S and does not involve any orientation of S. It is positive for positive f .
In order to relate the integral of a second order differential form over an oriented
surface S ∗ to the surface integrals of functions over the unoriented surface S as defined by
Eq. (11.150), we introduce the direction cosines of the positive normal to S ∗
ξ η ζ
cos α = p , cos β = p , cos γ = p ,
ξ 2 + η2 + ζ2 ξ 2 + η2 + ζ2 ξ 2 + η2 + ζ2
ω = Kdudv
where
ω d (y, z ) d (z, x ) d (x, y )
K= =a +b +c (11.151)
dudv d (u, v ) d (u, v ) d (u, v )
so that
Z Z Z Z
ω = Kdudv
S∗ R∗
Z Z !
d (y, z ) d (z, x ) d (x, y )
= a +b +c dudv. (11.152)
R∗ d (u, v ) d (u, v ) d (u, v )
Exercise Show that the value of this integral of ω over the oriented surface S ∗ is
independent of the particular parametric representation for S ∗ .
From Eqs (11.151) and (11.152) we can write
ω
q
K= = (a cos α + b cos β + c cos γ ) ξ 2 + η 2 + ζ 2 .
dudv
By Eq. (11.152)
Z Z Z Z Z Z
ω= Kdudv = Kdudv.
S∗ R∗ R
418 An Introduction to Vectors, Vector Operators and Vector Analysis
which expresses the integral of the differential form ω over the oriented surface S ∗ as an
integral over the unoriented surface S or over the unoriented region R in the parameter
plane. Note, however, that here the integrand depends on the orientation of S ∗ , since it
involves the direction cosines of the normal n̂ to S ∗ pointing to its positive side. If the
oriented surface S ∗ comprises many parts Si∗ each having a parametric representation x =
x(u, v ) we apply identity Eq. (11.153) to each part and add over different parts to get the
same identity for the integral of ω over the whole surface S ∗ .
The direction cosines of the normal n̂ pointing to the positive side of S ∗ can be identified
with the derivatives of x, y, z in the direction of n̂, so that4
Z Z Z Z !
dx dy dz
ω= a +b +c dS (11.154)
S∗ S dn dn dn
or, in vector notation
Z Z Z Z
ω= v · n̂dS, (11.155)
S∗ S
where n̂ ≡ (cos α, cos β, cos γ ) is the unit normal vector on the positive side of S ∗ and v(x)
is the vector field with components (a(x), b (x), c (x)).
The concept of a surface integral can be interpreted in terms of the 3-D flow of an
incompressible fluid of unit density. Let the vector field v(x) be the velocity field of this
flow. Then at each point of the surface S ∗ the product v · n̂ gives the component of the
velocity of the flow in the direction of the normal n̂ to the surface. The expression v · n̂dS
can then be identified with the amount of fluid that flows across the element of surface dS
from the negative side of S ∗ to the positive side in unit time. Note that this quantity may be
negative. The surface integral in Eq. (11.155) therefore represents the total amount of fluid
flowing across the surface S ∗ from the negative to the positive side in unit time. Note the
fundamental part played by the orientation (distinction between the positive and negative
sides) of S ∗ in the description of the motion of the fluid.
We may also consider the field defined by the integrand of Eq. (11.155) as the field of
force F(x). The direction of the vector F then gives the direction of the lines of force and its
4 We have, dx = n̂ · ∇x = [cos α cos β cos γ ][1 0 0]T = cos α etc.
dn
Vector Integration 419
magnitude gives the magnitude of the force. The integral in Eq. (11.155) is then interpreted
as the total flux of force across the surface from the negative to the positive side.
where S is a closed surface enclosing volume V . The point P is interior to or on the surface
S. The limit S → P means every point on S approaches P . If this limit exists, the integral
in Eq. (11.156) is independent of S and defines the divergence of f at P . We show that the
limit exists if f can be expanded in Taylor series in the neighborhood of P .
We construct a Cartesian coordinate system (ξ, η, ζ ) with its origin at P . As in
subsection 11.1.1 we expand f(x) with x on the surface S in Taylor series around the
origin 0 at P . We have,
where R is of the order of |x|2 and all the derivatives are evaluated at the origin, that is, at
point P . Therefore, integrating over the surface S we get
Z Z Z Z
f(x) · ds = f(0) · ds + (x · ∇)f(x) · ds + R · ds.
S S S S
We first resolve the vector ds along the basis (î, ĵ, k̂), (see Fig. 11.36)
where the components of ds are the projections of ds on yz, zx and xy planes respectively.
As in subsection 11.1.1 we express (x · ∇)f(x) · ds in terms of the derivatives with respect
to (ξ, η, ζ ) to get
where (Sξ , Sη , Sζ ) are the projections of S on the coordinate planes and the last integral
goes as |x|4 . We shall show later in an exercise that
Z
ds = 0.
S
Further,
Z
ξdsξ = V ,
Sξ
R
since Sξ
ξdsξ gives the volume under the upper part minus that under the lower part (see
subsection 11.4.2). Similarly,
Z Z
ηdsη = V = ζdsζ .
Sη Sζ
R
Moreover, the integrals of the form Sξ
ηdsξ vanish. Everything put together we get
Divide both sides by V , (so that the last term is O (|x|) and goes to zero as |x| → 0), and
take the limit as |x| → 0 to get
Thus, the limit exists and does not depend on S. It depends only on the derivatives of f at
point P .
Vector Integration 421
Exercise Let S be a closed surface and let P be an interior point of S or a point on S. For
a scalar field f and a vector field F show that
Z
1
∇f = lim f ds
S→P V S
and
Z
1
∇ × F = lim ds × F.
S→P V S
over the region R, oriented positively with respect to x, y, z coordinate system. Due to the
assumption made regarding the mesh of straight lines parallel to the axes and the region R,
such a region R can be described by the inequalities
z0 (x, y ) ≤ z ≤ z1 (x, y )
where (x, y ) varies over the projection B of R on the x, y plane. We assume that B has an
area and that the functions z0 (x, y ) and z1 (x, y ) are C 1 in B. We can express the volume
integral over R as the succession of integrals
Z Z Z Z Z Z z1
f dxdydz = dxdy f dz.
R B z0
Here, f = ∂c/∂z so that the integral over z can be carried out, giving
Z z1
∂c
= c (x, y, z1 ) − c (x, y, z0 ) = c1 − c0 ,
z0 ∂z
so that,
Z Z Z Z Z Z Z
∂c
dxdydz = c1 dxdy − c0 dxdy.
R ∂z B B
422 An Introduction to Vectors, Vector Operators and Vector Analysis
If we assume that the boundary surface S is positively oriented with respect to the region
R, then the part of the oriented boundary surface S ∗ comprising points of entry
z = z0 (x, y ) has a negative orientation with respect to x, y coordinates when projected on
the x, y plane. On the other hand the part z = z1 (x, y ) consisting of points of exit has a
positive orientation. To understand this, note that the triplets (ê1 , ê2 , n̂), one with n̂ at the
entry point has negative orientation and the one with n̂ at the exit point has positive
orientation (see the summary in section 11.7). Hence, the last two integrals combine to
form the integral
Z Z
c (x, y, z )dxdy
S∗
which is known as Gauss’s theorem, or divergence theorem. Using Eq. (11.153) we can write
this in the form
Z Z Z Z Z
[ax + by + cz ]dxdydz = (a cos α + b cos β + c cos γ )dS
R S
Z Z !
dx dy dz
= a +b +c dS, (11.159)
S dn dn dn
where, α, β, γ are the angles made by the outward normal n̂ with the positive coordinate
axes, corresponding to the positive orientation of S ∗ with respect to R.
Vector Integration 423
We can lift the restriction stated at the beginning, that is the region R can be covered
by a mesh of straight lines with each line intersecting the boundary surface exactly at two
points, if the region R can be divided onto subregions separately satisfying this restriction
and each subregion is bounded by an orientable surface. Then Gauss’s theorem separately
holds for each subregion. Upon adding, on the left we get a triple integral over the whole
region R and on the right some of the surface integrals combine to form the integral over
the oriented surface S, while the others making extra surfaces required to cover each
subregion cancel one another. Assuming that we get the same integral independent of the
way we divide the region R into subregions, this procedure generalizes Gauss’s theorem to
more general regions in space.
Exercise Use Gauss’s theorem to get the volume of a region R bounded by the surface S ∗
oriented positively with respect to R.
Answer
Z Z Z Z Z Z Z Z Z
V = dxdydz = xdydz = zdxdy = ydzdx.
R S∗ S∗ S∗
Hint To get the first equality, for example, put a = 0, b = 0, c = z in Eq. (11.158).
To get the vector form of the divergence theorem, let v be the vector field with component
functions a(x), b (x), c (x). Then, the integrand on the left of Eq. (11.159) is simply the
divergence of this field and the integrand on the right is its component along the outward
normal, so that
Z Z Z Z Z
∇ · vdV = v · n̂dS, (11.160)
R S∗
where f and A are scalar and vector valued functions respectively, da = dan̂ is the vector
differential area and the surface S encloses volume V .
Hint Use ∇ · (f A) = f (∇ · A) + A · (∇f ) and the divergence theorem.
bounded by the surface S then the total mass of fluid that flows across a small area ∆S of
S from interior to exterior of R in unit time is approximately ρv · n̂∆S where v · n̂ is the
component of the velocity v in the direction of the outward normal n̂ at a point on the
surface element defined by ∆S. Thus, the total amount of fluid flowing across the
boundary S of R from inside to outside in unit time is given by the integral
Z Z
ρv · n̂dS
S
over the whole boundary S. By Gauss’s theorem, the amount of fluid leaving R in unit time
through the boundary is
Z Z Z
∇ · (ρv)dxdydz.
R
By the law of conservation of mass, in the absence of sources or sinks of mass in R, the
amount of mass of fluid leaving R through surface S must be exactly equal to the loss of
mass of fluid contained in R. We must then have,
Z Z Z Z Z Z
∂ρ
∇ · (ρv)dxdydz = − dxdydz
R R ∂t
at any time t for any region R. Dividing both sides of this identity by the volume of R and
taking the limit as the size of R goes to zero, (as we did in the 2-D case), we get the three
dimensional continuity equation:
∂ρ
∇ · (ρv) + = 0,
∂t
or,
∂ρ ∂(ρu ) ∂(ρv ) ∂(ρw )
+ + + =0
∂t ∂x ∂y ∂z
where u (x), v (x), w (x) are the components of v(x). The continuity equation expresses the
law of conservation of mass for the motion of fluids.
Vector Integration 425
∂ρ
∇ · (ρv) +
∂t
measures the amount of mass created (or annihilated if negative) in unit time per unit
volume.
Of particular interest is the case of a homogeneous and incompressible fluid, for which
the density is constant both in space and time. For such a constant ρ, we deduce from the
continuity equation that,
where, on both S1 and S2 , n̂ denotes the normal pointing away from R. We can make both
S1 and S2 into oriented surfaces S1∗ and S2∗ in such a way that the orientation of C ∗ is positive
with respect to both S1∗ and S2∗ . On both these surfaces, let n̂∗ be the unit normal pointing
to the positive side. For a right handed orientation of space, this implies that n̂∗ points to
that side of the surface from which the orientation of C ∗ appears counterclockwise. Then,
necessarily, n̂∗ = n̂ on one of the surfaces S1 , S2 and n̂∗ = −n̂ on the other. It follows from
the last equation that
Z Z Z Z
∗
ρv · n̂ dS = ρv · n̂∗ dS.
S1 S2
In words, if the fluid is incompressible and homogeneous and mass is conserved, then the
same amount of fluid flows across any two surfaces with the same boundary curve C ∗ that
together bound a three dimensional region in space. This amount of fluid does not
depend on the precise form of the surfaces, it is plausible that it must be determined by the
boundary curve C ∗ alone. We will answer this question in the next subsection by means of
stokes theorem.
426 An Introduction to Vectors, Vector Operators and Vector Analysis
We can express this by saying that the forces in a fluid due to a pressure p (x) may, on the
one hand, be regarded as surface forces (pressure) that act with density p (x) perpendicular
to each surface element through the point (x) and on the other hand, as space forces, that
is, the forces that act on every element of volume with volume density −∇p.
Consider a fluid in equilibrium under the joint action of forces due to pressure and
gravity. Then, the force F due to pressure must balance the total attractive force G on the
fluid contained in R:
F + G = 0.
If the gravitational force acting on a unit mass at the point x is given by the vector g(x), we
have,
Z Z Z
G= g(x)ρ (x)dxdydz.
R
Vector Integration 427
From equation F + G = 0, valid for any portion R of the fluid, we conclude, as we did
previously while deriving continuity equations, that the corresponding relation holds for
the integrands, that is, that at each point of the fluid the equation
−∇p + ρg = 0 (11.163)
applies. Since the gradient of a scalar φ is perpendicular to the level surfaces of the scalar
(given by φ = constant), we conclude that for a fluid in equilibrium under pressure and
gravity, the gravitational force at each point of a surface of constant pressure p (isobaric
surface) is perpendicular to the surface. If we costumerily assume that the force of
gravitational force per unit mass near the surface of the earth is given by g = (0, 0, −g )
where g is the (constant) gravitational acceleration, we find from Eq. (11.163) that
px = 0, py = 0, pz = −gρ. (11.164)
0 = dp = px dx + py dy + pz dz = −gρdz,
Therefore, at the depth z0 − z = h the pressure has the value gρh. For a solid partly or
wholly immersed in the liquid, let R denote the portion of the solid lying below the free
surface z = z0 . We find from Eqs (11.162) and (11.164) that the resultant of the pressure
forces acting on the solid equals the buoyancy force with components
Z Z Z
Fx = 0, Fy = 0, Fz = gρdxdydz.
R
This force is directed vertically upward and its magnitude equals the weight of the displaced
liquid (Archimedes’ principle).
R
(1) Show that S
ds = 0 over a closed surface.
Solution Let a be an arbitrary constant vector. Then, by the divergence theorem,
Z Z Z
a · ds = a · ds = ∇ · adτ.
S S V
R
Since a is constant, ∇ · a = 0, so that a · S
ds = 0. Since a is arbitrary, it follows that
R
S
ds = 0.
Z Z Z
1 1
= x · ds = ∇ · xdτ = dτ = V .
3 S 3 V V
Further,
∇ · f a = f ∇ · a + a · ∇f .
The first term on RHS is zero as a is a constant vector. Therefore, integrating we get
Z Z
∇ · f adτ = a · ∇f dτ.
V V
Vector Integration 429
Thus,
"Z Z #
a· f n̂ds − ∇f dτ = 0.
S V
Since a is arbitrary, the second factor in the dot product must vanish, proving
Eq. (11.165).
Solution Let a be an arbitrary constant vector and apply the divergence theorem to
the vector F × a. We get,
Z Z
n̂ · F × ads = ∇ · (F × a)dτ,
S V
or,
Z Z Z
a · n̂ × Fds = (a · ∇ × F − F · ∇ × a)dτ = a · ∇ × Fdτ.
S V V
Taking a· out of these integrals and collecting all the terms on one side we get the
result.
We apply Eq. (11.170) to a wedge shaped region R̃ described by inequalities of the form
The boundary S of R̃ consists of six faces along each of which one of the coordinates r, θ, φ
has constant value. Applying the formula for transformation of triple integrals we write the
left side of Eq. (11.170) as
Z Z Z Z Z Z
d (x, y, z )
∆U dxdydz = ∆U drdθdφ
R R̃ d (r, θ, φ
Z Z Z
= ∆U r 2 sin θdrdθdφ. (11.171)
R̃
Vector Integration 431
In order to transform the surface integral in Eq. (11.170) we introduce the position vector
xr · xθ = 0, xθ · xφ = 0, xφ · xr = 0
xr · xr = 1, xθ · xθ = r 2 , xφ · xφ = r 2 sin2 θ. (11.172)
Thus, at each point the vector xr is normal to the coordinate surface r = constant passing
through that point, the vector xθ normal to the surface θ = constant and the vector xφ
normal to the surface φ = constant. (In other words, the unit vectors in the direction
of these vectors form the r̂, θ̂, φ̂ basis at that point). More precisely, on one of the faces
r = constant = rk , k = 1, 2 of the region R̃ defined above, the outward normal unit
vector n̂ is given by (−1)k xr . Hence, on these faces
∂U
∇U · n̂ = (−1)k ∇U · xr = (−1)k .
∂r
Using θ, φ as parameters on the face r = rk , we get, for the element of area (see
section 10.12)
√ q
dS = EG − F 2 dθdφ = (xθ · xθ )(xφ · xφ ) − (xθ · xφ )dθdφ = r 2 sin θdθdφ.
Thus, the contribution of the two faces r = r1 and r = r2 to the integral of dU /dn over S
is represented by the expression
Z Z Z Z
2 ∂U ∂U
r sin θ dθdφ − r 2 sin θ dθdφ,
r = r2 ∂r r = r1 ∂r
1 dU (−1)k ∂U
n̂ = (−1)k xθ , dS = r sin θdφdr, =
r dn r ∂θ
432 An Introduction to Vectors, Vector Operators and Vector Analysis
and on a face
φ = constant = φk
1 dU (−1)k ∂U
n̂ = (−1)k xφ , dS = rdrdθ, =
r sin θ dn r sin θ ∂φ.
Comparing with Eq. (11.171) dividing with the volume of the wedge R̃ and taking the limit
as this volume tends to zero, we can equate the corresponding integrands to get the desired
expression for the Laplace operator in the spherical coordinates:
" ! ! !#
1 ∂ 2 ∂U ∂ ∂U ∂ 1 ∂U
∆U = 2 r sin θ + sin θ + . (11.173)
r sin θ ∂r ∂r ∂θ ∂θ ∂φ sin θ ∂φ
∇ · f(r) = d (r),
Since the divergence of curl is always zero, the second of the above equations gives
∇ · c = 0. (11.175)
The question we are interested in is this: knowing the functions d (r) and c(r), can we use
Eqs (11.174) and (11.175) to uniquely specify the field f(r)? The answer is yes, provided
d (r) and c(r) tend to zero faster than 1/r 2 as r → ∞. It turns out that
f = −∇u + ∇ × w, (11.176)
where
d (r0 ) 0
Z
1
u (r) = dτ , (11.177)
4π γ
Vector Integration 433
and
(r0 ) 0
Z
1
w(r) = c dτ , (11.178)
4π γ
where the integrals are over all space, dτ 0 is the differential volume element and
γ = |r − r0 |. If f is given by Eq. (11.176), then its divergence is given by, (since divergence
of curl is zero), (see Appendix),
Z ! Z
2 1 2 1
∇ · f = −∇ u = − d∇ dτ = d (r0 )δ3 (r − r0 )dτ 0 = d (r).
0
4π γ
Regarding the curl of the field, we have, since the curl of a gradient is zero,
Thus, we need to show that the ∇(∇ · w) vanishes. Using integration by parts, Eq. (11.161),
and noting that the derivatives of γ with respect to primed coordinates differ by a sign from
those with respect to unprimed coordinates, we get
Z ! Z !
1 0 0 1
4π∇ · w = c·∇ dτ = − c · ∇ dτ 0
γ γ
Z Z
1 0 1
= ∇ · cdτ − c · da. (11.180)
γ γ
However, the divergence of c is zero, by Eq. (11.175) and the surface integral vanishes as
γ → ∞ as long as c(r) goes to zero sufficiently rapidly. The rate of divergence of d (r)
and c(r) as r → ∞ is important for the convergence of the integrals in Eqs (11.177) and
(11.178). In the large r 0 limit, where γ ≈ r 0 , the integrals are of the form
Z∞ Z∞
X (r 0 ) 02 0
r dr = r 0 X (r 0 )dr 0 ,
r0
where X stands for d or c as the case may be. If X ∼ 1/r 0 the integrand is constant so that
the integral blows up or if X ∼ 1/r 02 the integral is a logarithm and blows up. Evidently,
the divergence and the curl of f must vanish more rapidly than 1/r 02 as r 0 → ∞ for the
above proof to hold.
434 An Introduction to Vectors, Vector Operators and Vector Analysis
Assuming that the required conditions on d (r) and c(r) are satisfied, is the solution
(11.176) unique? Not in general, because we can add to f any vector function with
vanishing divergence and curl to get the same solution. However, it turns out that there is
no function with vanishing divergence and curl everywhere and goes to zero at infinity.
So, if we include the requirement that f(r) → 0 as r → ∞ then solution (11.176) is
unique. For example, generally we do expect the electromagnetic fields to go to zero far
away from the charge and current distributions which produce them.
We can thus state the all-important Helmoltz theorem rigorously as follows.
If the divergence d (r) and the curl c(r) of a vector field f(r) are specified and if they
both go to zero faster than 1/r 2 as r → ∞, and if f(r) goes to zero as r → ∞, then f is
given uniquely by Eq. (11.176).
From Helmoltz theorem it follows that a vector field with vanishing curl is derivable
from a scalar potential, while a field with vanishing divergence can be expressed as the
curl of some other vector field. For example, in electrostatics, ∇ · E = ρ/0 where ρ is the
given charge distribution and ∇ × E = 0, so
B(r) = ∇ × A,
Stokes theorem can be made plausible by using the fact that it is true for plane surfaces.
If S is a polyhedral surface composed of plane polygonal surfaces, so that the boundary
curve C is a polygon, we can apply Stokes theorem to each of the plane surfaces and add
the corresponding contributions. Then, the line integrals along all the interior edges of the
polyhedron cancel and we obtain the stokes theorem for the polyhedral surface. In order
to prove the general statement of the Stokes theorem, we only have to pass to the limit,
leading from approximate polyhedra to arbitrary surfaces S bounded by arbitrary curves C.
The regorous validation of this passage to the limit could be cumbersome so the proof is
generally carried out by transforming the whole surface S into a plane surface and proving
that the theorem is preserved under such transformations. We omit the details of this proof
and assume the theorem.
We can now settle the question asked in the discussion regarding the incompressible
and homogeneous fluid in section 11.10. Since the fluid is incompressible, its divergence is
everywhere zero so by Helmoltz theorem it must be given by the curl of some vector field.
Now by applying Stokes theorem we can write
Z Z Z Z Z
v · n̂dS = (∇ × A) · n̂dS = A · t̂ds.
S S C
Thus, the total amount of fluid passing through any two surfaces with the same boundary
curve C is determined by the curve C alone.
Exercise Show that the arguments leading to Eqs (10.105) and (10.107) can be extended
to prove the divergence theorem and Stokes theorem respectively (see Griffiths [9]).
Compare with our proofs of these theorems.
The following two exercises give two fundamental results based on the Helmoltz theorem
and the stokes theorem.
Exercise Curl-less or irrotational fields. Let F be a vector field. Show that the following
conditions are equivalent.
(a) ∇ × F = 0 everywhere.
(b) F is the gradient of some scalar, F = −∇V (x).
436 An Introduction to Vectors, Vector Operators and Vector Analysis
Rb
(c) a
F · dx is independent of path for any given end points and depend only on the end
points in a simply connected region.
H
(d) F · dx = 0 for any closed loop.
Solution (a) ⇒ (b) : By Helmoltz theorem. (b) ⇒ (c) is proved in section 11.1. (c) ⇒ (d):
Take any two distinct points P1 and P2 on the closed loop. Then,
I Z P2 Z P1 Z P2 Z P2
F · dx = F · dx + F · dx = F · dx − F · dx = 0.
P1 P2 P1 P1
(a) ∇ · F = 0 everywhere.
(b) F = ∇ × A for some vector field A.
RR
(c) S
F · n̂dS is independent of surface, for any given boundary curve, being equal to
the integral of A along the boundary curve in the positive sense with respect to the
surface.
RR
(d) S
F · n̂dS = 0 for any closed surface.
Solution (a) ⇒ (b) : By Helmoltz theorem. By Eq. (11.177), u (r) = 0 in Eq. (11.176).
(b) ⇒ (c) : By Stokes theorem the integral over the surface reduces to that over the
boundary curve. (c) ⇒ (d): View the closed surface as two surfaces with common
boundary curve. The integral in (c) reduces to the integrals over the boundary curve in
the opposite sense for the two surfaces, because the positive orientation of the boundary
curve with respect to two surfaces are opposite, so that these integrals cancel each other.
(d) ⇒ (a): By divergence theorem, condition (d) means the volume integral of ∇ · F over
the region R enclosed by the surface vanishes. Since this is true for any closed surface
enclosing any region R, we can divide by the volume of R and take the limit as this
volume tends to zero to yield condition (a).
taken over an oriented closed curve C ∗ the circulation of the flow along this curve. Stokes
theorem states that the circulation along C ∗ equals the integral
Z Z
(∇ × v) · n̂dS,
S
where S is any orientable surface bounded by C and n̂ is the unit normal to S making it the
oriented surface S ∗ such that the curve C ∗ is oriented positively with respect to S ∗ . Suppose
we divide the circulation around C by the area of the surface S bounded by C and pass to
the limit by making C shrink to a point while remaining the boundary of the surface. For
the surface integral of the normal component of curl v divided by the area, this limit gives
the value (∇ × v) · n̂ at the limit point. Thus, we can regard the component of curl v in the
direction of the surface normal n̂ as the circulation density of the flow across the surface at
the corresponding point.
The vector curl v is called the vorticity of the fluid motion. Therefore, the circulation
around a curve C equals the integral of the normal component of the vorticity over a surface
bounded by C. The motion is called irrotational if the vorticity vector vanishes at every
point occupied by the fluid, that is, if the vorticity vector satisfies the relations
As a result of Stokes theorem, the circulation in an irrotational motion vanishes along any
curve C that bounds a surface contained in the region filled with the fluid.
By the above exercise we know that an irrotational vector field is also conservative.
That is,
∇ × v = 0 implies v = ∇φ.
Thus, the velocity field of an irrotational fluid flow in a simply connected region implies the
existance of a velocity potential φ(x) satisfying
v(x) = ∇φ(x).
∇ · v = 0.
simple closed curve and S is either a surface with Γ as its boundary or a closed surface
enclosing the interior with volume V . n̂ is the outward normal to S.
F = zî + xĵ + y k̂
p
where Γ is the unit circle in the xy plane bounding the hemisphere z = 1 − x2 − y 2 .
Solution
Z Z Z Z
F(x) · dx = zdx + xdy + ydz.
Γ Γ Γ Γ
On Γ z = 0 = dz so that
Z Z
F(x) · dx = xdy = π.
Γ Γ
Now
∇ × F = î + ĵ + k̂
so that
Z Z Z Z
(∇ × F) · ds = î · ds + ĵ · ds + k̂ · ds = π
S S S S
because
Z Z
î · ds = 0 = ĵ · ds
S S
and
Z
k̂ · ds = π,
S
as the integrals represent the projected areas of the hemisphere on the coordinate
planes. Alternatively, we can express F and ∇ × F in terms of spherical polar
coordinates and integrate over sin θdθdφ, 0 ≤ θ ≤ π/2, 0 ≤ φ < 2π. This
establishes Stoke’s theorem for the given field.
R
(2) Evaluate Γ F · dx for F = (x2 − y 2 )î + xy ĵ and Γ is the arc of y = x3 from (0, 0) to
(2, 8).
Solution First check that the given field is not conservative, so that the integral
depends on the given curve. However, we can evaluate the given integral over the
Vector Integration 439
C = C1 + C2 − Γ
Or,
Z Z Z (2,0) Z (2,8)
F · dx = − n̂ · ∇ × Fds + F · dx + F · dx.
Γ S (0,0) (2,0)
Z 2Z x3 Z 2 Z 8
2 824
= − 3ydydx + x dx + 2ydy = ,
0 0 0 0 21
where we have evaluated the first integral as repeated single integrals.
(3) Let F(x) = 0 at every point on a surface S. Show that ∇ × F is tangent to S at every
point on it.
440 An Introduction to Vectors, Vector Operators and Vector Analysis
as ∇ × a = 0. This gives
"Z Z #
a· f (x)dx − (n̂ × ∇f )ds = 0.
Γ S
Since a is arbitrary, the second factor in the dot product must vanish, proving
Eq. (11.184).
(5) Show that
Z Z
dx × F = (n̂ × ∇) × Fds (11.185)
Γ S
By Stoke’s theorem,
Z Z
a × F · dx = n̂ · ∇ × (a × F)ds
Γ S
Z
= n̂ · [a(∇ · F) − (a · ∇)F]ds
S
Z
= a· [(∇ · F)n̂ − ∇(F · n̂)]ds
S
Z
= −a · (n̂ × ∇) × Fds.
S
All these steps can be proved using Levi-Civita symbols and noting that ∇ does not
operate on n̂. Last two equations give
"Z Z #
a· dx × F − (n̂ × ∇) × Fds = 0.
Γ S
Since a is arbitrary, the second factor of the dot product must vanish, proving
Eq. (11.185).
R
(6) Show that Γ dx × x , where Γ is a closed curve in the xy plane, is twice the enclosed
area A.
Solution In Eq. (11.185) we replace n̂ by k̂ which is the vector normal to xy plane
and F by x. We get,
Z Z
dx × x = (k̂ × ∇) × xds
Γ S
Z
= [∇(k̂ · x) − k̂(∇ · x)]ds
S
Z Z
= (k̂ − 3k̂)ds = −2k̂ ds = −2k̂A,
S S
442 An Introduction to Vectors, Vector Operators and Vector Analysis
where the second equality can be proved using Levi-Civita symbols and noting that
k̂ is a constant vector. Thus,
Z
dx × x = | − 2k̂A| = 2A.
Γ
(7) For two scalar fields f (x) and g (x) show that
Z Z
f (x)∇g (x) · dx + g (x)∇f (x) · dx = 0. (11.186)
Γ Γ
Z Z
= n̂ · (∇f × ∇g )ds + f n̂ · (∇ × ∇g )ds
S S
Z
= n̂ · (∇f × ∇g )ds,
S
because ∇ × ∇g = 0. Similarly,
Z Z
g∇f · dx = − n̂ · (∇f × ∇g )ds.
Γ S
and
Z Z
n̂ · ∇ × Fds2 = − F · dx.
S2 Γ
The sign is reversed while transforming the second integral because the positive
directions around the boundaries of the two surfaces are opposite. Hence,
Z
∇ · ∇ × Fdτ = 0.
V
Since this equation holds for all volume elements it follows that ∇ · ∇ × F = 0.
(9) For a closed surface S, show that
Z
(i) n̂ · (∇ × F)ds = 0
S
and
Z
(ii) n̂ × ∇f ds = 0.
S
Hints (i) Divide S into two parts, S1 and S2 , with the common boundary curve Γ
(see Fig. 11.37) and write
Z Z Z
n̂ · (∇ × F)ds = n̂ · (∇ × F)ds1 + n̂ · (∇ × F)ds2 .
S S1 S2
Now, apply Stoke’s theorem to both the terms on RHS keeping in mind that the
positive sense of traversing Γ as the boundary of S1 is opposite to that traversing Γ as
the boundary of S2 . This makes the two terms on RHS cancel after applying Stoke’s
theorem and the result follows.
(ii) Following the hint for part (i) we can write
Z Z Z
n̂ × ∇f ds = n̂ × ∇f ds1 + n̂ × ∇f ds2 .
S S1 S2
Now use Eq. (11.184) for the two terms on RHS and then follow rest of the hint for
part (i).
(10) Show that (with Γ as the boundary of S)
Z Z
f F · dx = n̂ · (∇f × F + f ∇ × F)ds.
Γ S
(11) If F is continuous and F × dx = 0 for every closed curve, show that F is constant.
12
and solve for Ω. Note that the time dependence must reside in the operator ΩS. Obviously,
solves Eq. (12.1) giving ẋ = Ṡx0 where Ṡ is the time derivative of the operator S (t ). Note
that we are using the same symbol for the operator and its matrix. This is justified because
they are isomorphic. Differentiating SS T = I we get
ṠS T + S Ṡ T = 0. (12.3)
Here, we have used the fact that the operations of transpose and differentiation commute.
Equations (12.2) and (12.3) lead to
Ωx = ω × x ∀x ∈ R3 .
0 −ω3 ω2
ω3
0 −ω 1 .
−ω2 ω1 0
x = eθ n̂× x0
x = eA x0 (12.5)
∂ λA −λA
Ω(λ) = e e
∂t
where λ is a parameter independent of t. We have Ω(0) = 0 and
∂Ω(λ)
= Ȧ + [A, Ω(λ)].
∂λ
446 An Introduction to Vectors, Vector Operators and Vector Analysis
∂2 Ω(λ) ∂Ω(λ)
2
= [A, ] = [A, Ȧ] + [A, Ω(λ)],
∂λ ∂λ
∂3 Ω(λ)
= [A, [A, Ȧ]] + [A, Ω(λ)]
∂λ3
and proceeding iteratively, we get, for the nth derivative
∂n Ω(λ)
= [A, [A, [A, [· · · [A, Ȧ] · · · ] + [A, Ω(λ)].
∂λn | {z }
(n−1)factors
Expanding Ω(λ) in Taylor series in λ about λ = 0 and using the above derivatives we get
λ2 λ3
Ω(λ) = λȦ + [A, Ȧ] + [A, [A, Ȧ]] + · · ·
2! 3!
and Eq. (12.7) follows with λ = 1.
We must now evaluate the commutator of two skewsymmetric matrices. This is also a
skewsymmetric matrix. Further, given a vector x ∈ R3 and A, B skewsymmetric operators,
Ax = a×x and Bx = b×x implies [A, B]x = (a×b) ×x, as you can check. Thus, Eq. (12.7)
can be written as (remember A = θ n̂×)
∞ ∞
X [(θ×)m × θ̇ ] X [θ m+1 (n̂×)m × n̂˙ ]
Ωx = ω × x = θx = × x + θ̇ n̂ × x (12.8)
m=0
(m + 1) ! (m + 1) !
m=0
Using this equation in Eq. (12.8) and collecting the coefficients of n̂ × n̂˙ and n̂θ (n̂θ n̂˙ )
we get
ω = θ̇ n̂ + (1 − cos θ )n̂ × n̂˙ − sin θ [n̂ × (n̂ × n̂˙ )],
or,
ω = θ̇ n̂ + (1 − cos θ )n̂ × n̂˙ + sin θ n̂.
˙ (12.9)
We see that ω , θ̇ unless n̂˙ = 0. Thus, it is more appropriate to call ω ‘rotational velocity’
rather than ‘angular velocity’ of the body.
Odds and Ends 447
Now apply Eq. (12.9) to get the rotational velocity of the blade,
+ω1 (1 − cos ω2 t )(cos ω1 t n̂1 × n̂3 + sin ω1 t n̂2 × n̂3 ) + ω1 sin ω2 t n̂3
where we have put î1 = cos ω1 t n̂1 + sin ω1 t n̂2 , used Eq. (12.10) and the fact that
n̂1 , n̂2 , n̂3 form a right handed system.
Exercise Suppose a rigid body is rotating in space and you know its instantaneous
rotational velocity ω. This does not mean that you know the instantaneous axis of
rotation, because ω specifies only its direction, which corresponds to a continuum of
parallel lines in space. Obtain the equation of the instantaneous axis of rotation in terms
of ω and the instantaneous (inertial) position and velocity vectors of a particle in the rigid
body, which is not on the instantaneous axis of rotation.
Solution [20] In what follows we refer to Fig. 12.2 and use symbols and the quantities
specified in this figure, without defining them in the text, as they are self explanatory. Thus,
you have to read the solution jointly with Fig. 12.2.
Let r and v denote the position and velocity of the particle as specified in the problem. As
all of the velocity v is taken to be rotational, it follows that
ω × Rc = v.
ω × (ω × Rc ) = ω × v.
ω (ω · Rc ) − Rc (ω · ω ) = ω × v.
Odds and Ends 449
−(ω × v)
Rc = .
|ω|2
Now
λω λω (ω × v)
r0 = Rc + = − ,
|ω| |ω| |ω|2
where λ is a scalar parameter with the dimensions of length. Hence, we have, with R =
r − r0 ,
λω (ω × v)
R = r− + .
|ω| |ω|2
As λ varies, the locus of the tip of this vector generates the line of the instantaneous axis of
rotation for a point moving with velocity v and position vector r in the rigid body. Since the
instantaneous rotational velocity ω is common to the whole rigid body, the instantaneous
axis of rotation we have found is also common to all points in the body.
f(0) = 0.
The first term on RHS vanishes because origin is an equilibrium point, while the second
term is linear in r, that is,
Third and further terms are of higher order of smallness and can be neglected. Stability of
the equilibrium point is ensured if we impose r · r̈ < 0, or, equivalently, we use the
following stability condition,
r · ∇f(0) ≤ 0,
mr̈ = r · ∇f(0).
Since this is a second order equation, it has two linearly independent solutions (that is,
they are not proportional to each other) say r1 = r1 (t ) and r2 = r2 (t ). Since it is linear,
any linear combination of these two linearly independent solutions say rn = αn r1 (t )+
βn r2 (t ) is also a solution. This system, governed by a linear force obeying the stability
condition, is called a harmonic oscillator. This superposition principle makes the analysis
of harmonic oscillator manageable. As we will see below, if the force satisfies one
additional requirement of being isotropic, the harmonic oscillator equation can be
integrated to get exact solutions. On the other hand, if we add the third term in the Taylor
series to the equation of motion, the resulting differential equation ceases to be linear. The
mathematical analysis of this so called anharmonic oscillator becomes very difficult and is
generally analysed using perturbation techniques, where the anharmonic term is treated
as a small perturbation to the harmonic one.
Let us now specialize to the case where the force f is not only linear, but also isotropic
or central, that is, it is only a function of the magnitude of r and not of its direction. Thus,
f(r) ≡ (f (r ), 0, 0), expressed in the r̂, θ̂, φ̂ basis. It is straightforward to check that in this
case
df(r)
r · ∇f(0) = −kr r̂ = −kr, where − k = .
dr r=0
k is called the force constant and gives the strength of the isotropic binding force. Note that,
d 2 V (r )
df (r )
if V (r ) is the potential function for the isotropic force, dr = − dr 2 = −k. By
r =0 r =0
d 2 V (r )
Lagrange’s theorem potential V (r ) has a local minimum at r = 0 so that dr 2 > 0,
r =0
making k > 0. This makes −k < 0 and satisfies the stability condition. The force −kr, (k >
0), is commonly called Hooke’s law force after Robert Hooke, who invented it to explain
the elastic force causing oscillations of a spring. However, one has to remember that the
Odds and Ends 451
general form of the Hooke’s law force is given by the second term in the Taylor expansion of
any force field near a stable equilibrium position, thus giving an universal approximation
to any force field having Taylor expansion near a stable equilibrium point. This explains
why Hooke’s law is so ubiquitous in physics and engineering applications. By the same
argument, Hooke’s law is not a fundamental force law, but only a very useful approximation.
Thus, we have to solve the equation
k
r̈ + ω02 r = 0 where ω02 = . (12.11)
m
It turns out that bounded orbits of the attractive central force −kr are closed [3, 10, 19], so
that the motion in the vicinity of a stable equilibrium point under such a force is periodic.
Further, note that the torque exerted by −kr on the particle is −kr × r = 0. Therefore, the
angular momentum of the particle must be conserved. This fixes the angular momentum
vector mr×v in space confining the position vector r and the velocity vector v of the particle
to a plane perpendicular to the angular momentum vector. Thus, the motion under such a
central force is planar.
In order to get two linearly independent solutions of Eq. (12.11) let us choose one of
them as the circular orbit obtained by rotating a vector a+ counterclockwise in the plane
of the orbit through the angle ω0 t in time t about the unit vector n̂ perpendicular to the
plane of the orbit. Using Eq. (6.45) we get,
To construct the other linearly independent solution we take a vector a− in the plane of the
orbit and rotate it clockwise by the angle ω0 t in time t. This amounts to replacing n̂ by −n̂
and a+ by a− in Eq. (12.12). We get,
To get a general solution we add these two linearly independent solutions. We get,
The two constant vectors a0 and b0 can have any values. Therefore, Eq. (12.14) is the
general solution of Eq. (12.11). If either a0 = 0 or b0 = 0, the motion becomes one
dimensional with oscillations along the line of the surviving vector. Thus, the motion
ceases to be planar and the unit vector n̂ is not uniquely defined, but any unit vector
normal to a± will do.
452 An Introduction to Vectors, Vector Operators and Vector Analysis
The vector coefficients a0 and b0 can be expressed in terms of initial conditions, that
is, the values of the position and velocity vectors at t = 0. Putting t = 0 in the expressions
for r(t ) as in Eq. (12.14) and ṙ(t ) obtained by differentiating Eq. (12.14) with respect to t,
we get
r0 = r(0) = a0
v0 = ṙ(0) = ω0 b0 . (12.16)
find the major axis a and the minor axis b from the initial conditions r0 = r(0) and v0 =
ṙ(0). Show that, for 0 < φ0 < π2 ,
v0
a = r0 cos φ0 − sin φ0
ω0
v0
b = r0 sin φ0 + cos φ0
ω0
and that φ0 is given by
2ω0 r0 · v0
tan 2φ0 = .
v20 − ω02 r20
Hint Expand the trigonometric functions in the expression for r(t ) and compare with
Eq. (12.14) to get expressions for a0 and b0 in terms of a and b and invert these equations
to get the result. To get the equation for φ0 use a · b = 0.
We may eliminate φ0 by taking φ0 = ω0 t0 and then shifting the origin in time to t0 . You
may recognize Eq. (12.17) to be the equation to an ellipse parameterized by φ which we
have encountered before (see section 2.4). a and b respectively give the major axis and the
Odds and Ends 453
minor axis of this ellipse (see Fig. 12.3). As we have mentioned above, we can now see that
the elliptic orbit of an isotropic harmonic oscillator is periodic in space that is, the particle
aquires the same position vector r after a fixed period of time T . However, something more
is true. Both the state variables r and ṙ have exactly the same values at any two times
separated by a fixed time interval T called the period of the motion. We express this by
saying that the motion of the isotropic harmonic oscillator is periodic. For the elliptical
motion, the period T is related to the natural frequency of the oscillator ω0 by
ω0 T = 2π.
The motion over a single period is called an oscillation. The constant φ0 is called phase
of an oscillation beginning at t = 0. The maximum displacement from the equilibrium
point during an oscillation is called its amplitude. For the elliptical motion, the amplitude
is A = |a|.
As we have seen, Eq. (12.14) represents the elliptical motion as a superposition of two
uniform circular motions with opposite senses. This is illustrated in Fig. 12.4. As we can
see from the figure, this relation provides a practical way to construct an ellipse from two
circles.
1 2 1 2
Exercise Show that the total energy of the oscillator E = 2 mṙ + 2 kr is constant in
time, and hence a constant of the motion. Show further, that E = 12 k (a2
+ b2 ). In fact
energy is an additive constant of motion, that is, the energy of n > 1 oscillators is the
sum of the energies of individual oscillators. Such additive constants of motion are called
conserved quantities.
Exercise Learn about the damped harmonic oscillator (an oscillator oscillating in a
resistive medium) from a suitable book and try to formulate and solve it using vector
methods. Differentiate between three cases: Light damping, heavy damping and critical
damping.
f(e1 ) = −k1 e1
f(e2 ) = −k2 e2
f(e3 ) = −k3 e3
where k1,2,3 are the positive force constants giving the strength of the binding force along
the three principal directions. Now, the superposition principle tells us that we can resolve
the general motion along the three principal directions. If ri is the component of
displacement along ei , i = 1, 2, 3, then we can write for the equation of motion,
X
mr̈ = mr̈1 + mr̈2 + mr̈3 = −k1 r1 − k2 r2 − k3 r3 = − ki êi .
i
Since ri are orthogonal and êi , i = 1, 2, 3 do not change with time, each component must
independently satisfy
mr̈i = −ki ri , i = 1, 2, 3,
whose solutions must be of the same form as those for the isotropic oscillator restricted to
one dimensional motion. Thus, the general solution to the anisotropic oscillator is
The vector hvi(t ) is the average velocity of the ball. Note that hvi(t ) and r(t ) have the
same direction. Comparing Eqs (12.19) and (12.21), we get a simple relation between the
actual velocity and the average velocity:
1 r 1
v(t ) = hvi(t ) + gt = + gt. (12.22)
2 t 2
Figure 12.6 depicts the hodograph given by Eq. (12.21) and also displays Eq. (12.22). We
see that the increment in the velocity of the ball in equal intervals of time is equal.
Fig. 12.7 contains all the information about the projectile motion, so all questions
regarding the motion can be answered by dealing with the triangles in the figure
graphically or algebraically.
First, consider the question of determining the range r of a target sighted in a direction r̂
(not necessarily along the horizontal) which is hit by a projectile, launched with velocity
v0 . This can be done graphically by using the properties of Fig. 12.7. Having laid out v0
on a graph paper (by choosing appropriate units and scale!) as indicated in Fig. 12.7, one
extends a line from the base of v0 , in the direction r̂, to its intersection with the vertical line
extending from the tip of v0 . The length of the two sides of the triangle thus constructed
are then measured, say v1 and v2 , to get the magnitude of 21 gt and rt respectively. This gives
the time of flight t = 2v1 /g and the range r = 2(v1 v2 )/g. How to get the final velocity is
also evident from Fig. 12.7.
458 An Introduction to Vectors, Vector Operators and Vector Analysis
To get to our problem, we find the range r algebraically. Crossing Eq. (12.21) with r
we get,
1
t (g × r) = r × v0 ,
2
giving
Again, crossing Eq. (12.21) with (−gt ), after some simplification, using Eq. (12.23) for t,
we get
π π
= cos − φ + cos (θ0 − φ) cos − θ0 , (12.25)
2 2
where θ0 and φ are the angles respectively made by v̂0 and r̂ with the horizontal, as shown
in Fig. 12.7.
Fig. 12.8 Graphical determination of the displacement r , time of light t and final
velocity v
Now we complete the job in the following two steps. First, for a given v0 and r̂, we find v̂0
which maximizes the range r in the direction r̂ and also find this maximum range,√say rmax .
Using this v̂0 and (r, r̂) as given, we solve for v0 with rmax = r. Note that r = h2 + L2
and r̂ is specified by tan(φ) = h/L.
Odds and Ends 459
To find the direction v̂0 which maximizes the range r along r̂, we note that r is
maximum when the RHS of Eq. (12.25) is maximum. Since r̂ and −ĝ are fixed directions,
we have to maximize the second term on the RHS of Eq. (12.25). This is maximum when
π π φ
2 − θ0 = θ0 − φ which implies θ0 = 4 + 2 . Thus, v̂0 is directed along the line bisecting
the angle between r̂ and −ĝ (see Fig. 12.8).
Thus,
r̂ − ĝ
v̂0 = . (12.26)
|r̂ − ĝ|
Substituting Eq. (12.26) in Eq. (12.24) we get,
2v02 1 v02 1
rmax = = . (12.27)
g |r̂ − ĝ|2 g 1 + sin(φ)
We
√ leave the last equality for you to check. Solving Eq. (12.27) for v0 with rmax =
h2 + L2 = r0 say, (note that sin(φ) = h/r0 ), we get
q
v0 = g (r0 + h).
π φ
Using θ0 = and φ = arctan(h/L) we get
4 + 2
!
π 1 h
θ0 = + arctan .
4 2 L
FD = Cv = −mγv.
v̇ = g − γv, (12.28)
or,
(v̇ + γv) = g.
460 An Introduction to Vectors, Vector Operators and Vector Analysis
1 − e−γt
!
v(t ) = g + v0 e−γt . (12.29)
γ
The constant γ −1 is called relaxation time which is the measure of the time it takes for the
retarding force to make the particle forget its initial conditions. If t γ −1 , then e−γt << 1
so that the first term on the RHS of Eq. (12.29) dominates all others, irrespective of the
value of v0 , giving
v = v∞ = γ −1 g.
The value v∞ is called the terminal velocity, which can also be obtained by putting v̇ = 0
in the equation of motion.
The displacement r of the ball from the origin is found by directly integrating
Eq. (12.29). This gives
e−γt + γt − 1 1 − e−γt
! !
r=g + v0 . (12.30)
γ2 γ
Let the plane of motion of the ball be the x−y plane with x axis horizontal. Equation (12.30)
gives rise to the equations
1 − e−γt
!
x = v0x (12.31)
γ
e−γt + γt − 1 1 − e−γt
! !
y=g + v0y . (12.32)
γ2 γ
At the end of its range, the ball touches the ground, so y = 0, making the RHS of
Eq. (12.32) equal to zero. This gives a transcendental equation for the time of flight t
which does not have a closed form solution. Assuming t to be sufficiently large so as to
make e−γt small enough, we expand e−γt in powers of t and retain terms only up to
second order so that contribution due to gravity is properly included. We now find the
positive root of the resulting quadratic in t and substitute in Eq. (12.31). Putting
Odds and Ends 461
v0x = v0 cos θ0 and v0y = v0 sin θ0 where θ0 is the angle at which the ball is projected
and v0 = |v0 |, we find that we have now got an equation expressing the range x as a
dx
function of θ0 . To find θ0 for the maximum range, we solve dθ = 0. Using the given
0
max 0
data, we get θ0 = 32 .
Equation (12.28) is useful in the analysis of microscopic motions also. For example,
consider an electron (with mass m and charge e) moving in a conductor under the
influence of a constant electric field E. The electron’s motion is retarded by the collisions
with the lattice. We may represent the retardation by the resistive force proportional to the
velocity of the electron. If the resistance is independent of the direction in which the
electron moves, we say that the conductor is an isotropic medium. We can then write the
resistive force in the form −µv, where µ is a scalar constant. We are thus led to the
equation, (compare with Eq. (12.28)),
v̇ = eE − µv. (12.33)
For times large compared to the relaxation time τ = m/µ, the electron reaches the
terminal velocity
" #
e
v= E (12.34)
µ
and the result is a steady current in the conductor. The electric current density J is given by
J = N ev, (12.35)
where N is the number density of electrons. Substituting Eq. (12.34) in Eq. (12.35) we get
Ohm’s law
J = σ E, (12.36)
σ = N e2 /µ. (12.37)
Ohm’s law holds remarkably well for many conductors over a wide range of currents. The
conductivity σ and the electron density N can be measured, so µ can be calculated from
Eq. (12.37). Then, the relaxation time can also be calculated and compared with the
measured values. These are in general agreement with the extremely short relaxation
times observed in metals. Thus, Eq. (12.33) is vindicated to some degree. However, we
note that the velocity v in Eq. (12.33) cannot be regarded as the velocity of an individual
electron, whose trajectory must be very irregular as it collides repeatedly with the massive
atoms in the lattice. Thus, v in Eq. (12.33) must be a kind of average electron velocity.
Thus, our classical analysis can describe, (if at all), only the average motion in the
microscopic domain. Derivation and explanation of equations like Eq. (12.33), pertaining
to the electron’s motion in a metal, requires statistical mechanics and the basic equations
of quantum mechanics.
462 An Introduction to Vectors, Vector Operators and Vector Analysis
geff = g + (ω × r) × ω.
Henceforth, we replace geff by g so that whenever we write g we actually mean geff . Also
we neglect the resistance due to air.
From Eq. (12.38) we can compute the effect of Coriolis force on the projectile motion,
treating g to be a constant. The principal source of variation in g is the deviation of earth’s
figure from sphericity and the non-uniformity of its mass distribution (density). Another
reason is the possible fall from great heights (multiples of earth’s radius) which is unrealistic
for a surface to surface projectile. Anyway, here we shall treat g to be a constant. Actually,
Odds and Ends 463
in the approximation of constant g and ω Eq. (12.38) can be exactly solved. In our case,
however, for typical velocities we have 2|(v × ω )| << g because of the relatively small value
for the angular speed of the earth (ω = 7.29 × 10−5 radians sec−1 ).
Thus, a perturbation solution is more useful here and we proceed to get it in the
following way.
We regard the Coriolis term in Eq. (12.38) as a small perturbing force. Then
Eq. (12.38) can be solved by the method of successive approximations. We write velocity v
as an expansion of successive orders in ω,
v = v1 + v2 + v3 + · · · (12.39)
The zeroth order term v1 is required to satisfy the unperturbed equation v̇1 = g, which
integrates to
v1 = gt + v0 , (12.40)
where v0 is the initial velocity. Inserting v to the first order in Eq. (12.38) we get,
Neglecting the second order term 2v2 × ω this reduces to an equation for v2 when v1 is
replaced by the RHS of Eq. (12.40),
This integrates to
Substituting Eqs (12.40) and (12.41) in Eq. (12.39) we get the velocity to the first order in
ω as
(r × g) = (v0 × g)t,
or,
or,
(r × g) · (v0 × g)
t= . (12.48)
|v0 × g|2
Similarly, again from Eq. (12.46) we have,
1 2 (r × v0 ) · (g × v0 )
t = . (12.49)
2 |g × v0 |2
Note that
" # !
1 2 1 (r̂ × v̂0 ) · (ĝ × v̂0 )
r − gt = r r̂ − ĝ . (12.50)
6 3 |ĝ × v̂0 |2
This shows that the two terms in Eq. (12.47) are of the same order of magnitude.
To find the change in range due to the Coriolis force we have to find the component of
∆r in the direction r̂, which is easily obtained from Eq. (12.47) as
t3
r̂ · ∆r = r̂ · (ω × g). (12.51)
6
Similarly, the vertical deflection is given by
ĝ · ∆r = tr · (ω × ĝ) (12.52)
The vector ω × ĝ is directed west, except at poles, so both Eqs (12.51) and (12.52) vanish
for the trajectories to the north or south. They have maximum values for the trajectories to
the west. This is due to rotation of the earth in opposite direction while the projectile is in
flight.
In most circumstances, resistive forces have a greater effect on the range and vertical
deflection than the Coriolis force. The lateral Coriolis deflection is more significant as it
will not be masked by resistive forces, that is, the observed lateral deflection is solely due to
Coriolis force, as the resistive forces do not have any component in the lateral direction. Of
course, resistive forces will change ∆r (and also its lateral component) via their influence
on the velocity which in turn governs the Coriolis force.
For a target on a horizontal plane, g · r = 0 and ĝ × r̂ is a rightward unit vector. From
Eq. (12.47), then, the rightward deflection ∆R is given by
∆R = (ĝ × r̂) · ∆r
t2
!!
= −t (ĝ × r̂) · ω × r − g
6
t2 t2
" !! !! #
= −t (ĝ · ω ) r̂ · r − g − ĝ · r − g (r̂ · ω )
6 6
466 An Introduction to Vectors, Vector Operators and Vector Analysis
t2
" #
= t −r (ĝ · ω ) − g (r̂ · ω )
6
t2g
!
= −rtω · ĝ + r̂
6r
!
1 (r̂ × v̂0 ) · (ĝ × v̂0 )
= −rtω · ĝ + r̂ .
3 (ĝ × v̂0 ) · (ĝ × v̂0 )
Here, we have used Eq. (12.49). We now use the identity II and Fig. 12.10 to get
1
∆R = rtω cos λ(tan λ − tan α cos φ). (12.53)
3
For nearly horizontal trajectories (α ≈ 0), the second term in Eq. (12.53) can be neglected,
giving ∆R = rtω sin λ which is positive in the northern hemisphere and negative in the
southern hemisphere. As a general rule, therefore, the Coriolis force tends to deflect
particles to the right in the northern hemisphere and to the left in the southern
hemisphere. However, this rule is violated by highly arched trajectories and Eq. (12.53)
tells us that for a trajectory satisfying
3 tan λ
tan α0 = , (12.54)
cos φ
the Coriolis deflection ∆R vanishes. In the northern hemisphere, deflection will be to the
left for α > α0 and to the right for α < α0 . In the southern hemisphere, these inequalities
reverse.
From Eq. (12.48), the time of flight for a target on the horizontal plane is
r
t= . (12.55)
v0 cos α
Since the projectile is fired due east, φ = π2 , so from Eq. (12.53) we get,
r2
!
∆R = ω sec α sin λ (12.57)
v0
!3
1 r
Change in range = − ωg sec3 α cos λ
6 v0
and
v = v î.
Here, î, ĵ, k̂ are the unit vectors along x, y, z axes respectively. With φ = π the Coriolis
acceleration ac becomes
which is towards the right of the flow (westward). So the total acceleration of the water is
ac + g (see Fig. 12.11) with ac given by Eq. (12.58). From Fig. 12.11 we see that the angle
made by the resultant ac + g with g (angle α in Fig. 12.11) is given by
!
ac
tan α = . (12.59)
g
Now the water surface must be normal to the vector ac + g, so it makes angle α with the
horizontal. If the level difference is h and width of the river is W we have from Eq. (12.59),
! !
h ac
=
W g
468 An Introduction to Vectors, Vector Operators and Vector Analysis
or,
!
ac
h= W
g
Putting numerical values of all the quantities involved we get the result.
˙
ṙ = ṙ r̂ + r r̂. (12.60)
H = r × ṙ = rr × r̂˙ = r 2 r̂ × r̂,
˙ (12.61)
where H is the specific angular momentum (angular momentum per unit mass) which is
conserved. Cross Eq. (12.61) by r̂ on the right so that
H × r̂
= (r̂ × r̂˙ ) × r̂ = r̂,
˙ (12.62)
r2
Odds and Ends 469
where we have used the identity I and the fact that r̂ · r̂˙ = 0. We substitute Eq. (12.62) in
Eq. (12.60) to get
H × r̂
ṙ = ṙ r̂ + . (12.63)
r
To get the acceleration we differentiate Eq. (12.63) with respect to t and again use
Eq. (12.62) and identity I. We have,
H2
!
r̈ = r̈ − 3 r̂. (12.64)
r
Now, we make use of the assumption that the motion is circular. This means H = rv,
where v is the constant speed of the particle on the circle and also ṙ = 0 = r̈. Therefore,
the acceleration is
v2
r̈ = − r̂
r
and the force is
mv 2
f = mr̈ = − r̂. (12.65)
r
3
Let us now assume that Kepler’s third law is valid i.e., Pr 2 is a constant say C, where P is
the period of the orbit. For circular motion the period P is related to v by v = 2πr
P or,
4π2 r 2
v2 = .
P2
1 C
Putting P2
= r3
in this equation we get
C
v 2 = 4π2 . (12.66)
r
Put Eq. (12.66) in Eq. (12.65) to get
4π2 Cm
f=− r̂.
r2
Thus, the conservation of angular momentum and Kepler’s third law mean that, for circular
motion, the force exerted on a moving particle is central, attractive and varies inversely as
the square of the radius of the circle.
470 An Introduction to Vectors, Vector Operators and Vector Analysis
Exercise The turning points of a satellite orbit are defined by the condition v · r = 0.
Show that, for a turning point, the conservation of Runge–Lenz vector gives the relation
K
r= (e − r̂), (12.67)
2E 0
where E 0 is the specific energy (energy per unit mass) and K is the constant in the
gravitational force law.
The conservation of the Runge–Lenz (or the eccentricity) vector e is given by
v × H = K (e + r̂), (12.68)
where H is the angular momentum per unit mass (specific angular momentum). Put
H = r × v and use the identity I to get
v 2 r − (r · v) = K (e + r̂). (12.69)
At the turning point r·v = 0, so the second term on the LHS vanishes. Further, v 2 is related
to E 0 by [19]
K
v2 = 2 E0 + . (12.70)
r
Substitute this expression for v 2 in Eq. (12.69) to get
K
2 E0 + r = K (e + r̂),
r
which easily simplifies to Eq. (12.67). It is instructive to sketch this relation on an elliptic
or hyperbolic orbit. Note that Eq. (12.67) specifies the turning points only in terms of the
conserved quantities.
v × H = K (e + r̂), (12.71)
so it is no surprise that the hodograph, (which is the orbit in the velocity space), follows
directly from it. Take the vector product with H on both sides of Eq. (12.71) to get
H × (v × H) = KH × (e + r̂). (12.72)
Odds and Ends 471
H 2 v − (H · v)H = KH × (e + r̂).
Since H · v = 0, we get,
K
v = (Ĥ × e + Ĥ × r̂). (12.73)
H
Since H × e is a constant vector, let us put
K
u = (Ĥ × e), (12.74)
H
so that
K
v−u = (Ĥ × r̂) (12.75)
H
or, squaring both sides,
K2
(v − u)2 = . (12.76)
H2
This equation describes a circle of radius (K/H ) centered at point u given by Eq. (12.74).
Since the centre of the circle is determined by the eccentricity vector as in Eq. (12.74),
the distance u = |u| of the centre from the origin is used to classify the orbits as shown in
the following table. In the fourth column, we use |K| to make room for both attractive (K >
0) and repulsive (K < 0) inverse square law force, (for example, Coulomb force between
two like charges, where K = −q1 q2 < 0), although here we have assumed attractive inverse
square law (Newtonian gravity), as we are dealing with spacecrafts and satellites.
Thus, the orbit is an ellipse if the origin is inside the circle, or an hyperbola if the origin is
outside the circle. For an elliptical orbit the hodograph described by Eq. (12.73) is a single
complete circle, as shown in Fig. 12.12 You may check the consistancy of Fig. 12.12 with
Eq. (12.73). Notice how, by parallelly moving any velocity vector v on the hodograph, we
can determine the corresponding position r on the orbit.
472 An Introduction to Vectors, Vector Operators and Vector Analysis
As an application, we find the orbital distance of a satellite as a function of its velocity. First,
I leave it for you to show, using Eq. (12.73), Eq. (12.74), the fact that H · e = 0 and using
(twice!) identity II that
K2 2
u·v = (e + e · r̂).
H2
Now, we know that the eccentricity is related to the specific energy, that is, energy per unit
(reduced) mass by
2E 0 H 2
e2 = 1 +
K2
Therefore, after a bit of rearrangement we get,
K2
u · v − 2E 0 = (1 + e · r̂).
H2
Using the equation to the orbit (in the real space!)
H2 1
(1 + e · r̂) =
K r
we finally get
−K
r = r (v) =
2E 0 − u · v
as the orbital distance of a satellite as a function of its velocity. Note that both u and E 0
are conserved quantities. Thus, knowledge of u and E 0 for a particular orbit enables us to
determine the orbital distance of the satellite if we know its velocity.
Odds and Ends 473
K∆e = v × ∆H + ∆v × H, (12.77)
where ∆H = r × ∆v. We use this to determine qualitatively the effect of a radial and a
tangential impulse on a circular orbit. We also get the effect of an impulse perpendicular to
the orbital plane.
As pointed out, the impulsive force will change the velocity from v to v + ∆v
instantaneously, without any corresponding change in r. Therefore, after the impulse the
eccentricity vector will go over to the new (conserved) value given by
Using the distributive property of the cross product and neglecting terms of higher order
in ∆v, the above expression goes over to
= v × ∆H + ∆v × H (12.79)
For a circular orbit e = 0, so after the impulse, if ∆e , 0, then a circular orbit will go
over to an orbit with eccentricity ∆e. For a radial impulse to a circular orbit, as shown in
Fig. 12.13(a), ∆H = r × ∆v = 0, so K∆e = ∆v × H which is a vector pointing towards east
if the direction of ∆v is north. The resulting elliptical orbit is shown in Fig. 12.13(b).
For a tangential impulse towards west, as shown in Fig. 12.13(c), both the terms in
Eq. (12.77) point towards north, pushing the force centre towards north. The resulting
elliptical orbit is shown in Fig. 12.13(d).
I leave it for you to show that ∆e = 0 for an impulse perpendicular to the plane of the
orbit. So this impulse does not change the shape of the orbit.
Exercise Atmospheric drag tends to reduce the orbit of a satellite to a circle. For a rough
estimate of this effect, suppose that the net effect of the atmosphere is a small impulse at
the perigee which reduces the satellite speed by a factor α (see Fig. 12.14 ). Show that the
resulting change in the eccentricity is
For e = 0.9 and α = 0.01 estimate the number of orbits required to get to a circular orbit.
Show that the speed at perigee actually increases with each orbit.
Solution We have to obtain the change in the eccentricity due to impulse at perigee. The
general expression for the change in eccentricity due to an impulse ∆v is given by
Eq. (12.77) with the corresponding definition of ∆H. In this problem the relevant
quantities are,
Here, r+ denotes the distance of perigee from the origin (a focus) and v+ denotes the speed
at perigee. Putting these expressions in Eq. (12.77) and simplifying, we get,
2
K∆e = −2αv+ a(1 − e )ê. (12.81)
2
To get rid of v+ , note that for r̂ = ê, the conservation law for the eccentricity vector
becomes,
2
v × H = v+ a(1 − e )ê = K (e + 1)ê. (12.82)
2
Substitute for v+ a(1 − e ) from Eq. (12.82) into Eq. (12.81) to get Eq. (12.80). The number
of orbits required to get to a circular orbit that is, to reduce the eccentricity to zero, with
the given values of α and e is
e 0.9
= 24.
∆e 0.038
I leave it for you to check the last sentence in the exercise.
v̇ = ω × v. (12.85)
d
Dotting both sides of Eq. (12.85) with v we see that dt (v · v) = 0 which means that the
magnitude of the velocity of a charged particle moving in constant magnetic field is
invariant in time. Thus, we expect vector v to perform pure rotational motion about the
constant magnetic field B or ω. This is expressed by saying that vector v precesses around
magnetic field B (see Fig. 12.15).
Taking cue from this observation, we resolve v into components parallel and
perpendicular to ω or B as
v = vk + v⊥ . (12.86)
We substitute Eq. (12.86) in Eq. (12.85) to get two equations, one for each of vk and v⊥
v̇⊥ = ω × v⊥ and
v̇k = 0. (12.87)
vk (t ) = v0k , (12.88)
476 An Introduction to Vectors, Vector Operators and Vector Analysis
where ω = |ω|.
Exercise Show that v · (ω × eωt ω̂× v0⊥ ) = 0.
Hint Show first that ω × eωt ω̂× v0⊥ = cos ωt (ω × v0⊥ ) − ω sin ωtv0⊥ . Both terms cancel
after dotting with v⊥ because v⊥ · v0⊥ = |v⊥ |2 cos ωt and v⊥ · (ω × v0⊥ ) = ω|v⊥ |2 sin ωt.
From this exercise we find that the vector eωt ω̂× v0⊥ is normal to both v and ω. Therefore,
it must be proportional to v̇. The proportionality constant is not of any physical
consequence and can be taken to be unity. Thus, the solution to Eq. (12.85) is
To get the trajectory of the particle we have to integrate v(t ) with respect to time. We get,
sin(ωt ) cos(ωt )
r(t ) = x(t ) − x0 = v0⊥ + (v0⊥ × ω̂ ) + v0k t or,
ω ω
" ωt ω̂× #
e (v0 × ω ) v0 · ω
r(ωt ) = + ωt, (12.91)
ω2 ω2
Odds and Ends 477
where x0 is the constant of integration, so that the state of the particle at t = 0 is given
by (x0 , v0 ). We have also used (ω × v0k ) = 0, and v0k = v0 · ω̂ ω̂. Equation (12.91) is a
coordinate free equation of an helix (see Fig. 12.16) with radius
(v0 × ω )
a≡
ω2
and pitch
v0 · ω
b≡ .
ω2
We can make Eq. (12.91) look like a helix by expressing it in terms of
Fig. 12.16 (a) Right handed helix (b) Left handed helix
where a · θ = 0. The helix is said to be right handed if b > 0 and left handed if b < 0 (see
Fig. 12.16).
Equation (12.91) gives a circular trajectory if v0k = 0. The radius vector r rotates with
an angular speed ω = |qB|/mc called the cyclotron frequency. Equation (12.84) tells us
that ω has the same (opposite) direction as the magnetic field B when the charge q is
negative (positive). As shown in Fig. 12.17, the circular motion of a negative (positive)
charge is right handed (left handed).
478 An Introduction to Vectors, Vector Operators and Vector Analysis
v̇ = g + ω × v. (12.96)
As in the case of uniform magnetic field, we resolve each vector in this equation into its
components parallel and perpendicular to ω so that,
v = vk + v⊥ ,
g = gk + g⊥ . (12.97)
v̇k = gk ,
v̇⊥ = g⊥ + ω × v⊥ . (12.98)
Let the velocity at t = 0 be v(0) = v0 which is also resolved parallel and perpendicular
to ω:
The first of Eq. (12.98) with initial condition Eq. (12.99) can be readily integrated to give,
vk (t ) = gk t + v0k
= (g · ω )ω −1 t + v0k
q
= E t + v0k = bt + v0k say, (12.100)
m k
where ω −1 = ω/|ω|2 (see subsection 1.7.1).
To integrate the second of Eq. (12.98) with initial condition Eq. (12.99), we re-write it,
using identity I and the fact that g⊥ · ω = 0, as follows.
Equation (12.101) is the same as the first of Eq. (12.87) with v⊥ replaced by the expression
in the square bracket, which is given by adding a constant vector to v⊥ . Therefore, it can
be solved in a similar way and is given by
v⊥ (t ) = eωt ω̂× a + c
with
we must have
Noting that (g⊥ × ω −1 ) = (g × ω −1 ) and combining Eqs (12.100), (12.102) and (12.103),
we can write the solution of Eq. (12.96) as
where the vectors a and b are defined above and the vector c is re-defined as
Integrating Eq. (12.104) with respect to time, we get the equation to the path of the charge
q (Exercise) as
1
r(t ) = x(t ) − x0 = eωt ω̂× (a × ω −1 ) + bt 2 + ct, (12.105)
2
where x0 is the constant of integration, giving the initial position of the particle to be
x(0) = x0 + a × ω −1 . If we take the origin at x(0), then x0 = ω −1 × a. With this choice of
the origin, r(0) = a × ω −1 , so the vector r at t = 0 lies on the circle of radius a/ω with its
center at the origin and the particle trajectory passes through this point. Note that the
vectors a and a × ω −1 lie in the plane perpendicular to ω, while b is parallel to ω.
It is instructive to write
r(t ) = r1 (t ) + r2 (t ), (12.106)
where
1
r1 (t ) = bt 2 + ct, (12.107)
2
which is an equation to a parabola parameterized by t and
Fig. 12.18 Trajectory of a charged particle in uniform electric and magnetic fields
Thus, we see that the motion of a charged particle under the combined influence of
uniform electric and magnetic fields is the composite of two motions, a parabolic motion
of the guiding center described by Eq. (12.107) and the uniform circular motion around
the guiding center along a circle with radius a/ω, in a plane normal to ω, given by
Eq. (12.108). The composite motion corresponding to Eq. (12.106) can be viewed as the
Odds and Ends 481
motion of a point on a spinning disc whose axis is aligned with the vertical and whose
center is traversing a parabola. This is depicted in Fig. 12.18 and the corresponding
directions of the electric and magnetic fields are shown in Fig. 12.19.
Fig. 12.19 Directions of electric and magnetic fields for Fig. 12.18
Fig. 12.20 Trochoids traced by a charge q when the electric and magnetic fields are
orthogonal
482 An Introduction to Vectors, Vector Operators and Vector Analysis
The position vector of the particle relative to the guiding center repeats itself after a period
of 2π/ω = 2πmc/|qB|. Thus, after every such period, the net change in r(t ) can be
viewed as a result of only the motion of the guiding center along the parabola. This fact is
expressed by saying that the motion about the guiding center averages to zero over a
period of 2π/ω. So motion of the guiding center can be regarded as an average motion of
the particle. Accordingly, the velocity of the guiding center is called the drift velocity of
the particle.
v0k = 0 so that
ṙ1 = c = ω −1 × g = d E × B−1 .
Thus, the drift velocity is perpendicular to both the electric and the magnetic field. The
particle trajectory is the composition of the drift motion of the center of a circle and the
uniform circular motion of a point on this circle. The resulting path of the particle is the
curve traced out by a point on a disc at a distance a/ω from the center, rolling without
slipping with its center drifting along vector c with drift speed |c| = |ω −1 × g| = d |E×
B−1 | = d |E|/|B| and angular speed ω = −q|B|/mc. This curve is, in general, a trochoid
we described in subsection 9.3.2. Now if r2 is the position vector of the dot on the rolling
disc which traces the path of the charged particle, then its linear velocity must match with
that of the particle, namely c. Thus, we require that
|ω × r2 | = |c|.
v0 = ω −1 × g = d E × B−1 . (12.109)
Eq. (12.109) will continue moving in its original staight line without any deflection. E and
B fields can be adjusted to select a large range of velocities. The selection is independent of
the sign of the charge or the mass of the particle.
∇2 φ = 0,
or, the potential φ(x) satisfies the Laplace equation in two dimensions
∂2 φ ∂2 φ
+ 2 = 0.
∂x2 ∂y
A function ψ (x) which forms a pair of harmonic functions with φ(x) also satisfies
∇2 φ = 0
f (z ) = φ(x, y ) + ψ (x, y ).
Z Z
= (φdx − ψdy ) + i (ψdx − φdy ).
C C
484 An Introduction to Vectors, Vector Operators and Vector Analysis
For an irrotational flow derivable from a potential, we expect this integral to be independent
of the chosen curve C and be a function only of the end point coordinates. This is possible
if and only if φ(x, y ) and ψ (x, y ) satisfy
∂φ ∂ψ ∂ψ ∂φ
= , and =− ,
∂x ∂y ∂x ∂y
which are the Cauchy–Riemann conditions, necessary and sufficient for the function f (z )
to be analytic. We can turn around and say that the real and imaginary parts of an analytic
function represent a 2-D irrotational steady flow of an incompressible fluid, as all analytic
functions satisfy the Cauchy–Riemann conditions.
It is easy to see that at all points
! !
∂φ ∂φ ∂ψ ∂ψ
∇φ · ∇ψ = î + ĵ · î + ĵ
∂x ∂y ∂x ∂y
∂φ ∂ψ ∂φ ∂ψ
= +
∂x ∂x ∂y ∂y
= 0
by virtue of the Cauchy–Riemann conditions. Thus, the equipotential surfaces for φ and ψ
at each point are perpendicular to each other. If φ(x, y ) is taken to be the velocity potential,
then the velocity q = −∇φ must be along the line of constant ψ. Such a curve, with its
tangent given by ∇φ, is called the stream line. By Bernoulli’s theorem (see for example,
[19]), the stream function is constant along all stream lines. So ψ can be treated as the
stream function of the problem.
We will now pick up some analytic functions and see what type of flow patterns they
represent.
Thus,
The flow pattern is depicted in Fig. 12.21. This is the flow pattern expected around a
rectangular corner. (Combine half x axis and half y axis to form a rectangle.)
(ii) f (z ) = zn , n > 2.
Here,
This corresponds to a flow pattern around an angle α = π/n. The case with n = 3
is shown in Fig. 12.22.
√
(iii) f (z ) = A z,
This gives
2φ2
= 2r cos2 (θ/2) = r (1 + cos θ ) = r + x
A2
and
2ψ 2
= 2r sin2 (θ/2) = r (1 − cos θ ) = r − x.
A2
486 An Introduction to Vectors, Vector Operators and Vector Analysis
Hence, φ = constant and ψ = constant are the confocal and coaxial parabolas
respectively (see Fig. 12.23). This corresponds to a flow turning around the edge of a
semi-infinite plane sheet.
M
(iv) f (z ) = − ,
2πz
M being a real constant. This gives,
M cos θ M sin θ
φ=− and ψ = .
2πr 2πr
Fig. 12.24 Two-dimensional flow around a 2-D doublet source consisting of a source
and a sink of equal strength, at an infinitesimal separation
The resulting flow pattern is shown in Fig. 12.24. This flow represents a doublet source
with a source and sink sitting at the origin. The streamlines are like that of some dipole
field lines. The source strength M is like the dipole moment of the source.
f (z ) = q0 z. This gives the uniform stream with stream velocity q0 in the direction of
the negative x axis.
Appendices
A
In this appendix we develop the theory of matrices and determinants, as required by this
book, emphasizing their connection with vectors. This approach is not coordinate-free: We
have to represent vectors by their coordinates with respect to some basis. This approach has
the advantage of being easily generalizable to higher dimensional spaces. Our interest in
matrices and determinants stems from their role in understanding of and computations
with linear operators and their connection with the orientations of triplets of vectors and
of surfaces. In the course of this appendix we may re-derive some of the results we have
obtained in the text. Of course, this appendix can be used to explain all instances where
we have used matrices and/or determinants. Theory of matrices is an independent, fully
developed branch of mathematics worthy of an independent, rewarding and fruitful study.
We recommend [12] for such a study.
where Mn×1 is the space of n × 1 matrices called column vectors. For an orthonormal basis
êk k = 1, . . . , n in En we have the correspondence
0
0
.
..
êk ↔ ; k = 1, . . . , n (A.2)
1
.
..
0
where for êk , 1 occurs in the kth row. The transpose of a vector x is defined by xT =
(x1 x2 . . . xn ). The transpose of a column vector is the corresponding row vector. Both the
column vectors representing {êk }, k = 1, . . . , n and the row vectors representing {êTk }, k =
1, . . . , n are called “coordinate vectors”.
Exercise Show that the set of all m × n real matrices forms a linear space of dimension
mn.
Hint Show that this set is isomorphic with the space of all mn-tuples, namely Rmn .
The rows of a m × n matrix A can be identified with the vectors a1 , a2 , . . . , am as the vectors
in Rn ,
a1
a
2
A = . . (A.3)
..
am
the equation
Ax = y (A.5)
Appendices 491
Viewed as the system of simultaneous equations, Eq. (A.5) connects the components
(x1 , . . . , xn ) of the vector x with respect to the basis of vectors defined in the last equation
in an n-dimensional subspace to the components of the same vector (y1 , . . . , ym ) with
respect to the basis êk ; k = 1, . . . , m. Thus, in this case Eq. (A.5) becomes a passive
transformation transforming the components of the same vector from one basis to the
other.
We can also view Eq. (A.5) as an active transformation or as a map or a linear operator
A : En 7→ Em mapping vectors x ∈ En to vectors y ∈ Em . If we shift the origin by a constant
vector b then Eq. (A.5) becomes
y = Ax + b (A.8)
Equation (A.8) defines an affine transformation. This is the most general result of the action
of a matrix on a vector.
As an example, the matrix
2
− 31
3
A = − 13 2 (A.9)
3
1
1
−3 −3
can be actively interpreted as a mapping of vectors x = (x1 x2 ) in the (x1 x2 ) plane onto
the vectors y = (y1 , y2 , y3 ) in the plane defined by
y1 + y2 + y3 = 0
492 Appendices
which is perpendicular to the vector N = (1, 1, 1) and which we call π. Geometrically, the
point (y1 , y2 , y3 ) is obtained by projecting the point (x1 x2 , 0) perpendicularly to the plane
π. Alternatively, the corresponding system of equations
2 1 1 2 1 1
y1 = x1 − x2 ; y2 = − x1 + x2 ; y3 = − x1 − x2
3 3 3 3 3 3
can be interpreted passively as a parametric representation of the plane π, with x1 , x2 as
parameters.
Given a scalar λ we have,
λA = [λaij ] ; i = 1, . . . , m ; j = 1, . . . , n.
Two matrices of the same size can be added. The ijth element of the matrix obtained by
adding A and B is the addition of the ijth elements of the matrices A and B :
A + B = [aij + bij ]
P
C = A + B implies cij = aij + bij . Thus, we can construct a linear combination k λk Ak
where Ak k = 1, . . . are the matrices of the same size say m × n and λk are scalars. Addition
of matrices is associative, (A + B) + C = A + (B + C ) and commutative, A + B = B + A. It
is distributive with respect to the multiplication by a scalar. That is, λ(A + B) = λA + λB
and (α + β )A = αA + βA, α, β, λ being scalars.
Two matrices can be multiplied provided the number of columns of the left multiplier
equals the number of rows of the right multiplier. Then the ijth element of the product is
X
cij = aik bkj .
k
That is, the ith row of A is is elementwise multiplied with the jth column of B and the
corresponding products are summed over, to get the ijth element of the product C = AB.
Note that, in general, AB , BA, that is, matrix product is not commutative. In fact only
one of the products AB or BA may be defined while the other may not.
Product of matrices can be understood via the composition of mappings. If y = Ax
is the map A : Em 7→ En defined by the matrix An×m = [aji ] then by linearity, as shown
above, its explicit form is
m
X
yj = aji xi .
i =1
Now suppose Bp×n = [bkj ] defines a map z = By, En 7→ Ep , then the vector z is given by
n
X n X
X m m
X
zk = bkj yj = bkj aji xi = cki xi ,
j =1 j =1 i =1 i =1
Appendices 493
where
n
X
cki = bkj aji ; k = 1, . . . , p; i = 1, . . . , m.
j =1
Thus, z = Cx where C = BA = [cki ] is the matrix with p rows and m columns defined by
the last equation. Accordingly, we take the matrix C defined above to be the product BA of
matrices A and B in that order.
The matrix product is associative and distributive with respect to matrix addition. Thus,
for three matrices A, B, C with appropriate sizes,
(AB)C = A(BC )
and
A(B + C ) = AB + AC.
Note that, in the last equation, matrices B and C must be of the same size, so if the product
AB is defined, so is AC. The last equation is valid with multiplication in the reverse order.
For the mappings of vectors determined by matrices, we can write
From the definition of the scalar product of two vectors in terms of their coordinates, we
see that x · y = xT y where x and y are the column vectors (n × 1 matrices) representing the
vectors x and y. For an orthonormal basis {êk }, k = 1, . . . , n we have
0
0
..
T
. 0 for i , k,
êi · êk = [0 0 · · · 1 · · · 0] = (A.10)
1 1 for i = k.
.
..
0
where 1 is at ith place in the left multiplier and at kth place in the right multiplier. Thus,
coordinate vectors are orthonormal, as they should be. In general, for any two orthogonal
vectors, we have,
x · y = xT y = 0.
aTij = aji .
A2 AA, A3 = AAA, · · · .
The zero matrix O of order n is the matrix all of whose elements are zero. All the rows
(columns) of zero matrix are zero vectors 0 = (0, 0, . . . , 0)T of n dimensional space. It has
the obvious properties
A + O = A = O + A, AO = OA = O
Ox = O for all x ∈ En .
The unit matrix of order n, denoted I is the matrix representing the identity mapping
Ix = x for all x ∈ En .
I êk = êk k = 1, 2, . . . , n,
from which we can conclude that the column (row) vectors in I are given by the coordinate
vectors as in Eq. (A.2).
1 0 0 · · · 0
0 1 0 · · · 0
I = (ê1 , ê2 , · · · , ên ) = .. .. .. .. · (A.11)
. . . .
0 0 0 ··· 1
The nth order unit matrix I is the multiplicative identity for matrix multiplication. That is,
IA = AI = A
A−1 A = I = AA−1
is called the inverse of A. A nth order matrix A for which A−1 exists is called invertible. We
state and prove the following properties of a nth order invertible matrix.
(i) The inverse of a nth order invertible matrix A is unique.
Proof If possible, let B and C be two distinct inverses of A satisfying AB = BA =
I = AC. Then we have,
B − C = BA(B − C ) = B(AB − AC ) = BO = O
so that B = C.
(ii) A nth order matrix A is invertible if and only if Ax = 0 implies x = 0, or, if and only
if x , 0 implies Ax , 0.
Proof (if part). We are given that Ax = 0 implies x = 0. We show that the
corresponding map A : En 7→ En is both one to one and onto and hence invertible. If
possible, let x1 , x2 with Ax1 = Ax2 . This means, by linearity of A that
A(x1 − x2 ) = 0 so that A maps a non-zero vector x1 − x2 to the zero vector,
contradicting the axiom. Therefore, Ax1 = Ax2 implies x1 = x2 or, in other words,
A is one to one. Since the images of two distinct vectors in En under the map A are
distinct, and since the map A is defined for all vectors in En , the image set of A
coincides with its domain En or, in other words, A is onto. Therefore, the inverse of
the map A exists and the corresponding matrix is the inverse of the matrix A.
(only if part). We are given that A is invertible. Then Ax = 0 =⇒ A−1 Ax =
0 =⇒ x = 0. A matrix mapping a non-zero vector to the zero vector is called
singular. Thus, a matrix is invertible if and only if it is non-singular
(iii) A nth order matrix A is invertible if and only if its determinant is not zero.
Proof (if part) The determinant of a square matrix is the product of its eigenvalues.
If the determinant is zero, then at least one of the eigenvalues of A is zero. Since the
eigenvector is non-zero, the corresponding eigenvalue equation reads Ax = 0x = 0,
so that A maps a non-zero vector to the zero vector and hence must not be
invertible. Alternatively, if det(A) , 0, the system AX = Y has unique solution
BY = X. Substituting, these two equations into each other we get AB = I = BA
which means B = A−1 .
(only if part) We are given that A is invertible. Therefore, A−1 A = I so that
det(A−1 A) = det(A−1 ) det(A) = det(I ) = 1 which means det(A) , 0.
(iv) A nth order matrix A is invertible if and only if it maps every basis to some basis.
Proof (if part) We are given that A maps a linearly independent Pnset x1 , x2 , . . . , xn to
the linearly independent set Ax1 , Ax2 , . . . , Axn . Consider x = k =1 ak xk such that
496 Appendices
(v) A nth order matrix A is invertible if and only if the column vectors of A are linearly
independent.
Proof From Eq. (A.7) it is clear that Ax = 0 for x , 0 if and only if the column
vectors of A are linearly dependent.
Exercise Show that a matrix is singular if and only if its determinant vanishes.
We have defined and used orthogonal matrices in connection with the rotation of a vector
about a direction in space. The orthogonal matrices correspond to linear operators or
transformations that preserve length or distance between points in space. If two points
P , Q in space with coordinates (xi , yi ), i = 1, . . . , n go over to points P 0 , Q0 , with
coordinates (xi0 , yi0 ), i = 1, . . . , n under an orthogonal transformation defined by the
orthogonal matrix R = [aij ], then we require that
n
X n
X
d 2 (P , Q ) = (xi − yi )2 = (xi0 − yi0 )2 = d 2 (P 0 , Q0 ). (A.12)
i =1 i =1
where δjk is the Kronaker delta, which is zero when j , k and is 1 if j = k, or,
aj · ak = δjk . (A.14)
That is, the jth and the kth column vectors of R are orthonornal. Since a set of orthogonal
vectors is essentially linearly independent, the n column vectors of R form an orthonormal
basis of the n dimensional space. Thus, every orthogonal matrix is invertible, by virtue of
(v) above. In fact Eq. (A.13) can be written as
n
X
aTji aik = δik ,
i =1
Appendices 497
or,
RT R = I = RRT . (A.15)
Rx · Ry = x · y. (A.16)
valid for any m vectors x1 , . . . , xm and scalars λ1 . . . , λm . In fact we can write any vector a
as a normal form involving a basis ê1 , · · · , ên :
where ci are the constant values ci = f (êi ). We define the vector c ≡ (c1 , c2 , . . . , cn ) to get
f (a) = c · a.
Thus, the most general linear form in a vector a is the scalar product of a with with a
suitable constant vector c.
A function f (x, y) of two vectors x ≡ (x1 , . . . , xn ), y ≡ (y1 , . . . , yn ) is called a bilinear
form in x, y if f is a linear form in x for fixed y and a linear form in y for fixed x. Thus, we
require that
for any vectors x, y, z and scalars λ, µ. The simplest example of a bilinear form is the vector
product
f (a, b) = a · b.
Here, the rules Eq. (A.17) reduce to the associative and distributive laws for the scalar
product. More generally, we find,
Thus, we can deal with the binary forms as we deal with ordinary products in multiplying
out expressions. Using the decomposition of a vector in terms of a basis ê1 , · · · , ên , we get,
for the most general bilinear form in a, b,
n
X n
X
f (a, b) = aj bk f (êj , êk ) = cjk aj bk (A.19)
j,k =1 j,k =1
A = [a1 , a2 , . . . , am ] = [ajk ],
where a1 , a2 , . . . , am are its column vectors. Generalizing the bilinear case, the most general
multilinear form in a1 , a2 , . . . , am is given by
X
f (a1 , a2 , . . . , am ) = cj1 j2 ···jm aj1 1 aj2 2 · · · ajm m (A.20)
j1 ,j2 ,...,jm =1,...,n
where
Exercise Write explicitly Eq. (A.20) for m = 3, 4, 5 and n = 3. Construct explicitly the
n × m matrix in each case.
f (a1 , a2 ) = −f (a2 , a1 )
f (a, a) = 0.
and using the fact that f is alternating, the right side of this equation can be written
a11 a12
(a11 a22 − a12 a21 )f (ê1 , ê2 ) = c = c det(a1 , a2 ), (A.21)
a21 a22
where c = f (ê1 , ê2 ) and we take the last equality as the definition of the determinant of
the second order of the matrix whose columns comprise the components of vectors a1 , a2 .
Thus, every bilinear alternating form of two vectors a1 , a2 in two-dimensional space differs
from the determinant of the matrix with columns a1 , a2 by a constant factor c.
More generally, an alternating bilinear form of two vectors in n-dimensional space can
be written
n
X
f (a1 , a2 ) = cjk aj1 ak2 ,
j,k =1
where
Combining the terms with subscripts which differ only by a permutation, we can express f
as the linear combination of second order determinants.
n
X aj1 ak1
f (a1 , a2 ) = cjk · (A.22)
aj2 ak2
j,k =1
j<k
The alternating function of three vectors, f (a1 , a2 , a3 ) changes sign whenever any two
of its arguments are exchanged. More generally, its sign is changed when the number of
exchanges of the pairs of its arguments is odd, and its sign does not change if the number
of corresponding exchanges are even. f vanishes if two of its arguments are equal.
Exercise Construct all possible permutations of the arguments a1 , a2 , a3 of an
alternating form which change its sign and which do not change its sign.
Let
where, using the conditions under which an alternating form changes or does not change
sign and the conditions under which it vanishes, we have,
where εjkr are simply the Levi-Civita symbols which by now we know so well.
Exercise Show that εjkr = sign(φ(j, k, r )) where φ(j, k, r ) = (r − k )(r − j )(k − j ).
We can now write the expression for f (a1 , a2 , a3 ) explicitly using the definition of {cjkr }.
We have,
f (a1 , a2 , a3 ) = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32
−a13 a22 a31 − a11 a23 a32 − a12 a21 a33 )f (ê1 , ê2 , ê3 ) (A.23)
or,
a11 a12 a13
f (a1 , a2 , a3 ) = c a21 a22 a23 , (A.24)
a31 a32 a33
where c = f (ê1 , ê2 , ê3 ) is a constant. Therefore, the most general trilinear alternating form
in three 3-dimensional vectors a1 , a2 , a3 differs from the determinant of the matrix with
columns a1 , a2 , a3 by a constant factor c. Note that
so that
as it should be.
Generalization to higher order matrices is now straightforward. Consider a n×n matrix
cj1 j2 ···jn = f (êj1 , êj2 , . . . , êjn ) = εj1 j2 ···jn f (ê1 , ê2 , . . . , ên ),
502 Appendices
where j1 . . . jn runs over the set of permutations of 1, 2, . . . , n (see the following exercise).
Exercise Show that there are n! terms in the expansion of an nth order determinant given
by Eq. (A.26).
Solution We have to show that there are n! non-zero values of εj1 j2 ···jn . Since no two
values of the subscripts can be the same, we have n choices for j1 , n−1 choices for j2 . . . n−k
choices for jk . . . so that the total number of distinct εj1 j2 ···jn are n(n−1)(n−2) · · · (n−k ) · · · 1
or n! which is the same as the number of terms in the required expansion. This makes the
nth order determinant a nth degree form in the ajk consisting of n! terms.
Exercise Show that determinant is linear in each of its columns separately.
corresponding matrix A with these vectors as its column vectors, the determinant changes
sign if we interchange any two of its columns. Thus, the determinant of a square matrix A
changes sign if we interchange any two columns of A; in particular, the determinant of a
square matrix A with two identical columns vanishes. Using the linearity of the determinant
in each of its columns separately, we find that multiplying one column of the matrix A by a
factor λ has the effect of multiplying the determinant of A by λ. For example,
det(λa1 , a2 , . . . , an ) = λ det(a1 , a2 , . . . , an ).
det(0, a2 , . . . , an ) = 0,
with the same result for any other column so that the determinant of a matrix A vanishes
if any column of A is the zero vector. Multiplying all elements of A by λ amounts to
multiplying every column of A by λ so that
det(λA) = λn det(A).
since the matrix (a2 , a2 , . . . , an ) has two identical columns. Generally, the value of the
determinant of the matrix A does not change if we add a multiple of one column to a
different column. However, if we multiply a column by λ and add it to the same column,
then the value of the determinant changes by the factor 1 + λ.
We now show that the determinant of the product of two nth order matrices A and B is
the product of their determinants. To see this, note that if C = AB the resulting matrix C
is given by
a1 · b1 a1 · b2 · · · a1 · bn
a · b a · b · · · a · b
2 1 2 2 2 n
C = , (A.27)
.. .. ..
. . .
an · b1 an · b2 · · · an · bn
where a1 , a2 , . . . , an are the row vectors of A while b1 , b2 , . . . , bn are the column vectors
of B. From Eq. (A.27) we see that, keeping A fixed, det(C ) is a linear form in column
vectors {bk } of B. Further, this is an alternating form because interchanging two columns
of B corresponds exactly to interchanging the corresponding columns of C. Hence, det(C )
is an alternating multilinear form in the column vectors of the matrix B. Consequently,
det(C ) = γ det(B),
504 Appendices
where γ is the value of det(C ) when bk = êk k = 1, . . . , n or when B is the unit matrix I.
Now, if B = I, then C = AB = AI = A so that γ = det(A). Thus we get,
To see this, note that in the expansion of the determinant (Eq. (A.26)) we can rearrange the
factors in each term according to the first subscripts (e.g., a31 a12 a23 = a12 a23 a31 ) so that,
Combining this result with Eq. (A.27) we get, for the matrices A, B defined via their column
vectors, A = (a1 , a2 , . . . , an ) and B = (b1 , b2 , . . . , bn ),
a1 · b1 a1 · b2 · · · a1 · bn
a2 · b1 a2 · b2 · · · a2 · bn
det(A) det(B) = det(AT B) = .
.. ..
· (A.31)
.. . .
an · b1 an · b2 · · · an · bn
Appendices 505
AX = Y
where X and Y are n × 1 column vectors and A is an n × n matrix with column vectors
a1 , a2 , . . . , an . This system of equations can be re-expressed as
x1 a1 + x2 a2 + . . . + xn an = y.
If the matrix A is non-singular, we can divide by its determinant and get the solution
x1 , x2 , . . . , xn expressed in terms of determinants:
det(y, a2 , . . . , an ) det(a1 , y, . . . , an )
x1 = , x2 = ,
det(a1 , a2 , . . . , an ) det(a1 , a2 , . . . , an )
det(a1 , a2 , . . . , y)
. . . , xn = .
det(a1 , a2 , . . . , an )
This is Crammer’s rule for the solution of n linear equations in n unknowns.
506 Appendices
Written out as an alternating linear form in vector c we have, (see Eq. (A.23)),
a3 b3
z2 = a3 b1 − a1 b3 = ,
a1 b1
a1 b1
z3 = a1 b2 − a2 b1 = ·
a2 b2
det(a, b, c) = c · (a × b).
If we cyclically permute the factors on the right side, we have to interchange the columns
(or rows) of the determinant on the left twice, leaving the determinant invarient. Thus,
The components zi of the vector z = a × b are themselves second order determinants and
hence are bilinear alternating forms of vectors a, b. This immediately leads to the laws of
vector multiplication stated in the text (see Eq. (1.10)).
The property a × a = 0 follows from a × b = −b × a. More generally, the vector product
two vectors a × b vanishes if a and b are linearly dependent, as we have seen in the text. To
prove this using determinants we note that by Eq. (A.34) a × b = 0 implies
which just means that a, b, c are dependent for all c. Since we can always choose c which is
linearly independent of a, b, we conclude that a × b = 0 implies that a and b are linearly
dependent or are proportional to each other.
Appendices 507
where θ is the angle between a and b and equals the area of the parallelogram spanned by
a and b. Using the above exercise the square of the area A2 of the parallelogram spanned
by vectors a, b can be written elegently in terms of a determinant as
2 a · a a · b
A = (a · a)(b · b) − (a · b)(b · a) = · (A.35)
b · a b · b
The determinant appearing in this equation is called the Gram determinant of vectors a, b
and denoted Γ (a, b). It is clear from the derivation that
Γ (a, b) ≥ 0
for all vectors a, b and that equality holds only if a and b are linearly dependent.
We can derive a similar expression for the square of the volume V of a parallelopiped
spanned by three vectors a, b, c. This volume V is the product of the area A of one of its
faces multiplied by the corresponding altitude h. Choosing for A the area of the
parallelogram spanned by the vectors a and b, we get
a · a a · b
V 2 = h2 A2 = h2 Γ (a, b) = h2
· (A.36)
b · a b · b
Let the vectors a, b, c be the position vectors of the points P1 , P2 , P3 respectively and let P
denote the foot of the perpendicular to the a, b plane dropped from P3 . Then h in Eq. (A.36)
−−→
is the length of the vector d = P P 3 The position vector of the point P , say p, lies in the
a, b plane so that
p = λa + µb.
d = c − p = c − λa − µb (A.37)
508 Appendices
a · d = 0 = b · d.
λa · a + µa · b = a · c, λb · a + µb · b = b · c. (A.38)
The determinant of these equations is just the Gram determinant Γ (a, b). Assuming a and
b to be independent vectors, (otherwise V = 0), we have Γ (a, b) , 0. There is, then, a
unique solution λ, µ to Eq. (A.38) and hence a unique vector d perpendicular to a, b plane
with initial point in that plane. The length of that vector is the required distance h so that,
by Eq. (A.37) and using orthogonality of d with vectors a and b, we have,
h2 = c · c − λc · a − µc · b.
This gives the volume V of the parallelopiped spanned by vectors a, b, c in terms of vectors
a, b, c as
This expression can be written more elegently as the Gram determinant formed from the
vectors a, b, c:
a · a a · b a · c
V 2 = b · a b · b b · c = Γ (a, b, c).
(A.40)
c · a c · b c · c
We show the identity of Eqs (A.39) and (A.40) for V 2 , using the fact that the value of the
determinant Γ (a, b, c) is unultered if we subtract from the last column λ times the first
column and µ times the second column. Doing this and using Eq. (A.38) we get,
a · a a · b 0
1Γ (a, b, c) = b · a b · b 0 · (A.41)
c · a c · b c · c − λc · a − µc · b
Expanding this determinant in terms of the last column leads immediately to the expansion
in Eq. (A.39).
Equation (A.40) shows that the volume V of the parallelopiped spanned by the vectors
a, b, c does not depend on the choice of the face and of the corresponding altitude used in
the computation, because the value of Γ (a, b, c) does not change when we permute a, b, c.
For example, Γ (a, b, c) is invarient under the exchange of first two rows and the first two
columns.
Appendices 509
It follows that
Γ (a, b, c) ≥ 0
for any vectors a, b, c. The equality sign can only hold if either Γ (a, b) = 0 or d = 0. The
first of these equations implies that a and b are dependent. The second of these equations
would mean c = λa + µb so that c depends on a and b. Hence, the Gram determinant
vanishes if and only if the vectors a, b, c are dependent.
Our derivation of the expression for V 2 (Eq. (A.40)) is valid for any n-dimensional
space (n finite). If we restrict to 3-dimensional space, Eq. (A.40) follows immediately from
Eq. (A.31)
1 ∂ 21 1 ∂
∇·f = 2 r 2 = 2 (1) = 0.
r ∂r r r ∂r
However, at r = 0 r12 blows up and r 2 r12 becomes indeterminate. Further, the surface
integral of f(r) over a sphere of radius R, centered at the origin, is
Z Z
r̂
f(r) · ds = · (R2 sin θdθdφr̂)
R2
Zπ ! Z 2π !
= sin θdθ dφ = 4π.
0 0
Thus, the surface integral remains finite despite the singularity at the origin. Now, we
require on physical grounds that the electrostatic field due to a point charge must obey the
divergence theorem. Hence, we must have
Z
∇ · fdV = 4π
for any volume containing the origin. Since ∇ · f = 0 everywhere except at r = 0, all the
contribution to this integral must come from ∇ · f at the origin. Thus, ∇ · f has the bizarre
property that it vanishes everywhere except at one point, the origin, and yet its integral over
Appendices 511
any volume containing that point is 4π. Such a behavior is not expected of any ordinary
function. The object required to salvage the situation can be constructed as follows. We
require the linear space D of infinitely differentiable (C ∞ ) and square integrable functions
φ : E3 7→ R with compact support.1 Then the required object is the functional δ3 (r) :
D 7→ R defined via
Z
φ(r)δ3 (r)dV = φ(0) ∈ R, (B.1)
V
where we have assumed that the point 0 ∈ V in the first case, while the point a ∈ V in the
second, failing which the corresponding integrals vanish. Taking φ(r) = 1 in Eq. (B.1)
we get,
Z
δ3 (r)dV = 1. (B.3)
V
Of course, all of the above three equations hold unconditionally, if all the integrals are over
all space. The functional δ3 (r) defined via the above three equations is an instance of a
mathematical structure called distributions, but is given the name ‘Dirac delta function’
after its inventor, P.A.M Dirac, although it is not a function in its usual sense.
Thus, the apperent paradox regarding the application of the divergence theorem to the
electrostatic field due to a point charge at the origin is resolved if we recognize
r̂
∇ · 2 = 4πδ3 (r), (B.4)
r
so that
Z Z
r̂
∇· dV = 4π δ3 (r)dV = 4π.
r2
More generally,
!
− r0
r[
∇· = 4πδ3 (r − r0 ), (B.5)
|r − r0 |2
it follows that
1
∇2 = −4πδ3 (r − r0 ). (B.7)
|r − r0 |
In order to construct the delta function for one dimensional physical phenomena, we need
the linear space D of functions of a single variable which are continuously differentiable
at all orders, and have compact support. Then, the Dirac delta function is the functional
δ (x ) : D 7→ R defined via
Z∞
φ(x )δ (x )dx = φ(0), (B.8)
−∞
and
Z ∞
φ(x )δ (x − a)dx = φ(a), (B.9)
−∞
The 3-D delta function δ3 (r) and 1-D delta function δ (x ) can be connected by evaluating
the integral over volume by successive evaluation of three single integrals.
Z Z ∞Z ∞Z ∞
3
δ (r)dV = δ (x )δ (y )δ (z )dxdydz = 1.
all space −∞ −∞ −∞
δ 3 (r) = δ (x )δ (y )δ (z ). (B.11)
1
δ (kx ) = δ (x ),
|k|
We change the variables to y = kx giving x = (1/k )y and dx = dy/k. With this change
of variables we get
Z∞
1 ∞
Z
1
φ(x )δ (kx )dx = ± φ(y/k )δ (y )dy = φ(0),
−∞ k −∞ |k|
where ± corresponds to k > 0 and k < 0 respectively, so that ± 1k can be replaced by 1
|k|
. This
means
Z∞ Z∞
1
φ(x )δ (kx )dx = φ (x ) δ (x ) dx.
−∞ −∞ |k|
This is the required result.
We can define the derivative of the delta function, denoted δ0 (x ), in the following way. For
φ(x ) ∈ D we write, integrating by parts,
Z∞ ∞ Z ∞
φ(x )δ0 (x )dx = φ(x )δ (x ) − φ0 (x )δ (x ) = φ0 (0),
−∞ −∞ −∞
as the first term on the right vanishes because φ(x ) has compact support and the prime
denotes diffrentiation with respect to x. Thus we get,
Z∞
φ(x )δ0 (x )dx = −φ0 (0). (B.12)
−∞
(iii) xδ0 (x ) = −δ (x ).
(iv) δ (x2 − a2 ) = (2a)−1 [δ (x − a) + δ (x + a)], a > 0.
R
(v) δ (a − x )δ (x − b )dx = δ (a − b ).
(vi) f (x )δ (x − a) = f (a)δ (x − a).